# FINANCE 781:   Exam 2024, Numerical Questions 

### Question 1


Let y = [4,6,3,6,10] and x = [3,6,9,4,9]. 

If we perform an ordinary least squares (OLS) regression of y on x, what is the root mean squared error (RMSE) on the training data?


In [None]:
import numpy as np
from sklearn.linear_model import LinearRegression

# Given data
y = np.array([4, 6, 3, 6, 10])
x = np.array([3, 6, 9, 4, 9]).reshape(-1, 1)

# Fit the OLS regression model
model = LinearRegression().fit(x, y)

# Predict the y values
y_pred = model.predict(x)

# Calculate the residuals
residuals = y - y_pred

# Compute the RMSE
rmse = np.sqrt(np.mean(residuals**2))
rmse

### Question 2

You define PCA_3 as a function that applies principal components analysis (PCA) to a dataset and then retains the first 3 principal components as a new dataset.

Consider that you apply PCA_3 to a dataset comprised of stock market returns from companies listed on the S&P 500 index, and you label the resulting dataset df_PC.  

If the first principal component in df_PC explains 0.13 of the total variance in the original dataset, what is the maximum proportion of total variance that all 3 principal components in df_PC could explain?

In [None]:
max_variance = 0.13*3

max_variance

### Question 3

The following training data consists of two features ("momentum" and "investment") and a target variable ("return"):

 

        momentum            investment           return                                
          0.04                0.03                0.07                                
          0.06                0.05                0.08                                
          0.02                0.03                0.05                                
          0.05                0.05                0.06                                
          0.04                0.06                0.05                                
          0.08                0.09                0.11                                
          0.06                0.08                0.06

 

The following regression tree was constructed using the training data above (trained to a maximum depth of 2).

![Image description](image.png)

Based on the tree above and the training data provided, what is the predicted return for a new observation with momentum of 0.07 and investment of 0.06?

Investment ≤ 0.085
Momentum > 0.055

The training observations that meet these criteria are:

Momentum = 0.06, Investment = 0.05, Return = 0.08
Momentum = 0.06, Investment = 0.08, Return = 0.06

The average return for these observations is:


In [None]:
ave_return = (0.08+0.06)/2

ave_return

### Question 4

Audacious Capital Partners LLC (ACP) is a hedge fund that takes directional bets based on the expected profitability of stocks.

ACP has been developing a neural network that uses R&D Spending and Total Assets to predict the Next Quarter Profit.

The neural network architecture is displayed below. It uses ReLU activation for both the Hidden Layer and the Output Layer.


![Image description](image-2.png)

Unfortunately, ACP has had the misfortune of recently hiring an intern.

While under the impression that they were working on a test model, the intern added code to the production model that changed all the connection weights (w1 to w6) to 0 and all the biases (b1 to b3) to 1.

This updated model was applied to the following data:

 

    R&D Spending	    Total Assets	    Next Quarter Profit
        300	            2300	            211
        200	            4500	            239
        350	            3100	            222
        420	            3800	            191
 

Calculate the RMSE (Root Mean Square Error) of the neural network predictions applied to the data above.

In [None]:
import numpy as np

# Actual Next Quarter Profit values
actual_profits = np.array([211, 239, 222, 191])

# Predicted profits (all are 1 due to the modified model)
predicted_profits = np.ones_like(actual_profits)

# Calculate the RMSE
rmse = np.sqrt(np.mean((actual_profits - predicted_profits) ** 2))
rmse


### Question 5

You have built a neural network using Long-Short Term Memory (LSTM) layers to generate one-step-ahead forecasts of Apple's monthly stock returns.

For this regression task, you have decided on a network with three hidden layers and a lookback period of 12 months. 

The hidden layers are comprised of 23, 14, and 5 LSTM units, respectively.

The output layer is a fully connected (i.e., dense) layer with 1 unit and a linear activation function.

Following Gu, Kelly, and Xiu (2020, Review of Financial Studies), you have gathered 176 predictor variables to predict the stock returns.

How many trainable parameters does the model have?

In [None]:
import tensorflow as tf
from tensorflow.keras.layers import Input, Dense, LSTM
from tensorflow.keras.models import Model


def build_lstm_network(lookback, n_features):

    inputs = Input(shape=(lookback, n_features))
    x = LSTM(23, return_sequences=True)(inputs)
    x = LSTM(14, return_sequences=True)(x)
    x = LSTM(5, return_sequences=False)(x)
    outputs = Dense(1, activation='linear')(x)

    model = Model(inputs=inputs, outputs=outputs)

    # Compile the model
    model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
                  loss='mean_squared_error')

    return model

model = build_lstm_network(12,176)

model.summary()

### Question 6

You wish to use a neural network model to forecast the one-day-ahead realized volatility (RV) of the S&P 500 index using rolling windows.

You expect the neural network model to capture the persistence of the RV series.

You have preprocessed the data and stored it in the file 'predictors_and_volatility_data.csv'.

You have also defined your model in the function 'model_definition()' in the Python file 'ml_model.py', which you can load import to the current script with the 'import ml_model' command.

You have started this project by writing the following code:



In [None]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split

import ml_model

df = pd.read_csv('predictors_and_volatility_data.csv')

window_size = 1000
reg_periods = df.shape[0] - window_size

predictions = []

for n in range(len(df) - reg_periods - 1):
    
    df_rw = df.iloc[n:reg_periods + n, :].copy()
    
    train_data, validation_data = train_test_split(df_rw, test_size=0.2)

    model = ml_model.model_definition()
    model.fit(train_data.drop("Target", axis=1),
              train_data["Target"],
              validation_data=(validation_data.drop("Target", axis=1), 
                               validation_data["Target"]),
              batch_size=1000,
              epochs=20
              )

    X_test = df.iloc[reg_periods + n: 
                     reg_periods + n + 1, :].drop("Target", axis=1)

    y_pred = model.predict(X_test)

    predictions.append(y_pred.flatten()[0])

Which line in the code will most likely lead to suboptimal results?

You may assume that the batch size, number of epochs, and rolling window length are optimal for the problem.

The line most likely leading to suboptimal results is line 18:

In [None]:
train_data, validation_data = train_test_split(df_rw, test_size=0.2)

Using a simple train-validation split in a rolling window context (with shuffling as this is the default in train_test_split()) may not be suitable for time series forecasting.

### Question 7

Adapting the methodology from Britten-Jones (1999, Journal of Finance), you wish to calculate optimal portfolio weights by rescaling to 100% the coefficients of an l1 penalized linear regression of excess stock returns on a vector of 1’s.

You have the following code, which is incomplete as the reg = # [place the correct model here] expression is missing.

Keeping all else equal, what is the expected return if you implement an l1 penalized linear regression with a penalization term of 0.0005? Note that the optimal portfolio allows short and long positions in the underlying assets.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from sklearn.linear_model import LinearRegression, Lasso
from sklearn.model_selection import train_test_split

# Set random seed for reproducibility
np.random.seed(42)

# Parameters for the simulation
n_assets = 20       # Number of assets
n_periods = 252     # Number of trading days in a year
mu = 0.0005         # Mean daily return (roughly 0.05% per day or 12% annually)
sigma = 0.02        # Daily volatility (1%)

# Simulate daily returns for each asset
excess_returns = np.random.normal(loc=0.0005, scale=0.01, size=(252, 20))
excess_returns = pd.DataFrame(excess_returns, columns=[f"Asset_{i+1}" for i in range(20)])
excess_returns['Target'] = 1

# training-validation split
train_data, validation_data = train_test_split(excess_returns, test_size=0.2, shuffle=False)

reg = Lasso(fit_intercept=False, alpha=0.0005, positive=False).fit(train_data.drop("Target", axis=1),
                                                                   train_data["Target"])

# Original coefficients
original_coefs = reg.coef_
# Rescale the coefficients so that their sum equals 100
rescaled_coefs = original_coefs * (1 / np.sum(original_coefs))

expected_return = np.sum(validation_data.drop("Target", axis=1).dot(rescaled_coefs))

print(expected_return)