**You are free to consult any source online, for syntax or conceptual understanding or for any other help you need but please write your own code**

For the last problem (problem 4), don't worry if your results are not great. We are mainly looking at your approach to handle a new problem. Just put in your best effort and you will be fine.

# Problem 1 (10 Points)

Implement gradient descent to find the local minimum of the function:-

\begin{gather}
f(x,y,z) = x^2 + y^2 + z^2 + x - 2y - 3z - e^{(-x^2 - y^2 - z^2)}
\end{gather}

Write the function from scratch, that is calculate partial derivatives beforehand, you may assume a reasonable value of learning rate like 0.001 or use your judgement, you should use an stopping condition to analyze whether your function has converged and decide a value of number of iterations.

# Problem 2 (10 Points) Predictive Model 

Please build a model from the given cleaned data in predictive.csv to predict values of future data. Test your prediction for 10 randomly generated data point for x1 domain (you may go few percentage outside the actual range). You need to report all the relevant performance metrics (MAE, MSE, R2 score, RMSE).

# Problem 3 (20 Points) Create a neural network model for classification 

Assume your coworker has cleaned a dataset and is giving you the data in data_classification.csv to build a model for classifying future data. Please build and test your model and show a linear classifier might not be suitable to solve this problem. Please document all your assumptions, provide relevant metrics (precision, recall, F-1 score, ROC-AUC curve, PR curve), and plot the decision region.

# Problem 4 (70 Points + Bonus for Exceptional Results)

In this question, you are asked to analyze whether using a time series model (RNN/GRU/LSTM) would be able to give us better estimate for the value of the future factors values. To be more concrete, you need to try some recurrent neural networks and  use past values of the factors as input and the value of the factors at next time interval as the target. You can try different designs for the network and its inputs, for example:

- The input is 1-dimensional (the future value of each factor depends only on its own past).
- The input is 5-dimensional (the future value of each factor depends on history of all factors).

You can either train 5 different neural networks to predict each of the 5 factors, or use one neural network to predict 5 factors at once. You can also engineer more features like we discussed in the labs (SMA, EMA, etc) and thus work with an even higher dimensional input. Remember that this increases the number of parameters of your model and may make you more prone to overfitting. Start with simple features where you only use past values and then experiment with feature engineering choices. **Include only the model with simple features and any other more successful feature engineering choices in your submission.**

__Part 1 (Monthly Factor Data, 30 points + bonus for strong performance)__: 

__Step 1 (Model Building)__: 

Download the 5-factor Fama french factors monthly factors [dataset](https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html)
    

Once you download the data, you will find that it contains a mix of annual and monthly factors and some texts at the top, so pd.read_csv will error out( try taking a look at csv file in text editor ). To get the monthly data, you can do the following:

df = pd.read_csv('F-F_Research_Data_5_Factors_2x3.csv', skiprows = 3).iloc[:-59,:]

Use 1970-2010 data for training data, 2010-2015 for validation, and 2015-2020 for testing. 

As you have seen from other courses, factor models are heavily used in asset pricing as well as predicting future stock returns. For future stock return prediction, we typically have a seperate model for factor loadings. By factor loadings, we mean that coefficents of the factor in the model. At the time $t$, by taking the dot product of the best estimation of factor loading for next time interval($t+1$) and the best estimate of factors at next time interval ($t+1$), one can come up with an estimation of future single stock return and create a long/short portfolio of stocks (long the stock with positive return prediction and short the ones that have negative predicted returns). However, most of these predictive models simply use the current value of the factors as the best guess for the factor value at time $t+1$. An alternative choice is using moving average of factor values as an estimate for the value of the factor at next time step.


Use data from 1970-2010 as training data, 2010-2015 as validation data, 2015-2023 as a testing data. You should tune the parameters using validation set only. After you have tuned the hyperparameters, use the model to make predictions on testing data and report the accuracy and performance of model. Compare the performance of the model with the two benchmarks mentioned above (last known value or moving average of previous values). Choose the window size for moving average as you desire.

__Step 2 (Strategies)__ 

As you might have learnt from your past courses, all these 5 factors are tradable portfolios by construction (for example SMB factor is a tradable portfolio longing small Caps and shorting Big Caps). Using the predictive model of the previous part, design a trading strategy. For example, you can use long the factor with highest estimated return and short the factor with smallest estimated return or any other strategy of your choice. Calculate total return, sharpe ratio for the out of sample period.

You can choose a starting cash amount such as 100,000 dollars, and see the performance of your trading strategy on this amount

Suppose the factor predicted at time step is $F_{t+1}$, you can calculate the percentage return as $\frac{F_{t+1} - F_{t}}{F_{t}}$. You can similarly calculate the return of true factors in testing data.

To be more precise, if the return vector of factors after prediction from neural network is $[0.1, 0.05, 0.07, 0.08, -0.09]$, then we will long the first factor and short the fifth factor. If the return vector of factors our testing dataset is $[0.05, 0.08, 0.54, 0.09, 0.02]$, then our PNL will be $100,000*(-0.02 + 0.05)$ which is equal to $3000$.

You can sum this PNL across various time steps by calculating and that will give you the performace of trading strategy on entire testing data( out of sample data).

As you see the factors are tradables itself, so we don't need to worry about factor loadings in calculating the PNL. Experiment with different trading strategies based on these factors.


__Part 2 (Daily Factor Data, 15 points + bonus for strong performance)__: 

Repeat the previous two steps by working on daily factor data (same [dataset](https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html)) and compare the approaches.

For daily data, the data contains header text. Set the parameter skiprows to 3 in pd.read_csv to work around this issue.

Feel free to use your judgement on this problem and make assumptions anywhere you get stuck; however, you need to clearly document your assumptions in the notebook. The design of the problem is made open-ended intentionally

__Don't forget to use a lot of regularization to minimize the risk of overfitting__

__Part 3 (Transformer, 25 points + bonus for strong performance)__

Implement a Transformer model instead of an RNN/GRU/LSTM for Part 1 and 2. You are welcome to use code from the labs as starter code, or you can use torch.nn.Transformer. Again, you can either train a transformer for each factor separately, or a single transformer for all of them. You need to choose a context window and engineer features appropriately, as discussed above and in labs. Transformers are even more prone to overfitting than RNNs and LSTMs, so you need to be extra careful about regularization here. Also compute the number of parameters for each transformer model that you train. There are extra points for significant improvements over the baseline as well as over a typical RNN/LSTM model.

**Again, include only the model with simple features and any other more successful feature engineering choices in your submission.**