# Building A Prediction Model using Neural Networks

Part 2 of Q2 focuses on the topic of neural networks. Specifically, you will be building a neural network regression model to predict the stock returns using reported ratios from earnings announcement. Through this short exercise, you will get familiar with the basics of neural networks using [scikit-learn](https://scikit-learn.org/stable/), which is an open-source machine learning library for Python. Data are obtained from [WRDS](https://wrds-www.wharton.upenn.edu/) and are pre-processed for this exercise.

## Neural Networks

Neural networks are flexible ML algorithms commonly used in both classification and regression tasks, which are characterized by significant non-linearities and complex interactions amongst variables in the feature set. For the neural network model in `sklearn`, the architecture underlying the neural network model is the **multi-layer perceptron (MLP)** algorithm.

The MLP algorithm learns a function $f(\cdot):R^m \rightarrow R^o$ by training on a dataset with an $m$-dimensional input $X = x_1, x_2, ..., x_n$, and an $o$-dimensional output $y$. The basic structure of the MLP algorithm (taken from the [scikit-learn](https://scikit-learn.org/stable/) website) is shown below:

<img src="https://scikit-learn.org/stable/_images/multilayerperceptron_network.png" width="400">

The leftmost layer is the *input* layer, consisting of a set of neurons $\{x_1, x_2, ..., x_n\}$ that represent the input features. 

In between the input layer and the output layer are *hidden layers*, where learning occurs by transforming the values from the previous layer into a weighted linear summation $w_1 x_1 + w_2 x_2 + ... w_n x_n$. 

An *activation function* $g(\cdot):R \rightarrow R$ then transforms the sum and feeds it to the output layer. Note that the above diagram has only one hidden layer, while most neural networks for practical use have 2 or more hidden layers, each feeding the output from the previous layer to the next one, in a transmission process called *forward propagation*.

When training a neural network, a neural network's predictions are compared to the actual values of labeled data, and then evaluated based on some performance measure that calculates the error of the neural network. Then, the weights for each layer of the neural network is adjusted to reduce the error of the network, and the process is repeated for a specified number of iterations, in which a neural network gradually learns and adjusts its weights to create predictions that better fit the training set. The *learning rate* affects the magnitude of adjustments for each iteration of the training process. 

In regards to the hyperparameters of the neural network, we can specify the exact activation function used to transform the output at each layer, such as `relu` or `tanh`, as well as the *solver* used for weight optimization, such as `lbfgs`, `sgd`, and `adam`. Furthermore, we can specify the number of iterations and learning rate for the training process. For the latter, we can specify both the initial learning rate, as well as whether the learning rate is constant or adaptive to decreases in training loss.

Let's first import the libraries. We will use the scikit-learn package to run LASSO in Python. If you are not using the Anaconda distribution, you might need to navigate to [this page](https://scikit-learn.org/stable/install.html) and follow the instructions to install the correct version for your system.

In [None]:
import numpy as np
import pandas as pd
from sklearn.neural_network import MLPRegressor
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import GridSearchCV

Load in the first dataset on daily stock returns.

In [None]:
security = pd.read_csv("PSet2Q2data_stock.csv")

security

Load in the second dataset on firm financial ratios.

In [None]:
ratios = pd.read_csv("PSet2Q2data_firm.csv")

ratios

Merge the datasets by date and get rid of missing values and redundant columns.

In [None]:
merged = ratios.merge(security, on='public_date').dropna(axis=1)
merged['ret2'] = merged['ret']
merged = merged.drop(['ret', 'id', 'date'], axis=1)

merged

Partition the dataset into a training set (roughtly 90% of data) and a test set (roughly 10% of data). Since stock data is time-series, we can only use past information as training data.

In [None]:
x = merged.iloc[:, 3:-1]
y = merged.iloc[:,-1]

x_tr, x_te, y_tr, y_te = train_test_split(x, y, test_size = 0.1)

Next, we need to note that the Multi-layer Perceptron model for our neural network is highly sensitive to feature scaling. Thus, we first need to pre-process and scale our data in order to ensure we get satisfactory results. We first call `StandardScaler` in order to build the scaling method to ensure each variable in our feature set of independent variables has mean 0 and variance 1.

In [None]:
scaler = StandardScaler()

scaler.fit(x_tr)

Next, use the scaler to scale the x variables in both the training and test sets.

In [None]:
#<GRADED>

#</GRADED>

### Training - Neural Network
Now we can begin the training process. First, let *clf* be a MLP regression model with 4 hidden layers of size 250, 150, 150, and 100. It also uses the `tanh` activation function for the hidden layer, and the `adam` solver for weight optimization.

In [None]:
clf = MLPRegressor(hidden_layer_sizes=(250,150,150,100),
                       max_iter = 300, activation = 'tanh',
                       solver = 'adam', random_state = 1)

Fit the model on the training dataset.

In [None]:
#<GRADED>

#</GRADED>

Next, use the fitted model to create your predictions and calculate the score of the model.

In [None]:
#<GRADED>

#</GRADED>

The score for our current model is very low, so let us see if we can find a better model. Using grid search, we can perform hyperparameter optimization to find if there is a better set of hidden layers, activation functions, solvers, learning rates, and other hyperparameters that can give better results to our model. 

First, we specify the grid of hyperparameters to train our neural network model over.

In [None]:
param_grid = {
    'hidden_layer_sizes': [(250,150,150,100), (100,100,100,100), (150,100,100,50), (250,150,50), (150,100,50)],
    'max_iter': [300, 500],
    'activation': ['tanh', 'relu'],
    'solver': ['sgd', 'adam'],
    'alpha': [0.0001, 0.005],
    'learning_rate': ['constant','adaptive'],
}
grid = GridSearchCV(clf, param_grid, n_jobs= -1, cv=5)

Next, we fit the model on our training set over the entire grid of hyperparameters, which will give us the set of hyperparameters that gives the best results during training.

In [None]:
grid.fit(x_tr, y_tr)

print(grid.best_params_) 

Use the fitted model from the grid search to predict y. Store the predictions in *grid_pred*, and output the score for the model.

In [None]:
#<GRADED>

#</GRADED>

Briefly describe the model that gave the best results from hyperparameter optimization, with as much detail about the specifics of the hyperparameters..

In [None]:
#<GRADED>
```

```
#</GRADED>