<h1>One-day Stock Price Forecasting Using Deep Neural Netrowks</h1>

<h1>Table of Contents</h1>

<ul>
    <li><a href="#Description">Problem Description</a></li>
    <li><a href="#Data Preparation">Data Preparation</a></li>
    <li><a href="#Model">Model</a></li>
    <li><a href="#Train">Training</a></li>
    <li><a href="#Results">Results</a></li>
    <li><a href="#Source">Source code</a></li>
</ul>
<hr>

<h2 href="#Problem Description">Problem Description</h2>

Deep Learning neural networks are capable of modeling a process by learning from a given dataset. That is, a neural network is a function $\hat{y}=f(x|W)$ that can be used to approximate most fucntions using a set of trainable parameters.

If a given neural netrowk can be trained to mimic the behaviour of some time series $s(t)$ we can use such network to make predictions for a given time. More formally, we want to perform univariate one-step forecasting on a dataset of daily close prices to capture the behaviour of a particular stock.

\begin{align}
\hat{s}(t) = f(s(t-1),s(t-2),...,s(t-l)|W)\\
\end{align}

Here, $W$ is the set of trainable weights that are used inside the network to map a feature vector to a target (input to output). We represent the daily close prices as follows,

\begin{align}
d = \{(\mathbf{x}_i,y_i)\}^{n-1}_{i=0}\\
\end{align}

The dataset has $n$ tuples $(\mathbf{x}_i,y_i)$ representing the $\textit{i-th}$ feature vector $\mathbf{x}_i=[s(t-1), s(t-2), ..., s(t-l)]$ and the scalar training sample $y_i=s(t)$. The parameter $l$ controls the number of past samples included in the forecast.

Note that $d$ is not the original time series $s(t)$, we first have to re-frame the raw data set to one suitable for a supervised learning problem by transforming the univariate time series into a multivariate one via time delay embedding,

\begin{equation}
X = 
\begin{bmatrix}
s_0 & s_1 & s_2 & \cdots & s_{l-1}\\
s_1 & s_2 & s_3 &\cdots & s_{l}\\
s_2 & s_3 & s_4 & \cdots & s_{l+1}\\
\vdots & \vdots & \vdots & \ddots & \vdots \\
s_{m-l} & s_{m-l+1} & s_{m-l+2} & \cdots & s_{m-1}\\
\end{bmatrix} 
\end{equation}

Here, $m$ is the last sample in the original data set and the feature vectors are the rows of $X$. Simmlarly, the targets are the following scalar entries,

\begin{equation}
Y = 
\begin{bmatrix}
s_l\\
s_{l+1}\\
s_{l+2}\\
\vdots\\
s_{m}
\end{bmatrix}
\end{equation}

<h2 href="#Data Preparation">Data Preparation</h2>

We need to make the supervised learning dataset and create a Dataset class to work with DataLoader.

```python
class DataSupervised(Dataset):
    '''
        Custom dataset to work with DataLoader for supervised learning.
        
        file_name (str): Path to csv file.
        target_cols (int): Number of steps (forecasts), one by default (last column).
        train (bool): Train (odd samples) or test split (even samples), True by default.
    '''    
    
    def __init__(self, file_name, target_cols=1, train=True):
        
        stock_supervised = pd.read_csv(file_name).values
        X_train, X_test = train_test_split(stock_supervised,test_size=0.2)         
        if train:
            self.X = torch.FloatTensor(X_train[:,:-target_cols])
            self.Y = torch.FloatTensor(X_train[:,-target_cols])
            if target_cols == 1:
                self.Y = self.Y.unsqueeze(1)            
        else:
            self.X = torch.FloatTensor(X_test[:,:-target_cols])     
            self.Y = torch.FloatTensor(X_test[:,-target_cols])
            if target_cols == 1:
                self.Y = self.Y.unsqueeze(1)                   
            
        self.n_samples = self.X.shape[0]
        
    def __getitem__(self, index):
        return self.X[index], self.Y[index] 
    
    def __len__(self):
        return self.n_samples 
```

<h2 href="#Model">Model</h2>

One of the simplest architectures for a multi-dimensional inputs is the [multi-layer perceptron](https://en.wikipedia.org/wiki/Multilayer_perceptron). It is composed of an input layer followed by a variable number of fully connected layers and nodes, and the output layer.

<p>The number of neurons in the input layer corresponds to the number of features the model uses, while the number of nodes in the output layer tells us how many samples we are forecasting. The number of neurons in the hidden layer translates to the complexity of the model. A model with too many hidden neurons could result in a model that is too complex for the dataset and thus lead to overfitting. In contrast, a model with too few hidden neurons can fail to capture the complexity of the data and result in underfitting.</p>

<p>We define our neural network as a class that allows for custom number of hidden layers, number of neurons per hidden layer, and choice of non-linear transformation in the hidden layer.</p>

```python
class ANN(nn.Module):
    """Artificial Neural Network
    
        MLP with tanh activation function for the hidden layers and linear transformation
        for the output layer by default.
        
        Layers -- (list) Numbers of neurons in each layer.
    """
    
    def __init__(self, Layers):
        super(ANN, self).__init__()  
        self.hidden = nn.ModuleList()
        
        for input_size, output_size in zip(Layers, Layers[1:]):
            linear = nn.Linear(input_size, output_size)
            self.hidden.append(linear)
    
    def forward(self, x):
        layers = len(self.hidden)
        for (layer, linear_transform) in zip(range(layers), self.hidden):
            if layer < layers - 1:
                x = torch.tanh(linear_transform(x))
            else:
                x = linear_transform(x)
        return x
```

![vanilla_MLP](figures/stocks_vanilla_MLP.png)
*Multi-dimensional input fully connected neural network with $p$ perceptrons or neurons in a single hidden layer. The nodes represent the inputs, activations or outputs and the edges represent the weights and biases. The biases are assumed to be zero for simplicity. The yellow circles represent a linear transformation while the blue rectangle represents a non-linear transformation (activation function).*

<h2 href="#Train">Training</h2>

To train the network there are a couple of things we need to define:
* The loss function
* How the data gets fed to the network

<p> we are modeling a time-series via non-linear regression, thus using the mean squared error (MSE) is a good starting point to measure the performance of the network.</p>

\begin{align}
\mathit{MSE}:=\frac{1}{n} \sum_{i=0}^{n-1} (\mathbf{y}_{i}-\hat{\mathbf{y}}_{i})^2 
\end{align}

<p>Where $\hat{\mathbf{y}}_{i}$ is the i-th estimated value. Notice that the definition above averages over all training samples for a given epoch.</p>

<p>For very large datasets it is sometimes conveient to feed the training samples in batches to avoid computing the gradient over all samples. This approximationg approaches the true gradient as the batch size tends to the whole dataset. As a consequence the MSE is computing for a given batch.</p>

<p>Lastly, we shuffle the training samples at each epoc to increase the robustness of the model. This ensures that the model is not fed the same batch at later epochs</p>

[What is batch size in neural networks](https://stats.stackexchange.com/questions/153531/what-is-batch-size-in-neural-network) <br>
[Why should the data be shuffled for machine learning tasks](https://datascience.stackexchange.com/questions/24511/why-should-the-data-be-shuffled-for-machine-learning-tasks) <br>

```python
def train_series(train_loader, model, criterion, optimizer, epochs=10, test_loader=None, display_batch=False):
    """Train and test (optional) a time-series model, returns the loss at a given epoch.

    train_loader -- DataLoader object with the training dataset.
    model -- Neural Network to be trained.
    criterion -- Loss function.
    optimizer -- optimization algorithm to update the network weights.
    epochs -- Number of forward and backward passes on the whole dataset.
    test_loader -- DataLoader object with the test dataset (default None).
    display_batch -- Display epoch, bactch index and batch length (default False).
    """
    
    model.train()
    results = {'training loss': [], 'validation error': []}   # loss at a given epoch        
    for epoch in range(epochs):
         
        total = 0 # training loss for every epoch        
        for batch_idx, (x, y) in enumerate(train_loader):             
            if display_batch: 
              print('epoch {}, batch idx {} , batch len {}'.format(epoch, batch_idx, len(y)))              
            optimizer.zero_grad()
            loss = criterion(model(x), y)
            loss.backward()
            optimizer.step()
            total += loss.item()  # cummulative loss/batch
        results['training loss'].append(total/batch_idx) # ~ loss over all training samples
        
        if test_loader is not None:
            results['validation error'].append(test_series(test_loader, model, criterion))
        
    return results
```

We also define the auxiliary method for computing the error on a test set.

```python
def test_series(test_loader, model, criterion):
    """ test a a time-series model, returns the error.
    
    test_loader -- DataLoader object with the test dataset.
    model -- Neural Network to be evaaluated.
    criterion -- Loss function.
    """        
    
    model.eval()
    error = 0   
    with torch.no_grad():
        for batch_idx, (x, y) in enumerate(test_loader):
            error += criterion(model(x), y).item()   # cummulative error/batch
    model.train()

    return error / batch_idx # ~ error over all testing samples
```

<h2 href="#Results">Results</h2>

We test the model on the [chart](https://finance.yahoo.com/quote/ANET/?guccounter=1&guce_referrer=aHR0cHM6Ly93d3cuZ29vZ2xlLmNvbS8&guce_referrer_sig=AQAAACcYE76pxfIMIu7x_2P-BSD9ufvEL1Gjz1_QKIR6uRffsIMUP_xzVzuOGJusIHMIbajTyvL-8LbgoetbBKZ1Cl6TfQToIJ5gV4x3DG9aOT3lALbP59gjIJZmmihz52uicQWfUuMDu5jnlMXgH4t3BXl_CduQBEgS-FzwuyFF_4vd) for [Arista Networks](https://www.arista.com/en/) (ANET) for the dates of 06/06/2014 to 08/18/2020. The train/test split follows an 80/20 split to mimic a production setting where we wish to predict prices in the future. Splitting the data randomly can lead to look-ahead bias.

```python
model = ANN(Layers=[in_features, 10, 10, out_features])
```

![Result](figures/train_test_error_1_10_10_1.png)

* Candles Plot of stock
* Plot of estimated values

<h2 href="#Source">Source code</h2>

* [Python script](https://github.com/Randomized-Neurais-Learners/FinML/blob/master/Forecasting/StocksANN.py)
* [Data request](https://github.com/Randomized-Neurais-Learners/FinML/blob/master/Data/getHistoricalDaily.py)
* [Data preparation](https://github.com/Randomized-Neurais-Learners/FinML/blob/master/Data/series_to_supervised.py)
* [FinML module](https://github.com/Randomized-Neurais-Learners/FinML/blob/master/FinML.py)