# Problem description

Deep Learning neural networks are capable of modeling a process by learning from a given dataset. That is, a neural network is a function $\hat{y}=f(x|W)$ that can be used to approximate most fucntions using a set of trainable parameters.

If a given neural netrowk can be trained to mimic the behaviour of some time series $s(t)$ we can use such network to make predictions for a given time. More formally, we want to perform univariate one-step forecasting on a dataset of daily close prices to capture the behaviour of a particular stock.

\begin{align}
\hat{s}(t) = f(s(t-1),s(t-2),...,s(t-l)|W)\\
\end{align}

Here, $W$ is the set of trainable weights that are used inside the network to map a feature vector to a target (input to output). We represent the daily close prices as follows,

\begin{align}
d = \{(\mathbf{x}_i,y_i)\}^{n}_{i=0}\\
\end{align}

The dataset has $n+1$ tuples $(\mathbf{x}_i,y_i)$ representing the $\textit{i-th}$ feature vector $\mathbf{x}_i=[s(t-1), s(t-2), ..., s(t-l)]$ and the scalar training sample $y_i=s(t)$. The parameter $l$ controls the number of past samples included in the forecast.

Note that $d$ is not the original time series $s(t)$, we first have to re-frame the raw data set to one suitable for a supervised learning problem.

\begin{equation}
X = 
\begin{bmatrix}
s_0 & s_1 & s_2 & \cdots & s_{l-1}\\
s_1 & s_2 & s_3 &\cdots & s_{l}\\
s_2 & s_3 & s_4 & \cdots & s_{l+1}\\
\vdots & \vdots & \vdots & \ddots & \vdots \\
s_{m-l} & s_{m-l+1} & s_{m-l+2} & \cdots & s_{m-1}\\
\end{bmatrix} ,
%
Y = 
\begin{bmatrix}
s_l\\
s_{l+1}\\
s_{l+2}\\
\vdots\\
s_{m}
\end{bmatrix}
\end{equation}


Here, $m$ is the number of samples minus one (to account for zero-indexing) in the original data set. The feature vectors are the rows of $X$ and the targets are the entries of $Y$.

# Libraries

In [None]:
import torch                    # Package with data structures for tensors and their mathematical operations.
import torch.nn as nn           # Package for building and training neural networks.

# Make dataset

# Model

We start with the simplest architecture for a multi-dimensional input neural network, a "vanilla" multi-layer perceptron with three (fully connected) layers of nodes (input, hidden, output). Note that the ANN class allows for custom number of hidden layers and neurons per hidden layer, and choice of non-linear transformation in the hidden layer.


![vanilla_MLP](figures/stocks_vanilla_MLP.png)
*Multi-dimensional input fully connected neural network with $p$ perceptrons or neurons in the hidden layer. The nodes represent the inputs, activations or outputs and the edges represent the weights and biases. The biases are assumed to be zero for simplicity. The yellow circles represent a linear transformation while the blue rectangle represents a non-linear transformation (activation function).*

The number of neurons in the inout layer corresponds to the number of features the model uses, while the number of nodes in the output layer tells us how many samples we are forecasting. The number of neurons in the hidden layer translates to the complexity of the model. A model with too many hidden neurons could result in a model that is too complex for the dataset and thus lead to overfitting (cite). In contrast, a model with too few hidden neurons can fail to capture the complexity of the data and result in underfitting (cite).

In [None]:
class ANN(nn.Module):
    def __init__(self, Layers):
        super(ANN, self).__init__()        
        for input_size, output_size in zip(Layers, Layers[1:]):
            linear = nn.Linear(input_size, output_size)
            self.hidden.append(linear)
    
    def forward(self, x):
        layers = len(self.hidden)
        for (layer, linear_transform) in zip(range(layers), self.hidden):
            if layer < layers - 1:
                x = torch.tanh(linear_transform(x))
            else:
                x = linear_transform(x)
        return x

# Training function