# Neural Network
There exists the well-known problem of the vanishing / exploding gradient problem in recurrent neural networks (RNNs). This is mechanism where the weighted matrices between layers of the NN architecture are the same, by the recurrent system. Therefore for weights below zero the gradient vanishes as the number of steps increases and explodes for weights above zero. Note that this is occurs in RNNs not deep NNs (DNNs) because DNNs can have different weighted matrices between each layer, allowing for the cancellation of divergent (to infinity) and vanishing effects. In the example of a simple neural net with no hidden layers,

\begin{align}
    x^{(n)} = W^n x^{(0)},
\end{align}

coefficients $w_{ij}$ in the weighted square matrix matrix $W$ would vanish if $w_{ij} < 0$ and explode for $w_{ij} > 0$. 

During backpropagation, the values used to update the weights in the RNN are called the gradients. These vanish by this mechanism with increase sequence length and therefore work arounds have been developed around them, which leads us to our LSTMs and GRUs.

## LSTM & GRU
As such, a long-short-term-memory (LSTM) cell, which can form gated recurral neural network architecture, in which gates act as mechanisms between layers to determine what a cell ought to forget or remember. The vanishing gradient problem is mitigated in the cells by adding an additional weighting parameter to the weighted matrices to determine the amount of data carried forward from previous steps in the learning algorithm. This is a short explanation as to our selection of cell for our NN. A comprehensive tutorial that best explains the whole theory can be found [here](https://skymind.ai/wiki/lstm) or [here](https://colah.github.io/posts/2015-08-Understanding-LSTMs/).

Replacing hidden units in a normal neural network with an LSTM cell, which includes a new connection between cells called a *cell state*. This is indicated as $C_{t-1}$ to $C_t$ below.

![image_title](https://colah.github.io/posts/2015-08-Understanding-LSTMs/img/LSTM3-C-line.png)

The full LSTM cell structure includes a configuration of sigmoid ($\sigma$) and hyperbolic tangent ($\tanh$) functions to handle memory and forgetting. These images from [colah's](https://colah.github.io/posts/2015-08-Understanding-LSTMs/) Github (also linked as the tutorial above) are pretty common in LSTM NN tutorials.

![image_title](https://colah.github.io/posts/2015-08-Understanding-LSTMs/img/LSTM3-chain.png)

Following this LSTM cell have been several thousand NN architectures. Looking at an empirical comparative study of NNs such as <cite data-cite="6461367/TRDBDLCG"></cite> indicates that the gated recurrent unit (GRU) out performs the LSTM on all tasks but in language modelling. Seeing as we are looking at stock timeseries data from the SNP 500, it looks like we'll compare a GRU's performance to the performance of an LSTM cell. 

*Note that to run TensorFlow without errors you might want to set-up tensorflow on your GPU if you aren't using a cloud shell or Kaggle Kernel / Notebook (Kaggle includes free GPU usage). A quick tutorial from a StackOverflow thread on allowing TensorFlow GPU access can be found [here](https://stackoverflow.com/questions/51306862/how-to-use-tensorflow-gpu).*

But do we need knowledge of these diagrams? Do we need to implement each of these sigmoid / hyperbolic tangent activation functions? 

No. Keras has many of these cells pre-built, and we can simply tell it the dimensions of our layers and specifically define our input data in terms of shape and size and it Keras will do the rest as we sit back and watch our model train. We just need to know which cell is best for our application, and why certain errors and other artifacts may arise in the results. 