# Recurrent Neural Network

Can take a series of input with **no predetermined limit** on size.  

Can take one or more input vectors and produce one or more output vectors and the output(s) are **influenced** not just by **weights** applied on inputs like a regular NN, but also by a **hidden state** vector representing the **context** based on prior input(s)/output(s).  

![rnn](https://miro.medium.com/max/1260/1*aIT6tmnk3qHpStkOX3gGcQ.png)

![rnn2](https://colah.github.io/posts/2015-08-Understanding-LSTMs/img/RNN-unrolled.png)

![types](https://i.imgur.com/yweicB5.png)

## Deep RNN
Four possible ways to add depth :  
1) Add **hidden states**, one on top of another, feeding the output of one to the next.  
2) We can also add additional **nonlinear hidden layers** between input to hidden state.  
3) We can increase depth in the **hidden to hidden** transition.  
4) We can increase depth in the **hidden to output** transition.  

## Bidirectional RNN
**Look into the future to fix the past.**  

![bidirectional RNN](https://miro.medium.com/max/1260/1*4boTkuSnOzkVfsvatgYthQ.png)

## Recursive Neural Network
The transitions are repeatedly applied to inputs, but **not necessarily in a sequential fashion**.  
It can operate on any hierarchical tree structure.  
Parsing through input nodes, combining child nodes into parent nodes and combining them with other child/parent nodes to create a tree like structure.  

![recursive RNN](https://miro.medium.com/max/1260/1*IbpHou3FVc5Mfw4t6XzL6w.png)

## Encoder Decoder Sequence to Sequence RNN
Used a lot in translation services.  
Two RNNs :  
- One an **encoder** that keeps updating its hidden state and produces a final single “Context” output.  
- This is then fed to the **decoder**, which translates this context to a sequence of outputs.  

![edss RNN](https://miro.medium.com/max/1260/1*EtPN2quUtNhl156ebppRPQ.png)

## Vanishing gradient

![vanishing gradient](https://i.imgur.com/5iQdihD.png)

## Long Short Term Memory Networks 
Capable of learning **long-term dependencies**.  

LSTMs have a chain of repeating modules of neural network. Instead of having a single neural network layer, there are four, interacting in a very special way.

![lstm](https://colah.github.io/posts/2015-08-Understanding-LSTMs/img/LSTM3-chain.png)
![lstm legende](https://colah.github.io/posts/2015-08-Understanding-LSTMs/img/LSTM2-notation.png)

In the above diagram  
- Each line carries an entire vector, from the output of one node to the inputs of others.  
- The pink circles represent pointwise operations, like vector addition, while the yellow boxes are learned neural network layers.  
- Lines merging denote concatenation, while a line forking denote its content being copied and the copies going to different locations.  

### Step by Step

The key to LSTMs is the **cell state**, the **horizontal line running through the top** of the diagram.  

The cell state is kind of like a conveyor belt. It runs straight down the entire chain, with only some minor linear interactions. It’s very easy for information to just flow along it unchanged.  

1) Decide what information we’re going to throw away from the cell state. This decision is made by a sigmoid layer called the “forget gate layer.” It looks at ht−1 and xt, and outputs a number between 0 and 1 for each number in the cell state Ct−1. A 1 represents “completely keep this” while a 0 represents “completely get rid of this.”  
![step1](https://colah.github.io/posts/2015-08-Understanding-LSTMs/img/LSTM3-focus-f.png)

2) Decide what new information we’re going to store in the cell state. This has two parts. First, a sigmoid layer called the “input gate layer” decides which values we’ll update. Next, a tanh layer creates a vector of new candidate values, C̃ t, that could be added to the state. In the next step, we’ll combine these two to create an update to the state.  
![step2](https://colah.github.io/posts/2015-08-Understanding-LSTMs/img/LSTM3-focus-i.png)

3) Update the old cell state, Ct−1, into the new cell state Ct. The previous steps already decided what to do, we just need to actually do it. We multiply the old state by ft, forgetting the things we decided to forget earlier. Then we add it∗C̃ t. This is the new candidate values, scaled by how much we decided to update each state value.  
![step3](https://colah.github.io/posts/2015-08-Understanding-LSTMs/img/LSTM3-focus-C.png)

4) Decide what we’re going to output. This output will be based on our cell state, but will be a filtered version. First, we run a sigmoid layer which decides what parts of the cell state we’re going to output. Then, we put the cell state through tanh (to push the values to be between −1 and 1) and multiply it by the output of the sigmoid gate, so that we only output the parts we decided to.  
![step4](https://colah.github.io/posts/2015-08-Understanding-LSTMs/img/LSTM3-focus-o.png)