## RNN

- RNN은 인간의 생각의 persistence를 고려해 정보가 지속될 수 있도록 루프가 있는 네트워크이다.


- BUT Unfortunately, as that gap grows, RNNs become unable to learn to connect the information.


- 이론적으로는 RNN이 Long-Term Dependency를 다룰 수 있지만, 실전에서는 잘 되지 않아 LSTM을 사용한다. 그리고, RNN의 Long-Term Dependency의 이유는 다음 논문에서 자세하게 다루었다.  ["Learning Long-Term Dependencies with Gradient Descent is Difficult",  Bengio, et al. (1994)](http://ai.dinfo.unifi.it/paolo//ps/tnn-94-gradient.pdf)


- LSTM은 speech recognition, language modeling, translation, image captioning 등의 다양한 문제에서 성공을 했으며, 다음 블로그 자료를 통해 자세하게 알 수 있다.  [The Unreasonable Effectiveness of Recurrent Neural Networks](http://karpathy.github.io/2015/05/21/rnn-effectiveness/)

<img  src="./image/RNN.PNG" width="65%">


## LSTM Networks

<img  src="./image/LSTM_1.PNG" width="65%">

<img  src="./image/LSTM_2.PNG" width="55%">

- LSTM은 대놓고 Long-term을 위해 만들어졌고, 긴 시간동안 정보를 기억하는 것은 애를 쓰는 것이 아니라, LSTMs의 기본행동이다. 
    - LSTMs are explicitly designed to avoid the long-term dependency problem. Remembering information for long periods of time is practically their default behavior, not something they struggle to learn!


- RNN과 비교한 LSTM의 구조는 single neural network layer가 4개로 늘어 서로 interacting in a very special way 하는 것을 볼 수 있다.
    - Each line carries an entire vector, from the output of one node to the inputs of others. 
    - The pink circles represent pointwise operations, like vector addition.
    - Yellow boxes are learned neural network layers.
    - Lines merging denote concatenation,
    - Line forking denote its content being copied and the copies going to different locations.


### The Core Idea Behind LSTMs

- LSTM의 열쇠는 수평적으로 지나가는 Cell State이다.
    - 컨베이어벨트와 같으며, some minor linear interactions로 쭉 지나간다.    
    - 그리고 cell state에 정보를 추가하거나 제거하기 위해 Gates를 이용한다.
        - The LSTM does have the ability to remove or add information to the cell state, carefully regulated by structures called gates.

<img  src="./image/LSTM_3.PNG" width="45%">

- Gates는 정보를 선택하도록 만든 것으로, sigmoid와 pointwise multiplication 연산으로 이루어져, Sigmoid의 output이 0일 때는 "아무것도 지나가지 않도록", 1일 때는 "모두 통과하도록" 한다.

- LSTM은 Cell State를 protect하고 control하기 위한 3개의 게이트가 있다.


### Step-by-Step LSTM Walk Through
    
1) Forget Gate 
<img  src="./image/LSTM_4.PNG" width="45%">
- $ f_{t} = \sigma (W_{f} \cdot \left[ h_{t-1}, x_{t} \right] + b_{f}) $
- decide what information we’re going to throw away from the cell state.
    - sigmoid layer인 "forget gate layer"에 의해 어떤 정보를 버릴지 정해진다.
    
<br>

2) Input Gate
<img  src="./image/LSTM_5.PNG" width="45%">    
- $ i_{t} = \sigma (W_{i} \cdot \left[ h_{t-1}, x_{t} \right] + b_{i}) $
- $ \tilde{C_{t}} = tanh (W_{C} \cdot \left[ h_{t-1}, x_{t} \right] + b_{C}) $
- decide what new information we’re going to store in the cell state.
    -  A Sigmoid layer“input gate layer” decides which values we’ll update.
    -  A Tanh layer creates a vector of new candidate values, $ \tilde{C_{t}}$,  that could be added to the state.
    -  Combine these two to create an update to the state.

<br>

3) Update
<img  src="./image/LSTM_6.PNG" width="45%">    
- $ C_{t} = f_{t} * C_{t-1} + i_{t} * \tilde{C_{t}} $
- Update the old cell state, $ C_{t-1} $ , into the new cell state $ C_{t} $ 
- This is the new candidate values, scaled by how much we decided to update each state value.

<br>

4) Output Gate
<img  src="./image/LSTM_6.PNG" width="45%">    
- $ o_{t} = \sigma (W_{o} \cdot \left[ h_{t-1}, x_{t} \right] + b_{o}) $
- $ h_{t} = o_{t} * tanh(C_{t}) $
- Decide what we’re going to output    
    - This output will be based on our cell state, but will be a filtered version.    
- A sigmoid layer which decides what parts of the cell state we're going to output.
- Put the cell state through "tanh"(-1 ~ 1의 값) and multiply it by the output of the sigmoid gate, so that we only output the parts we decided to.


## GRU - Gated Recurren Unit

<img  src="./image/GRU.PNG" width="45%">

- $ z_{t} =  \sigma (W_{z} \cdot \left[ h_{t-1}, x_{t} \right]) $
- $ r_{t} =  \sigma (W_{r} \cdot \left[ h_{t-1}, x_{t} \right]) $
- $ \tilde{h_{t}} =  tanh (W \cdot \left[ r_{t} * h_{t-1}, x_{t} \right]) $
- $ h_{t} = (1-z_{t}) * h_{t-1} + z_{t} * \tilde{h_{t}} $

- It combines the forget and input gates into a single update gate. It also merges the cell state and hidden state, and makes some other changes.
- simpler than standard LSTM models, and has been growing increasingly popular.

### Variants on Long Short Term Memory 

- Peephole Connection LSTMs
- Coupling Forget and input gates

#### CF)
- LSTMs were a big step in what we can accomplish with RNNs. It's natural to wonder: is there another big step? A common opinion among researchers is: "Yes! There is a next step and it's attention!" The idea is to let every step of an RNN pick information to look at from some larger collection of information.

- Grid LSTMs
- Work using RNNs in generative models


### Reference: 

- http://colah.github.io/posts/2015-08-Understanding-LSTMs/ 
<br> Christopher Olah의 블로그

- LSTM paper: http://www.bioinf.jku.at/publications/older/2604.pdf
- GRU paper: https://arxiv.org/pdf/1406.1078v3.pdf

