# Gated Recurrent Unit

credits: 
- [Empirical Evaluation of
Gated Recurrent Neural Networks
on Sequence Modeling](https://arxiv.org/pdf/1412.3555v1.pdf)

- [Understanding GRU Networks](https://towardsdatascience.com/understanding-gru-networks-2ef37df6c9be)

## What are GRUs?
The gated recurrent unit is a variation of RNNs which is similar to long-short term memory. Both GRUs and LSTMs focus on delivering a memory unit to the RNN to make it more efficient in situations where the network has to remember something in the past to solve the sequence.

### GRU vs LSTM
GRU use less training parameters and therefore less memory, execute faster and train faster than LSTM's whereas LSTM is more accurate on dataset using longer sequence. In short, **if the sequence is large or accuracy is critical, go for LSTM whereas for less memory consumption and faster operation use GRU.**

**Similarities**

Unline the traditional recurrent unit which always replaces the activation in place of a new value computed from the current input and previous hidden state. Both LSTM and GRUs keep the existing content and add new content on top of it.

**Differences**

LSTM units have the ability to control the exposure of the memory content. The amount of memory content that is seen or used by other units is controlled by the output gate. On the other hand GRU exposes all of its content without any control.

LSTM unit controls the amount of new memory content being added to the memory cell independently from the forget gate. On the other hand, GRU controls the information flow from previous activation but does not independently control the amount of activation being added.

![Diagram12.png](attachment:Diagram12.png)

## Gates
**Update gate**
Decides how much of the unit updates its activation, or content. In other words it helps the model determine of much of the past information needs to be passed along to the future. The update gate is computed by:
![Diagram13.png](attachment:Diagram13.png)

**Reset gate**
This gate is used from the model to decide how much of the past information to forget.

![Diagram14.png](attachment:Diagram14.png)

## Implementation using TensorFlow 

The built-in `keras.layers.GRU` layers enable you to quickly build recurrent models without having to make difficult configuration choices.

In [2]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

In [3]:
model = keras.Sequential()
model.add(layers.Embedding(input_dim=1000, output_dim=64))

model.add(layers.GRU(256, return_sequences=True))
model.add(layers.SimpleRNN(128))
model.add(layers.Dense(10))
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding (Embedding)        (None, None, 64)          64000     
_________________________________________________________________
gru (GRU)                    (None, None, 256)         247296    
_________________________________________________________________
simple_rnn (SimpleRNN)       (None, 128)               49280     
_________________________________________________________________
dense (Dense)                (None, 10)                1290      
Total params: 361,866
Trainable params: 361,866
Non-trainable params: 0
_________________________________________________________________
