# Understanding RNN structure
- Distinguished from feedforward nets, RNNs are structures that can well handle data with "sequential" format by preserving previous "state" 
- Thus, grasping concepts of **"sequences"** and (hidden) **"states"** in RNNs is crucial

<br>
<img src="https://github.com/DevavratSinghBisht/neural-networks/blob/main/2.RNN/Assets/rnn_structure.jpeg?raw=true" style="width: 500px"/>

In [23]:
import numpy as np
from keras.models import Model, Sequential
from keras.layers import *

## 1. SimpleRNN 

Input shape of SimpleRNN should be 3D tensor => (batch_size, timesteps, input_dim)
- **batch_size**: ommitted when creating RNN instance (== None). Usually designated when fitting model.
- **timesteps**: number of input sequence per batch
- **input_dim**: dimensionality of input sequence

In [24]:
# for instance, consider below array
x = np.array([[
             [1,    # => input_dim 1
              2,    # => input_dim 2 
              3],   # => input_dim 3     # => timestep 1                            
             [4, 5, 6]                   # => timestep 2
             ],                                  # => batch 1
             [[7, 8, 9], [10, 11, 12]],          # => batch 2
             [[13, 14, 15], [16, 17, 18]]        # => batch 3
             ])

In [25]:
print('(Batch size, timesteps, input_dim) = ',x.shape)

(Batch size, timesteps, input_dim) =  (3, 2, 3)


**lets assume** that our input shape is **(10, 30)** for further examples

In [26]:
# rnn = SimpleRNN(50)(Input(shape = (10,))) => error
# rnn = SimpleRNN(50)(Input(shape = (10, 30, 40))) => error
rnn = SimpleRNN(50)(Input(shape = (10, 30)))

**return_state** = **return_sequences** = **False** ====> output_shape = **(batch_size = None, num_units)**

In [27]:
rnn = SimpleRNN(50)(Input(shape = (10, 30)))
print(rnn.shape)

(None, 50)


**return_sequences = True** ====> output_shape = **(batch_size, timesteps, num_units)**

In [28]:
rnn = SimpleRNN(50, return_sequences = True)(Input(shape = (10, 30)))
print(rnn.shape)

(None, 10, 50)


return_state = True ===> outputs list of tensor: **[output, state]**
- if return_sequences == False     =>>    output_shape = (batch_size, num_units)
- if return_sequences == True      =>>    output_shape = (batch_size, timesteps, num_units)

In [29]:
rnn = SimpleRNN(50, return_sequences = False, return_state = True)(Input(shape = (10, 30)))
print(rnn[0].shape)         # shape of output
print(rnn[1].shape)         # shape of last state

(None, 50)
(None, 50)


In [30]:
rnn = SimpleRNN(50, return_sequences = True, return_state = True)(Input(shape = (10, 30)))
print(rnn[0].shape)         # shape of output
print(rnn[1].shape)         # shape of last state

(None, 10, 50)
(None, 50)


Current output and state can be unpacked as below

In [31]:
output, state = SimpleRNN(50, return_sequences = True, return_state = True)(Input(shape = (10, 30)))

In [32]:
print(output.shape)
print(state.shape)

(None, 10, 50)
(None, 50)


## 2. LSTM

<img src="https://github.com/DevavratSinghBisht/neural-networks/blob/main/2.RNN/Assets/understanding_lstm.png?raw=true" style="width: 500px"/>

- Outputs of LSTM are quite similar to those of RNNs, but there exist subtle differences
- If you compare two diagrams below, there is one more type of "state" that is preserved to next module

<br>
<img src="https://github.com/DevavratSinghBisht/neural-networks/blob/main/2.RNN/Assets/simple_rnn.png?raw=true" style="width: 500px"/>

<center> Standard RNN </center>

<br>
<img src="https://github.com/DevavratSinghBisht/neural-networks/blob/main/2.RNN/Assets/lstm.png?raw=true" style="width: 500px"/>

<center> LSTM </center>

In addition to "hidden state (ht)" in RNN, there exist "cell state (Ct)" in LSTM structure

<br>
<img src="https://github.com/DevavratSinghBisht/neural-networks/blob/main/2.RNN/Assets/lstm_hidden_state.png?raw=true" style="width: 500px"/>

<center> Hidden State </center>

<br>
<img src="https://github.com/DevavratSinghBisht/neural-networks/blob/main/2.RNN/Assets/lstm_cell_state.png?raw=true" style="width: 500px"/>

<center> Cell State </center>

In [33]:
lstm = LSTM(50)(Input(shape = (10, 30)))

In [34]:
print(lstm.shape)

(None, 50)


In [35]:
lstm = LSTM(50, return_sequences = False, return_state = True)(Input(shape = (10, 30)))
print(lstm[0].shape)         # shape of output
print(lstm[1].shape)         # shape of hidden state
print(lstm[2].shape)         # shape of cell state

(None, 50)
(None, 50)
(None, 50)


In [36]:
lstm = LSTM(50, return_sequences = True, return_state = True)(Input(shape = (10, 30)))
print(lstm[0].shape)         # shape of output
print(lstm[1].shape)         # shape of hidden state
print(lstm[2].shape)         # shape of cell state

(None, 10, 50)
(None, 50)
(None, 50)


In [37]:
output, hidden_state, cell_state = LSTM(50, return_sequences = True, return_state = True)(Input(shape = (10, 30)))

In [38]:
print(output.shape)
print(hidden_state.shape)
print(cell_state.shape)

(None, 10, 50)
(None, 50)
(None, 50)


## 3. GRU
<br>
<img src="https://github.com/DevavratSinghBisht/neural-networks/blob/main/2.RNN/Assets/gru_structure.JPG?raw=true" style="width: 500px"/>

<center> Hidden State </center>

- GRU, Popular variant of LSTM, does not have cell state
- Hence, it has only hidden state, as simple RNN

In [39]:
gru = GRU(50)(Input(shape = (10, 30)))

In [40]:
print(gru.shape)

(None, 50)


In [41]:
gru = GRU(50, return_sequences = False, return_state = True)(Input(shape = (10, 30)))
print(gru[0].shape)         # shape of output
print(gru[1].shape)         # shape of hidden state

(None, 50)
(None, 50)


In [42]:
gru = GRU(50, return_sequences = True, return_state = True)(Input(shape = (10, 30)))
print(gru[0].shape)         # shape of output
print(gru[1].shape)         # shape of hidden state

(None, 10, 50)
(None, 50)


In [43]:
output, hidden_state = GRU(50, return_sequences = True, return_state = True)(Input(shape = (10, 30)))

In [44]:
print(output.shape)
print(hidden_state.shape)

(None, 10, 50)
(None, 50)


## LSTM vs GRU

* A GRU is slightly less complex but is approximately as good as an LSTM performance-wise when trained on small amount of data.
* LSTMs give better results than GRUs when trained on a sufficiently good amount of data.
---
GRUs are generally used when you do have long sequence training samples and you want a quick and decent accuracy and maybe in cases where infrastructure is an issue. LSTMs are preferred when sequence lengths are more and some good context is there.

<img src="https://github.com/DevavratSinghBisht/neural-networks/blob/main/2.RNN/Assets/lstm_vs_gru.png?raw=true" style="width: 500px"/>