# Understanding RNN structure
- Distinguished from feedforward nets, RNNs are structures that can well handle data with "sequential" format by preserving previous "state" 
- Thus, grasping concepts of **"sequences"** and (hidden) **"states"** in RNNs is crucial

<br>
<img src="http://karpathy.github.io/assets/rnn/charseq.jpeg" style="width: 500px"/>

In [56]:
import numpy as np
from keras.models import Model, Sequential
from keras.layers import *

## 1. SimpleRNN 

Input shape of SimpleRNN should be 3D tensor => (batch_size, timesteps, input_dim)
- **batch_size**: ommitted when creating RNN instance (== None). Usually designated when fitting model.
- **timesteps**: number of input sequence per batch
- **input_dim**: dimensionality of input sequence

In [62]:
# for instance, consider below array
x = np.array([[
             [1,    # => input_dim 1
              2,    # => input_dim 2 
              3],   # => input_dim 3     # => timestep 1                            
             [4, 5, 6]                   # => timestep 2
             ],                                  # => batch 1
             [[7, 8, 9], [10, 11, 12]],          # => batch 2
             [[13, 14, 15], [16, 17, 18]]        # => batch 3
             ])

In [64]:
print('(Batch size, timesteps, input_dim) = ',x.shape)

(Batch size, timesteps, input_dim) =  (3, 2, 3)


In [80]:
# rnn = SimpleRNN(50)(Input(shape = (10,))) => error
# rnn = SimpleRNN(50)(Input(shape = (10, 30, 40))) => error
rnn = SimpleRNN(50)(Input(shape = (10, 30)))

**return_state** = **return_sequences** = **False** ====> output_shape = **(batch_size = None, num_units)**

In [68]:
rnn = SimpleRNN(50)(Input(shape = (10, 30)))
print(rnn.shape)

(?, 50)


**return_sequences = True** ====> output_shape = **(batch_size, timesteps, num_units)**

In [79]:
rnn = SimpleRNN(50, return_sequences = True)(Input(shape = (10, 30)))
print(rnn.shape)

(?, ?, 50)


return_state = True ===> outputs list of tensor: **[output, state]**
- if return_sequences == False     =>>    output_shape = (batch_size, num_units)
- if return_sequences == True      =>>    output_shape = (batch_size, timesteps, num_units)

In [77]:
rnn = SimpleRNN(50, return_sequences = False, return_state = True)(Input(shape = (10, 30)))
print(rnn[0].shape)         # shape of output
print(rnn[1].shape)         # shape of last state

(?, 50)
(?, 50)


In [78]:
rnn = SimpleRNN(50, return_sequences = True, return_state = True)(Input(shape = (10, 30)))
print(rnn[0].shape)         # shape of output
print(rnn[1].shape)         # shape of last state

(?, ?, 50)
(?, 50)


Current output and state can be unpacked as below

In [81]:
output, state = SimpleRNN(50, return_sequences = True, return_state = True)(Input(shape = (10, 30)))

In [82]:
print(output.shape)
print(state.shape)

(?, ?, 50)
(?, 50)


## 2. LSTM
- Outputs of LSTM are quite similar to those of RNNs, but there exist subtle differences
- If you compare two diagrams below, there is one more type of "state" that is preserved to next module

<br>
<img src="http://colah.github.io/posts/2015-08-Understanding-LSTMs/img/LSTM3-SimpleRNN.png" style="width: 500px"/>

<center> Standard RNN </center>

<br>
<img src="http://colah.github.io/posts/2015-08-Understanding-LSTMs/img/LSTM3-chain.png" style="width: 500px"/>

<center> LSTM </center>

In addition to "hidden state (ht)" in RNN, there exist "cell state (Ct)" in LSTM structure

<br>
<img src="http://colah.github.io/posts/2015-08-Understanding-LSTMs/img/LSTM3-focus-o.png" style="width: 500px"/>

<center> Hidden State </center>

<br>
<img src="http://colah.github.io/posts/2015-08-Understanding-LSTMs/img/LSTM3-focus-C.png" style="width: 500px"/>

<center> Cell State </center>

In [83]:
lstm = LSTM(50)(Input(shape = (10, 30)))

In [85]:
print(lstm.shape)

(?, 50)


In [86]:
lstm = LSTM(50, return_sequences = False, return_state = True)(Input(shape = (10, 30)))
print(lstm[0].shape)         # shape of output
print(lstm[1].shape)         # shape of hidden state
print(lstm[2].shape)         # shape of cell state

(?, 50)
(?, 50)
(?, 50)


In [87]:
lstm = LSTM(50, return_sequences = True, return_state = True)(Input(shape = (10, 30)))
print(lstm[0].shape)         # shape of output
print(lstm[1].shape)         # shape of hidden state
print(lstm[2].shape)         # shape of cell state

(?, ?, 50)
(?, 50)
(?, 50)


In [88]:
output, hidden_state, cell_state = LSTM(50, return_sequences = True, return_state = True)(Input(shape = (10, 30)))

In [89]:
print(output.shape)
print(hidden_state.shape)
print(cell_state.shape)

(?, ?, 50)
(?, 50)
(?, 50)


## 3. GRU
- GRU, Popular variant of LSTM, does not have cell state
- Hence, it has only hidden state, as simple RNN

In [90]:
gru = GRU(50)(Input(shape = (10, 30)))

In [91]:
print(gru.shape)

(?, 50)


In [93]:
gru = GRU(50, return_sequences = False, return_state = True)(Input(shape = (10, 30)))
print(gru[0].shape)         # shape of output
print(gru[1].shape)         # shape of hidden state

(?, 50)
(?, 50)


In [94]:
gru = GRU(50, return_sequences = True, return_state = True)(Input(shape = (10, 30)))
print(gru[0].shape)         # shape of output
print(gru[1].shape)         # shape of hidden state

(?, ?, 50)
(?, 50)


In [95]:
output, hidden_state = GRU(50, return_sequences = True, return_state = True)(Input(shape = (10, 30)))

In [96]:
print(output.shape)
print(hidden_state.shape)

(?, ?, 50)
(?, 50)
