## RNN State Test using Keras 

In this notebook I will be demostrating the difference between return_seqences and return_states in the Keras API for recrrent layers. I will be using both LSTM and GRU cells for this purpose.

### Basic Imports 

In [1]:
from keras.models import Model
from keras.layers import Input, LSTM, GRU
import numpy as np

Using TensorFlow backend.


In [2]:
T = 10 # seqence length or time-steps
D = 2 # Input dimension or features 
M = 4 # Hidden layer size 

In [3]:
X = np.random.randn(1,T,D)

In [4]:
X.shape

# This is the input shape that a RNN cell expects 

# The input must be a 3-D tensor of shape (batch,time-steps,features)

(1, 10, 2)

### LSTM 

In [5]:
def lstm_1():
    '''
    This functions defines a LSTM cell and returns its output,hidden state and cell state.
    The return_state is set to True and return_seqences is set to False.
    '''
    
    input_ = Input(shape=(T,D))
    rnn = LSTM(units=M,return_state=True,return_sequences=False)
    x = rnn(input_)
    
    model = Model(inputs=input_ , outputs= x)
    o,h,c = model.predict(X)
    
    print('Output:',o)
    print('\n')
    print('Output Shape:',o.shape)
    print('\n')
    print('Hidden State:',h)
    print('\n')
    print('Hidden State Shape:',h.shape)
    print('\n')
    print('Cell State:',c)
    print('\n')
    print('Cell State shape:',c.shape)

In [7]:
def lstm_2():
    '''
    This functions defines a LSTM cell and returns its output,hidden state and cell state.
    The return_state is set to True and return_seqences is set to True.
    '''
    
    input_ = Input(shape=(T,D))
    rnn = LSTM(units=M,return_sequences=True,return_state=True)
    x = rnn(input_)
    
    model = Model(input_,x)
    o,h,c = model.predict(X)
    
    print('Output:',o)
    print('\n')
    print('Output Shape:',o.shape)
    print('\n')
    print('Hidden State:',h)
    print('\n')
    print('Hidden State Shape:',h.shape)
    print('\n')
    print('Cell State:',c)
    print('\n')
    print('Cell State shape:',c.shape)

### GRU

In [8]:
def gru_1():
    '''
    This functions defines a GRU cell and returns its output and hidden state.
    The return_state is set to True and return_seqences is set to False.
    '''
    
    input_ = Input(shape=(T,D))
    rnn = GRU(units=M,return_state=True,return_sequences=False)
    x = rnn(input_)
    
    model = Model(input_,x)
    o,h = model.predict(X)
    
    print('Output:',o)
    print('\n')
    print('Output Shape:',o.shape)
    print('\n')
    print('Hidden State:',h)
    print('\n')
    print('Hidden State Shape:',h.shape)

In [9]:
def gru_2():
    '''
    This functions defines a GRU cell and returns its output and hidden state.
    The return_state is set to True and return_seqences is set to True.
    '''
    
    input_ = Input(shape=(T,D))
    rnn = GRU(units=M,return_state=True,return_sequences=True)
    x = rnn(input_)
    
    model = Model(input_,x)
    o,h = model.predict(X)
    
    print('Output:',o)
    print('\n')
    print('Output Shape:',o.shape)
    print('\n')
    print('Hidden State:',h)
    print('\n')
    print('Hidden State Shape:',h.shape)

### LSTM output

In [11]:
print('lstm_1')
print('\n')
lstm_1()

lstm_1


Output: [[-0.13705452  0.12022021 -0.3441377   0.07776075]]


Output Shape: (1, 4)


Hidden State: [[-0.13705452  0.12022021 -0.3441377   0.07776075]]


Hidden State Shape: (1, 4)


Cell State: [[-0.22892734  0.18493187 -0.70387197  0.12077542]]


Cell State shape: (1, 4)


We can see that the output and hidden state are same but the cell state is different. Notice the shape of every object here, they are all of the shape (1,4) which corresponds to the sample and hidden dimension respectively. Here we have set the return_seqences to be False, hence the output returns only the hidden and cell state of the final seqence. 

In [12]:
print('lstm_2')
print('\n')
lstm_2()

lstm_2


Output: [[[-0.01556229  0.07676451  0.04730725  0.00345482]
  [-0.05091928  0.13667482 -0.0356027  -0.00430598]
  [-0.10059121  0.2106916  -0.10895777 -0.02924241]
  [ 0.04250218 -0.0860673   0.16977929  0.0072417 ]
  [ 0.09593287 -0.22544116  0.13827918  0.00587416]
  [ 0.04078832 -0.06416251  0.2189483   0.00339779]
  [ 0.0487584  -0.14450897  0.20324133  0.00274296]
  [ 0.06455808 -0.18928595  0.2756748   0.00673052]
  [-0.05552567 -0.00833835  0.02959559 -0.02320448]
  [-0.08483997  0.04004847 -0.06473035 -0.03224741]]]


Output Shape: (1, 10, 4)


Hidden State: [[-0.08483997  0.04004847 -0.06473035 -0.03224741]]


Hidden State Shape: (1, 4)


Cell State: [[-0.19424611  0.07681724 -0.16359454 -0.06385694]]


Cell State shape: (1, 4)


We can already see the difference when we set return_seqences to be True. The only difference here is the output contains the entire seqence. Notice the shape of the output (1,10,4) which corrensponds to batch,time-step and hidden dimension respectively. When we are stacking multiple layers of LSTM it is necessary to specify return_seqences to be True so that input to the second LSTM layer will be a 3-D tensor with time dimension on the 2nd axis. Here, we can also see that the last row of the output is same as that of the hidden state, this proves that when we set return_seqences False only the last hidden state value is returned as output. 

### GRU output

In [13]:
print('gru_1')
gru_1()

gru_1
Output: [[ 0.43486446  0.4459237   0.11545931 -0.26306632]]


Output Shape: (1, 4)


Hidden State: [[ 0.43486446  0.4459237   0.11545931 -0.26306632]]


Hidden State Shape: (1, 4)


In [14]:
print('gru_2')
gru_2()

gru_2
Output: [[[-0.03148022  0.12049837  0.18103163  0.18454805]
  [-0.19749367  0.04313508  0.2563645  -0.09047672]
  [-0.41434777 -0.14893222  0.31628466 -0.6242127 ]
  [ 0.28279668 -0.00580456 -0.25375867 -0.3491187 ]
  [ 0.39277342 -0.20834535 -0.45176262 -0.33512896]
  [ 0.30699134  0.05046801 -0.0405878   0.07690015]
  [ 0.31485742 -0.08253266 -0.23865432 -0.02356813]
  [ 0.3808537  -0.06204725 -0.36540538  0.06372052]
  [-0.06494007 -0.04870018  0.03072032 -0.36238673]
  [-0.19713812 -0.12900753  0.09717467 -0.49170685]]]


Output Shape: (1, 10, 4)


Hidden State: [[-0.19713812 -0.12900753  0.09717467 -0.49170685]]


Hidden State Shape: (1, 4)


The theory of LSTM can be extended to GRU as well. The only difference here is the absence of a cell state. This is understandable since GRU uses only one update gate to produce the next hidden state where as LSTM uses both forget gate and input gate to generate the next cell state which inturn is used to produce the next hidden state. Basically GRU incorporates the cell and hidden state into one hidden state where as in LSTM they two seperate states. 