<a href="https://colab.research.google.com/github/AtrCheema/Miscellaneous_DL_Tutorials/blob/master/return_sequences_vs_return_states.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Intro
This notebook describes difference between `return_sequence` and `return_state` arguments in LSTM/RNN layers of tensorflow/keras.

In [1]:
import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input
from tensorflow.keras.layers import LSTM
import numpy as np
from tensorflow.keras.layers import MaxPooling1D, Flatten, Conv1D


np.set_printoptions(suppress=True) # to suppress scientific notation while printing arrays

def reset_graph(seed=313):
    tf.compat.v1.reset_default_graph()
    tf.compat.v1.set_random_seed(seed)
    np.random.seed(seed)

tf.__version__

'2.2.0'

In [2]:
seq_len = 10
in_features = 3
batch_size = 2
units = 5

# define input data
data = np.random.normal(0,1, size=(batch_size, seq_len, in_features))
print('input shape is', data.shape)


input shape is (2, 10, 3)


In [3]:
reset_graph()

# define model
inputs1 = Input(shape=(seq_len, in_features))
lstm1 = LSTM(units)(inputs1)
model = Model(inputs=inputs1, outputs=lstm1)

# check output
output = model.predict(data)
print('output shape is ', output.shape)
print(output)

output shape is  (2, 5)
[[-0.1566722  -0.09671225 -0.07435499  0.2380382  -0.10205627]
 [ 0.0498487   0.10540111 -0.11872431  0.21326743 -0.07617775]]


### Return Sequence
If we use `return_sequence=True`, we can get hidden state which is also output, at each time step instead of just one final output.

In [4]:
reset_graph()

print('input shape is', data.shape)

# define model
inputs1 = Input(shape=(seq_len, in_features))
lstm1 = LSTM(units, return_sequences=True)(inputs1)
model = Model(inputs=inputs1, outputs=lstm1)

# check output
output = model.predict(data)
print('output shape is ', output.shape)
print(output)

input shape is (2, 10, 3)
output shape is  (2, 10, 5)
[[[-0.02143264 -0.00227202  0.06692377  0.03960254  0.06273782]
  [ 0.11968565  0.08252696  0.09503155  0.0289942   0.17401429]
  [ 0.10317614  0.03702856  0.16926843  0.06439783  0.24887747]
  [ 0.25664562  0.21938275  0.0558738   0.03023281  0.17119098]
  [ 0.10064318  0.04220012  0.15114217  0.22362679  0.03877142]
  [ 0.04813049  0.06628726  0.09269002  0.21057403  0.16603518]
  [-0.06222967 -0.02064434  0.05820563  0.17465067  0.06314038]
  [-0.0490917  -0.07563521 -0.07562283  0.1867909  -0.07249714]
  [-0.0628821  -0.13354729 -0.24176472  0.13450752 -0.2201733 ]
  [-0.1566722  -0.09671225 -0.07435499  0.2380382  -0.10205627]]

 [[-0.06512719 -0.12316308 -0.23606628 -0.09947219 -0.19536504]
  [-0.24930197 -0.31348827 -0.31217012  0.03622868 -0.14607252]
  [-0.24061005 -0.22410582 -0.08258545  0.14922056 -0.16823886]
  [-0.07348045 -0.09826367 -0.13519506  0.0518603  -0.24591225]
  [-0.10792634 -0.08394783 -0.06631595  0.123018

### Return States
If we use `return_state=True`, it will give final hidden state/output plus the cell state as well

In [5]:
reset_graph()

# define model
inputs1 = Input(shape=(seq_len, in_features))
lstm1, state_h, state_c = LSTM(units, return_state=True)(inputs1)
model = Model(inputs=inputs1, outputs=[lstm1, state_h, state_c])

# check output
_h, h, c = model.predict(data)
print('_h: shape {} values \n {}\n'.format(_h.shape, _h))
print('h: shape {} values \n {}\n'.format(h.shape, h))
print('c: shape {} values \n {}'.format(c.shape, c))

_h: shape (2, 5) values 
 [[-0.1566722  -0.09671225 -0.07435499  0.2380382  -0.10205627]
 [ 0.0498487   0.10540111 -0.11872431  0.21326743 -0.07617775]]

h: shape (2, 5) values 
 [[-0.1566722  -0.09671225 -0.07435499  0.2380382  -0.10205627]
 [ 0.0498487   0.10540111 -0.11872431  0.21326743 -0.07617775]]

c: shape (2, 5) values 
 [[-0.29146802 -0.22284117 -0.17079654  0.5928285  -0.2525362 ]
 [ 0.07988599  0.1933969  -0.30316094  0.4730413  -0.22530162]]


## using both at same time
We can use both `return_sequences` and `return_states` at same time as well.

In [6]:
reset_graph()

# define model
inputs1 = Input(shape=(seq_len, in_features))
lstm1, state_h, state_c = LSTM(units, return_state=True, return_sequences=True)(inputs1)
model = Model(inputs=inputs1, outputs=[lstm1, state_h, state_c])

# check output
_h, h, c = model.predict(data)
print('_h: shape {} values \n {}\n'.format(_h.shape, _h))
print('h: shape {} values \n {}\n'.format(h.shape, h))
print('c: shape {} values \n {}'.format(c.shape, c))

_h: shape (2, 10, 5) values 
 [[[-0.02143264 -0.00227202  0.06692377  0.03960254  0.06273782]
  [ 0.11968565  0.08252696  0.09503155  0.0289942   0.17401429]
  [ 0.10317614  0.03702856  0.16926843  0.06439783  0.24887747]
  [ 0.25664562  0.21938275  0.0558738   0.03023281  0.17119098]
  [ 0.10064318  0.04220012  0.15114217  0.22362679  0.03877142]
  [ 0.04813049  0.06628726  0.09269002  0.21057403  0.16603518]
  [-0.06222967 -0.02064434  0.05820563  0.17465067  0.06314038]
  [-0.0490917  -0.07563521 -0.07562283  0.1867909  -0.07249714]
  [-0.0628821  -0.13354729 -0.24176472  0.13450752 -0.2201733 ]
  [-0.1566722  -0.09671225 -0.07435499  0.2380382  -0.10205627]]

 [[-0.06512719 -0.12316308 -0.23606628 -0.09947219 -0.19536504]
  [-0.24930197 -0.31348827 -0.31217012  0.03622868 -0.14607252]
  [-0.24061005 -0.22410582 -0.08258545  0.14922056 -0.16823886]
  [-0.07348045 -0.09826367 -0.13519506  0.0518603  -0.24591225]
  [-0.10792634 -0.08394783 -0.06631595  0.12301818 -0.12757494]
  [-0.03

##LSTM to 1D CNN

We can put 1d cnn at the end of LSTM to further extract some features from LSTM output.

In [7]:
reset_graph()

print('input shape is', data.shape)

# define model
inputs = Input(shape=(seq_len, in_features))
lstm_layer = LSTM(units, return_sequences=True)
lstm_outputs = lstm_layer(inputs)
print('lstm output: ', lstm_outputs.shape)

conv1 = Conv1D(filters=64, kernel_size=2, activation='relu', input_shape=(seq_len, units))(lstm_outputs)
print('conv output: ', conv1.shape)

max1d1 = MaxPooling1D(pool_size=2)(conv1)
print('max pool output: ', max1d1.shape)

flat1 = Flatten()(max1d1)
print('flatten output: ', flat1.shape)

model = Model(inputs=inputs, outputs=flat1)

# check output
output = model.predict(data)
print('output shape: ', output.shape)

input shape is (2, 10, 3)
lstm output:  (None, 10, 5)
conv output:  (None, 9, 64)
max pool output:  (None, 4, 64)
flatten output:  (None, 256)
output shape:  (2, 256)


The output from LSTM/RNN looks roughly as below.
$$ 
h_t = tanh(b + Wh_{t-1} + UX_t)
$$

In [8]:
print('kernel U: ', lstm_layer.get_weights()[0].shape)  # weights of our input against every neuron in LSTM
print('recurrent kernel, W: ', lstm_layer.get_weights()[1].shape) # weights of our hidden state a.k.a the output of LSTM in the previous timestep (t-1) against every neuron in LSTM
print('bias: ', lstm_layer.get_weights()[2].shape)

kernel:  (3, 20)
recurrent kernel:  (5, 20)
bias:  (20,)


## Credits
This post is inspired from Jason Brownlee's [page](https://machinelearningmastery.com/return-sequences-and-return-states-for-lstms-in-keras/)