# Difference between ```return_sequence``` and ```return_state``` for LSTMs in TF2 Keras

As part of TF2 LSTM implementation, the Keras API provides access to both _return sequences_ and _return state_. The use and difference between these data can be confusing when designing sophisticated recurrent neural network models, such as the encoder-decoder model.


After completing this tutorial, you will know:

- __That return sequences return the hidden state output for each input time step__.

- __That return state returns the hidden state output and cell state for the last input time step__.

- __That return sequences and return state can be used at the same time__.


#### Long Short-Term Memory RNN

- Creating a layer of LSTM memory units allows you to specify the number of memory units within the layer.

- Each unit or cell within the layer has an _internal cell_ state, often abbreviated as “c“, and _outputs a hidden state_, often abbreviated as “h“.

The Keras API allows you to access these data, which can be useful or even required when developing sophisticated recurrent neural network architectures, such as the encoder-decoder model.

For the rest of this tutorial, we will look at the API for access these data.


[Tutorial](https://machinelearningmastery.com/return-sequences-and-return-states-for-lstms-in-keras/)

In [None]:
# Specify GPU to be used-
%env CUDA_DEVICE_ORDER = PCI_BUS_ID
%env CUDA_VISIBLE_DEVICES = 0

In [1]:
import tensorflow
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.layers import Flatten, BatchNormalization, LeakyReLU, Reshape
from tensorflow.keras.layers import LSTM, GRU, Input, Flatten, Dense, LSTM, RepeatVector
from tensorflow.keras.layers import Bidirectional, TimeDistributed, Dropout
# from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.models import Sequential, Model
import tensorflow.keras.backend as K
import tensorflow as tf

from sklearn.preprocessing import MinMaxScaler, StandardScaler, RobustScaler
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# import plotly
# import plotly.express as px
import pickle

### Return Sequences

__Each LSTM cell will output one hidden state h for each input__.

We can demonstrate this in Keras with _a very small model with a single LSTM layer that itself contains a single LSTM cell_.

Here, we will have _one input sample with 3 time steps and one feature observed at each time step_:
```
t1 = 0.1
t2 = 0.2
t3 = 0.3
```

In [29]:
# Initialize a LSTM layer having one LSTM cell-
lstm_layer = LSTM(units = 1, input_shape = (3, 1))

In [27]:
# Create data with one feature having 3 time steps-
X = np.array([0.1, 0.2, 0.3]).reshape((1, 3, 1))

In [28]:
X.shape

(1, 3, 1)

In [30]:
# Get output for a single hidden state for the input sequence with 3 time steps-
output = lstm_layer(X)

In [31]:
print(f"Output for a single hidden state for an input with one feature and 3 time steps = {output.numpy()[0][0]:.4f}")

Output for a single hidden state for an input with one feature and 3 time steps = 0.0210


#### Access the _hidden state output_ for each input time step

It is possible to access the hidden state output for each input time step. This can be done by setting the ```return_sequences``` attribute to True when defining the LSTM layer, as follows-

In [33]:
# Define LSTM layer to access hidden state output for each input time step-
lstm_layer = LSTM(
    units = 1, return_sequences = True,
    input_shape = (3, 1)
)

In [34]:
# Get output for input sequence with 3 time steps-
output = lstm_layer(X)

In [35]:
print(f"Output for each time step:\n{output.numpy()}")

Output for each time step:
[[[0.01174813]
  [0.03592343]
  [0.07357237]]]


Note: You must set ```return_sequences = True``` when stacking LSTM layers so that the second LSTM layer has a three-dimensional sequence input. For more details, refer to [Stacked Long Short-Term Memory Networks](https://machinelearningmastery.com/stacked-long-short-term-memory-networks/)

You may also need to access the sequence of hidden state outputs when predicting a sequence of outputs with a Dense output layer wrapped in a ```TimeDistributed``` layer. Refer to [How to Use the TimeDistributed Layer for Long Short-Term Memory Networks in Python](https://machinelearningmastery.com/timedistributed-layer-for-long-short-term-memory-networks-in-python/)


### Return States

- __The output of an LSTM cell or layer of cells is called the hidden state__. This is confusing, because __each LSTM cell retains an internal state that is not output, called the cell state, or c__.

- Generally, we do not need to access the cell state unless we are developing sophisticated models where subsequent layers may need to have their cell state initialized with the final cell state of another layer, such as in an encoder-decoder model.

Keras provides the ```return_state``` argument in the LSTM layer that will provide access to the hidden state output ```(state_h)``` and the cell state ```(state_c)```. For example-

```
lstm1, state_h, state_c = LSTM(units = 1, return_state = True)
```

This may look confusing because both ```lstm1``` and ```state_h``` refer to the same hidden state output. The reason for these two tensors being separate will become clear in the next section.

We can demonstrate access to the hidden and cell states of the cells in the LSTM layer as follows-

In [39]:
# Define LSTM model-
inputs1 = Input(shape = (3, 1))
lstm1, state_h, state_c = LSTM(units = 1, return_state = True)(inputs1)
model = Model(inputs = inputs1, outputs = [lstm1, state_h, state_c])

In [40]:
# Get LSTM's predictions-
model.predict(X)

[array([[-0.09086772]], dtype=float32),
 array([[-0.09086772]], dtype=float32),
 array([[-0.18937905]], dtype=float32)]

Running the example returns 3 arrays:

- The LSTM hidden state output for the last time step.
- The LSTM hidden state output for the last time step (again).
- The LSTM cell state for the last time step.

__The hidden state and the cell state could in turn be used to initialize the states of another LSTM layer with the same number of cells_.

In [50]:
# Sanity check (without using Model)-
lstm_layer = LSTM(
    units = 1, return_state = True,
    input_shape = (3, 1)
)

In [51]:
lstm_layer(X)

[<tf.Tensor: shape=(1, 1), dtype=float32, numpy=array([[-0.07009666]], dtype=float32)>,
 <tf.Tensor: shape=(1, 1), dtype=float32, numpy=array([[-0.07009666]], dtype=float32)>,
 <tf.Tensor: shape=(1, 1), dtype=float32, numpy=array([[-0.14744249]], dtype=float32)>]

### Return States and Sequences

We can access both the sequence of hidden state and the cell states at the same time. This can be done by configuring the LSTM layer to return both sequences and return states. An examples is:

```
lstm1, state_h, state_c = LSTM(1, return_sequences = True, return_state = True)
```



In [52]:
# Define LSTM model-
inputs1 = Input(shape = (3, 1))
lstm1, state_h, state_c = LSTM(
    units = 1, return_sequences = True,
    return_state = True)(inputs1)

model = Model(inputs = inputs1, outputs = [lstm1, state_h, state_c])

In [53]:
# Make predictions-
model.predict(X)

[array([[[-0.00882794],
         [-0.02347509],
         [-0.04204081]]], dtype=float32),
 array([[-0.04204081]], dtype=float32),
 array([[-0.07500947]], dtype=float32)]

Running the example, we can now see why the LSTM output tensor and hidden state output tensor are declared separably.

- __The layer returns the hidden state for each input time step, then separately, the hidden state output for the last time step and the cell state for the last input time step__.

- This can be confirmed by seeing that the last value in the returned sequences (first array) matches the value in the hidden state (second array).

In [54]:
# Sanity check (without using Model)-
lstm_layer = LSTM(
    units = 1, return_state = True,
    return_sequences = True, input_shape = (3, 1)
)

In [55]:
lstm_layer(X)

[<tf.Tensor: shape=(1, 3, 1), dtype=float32, numpy=
 array([[[0.00771256],
         [0.0218019 ],
         [0.04130548]]], dtype=float32)>,
 <tf.Tensor: shape=(1, 1), dtype=float32, numpy=array([[0.04130548]], dtype=float32)>,
 <tf.Tensor: shape=(1, 1), dtype=float32, numpy=array([[0.07166928]], dtype=float32)>]

### Summary

We saw that:

- return sequences return the hidden state output for each input time step.
- return state returns the hidden state output and cell state for the last input time step.
- return sequences and return state can be used at the same time.

#### Additional URLs

- [Reference](https://www.kaggle.com/code/kmkarakaya/lstm-output-types-return-sequences-state/notebook)

- [Reference2](https://sanjivgautamofficial.medium.com/lstm-in-keras-56a59264c0b2)

- [Reference3](https://www.dlology.com/blog/how-to-use-return_state-or-return_sequences-in-keras/)

- [StackOverflow](https://stackoverflow.com/questions/42755820/how-to-use-return-sequences-option-and-timedistributed-layer-in-keras)

- [Reference4](https://colab.research.google.com/github/kmkarakaya/ML_tutorials/blob/master/LSTM_Understanding_Output_Types.ipynb)