# Difference Between Return Sequences and Return States for LSTMs in TensorFlow Keras

The Keras deep learning library provides an implementation of the _Long Short-Term Memory_, or LSTM, recurrent neural network.

As part of this implementation, the Keras API provides access to both _return sequences_ and _return state_. The use and difference between these data can be confusing when designing sophisticated recurrent neural network models, such as the encoder-decoder model.

[Reference](https://machinelearningmastery.com/return-sequences-and-return-states-for-lstms-in-keras/)

In [7]:
import tensorflow
from tensorflow import keras
from tensorflow.keras.layers import Input, Flatten, LSTM, Bidirectional, GRU, RepeatVector, TimeDistributed, Dropout
from tensorflow.keras.models import Sequential, Model
import tensorflow as tf
import numpy as np

In [2]:
print(f"TensorFlow version: {tf.__version__}")

TensorFlow version: 2.8.0


In [3]:
# Check GPU availibility-
gpu_devices = tf.config.list_physical_devices('GPU')
# print(f"GPU: {gpu_devices}")

if gpu_devices:
    print(f"GPU: {gpu_devices}")
    details = tf.config.experimental.get_device_details(gpu_devices[0])
    print(f"GPU details: {details.get('device_name', 'Unknown GPU')}")
else:
    print("No GPU found")

No GPU found


### Long Short-Term Memory

- The Long Short-Term Memory, or __LSTM__, is a recurrent neural network that is __comprised of internal gates__.

- Unlike other recurrent neural networks, the network’s internal gates allow the model to be trained successfully using backpropagation through time, or BPTT, and avoid the vanishing gradients problem.

- In the Keras deep learning library, LSTM layers can be created using the ```LSTM()``` class.

- __Creating a layer of LSTM memory units allows you to specify the number of memory units within the layer__.

- __Each unit or cell within the layer has an internal cell state, often abbreviated as “c“, and outputs a hidden state, often abbreviated as “h“__.

The Keras API allows you to access these data, which can be useful or even required when developing sophisticated recurrent neural network architectures, such as the encoder-decoder model.

#### Return Sequences

__Each LSTM cell will output one hidden state _h_ for each input__.

```h = LSTM(X)```

We can demonstrate this in Keras with a very small model with a single LSTM layer that itself contains a single LSTM cell.

In this example, we will have one input sample with 3 time steps and one feature observed at each time step:

```
t1 = 0.1
t2 = 0.2
t3 = 0.3
```

In [4]:
# Define LSTM model-
inputs1 = Input(shape = (3, 1))
lstm1 = LSTM(
    units = 1, activation = tf.keras.activations.tanh
)(inputs1)
model = Model(inputs = inputs1, outputs = lstm1)

In [5]:
# Get model summary-
model.summary()

Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_1 (InputLayer)        [(None, 3, 1)]            0         
                                                                 
 lstm (LSTM)                 (None, 1)                 12        
                                                                 
Total params: 12
Trainable params: 12
Non-trainable params: 0
_________________________________________________________________


In [9]:
# Define input data-
data = np.array([0.1, 0.2, 0.3]).reshape((1, 3, 1))
print(f"Input sequence shape: {data.shape}")

Input sequence shape: (1, 3, 1)


In [10]:
# Predict for this data-
print(model.predict(data))

[[0.09937763]]


Running this code example __outputs a single hidden state for the input sequence with 3 time steps__.

- It is __possible to access the hidden state output for each input time step__.

- This can be done by setting the ```return_sequences``` attribute to ```True``` when defining the LSTM layer.

In [18]:
# Define LSTM model to access hidden state output for each time step-
lstm1 = LSTM(
    units = 1, activation = tf.keras.activations.tanh,
    return_sequences = True
)(inputs1)
model = Model(inputs = inputs1, outputs = lstm1)

In [19]:
# Get model summary-
model.summary()

Model: "model_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_1 (InputLayer)        [(None, 3, 1)]            0         
                                                                 
 lstm_3 (LSTM)               (None, 3, 1)              12        
                                                                 
Total params: 12
Trainable params: 12
Non-trainable params: 0
_________________________________________________________________


In [20]:
# Predict for the input data above-
model.predict(data)

array([[[0.01469561],
        [0.04029939],
        [0.07396863]]], dtype=float32)

In [21]:
model.predict(data).shape

(1, 3, 1)

This code example __returns a sequence of 3 values, one hidden state output for each input time step for the single LSTM cell in the layer__.

In [14]:
# Define LSTM model to access hidden state output for each time step-
lstm1 = LSTM(
    units = 12, activation = tf.keras.activations.tanh,
    return_sequences = True
)(inputs1)
model = Model(inputs = inputs1, outputs = lstm1)

In [16]:
# Predict for the input data above-
model.predict(data).shape

(1, 3, 12)

This code example __returns a sequence of 3 values, one hidden state output for each input time step for all _12_ LSTM cell in the layer__.

- You must set ```return_sequences=True``` when stacking LSTM layers so that the second LSTM layer has a three-dimensional sequence input. Refer to [   Stacked Long Short-Term Memory Networks](https://machinelearningmastery.com/stacked-long-short-term-memory-networks/) for more details.

- You may also need to access the sequence of hidden state outputs when predicting a sequence of outputs with a Dense output layer wrapped in a ```TimeDistributed``` layer. Refer to [How to Use the TimeDistributed Layer for Long Short-Term Memory Networks in Python](https://machinelearningmastery.com/timedistributed-layer-for-long-short-term-memory-networks-in-python/) for more details.

### Return States

- __The output of an LSTM cell or layer of cells is called the hidden state__.

- This is confusing, because __each LSTM cell retains an internal state that is not output, called the _cell state_, or _c___.

- __Generally, we do not need to access the cell state (or, internal state) unless we are developing sophisticated models where subsequent layers may need to have their cell state initialized with the final cell state of another layer__, such as in an encoder-decoder model.

- Keras provides the ```return_state``` argument to the LSTM layer that will provide access to the hidden state output (_state_h_) and the cell state (_state_c_). As an example-

```
lstm1, state_h, state_c = LSTM(units = 1, return_state = True)
```

- __This may look confusing because both ```lstm1``` and ```state_h``` refer to the same hidden state output__. The reason for these two tensors being separate will become clear in the next section.

- We can demonstrate access to the hidden and cell states of the cells in the LSTM layer with a worked example listed below-

In [45]:
# Define LSTM model-
lstm1, state_h, state_c = LSTM(
    units = 1, activation = tf.keras.activations.tanh,
    return_state = True)(inputs1)

model = Model(inputs = inputs1, outputs = [lstm1, state_h, state_c])

In [46]:
# Get model summary-
model.summary()

Model: "model_6"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_1 (InputLayer)        [(None, 3, 1)]            0         
                                                                 
 lstm_6 (LSTM)               [(None, 1),               12        
                              (None, 1),                         
                              (None, 1)]                         
                                                                 
Total params: 12
Trainable params: 12
Non-trainable params: 0
_________________________________________________________________


In [47]:
# Predict for the input data above-
model.predict(data)

[array([[0.06094353]], dtype=float32),
 array([[0.06094353]], dtype=float32),
 array([[0.11918673]], dtype=float32)]

- This returns 3 arrays:
    1. The LSTM hidden state output for the last time step.
    1. The LSTM hidden state output for the last time step (again).
    1. The LSTM cell state for the last time step.
    
- The hidden state and the cell state could in turn be used to initialize the states of another LSTM layer with the same number of cells.

# Return States and Sequences

- We can access both the sequence of hidden states and the cell states at the same time.

- This can be done by configuring the LSTM layer to both return sequences and return states-
```
lstm1, state_h, state_c = LSTM(
    units = 1, return_sequences = True,
    return_state = True)
```


In [26]:
# Define LSTM model-
lstm1, state_h, state_c = LSTM(
    units = 1, activation = tf.keras.activations.tanh,
    return_sequences = True, return_state = True)(inputs1)

model = Model(inputs = inputs1, outputs = [lstm1, state_h, state_c])

In [27]:
model.summary()

Model: "model_5"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_1 (InputLayer)        [(None, 3, 1)]            0         
                                                                 
 lstm_5 (LSTM)               [(None, 3, 1),            12        
                              (None, 1),                         
                              (None, 1)]                         
                                                                 
Total params: 12
Trainable params: 12
Non-trainable params: 0
_________________________________________________________________


- With this code example, we can see now why the LSTM output tensor and hidden state output tensor are declared separably.

- The layer returns:
    - the hidden state for each input time step, (and then separately),
    - the hidden state output for the last time step (and)
    - the cell state for the last input time step.

- This can be confirmed by seeing that the last value in the returned sequences (first array) matches the value in the hidden state (second array).

In [29]:
# Make predictions using LSTM-
lstm_output = model.predict(data)

In [30]:
type(lstm_output), len(lstm_output)

(list, 3)

In [32]:
print(f"Hidden state for each time step shape: {lstm_output[0].shape},"
      f" hidden state output for last time step shape: {lstm_output[1].shape}"
      f" & cell state output for last time step shape: {lstm_output[2].shape}"
     )

Hidden state for each time step shape: (1, 3, 1), hidden state output for last time step shape: (1, 1) & cell state output for last time step shape: (1, 1)


In [43]:
lstm_output[0][0, 2, 0]

0.10932464

In [37]:
print(f"Hidden state output for last time-step = {lstm_output[1][0]} &"
      f" cell state output for last time-step = {lstm_output[2][0]}")

Hidden state output for last time-step = [0.10932464] & cell state output for last time-step = [0.19064409]
