<a href="https://colab.research.google.com/github/https-deeplearning-ai/tensorflow-1-public/blob/master/C3/W3/ungraded_labs/C3_W3_Lab_2_multiple_layer_LSTM.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!wget https://raw.githubusercontent.com/doantronghieu/DEEP-LEARNING/main/helper_DL.py
!pip install colorama
import matplotlib.pyplot as plt
plt.rcParams.update({'font.size':15})
import seaborn           as sns
sns.set()
import helper_DL as helper

# Ungraded Lab: Multiple LSTMs

In this lab, you will look at how to build a model with multiple LSTM layers. Since you know the preceding steps already (e.g. downloading datasets, preparing the data, etc.), we won't expound on it anymore so you can just focus on the model building code.

## Download and Prepare the Dataset

In [None]:
import tensorflow_datasets as tfds

# Download the subword encoded pretokenized dataset
dataset, info = tfds.load('imdb_reviews/subwords8k', with_info = True, as_supervised = True)

# Get then tokenizer
tokenizer = info.features['text'].encoder

Like the previous lab, we increased the `BATCH_SIZE` here to make the training faster. If you are doing this on your local machine and have a powerful processor, feel free to use the value used in the lecture (i.e. 64) to get the same results as Laurence.

In [None]:
BUFFER_SIZE = 10000
BATCH_SIZE  = 256

# Get the train and test splits
train_data, test_data = dataset['train'], dataset['test']

# Shuffle the training data
train_dataset = train_data.shuffle(BUFFER_SIZE)

# Batch and pad the datasets to the maximum length of the sequences
train_dataset = train_dataset.padded_batch(BATCH_SIZE)
test_dataset  = test_data    .padded_batch(BATCH_SIZE)

## Build and Compile the Model

You can build multiple layer LSTM models by simply appending another `LSTM` layer in your `Sequential` model and enabling the `return_sequences` flag to `True`. This is because an `LSTM` layer expects a sequence input so if the previous layer is also an LSTM, then it should output a sequence as well. See the code cell below that demonstrates this flag in action. You'll notice that the output dimension is in 3 dimensions `(batch_size, timesteps, features)` when when `return_sequences` is True.

In [None]:
import tensorflow as tf
import tensorflow.keras as tfk
from tensorflow import nn
from tensorflow.keras import layers, losses, optimizers, models, Model
import numpy as np

In [None]:
# Hyperparameters
BATCH_SIZE = 1  # Batch size
TIMESTEPS  = 20 # Sequence length
FEATURES   = 16 # Embedding size
LSTM_DIM   = 8  # LSTM output units

# Define array input with random values
random_input = np.random.rand(BATCH_SIZE, TIMESTEPS, FEATURES)
print(f'Shape of input array: {random_input.shape}')

# Define LSTM that returns a single output
lstm = layers.LSTM(LSTM_DIM)
result = lstm(random_input)
print(f'Shape of LSTM output (return_sequences = False): {result.shape}')

# Define LSTM that returns a sequence
lstm_rs = layers.LSTM(LSTM_DIM, return_sequences = True)
result = lstm_rs(random_input)
print(f'Shape of LSTM output (return_sequences = True):  {result.shape}')

The next cell implements the stacked LSTM architecture.

In [None]:
# Hyperparameters
EMBEDDING_DIM = 64
LSTM1_DIM     = 64
LSTM2_DIM     = 32
DENSE_DIM     = 64

# Buid the model
model = models.Sequential([
    layers.Embedding(tokenizer.vocab_size, EMBEDDING_DIM),
    layers.Bidirectional(layers.LSTM(LSTM1_DIM, return_sequences = True)),  
    layers.Bidirectional(layers.LSTM(LSTM2_DIM, return_sequences = False)),  
    layers.Dense(DENSE_DIM, activation = nn.relu),
    layers.Dense(1, activation = nn.sigmoid)                         
])

model.summary()

# Set the training parameters
model.compile(loss = losses.binary_crossentropy,
              optimizer = optimizers.Adam(),
              metrics = ['accuracy'])

## Train the Model

The additional LSTM layer will lengthen the training time compared to the previous lab. Given the default parameters we set, it will take around 2 minutes per epoch with the Colab GPU enabled. 

In [None]:
NUM_EPOCHS = 10

history = model.fit(train_dataset, epochs = NUM_EPOCHS, validation_data = test_dataset)

In [None]:
helper.plot_history_curves(history)

## Wrap Up

This lab showed how you can build deep networks by stacking LSTM layers. In the next labs, you will continue exploring other architectures you can use to implement your sentiment classification model.