<a href="https://colab.research.google.com/github/https-deeplearning-ai/tensorflow-1-public/blob/master/C3/W3/ungraded_labs/C3_W3_Lab_4_imdb_reviews_with_GRU_LSTM_Conv1D.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!wget https://raw.githubusercontent.com/doantronghieu/DEEP-LEARNING/main/helper_DL.py
!pip install colorama
import matplotlib.pyplot as plt
plt.rcParams.update({'font.size':15})
import seaborn           as sns
sns.set()
import helper_DL as helper

# Ungraded Lab: Building Models for the IMDB Reviews Dataset

In this lab, you will build four models and train it on the [IMDB Reviews dataset](https://www.tensorflow.org/datasets/catalog/imdb_reviews) with full word encoding. These use different layers after the embedding namely `Flatten`, `LSTM`, `GRU`, and `Conv1D`. You will compare the performance and see which architecture might be best for this particular dataset. Let's begin!

## Imports

You will first import common libraries that will be used throughout the exercise.

In [None]:
import tensorflow as tf
import tensorflow.keras as tfk
from tensorflow import nn
from tensorflow.keras import layers, losses, optimizers, models, Model
import tensorflow.keras.preprocessing as tfkp
import numpy as np

## Download and Prepare the Dataset

Next, you will download the `plain_text` version of the `IMDB Reviews` dataset.

In [None]:
import tensorflow_datasets as tfds

# Load the IMDB Reviews dataset
imdb, info = tfds.load('imdb_reviews', with_info = True, as_supervised = True)

In [None]:
# Get the train and test sets
train_data, test_data = imdb['train'], imdb['test']

# Initialize sentences and labels lists
training_sentences = []
training_labels    = []
testing_sentences  = []
testing_labels     = []

# Loop over all training examples and save the sentences and labels
for sentence, label in train_data:
    training_sentences.append(sentence.numpy().decode('utf8'))
    training_labels   .append(label   .numpy())

# Loop over all test examples and save the sentences and labels
for sentence, label in test_data:
    testing_sentences.append(sentence.numpy().decode('utf8'))
    testing_labels   .append(label   .numpy())

# Convert labels lists to numpy array
training_labels_final = np.array(training_labels)
testing_labels_final  = np.array(testing_labels)

Unlike the subword encoded set you've been using in the previous labs, you will need to build the vocabulary from scratch and generate padded sequences. You already know how to do that with the `Tokenizer` class and `pad_sequences()` method.

In [None]:
# Parameters
vocab_size     = 10000
max_length     = 120
trunc_type     = 'post'
oov_tok        = '<OOV>'

# Initialize the Tokenizer class
tokenizer = tfkp.text.Tokenizer(num_words = vocab_size, oov_token = oov_tok)

# Generate the word index dictionary for the training sentences
tokenizer.fit_on_texts(training_sentences)
word_index = tokenizer.word_index

# Generate and pad the training sequences
sequences = tokenizer.texts_to_sequences(training_sentences)
padded    = tfkp.sequence.pad_sequences(sequences, maxlen = max_length, truncating = trunc_type)

# Generate and pad the test sequences
testing_sequences = tokenizer.texts_to_sequences(testing_sentences)
testing_padded    = tfkp.sequence.pad_sequences(testing_sequences, maxlen = max_length)

## Model 1: Flatten

First up is simply using a `Flatten` layer after the embedding. Its main advantage is that it is very fast to train. Observe the results below.

*Note: You might see a different graph in the lectures. This is because we adjusted the `BATCH_SIZE` for training so subsequent models will train faster.*

In [None]:
# Hyper parameters
EMBEDDING_DIM = 16
DENSE_DIM     = 6

# Buid the model
model_flatten = models.Sequential([
    layers.Embedding(vocab_size, EMBEDDING_DIM, input_length = max_length),
    layers.Flatten(),
    layers.Dense(DENSE_DIM, activation = nn.relu),
    layers.Dense(1, activation = nn.sigmoid)                         
])

model_flatten.summary()

# Set the training parameters
model_flatten.compile(loss = losses.binary_crossentropy,
                      optimizer = optimizers.Adam(),
                      metrics = ['accuracy'])

In [None]:
NUM_EPOCHS = 10
BATCH_SIZE = 128

history_flatten = model_flatten.fit(padded, training_labels_final, batch_size = BATCH_SIZE,
                                    epochs = NUM_EPOCHS,
                                    validation_data = (testing_padded, testing_labels_final))

# Plot the accuracy and loss history
helper.plot_history_curves(history_flatten)

## LSTM

Next, you will use an LSTM. This is slower to train but useful in applications where the order of the tokens is important.

In [None]:
# Hyper parameters
EMBEDDING_DIM = 16
LSTM_DIM      = 32
DENSE_DIM     = 6

# Buid the model
model_lstm = models.Sequential([
    layers.Embedding(vocab_size, EMBEDDING_DIM, input_length = max_length),
    layers.Bidirectional(layers.LSTM(LSTM_DIM)),  
    layers.Dense(DENSE_DIM, activation = nn.relu),
    layers.Dense(1, activation = nn.sigmoid)                         
])

model_lstm.summary()

# Set the training parameters
model_lstm.compile(loss = losses.binary_crossentropy,
                   optimizer = optimizers.Adam(),
                   metrics = ['accuracy'])

In [None]:
NUM_EPOCHS = 10
BATCH_SIZE = 128

history_lstm = model_lstm.fit(padded, training_labels_final, batch_size = BATCH_SIZE,
                              epochs = NUM_EPOCHS,
                              validation_data = (testing_padded, testing_labels_final))

# Plot the accuracy and loss history
helper.plot_history_curves(history_lstm)

## GRU

The *Gated Recurrent Unit* or [GRU](https://www.tensorflow.org/api_docs/python/tf/keras/layers/GRU) is usually referred to as a simpler version of the LSTM. It can be used in applications where the sequence is important but you want faster results and can sacrifice some accuracy. You will notice in the model summary that it is a bit smaller than the LSTM and it also trains faster by a few seconds.

In [None]:
# Hyper parameters
EMBEDDING_DIM = 16
GRU_DIM       = 32
DENSE_DIM     = 6

# Buid the model
model_gru = models.Sequential([
    layers.Embedding(vocab_size, EMBEDDING_DIM, input_length = max_length),
    layers.Bidirectional(layers.GRU(GRU_DIM)),  
    layers.Dense(DENSE_DIM, activation = nn.relu),
    layers.Dense(1, activation = nn.sigmoid)                         
])

model_gru.summary()

# Set the training parameters
model_gru.compile(loss = losses.binary_crossentropy,
                  optimizer = optimizers.Adam(),
                  metrics = ['accuracy'])

In [None]:
NUM_EPOCHS = 10
BATCH_SIZE = 128

history_gru = model_gru.fit(padded, training_labels_final, batch_size = BATCH_SIZE,
                            epochs = NUM_EPOCHS,
                            validation_data = (testing_padded, testing_labels_final))

# Plot the accuracy and loss history
helper.plot_history_curves(history_gru)

## Convolution

Lastly, you will use a convolution layer to extract features from your dataset. You will append a [GlobalAveragePooling1d](https://www.tensorflow.org/api_docs/python/tf/keras/layers/GlobalAveragePooling1D) layer to reduce the results before passing it on to the dense layers. Like the model with `Flatten`, this also trains much faster than the ones using RNN layers like `LSTM` and `GRU`.

In [None]:
# Hyperparameters
EMBEDDING_DIM = 16
FILTERS        = 128
KERNEL_SIZE   = 5
DENSE_DIM     = 6

# Buid the model
model_conv = models.Sequential([
    layers.Embedding(vocab_size, EMBEDDING_DIM, input_length = max_length),
    layers.Conv1D(filters = FILTERS, kernel_size = KERNEL_SIZE, activation = nn.relu),
    layers.GlobalMaxPooling1D(),
    layers.Dense(DENSE_DIM, activation = nn.relu),
    layers.Dense(1, activation = nn.sigmoid)                         
])

model_conv.summary()

# Set the training parameters
model_conv.compile(loss = losses.binary_crossentropy,
                   optimizer = optimizers.Adam(),
                   metrics = ['accuracy'])

In [None]:
NUM_EPOCHS = 10
BATCH_SIZE = 128

history_conv = model_conv.fit(padded, training_labels_final, batch_size = BATCH_SIZE,
                              epochs = NUM_EPOCHS,
                              validation_data = (testing_padded, testing_labels_final))

# Plot the accuracy and loss history
helper.plot_history_curves(history_conv)

## Wrap Up

Now that you've seen the results for each model, can you make a recommendation on what works best for this dataset? Do you still get the same results if you tweak some hyperparameters like the vocabulary size? Try tweaking some of the values some more so you can get more insight on what model performs best.