# Long Short-term Memory for Sentiment Classification

This notebook uses LSTM neural network on the [IMDB sentiment classification](https://keras.io/api/datasets/imdb/) task. This is a dataset for binary sentiment classification. 25,000 highly polar movie reviews are provided for training.

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras import optimizers
from tensorflow.keras.datasets import imdb
from tensorflow.keras.preprocessing import sequence


## Dataset

### Get the data
IMDB sentiment dataset is available in keras.datasets.

In [None]:
help(imdb.load_data)

Help on function load_data in module keras.src.datasets.imdb:

load_data(path='imdb.npz', num_words=None, skip_top=0, maxlen=None, seed=113, start_char=1, oov_char=2, index_from=3, **kwargs)
    Loads the [IMDB dataset](https://ai.stanford.edu/~amaas/data/sentiment/).
    
    This is a dataset of 25,000 movies reviews from IMDB, labeled by sentiment
    (positive/negative). Reviews have been preprocessed, and each review is
    encoded as a list of word indexes (integers).
    For convenience, words are indexed by overall frequency in the dataset,
    so that for instance the integer "3" encodes the 3rd most frequent word in
    the data. This allows for quick filtering operations such as:
    "only consider the top 10,000 most
    common words, but eliminate the top 20 most common words".
    
    As a convention, "0" does not stand for a specific word, but instead is used
    to encode the pad token.
    
    Args:
        path: where to cache the data (relative to `~/.keras/dataset

In [None]:
max_features = 20000
maxlen = 80
# maxlen: cut texts after this number of words (among top max_features most common words)

print('Loading data...')
(X_train, y_train), (X_test, y_test) = imdb.load_data(num_words=max_features)
print(len(X_train), 'train sequences')
print(len(X_test), 'test sequences')

Loading data...
25000 train sequences
25000 test sequences


### Data Preprocessing

Keras has already preprocessed the data

In [None]:
print('Pad sequences (samples x time)')
X_train = sequence.pad_sequences(X_train, maxlen=maxlen)
X_test = sequence.pad_sequences(X_test, maxlen=maxlen)
print('X_train shape:', X_train.shape)
print('X_test shape:', X_test.shape)
X_train[1]

Pad sequences (samples x time)
X_train shape: (25000, 80)
X_test shape: (25000, 80)


array([ 125,   68,    2, 6853,   15,  349,  165, 4362,   98,    5,    4,
        228,    9,   43,    2, 1157,   15,  299,  120,    5,  120,  174,
         11,  220,  175,  136,   50,    9, 4373,  228, 8255,    5,    2,
        656,  245, 2350,    5,    4, 9837,  131,  152,  491,   18,    2,
         32, 7464, 1212,   14,    9,    6,  371,   78,   22,  625,   64,
       1382,    9,    8,  168,  145,   23,    4, 1690,   15,   16,    4,
       1355,    5,   28,    6,   52,  154,  462,   33,   89,   78,  285,
         16,  145,   95], dtype=int32)

## RNN

### Build the RNN model

We add an input layer to make sure `model.summary` can work on the model. If not added, the model can still be trained but it cannot be summarized.

In [None]:
X_train.shape

(25000, 80)

In [None]:
model = keras.Sequential()
# Embedding layer turns vectors of integers into dense real vectors of fixed size
model.add(layers.Input((maxlen,)))
model.add(layers.Embedding(max_features, 16))
model.add(layers.SimpleRNN(32))
model.add(layers.Dense(1, activation='sigmoid'))

optimizer = optimizers.RMSprop(learning_rate=0.001)
model.compile(loss='binary_crossentropy',
              optimizer=optimizer,
              metrics=['accuracy'])

### Inspect the model

Use the `.summary` method to print a simple description of the model

In [None]:
model.summary()

### Train the model

In [None]:
EPOCHS = 32
BATCH = 64

early_stop = keras.callbacks.EarlyStopping(monitor='val_loss', patience=2)

model.fit(X_train, y_train,
          batch_size=BATCH,
          epochs=EPOCHS,
          validation_split=0.2,
          verbose = 1,
          callbacks = [early_stop])

Epoch 1/32
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m13s[0m 34ms/step - accuracy: 0.6018 - loss: 0.6422 - val_accuracy: 0.8090 - val_loss: 0.4285
Epoch 2/32
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m19s[0m 28ms/step - accuracy: 0.8456 - loss: 0.3726 - val_accuracy: 0.8204 - val_loss: 0.3984
Epoch 3/32
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 33ms/step - accuracy: 0.8879 - loss: 0.2826 - val_accuracy: 0.8286 - val_loss: 0.3859
Epoch 4/32
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m18s[0m 25ms/step - accuracy: 0.9155 - loss: 0.2259 - val_accuracy: 0.8040 - val_loss: 0.5361
Epoch 5/32
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 32ms/step - accuracy: 0.9360 - loss: 0.1799 - val_accuracy: 0.8162 - val_loss: 0.4340


<keras.src.callbacks.history.History at 0x7e0675abdfd0>

In [None]:
_, acc = model.evaluate(X_test, y_test, batch_size=64, verbose = 0)
print("Testing set accuracy: {:.2f}%".format(acc*100))

Testing set accuracy: 81.95%


## RNN using the entire sequence instead of the last output

In [None]:
model = keras.Sequential()
# Embedding layer turns vectors of integers into dense real vectors of fixed size
model.add(layers.Input((maxlen,)))
model.add(layers.Embedding(max_features, 16))
model.add(layers.SimpleRNN(32, return_sequences=True, dropout=0.2, recurrent_dropout=0.2))
model.add(layers.Flatten())
model.add(layers.Dense(1, activation='sigmoid'))

optimizer = optimizers.RMSprop(learning_rate=0.001)
model.compile(loss='binary_crossentropy',
              optimizer=optimizer,
              metrics=['accuracy'])
model.summary()

In [None]:
EPOCHS = 32
BATCH = 64

early_stop = keras.callbacks.EarlyStopping(monitor='val_loss', patience=2)

model.fit(X_train, y_train,
          batch_size=BATCH,
          epochs=EPOCHS,
          validation_split=0.2,
          verbose = 1,
          callbacks = [early_stop])

Epoch 1/32
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m15s[0m 40ms/step - accuracy: 0.4996 - loss: 0.7091 - val_accuracy: 0.5054 - val_loss: 0.6979
Epoch 2/32
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 39ms/step - accuracy: 0.5225 - loss: 0.6984 - val_accuracy: 0.6608 - val_loss: 0.6428
Epoch 3/32
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 39ms/step - accuracy: 0.6891 - loss: 0.5902 - val_accuracy: 0.8308 - val_loss: 0.3796
Epoch 4/32
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 37ms/step - accuracy: 0.8548 - loss: 0.3452 - val_accuracy: 0.8392 - val_loss: 0.3559
Epoch 5/32
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m21s[0m 38ms/step - accuracy: 0.8840 - loss: 0.2919 - val_accuracy: 0.8440 - val_loss: 0.3514
Epoch 6/32
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 39ms/step - accuracy: 0.9014 - loss: 0.2474 - val_accuracy: 0.8258 - val_loss: 0.3967
Epoch 7/32
[1m3

<keras.src.callbacks.history.History at 0x7e067d534590>

In [None]:
_, acc = model.evaluate(X_test, y_test, batch_size=64, verbose = 0)
print("Testing set accuracy: {:.2f}%".format(acc*100))

Testing set accuracy: 83.78%


## LSTM

### Build the model

In [None]:
model = keras.Sequential()
# Embedding layer turns vectors of integers into dense real vectors of fixed size
model.add(layers.Input((maxlen,)))
model.add(layers.Embedding(max_features, 16))
model.add(layers.LSTM(128, dropout=0.2, recurrent_dropout=0.2))
model.add(layers.Dense(1, activation='sigmoid'))

optimizer = optimizers.RMSprop(learning_rate=0.001)
model.compile(loss='binary_crossentropy',
              optimizer=optimizer,
              metrics=['accuracy'])

### Inspect the model

Use the `.summary` method to print a simple description of the model

In [None]:
model.summary()

### Train the model

In [None]:
EPOCHS = 32
BATCH = 64

early_stop = keras.callbacks.EarlyStopping(monitor='val_loss', patience=2)

model.fit(X_train, y_train,
          batch_size=BATCH,
          epochs=EPOCHS,
          validation_split=0.2,
          verbose = 1,
          callbacks = [early_stop])

Epoch 1/32
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m86s[0m 258ms/step - accuracy: 0.5931 - loss: 0.6534 - val_accuracy: 0.8096 - val_loss: 0.4186
Epoch 2/32
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m81s[0m 255ms/step - accuracy: 0.8283 - loss: 0.4037 - val_accuracy: 0.7942 - val_loss: 0.4371
Epoch 3/32
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m85s[0m 264ms/step - accuracy: 0.8565 - loss: 0.3471 - val_accuracy: 0.8268 - val_loss: 0.4457


<keras.src.callbacks.history.History at 0x7e067d8160d0>

In [None]:
_, acc = model.evaluate(X_test, y_test, batch_size=64, verbose = 0)
print("Testing set accuracy: {:.2f}%".format(acc*100))

Testing set accuracy: 81.88%


## Stacked LSTM

### Build the model

In [None]:
model = keras.Sequential()
model.add(layers.Input((maxlen,)))
model.add(layers.Embedding(max_features, 16))
model.add(layers.LSTM(128, return_sequences=True, dropout=0.2, recurrent_dropout=0.2))
model.add(layers.LSTM(128, return_sequences=True, dropout=0.2, recurrent_dropout=0.2))
model.add(layers.LSTM(128, return_sequences=False, dropout=0.2, recurrent_dropout=0.2))
model.add(layers.Dense(1, activation='sigmoid'))

optimizer = optimizers.RMSprop(learning_rate=0.001)
model.compile(loss='binary_crossentropy',
              optimizer=optimizer,
              metrics=['accuracy'])
model.summary()


### Train the model

In [None]:
EPOCHS = 32
BATCH = 64

early_stop = keras.callbacks.EarlyStopping(monitor='val_loss', patience=2)

model.fit(X_train, y_train,
          batch_size=BATCH,
          epochs=EPOCHS,
          validation_split=0.2,
          verbose = 1,
          callbacks = [early_stop])

Epoch 1/32
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m302s[0m 925ms/step - accuracy: 0.5006 - loss: 0.6931 - val_accuracy: 0.6888 - val_loss: 0.5878
Epoch 2/32


In [None]:
_, acc = model.evaluate(X_test, y_test, batch_size=64, verbose = 0)
print("Testing set accuracy: {:.2f}%".format(acc*100))

## Bidirectional LSTM

In [None]:
model = keras.Sequential()
model.add(layers.Input((maxlen,)))
model.add(layers.Embedding(max_features, 16))
model.add(layers.Bidirectional(layers.LSTM(128, return_sequences=False, dropout=0.2, recurrent_dropout=0.2)))
model.add(layers.Dense(1, activation='sigmoid'))

optimizer = optimizers.RMSprop(learning_rate=0.001)
model.compile(loss='binary_crossentropy',
              optimizer=optimizer,
              metrics=['accuracy'])
model.summary()

### Train the model

In [None]:
EPOCHS = 32
BATCH = 64

early_stop = keras.callbacks.EarlyStopping(monitor='val_loss', patience=2)

model.fit(X_train, y_train,
          batch_size=BATCH,
          epochs=EPOCHS,
          validation_split=0.2,
          verbose = 1,
          callbacks = [early_stop])

_, acc = model.evaluate(X_test, y_test, batch_size=64, verbose = 0)
print("Testing set accuracy: {:.2f}%".format(acc*100))