# Classification with LSTMs and Bi-LSTMS

## Douglas Rice

*This tutorial was originally created by Burt Monroe for his prior work with the Essex Summer School. I've updated and modified it.*

In this notebook, we'll move beyond the simple feed-forward architectures we have set up in prior neural networks to setting up neural networks that are explicitly trying to learn about *sequences*. We'll look specifically at **L**ong **S**hort-**T**erm **M**emory (LSTM) and **bi**directional  (bi-LSTM)  networks. In terms of building the models in Keras, the modifications will be relatively straightforward updates. Computationally, however, we are adding significant complexity, and the additional complexity means the models will take longer to estimate.



## Set Everything Up

As always, we start by getting our environment, loading in the modules and functionality that we'll need to estimate the model.

In [None]:
import numpy as np
from tensorflow import keras
from tensorflow.keras import layers
from keras import models

max_features = 5000
maxlen = None  # This will pad shorter reviews to the length of the longest review. Set maxlen=200 or 500 for less padding at the expense of truncating the reviews.


## Load the IMDB movie review sentiment data

We'll stick with the IMDB movie review sentiment data that ships with Keras for this exercise. One benefit is that we can maintain pretty direct comparisons across all of these different modeling approaches.

In [None]:
(x_train, y_train), (x_test, y_test) = keras.datasets.imdb.load_data(
    num_words=max_features
)
x_val = x_train[:10000]
partial_x_train = x_train[10000:]
y_val = y_train[:10000]
partial_y_train = y_train[10000:]
print(len(x_train), "Training sequences")
print(len(x_test), "Test sequences")
partial_x_train = keras.preprocessing.sequence.pad_sequences(partial_x_train, maxlen=maxlen)
x_val = keras.preprocessing.sequence.pad_sequences(x_val, maxlen=maxlen)
x_test = keras.preprocessing.sequence.pad_sequences(x_test, maxlen=maxlen)

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb.npz
25000 Training sequences
25000 Test sequences


## Build a basic LSTM model

Building a basic LSTM is very simple in Keras. We just add an LSTM layer in our Sequential model.

In [None]:
model = models.Sequential()
model.add(layers.Input(shape=(None,), dtype="int32"))
model.add(layers.Embedding(max_features,16))
model.add(layers.LSTM(16))
model.add(layers.Dense(1, activation= 'sigmoid'))

model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding (Embedding)       (None, None, 16)          80000     
                                                                 
 lstm (LSTM)                 (None, 16)                2112      
                                                                 
 dense (Dense)               (None, 1)                 17        
                                                                 
Total params: 82,129
Trainable params: 82,129
Non-trainable params: 0
_________________________________________________________________


## Train and evaluate the model

In [None]:
model.compile("adam", "binary_crossentropy", metrics=["accuracy"])

In [None]:
model.fit(partial_x_train, partial_y_train, batch_size=512, epochs=12, validation_data=(x_val, y_val))

Epoch 1/12
Epoch 2/12
Epoch 3/12
Epoch 4/12
Epoch 5/12
Epoch 6/12
Epoch 7/12
Epoch 8/12
Epoch 9/12
Epoch 10/12
Epoch 11/12
Epoch 12/12


<keras.callbacks.History at 0x7f606f3ec410>

In [None]:
model.evaluate(x_test, y_test)



[0.33969274163246155, 0.8617200255393982]

86%.

## Build a basic bi-LSTM model

Let's see if a bidirectional LSTM does any better.

In [None]:
model = models.Sequential()
model.add(layers.Input(shape=(None,), dtype="int32"))
model.add(layers.Embedding(max_features,16))
model.add(layers.Bidirectional(layers.LSTM(16)))
model.add(layers.Dense(1, activation= 'sigmoid'))

model.summary()

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding_1 (Embedding)     (None, None, 16)          80000     
                                                                 
 bidirectional (Bidirectiona  (None, 32)               4224      
 l)                                                              
                                                                 
 dense_1 (Dense)             (None, 1)                 33        
                                                                 
Total params: 84,257
Trainable params: 84,257
Non-trainable params: 0
_________________________________________________________________


In [None]:
model.compile("adam", "binary_crossentropy", metrics=["accuracy"])

In [None]:
model.fit(partial_x_train, partial_y_train, batch_size=512, epochs=12, validation_data=(x_val, y_val))

Epoch 1/12
Epoch 2/12
Epoch 3/12
Epoch 4/12
Epoch 5/12
Epoch 6/12
Epoch 7/12
Epoch 8/12
Epoch 9/12
Epoch 10/12
Epoch 11/12
Epoch 12/12


<keras.callbacks.History at 0x7f607c016c90>

In [None]:
model.evaluate(x_test, y_test)



[0.3619060814380646, 0.8525999784469604]


85%. Going in the wrong direction!

## Build a more expressive, deeper bi-LSTM model with dropout.

Bi-LSTMs seem to gain power when stacked in multiple layers. Let's do that, make everything bigger, and add some regularization through dropout.

In [None]:
model = models.Sequential()
model.add(layers.Input(shape=(None,), dtype="int32"))
model.add(layers.Embedding(max_features,64))
model.add(layers.Dropout(.3))
model.add(layers.Bidirectional(layers.LSTM(32, return_sequences=True))) # return_sequences is necessary when stackin LSTMs
model.add(layers.Dropout(.2))
model.add(layers.Bidirectional(layers.LSTM(16)))
model.add(layers.Dense(1, activation= 'sigmoid'))

model.summary()


In [None]:
model.compile("adam", "binary_crossentropy", metrics=["accuracy"])

In [None]:
model.fit(partial_x_train, partial_y_train, batch_size=512, epochs=3, validation_data=(x_val, y_val))

Epoch 1/3
Epoch 2/3
Epoch 3/3


<keras.callbacks.History at 0x7f6077c4a4d0>

In [None]:
model.evaluate(x_test, y_test)



[0.3764491677284241, 0.8516799807548523]

85%.

It's worth noting, perhaps, that the even bigger, even more expressive model in the Keras documentation (128-dimensional embedding layer, and TWO 64-node BiLSTM layers -- 2.8 million parameters) gets accuracy in the test set of 86.8%. (https://keras.io/examples/nlp/bidirectional_lstm_imdb/)

And we did a bit better, 88%, with our basic feedforward network with some dropout.