# **Deep Learning RNN/LSTM**

### Dr. Santosh Chapaneri
### Lead AI Product Engineer, Wolters Kluwer

- The IMDB dataset) contains 25,000 highly-polar movie reviews (good or bad) for training and the same amount again for testing.

- The problem is to determine whether a given movie review has a positive or negative sentiment.

- The words have been replaced by integers that indicate the ordered frequency of each word in the dataset. The sentences in each review are therefore comprised of a sequence of integers.

- **Word Embedding**

- We will map each movie review into a real vector domain, a popular technique when working with text called word embedding. This is a technique where words are encoded as real-valued vectors in a high dimensional space, where the similarity between words in terms of meaning translates to closeness in the vector space.

- Keras provides a convenient way to convert positive integer representations of words into a word embedding by an Embedding layer.

- We will map each word onto a 32 length real valued vector. We will also limit the total number of words that we are interested in modeling to the 5000 most frequent words, and zero out the rest. Finally, the sequence length (number of words) in each review varies, so we will constrain each review to be 500 words, truncating long reviews and pad the shorter reviews with zero values.

In [1]:
import numpy as np

from keras.datasets import imdb

from keras.models import Sequential
from keras.layers import Dense, SimpleRNN, LSTM, GRU, Embedding
from keras.preprocessing import sequence

# fix random seed for reproducibility
np.random.seed(2026)

- We are constraining the dataset to the top 5,000 words. We also split the dataset into train (50%) and test (50%) sets.

In [3]:
# load the dataset but only keep the top n words, zero the rest
top_words = 5000

(X_train, y_train), (X_test, y_test) = imdb.load_data(num_words = top_words)

In [4]:
X_train.shape

(25000,)

In [5]:
len(X_train[20]) # vary number to see length of different samples

129

In [6]:
X_train[2][:10]

[1, 14, 47, 8, 30, 31, 7, 4, 249, 108]

- We need to truncate and pad the input sequences so that they are all the same length for modeling.

- The model will learn the zero values carry no information so indeed the sequences are not the same length in terms of content, but same length vectors is required to perform the computation in Keras.

In [11]:
# Retrieve the word index file mapping words to indices
word_index = imdb.get_word_index()

# Reverse the word index to obtain a dict mapping indices to words
inverted_word_index = dict((i, word) for (word, i) in word_index.items())

# Decode the first sequence in the dataset
idx = 289
decoded_sequence = " ".join(inverted_word_index[i] for i in X_train[idx])
decoded_sequence

"the plot great fat that movie is completely related you most is quite br mad idea project this as on and wishing to and rent and it time dialog driving this results not then fat that new have character it when was why and to wealth he time whose no way as you to from unique her each if is very you film is and their reasons drama something well at her played to good live he her serious from way that her very friends was big known doesn't as an show cast i i of their there keep around much not was appear for were favorite he over harry as it course but wishing most be tight br love it 4 or of because to that it laughs in whatever of too no all service film because laughing type be graphic laughing results not if and most as it were how my in favorite this of on i i was two and looks in ever want has don't play well at her buy no from folks of its five piece this were acting no and with age no and johnny and in i'd and workers in never ride just made most all parts episode about minute n

In [12]:
# truncate and pad input sequences
max_review_length = 500

X_train_pad = sequence.pad_sequences(X_train, maxlen = max_review_length, padding = 'post')

X_test_pad = sequence.pad_sequences(X_test, maxlen = max_review_length, padding = 'post')

In [13]:
X_train_pad.shape

(25000, 500)

In [16]:
X_train_pad[20]

array([   1,  617,   11, 3875,   17,    2,   14,  966,   78,   20,    9,
         38,   78,   15,   25,  413,    2,    5,   28,    8,  106,   12,
          8,    4,  130,   43,    8,   67,   48,   12,  100,   79,  101,
        433,    5,   12,  127,    4,  769,    9,   38,  727,   12,  186,
        398,   34,    6,  312,  396,    2,  707,    4,  732,   26, 1235,
         21,    2,  128,   74,    4,    2,    5,    4,  116,    9, 1639,
         10,   10,    4,    2,    2,  186,    8,   28,   77, 2586,   39,
          4, 4135,    2,    7,    2,    2,   50,  161,  306,    8,   30,
          6,  686,  204,  326,   11,    4,  226,   20,   10,   10,   13,
        258,   14,   20,    8,   30,   38,   78,   15,   13, 1498,   91,
          7,    4,   96,  143,   10,   10,    2,    2,  144, 3261,   27,
        419,   11,  902,   29,  540,  887,    4,  278,    0,    0,    0,
          0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
          0,    0,    0,    0,    0,    0,    0,   

In [17]:
# For demo only, faster execution
X_tr = X_train_pad[:1000]
y_tr = y_train[:1000]

X_val = X_train_pad[1000:1500]
y_val = y_train[1000:1500]

X_test = X_test_pad[:500]
y_test = y_test[:500]

In [18]:
y_val.shape

(500,)

- Define, compile and fit our RNN model

- The first layer is the Embedding layer that uses 32 length vectors to represent each word.

- The next layer is the RNN layer with 100 memory units (smart neurons).

- Finally, because this is a classification problem we use a Dense output layer with a single neuron and a sigmoid activation function to make 0 or 1 predictions for the two classes (good and bad) in the problem.

- Because it is a binary classification problem, log loss is used as the loss function (binary_crossentropy in Keras).

- The efficient ADAM optimization algorithm is used.

- The model is fit for only 2 epochs because it quickly overfits the problem. A large batch size of 64 reviews is used to space out weight updates.

In [19]:
# create the model
embedding_veclen = 32

model = Sequential()
model.add(Embedding(top_words, embedding_veclen))
model.add(SimpleRNN(100))
model.add(Dense(1, activation='sigmoid'))

model.compile(loss = 'binary_crossentropy', optimizer = 'adam', metrics = ['accuracy'])

In [22]:
model.fit(X_tr, y_tr, validation_data=(X_val, y_val), epochs=50, batch_size=64)

Epoch 1/50
[1m16/16[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 54ms/step - accuracy: 0.5320 - loss: 0.6900 - val_accuracy: 0.5080 - val_loss: 0.6940
Epoch 2/50
[1m16/16[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 55ms/step - accuracy: 0.5080 - loss: 0.6913 - val_accuracy: 0.4820 - val_loss: 0.6935
Epoch 3/50
[1m16/16[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 55ms/step - accuracy: 0.5060 - loss: 0.6935 - val_accuracy: 0.4980 - val_loss: 0.6971
Epoch 4/50
[1m16/16[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 54ms/step - accuracy: 0.5280 - loss: 0.6901 - val_accuracy: 0.4960 - val_loss: 0.6947
Epoch 5/50
[1m16/16[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 52ms/step - accuracy: 0.5560 - loss: 0.6872 - val_accuracy: 0.5080 - val_loss: 0.6932
Epoch 6/50
[1m16/16[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 51ms/step - accuracy: 0.5440 - loss: 0.6885 - val_accuracy: 0.5240 - val_loss: 0.6929
Epoch 7/50
[1m16/16[0m [32m━━━━

<keras.src.callbacks.history.History at 0x1c0114c5290>

In [23]:
model.summary()

In [24]:
# Final evaluation of the model

scores = model.evaluate(X_test, y_test)
scores

[1m16/16[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 15ms/step - accuracy: 0.5240 - loss: 0.6919


[0.6918753981590271, 0.5239999890327454]

- **Stacked RNN**

In [25]:
model_st_rnn = Sequential()
model_st_rnn.add(Embedding(top_words, embedding_veclen))
model_st_rnn.add(SimpleRNN(100, return_sequences = True))
model_st_rnn.add(SimpleRNN(100, return_sequences = False))
model_st_rnn.add(Dense(1, activation='sigmoid'))
model_st_rnn.compile(loss = 'binary_crossentropy', optimizer = 'adam', metrics = ['accuracy'])

In [27]:
model_st_rnn.fit(X_tr, y_tr,validation_data=(X_val, y_val), epochs=3, batch_size=64)

Epoch 1/3
[1m16/16[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 139ms/step - accuracy: 0.5100 - loss: 0.7094 - val_accuracy: 0.5220 - val_loss: 0.6939
Epoch 2/3
[1m16/16[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 127ms/step - accuracy: 0.5260 - loss: 0.6947 - val_accuracy: 0.5180 - val_loss: 0.6971
Epoch 3/3
[1m16/16[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 130ms/step - accuracy: 0.4980 - loss: 0.6973 - val_accuracy: 0.5220 - val_loss: 0.6923


<keras.src.callbacks.history.History at 0x1c011c9ab90>

In [28]:
model_st_rnn.summary()

In [32]:
# Final evaluation of the model
scores = model_st_rnn.evaluate(X_test, y_test)
scores

[1m16/16[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 25ms/step - accuracy: 0.4760 - loss: 0.6981


[0.6981307864189148, 0.47600001096725464]

# **LSTM**

In [29]:
model_lstm = Sequential()
model_lstm.add(Embedding(top_words, embedding_veclen))
model_lstm.add(LSTM(100, return_sequences = False))
model_lstm.add(Dense(1, activation='sigmoid'))
model_lstm.compile(loss = 'binary_crossentropy', optimizer = 'adam', metrics = ['accuracy'])

In [30]:
model_lstm.fit(X_tr, y_tr, validation_data=(X_val, y_val), epochs=3, batch_size=64)

Epoch 1/3
[1m16/16[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 276ms/step - accuracy: 0.4910 - loss: 0.6937 - val_accuracy: 0.4900 - val_loss: 0.6941
Epoch 2/3
[1m16/16[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 228ms/step - accuracy: 0.5140 - loss: 0.6928 - val_accuracy: 0.4920 - val_loss: 0.6937
Epoch 3/3
[1m16/16[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 238ms/step - accuracy: 0.5300 - loss: 0.6928 - val_accuracy: 0.4920 - val_loss: 0.6936


<keras.src.callbacks.history.History at 0x1c00f774b50>

In [31]:
model_lstm.summary()

In [33]:
# Final evaluation of the model

scores = model_lstm.evaluate(X_test, y_test)
scores

[1m16/16[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 46ms/step - accuracy: 0.5320 - loss: 0.6920


[0.6919567584991455, 0.5320000052452087]

# **Deep LSTM**

In [34]:
model_st_lstm = Sequential()

model_st_lstm.add(Embedding(top_words, embedding_veclen, input_length = None))

model_st_lstm.add(LSTM(100, return_sequences = True))
model_st_lstm.add(LSTM(100, return_sequences = False))

model_st_lstm.add(Dense(1, activation='sigmoid'))

model_st_lstm.compile(loss = 'binary_crossentropy', optimizer = 'adam', metrics = ['accuracy'])

In [35]:
model_st_lstm.fit(X_tr, y_tr, validation_data=(X_val, y_val), epochs=3, batch_size=64)

Epoch 1/3
[1m16/16[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m13s[0m 710ms/step - accuracy: 0.4800 - loss: 0.6949 - val_accuracy: 0.5240 - val_loss: 0.6929
Epoch 2/3
[1m16/16[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 681ms/step - accuracy: 0.4930 - loss: 0.6936 - val_accuracy: 0.4760 - val_loss: 0.6939
Epoch 3/3
[1m16/16[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 697ms/step - accuracy: 0.5060 - loss: 0.6931 - val_accuracy: 0.4760 - val_loss: 0.6936


<keras.src.callbacks.history.History at 0x1c00f134250>

In [36]:
model_st_lstm.summary()

In [38]:
# Final evaluation of the model
scores = model_st_lstm.evaluate(X_test, y_test)
scores

[1m16/16[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 74ms/step - accuracy: 0.5240 - loss: 0.6926


[0.692596435546875, 0.5239999890327454]