# Recurrent Neural Net (RNN)
Now we will build a (time-series) recurrent neural net to do sentiment classification with the IMDB Movie reviews dataset.

## Imports
First, let's import everything we'll need.

In [4]:
import numpy as np
import matplotlib.pyplot as plt

from keras.datasets import imdb
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation
from keras.layers import LSTM, RNN, GRU
from keras.layers.convolutional import Conv1D, MaxPooling1D
from keras.layers import Bidirectional
from keras.layers import Embedding, BatchNormalization
from keras.layers import Average
from keras.preprocessing import sequence
from keras.utils import np_utils

## Load Data
Now let's load the mnist data.

In [15]:
num_classes = 2
max_idx = 10000

# Constrain dataset to only select top 10000 words
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_idx, index_from=3)

print("x_train shape is: ", x_train.shape)
print("y_train shape is: ", y_train.shape)

print("x_test shape is: ", x_test.shape)
print("y_test shape is: ", y_test.shape)


#save this for later
labels = y_test

x_train shape is:  (25000,)
y_train shape is:  (25000,)
x_test shape is:  (25000,)
y_test shape is:  (25000,)


Let's first take a quick look at what our data looks like:

In [6]:
print("X_train set:", x_train[0])
print("The length of this review is: ", len(x_train[0]))
print("Y_train set:", y_train [0])

X_train set: [1, 14, 22, 16, 43, 530, 973, 1622, 1385, 65, 458, 4468, 66, 3941, 4, 173, 36, 256, 5, 25, 100, 43, 838, 112, 50, 670, 2, 9, 35, 480, 284, 5, 150, 4, 172, 112, 167, 2, 336, 385, 39, 4, 172, 4536, 1111, 17, 546, 38, 13, 447, 4, 192, 50, 16, 6, 147, 2025, 19, 14, 22, 4, 1920, 4613, 469, 4, 22, 71, 87, 12, 16, 43, 530, 38, 76, 15, 13, 1247, 4, 22, 17, 515, 17, 12, 16, 626, 18, 2, 5, 62, 386, 12, 8, 316, 8, 106, 5, 4, 2223, 5244, 16, 480, 66, 3785, 33, 4, 130, 12, 16, 38, 619, 5, 25, 124, 51, 36, 135, 48, 25, 1415, 33, 6, 22, 12, 215, 28, 77, 52, 5, 14, 407, 16, 82, 2, 8, 4, 107, 117, 5952, 15, 256, 4, 2, 7, 3766, 5, 723, 36, 71, 43, 530, 476, 26, 400, 317, 46, 7, 4, 2, 1029, 13, 104, 88, 4, 381, 15, 297, 98, 32, 2071, 56, 26, 141, 6, 194, 7486, 18, 4, 226, 22, 21, 134, 476, 26, 480, 5, 144, 30, 5535, 18, 51, 36, 28, 224, 92, 25, 104, 4, 226, 65, 16, 38, 1334, 88, 12, 16, 283, 5, 16, 4472, 113, 103, 32, 15, 16, 5345, 19, 178, 32]
The length of this review is:  218
Y_train set:

As you can see, the features training set contains reviews encoded as a sequences of integers (indexed by frequency in the data set), and the label is a '1' for a positive review, and '0' for a negative review.

Because of this, we will later add an embedding layer that maps each word to an "embedded" vector.

## Format Data
In order to feed the data into a neural network, we must make sure that all training and testing examples are the same length.

In [16]:
# Turn all training examples to length of 400
max_len = 400

x_train = sequence.pad_sequences(x_train, max_len)
x_test = sequence.pad_sequences(x_test, max_len)

print("new x_train shape is: ", x_train.shape)
print("new x_test shape is: ", x_test.shape)

new x_train shape is:  (25000, 400)
new x_test shape is:  (25000, 400)


## Build Model
Now we can build our RNN/LSTM model.

In [8]:
# Create model
model = Sequential()

# Add embedding to words
model.add(Embedding(input_dim=max_idx+1, output_dim=64,
                    input_length=max_len))

# Add 1D Convolutional layer
model.add(Conv1D(filters=32, kernel_size=3, padding='same', activation='relu'))
model.add(Dropout(.15))
model.add(MaxPooling1D(pool_size=2))

# Add bidirectional LSTM layer
model.add(Bidirectional(LSTM(16, return_sequences=False, recurrent_dropout=.15)))
model.add(Dropout(.15))

# Add time-ordered dense layer
model.add(Dense(32))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Dropout(.2))

# Add final softmax layer
model.add(Dense(1))
model.add(Activation('sigmoid'))

# Compile model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

Lets take a quick look at our model:

In [9]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (None, 400, 64)           640064    
_________________________________________________________________
conv1d_1 (Conv1D)            (None, 400, 32)           6176      
_________________________________________________________________
dropout_1 (Dropout)          (None, 400, 32)           0         
_________________________________________________________________
max_pooling1d_1 (MaxPooling1 (None, 200, 32)           0         
_________________________________________________________________
bidirectional_1 (Bidirection (None, 32)                6272      
_________________________________________________________________
dropout_2 (Dropout)          (None, 32)                0         
_________________________________________________________________
dense_1 (Dense)              (None, 32)                1056      
__________

## Train Model
Now let's fit the model.

In [11]:
model.fit(x_train, y_train, batch_size=128,
          epochs=5, verbose=1, validation_data=(x_test, y_test))

Train on 25000 samples, validate on 25000 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x4a815f8>

## Evaluate Model
Let's see how well our model did.

In [12]:
score = model.evaluate(x_test, y_test)

print("Test score:", score[0])
print("Test accuracy:", score[1])

results = model.predict(x_test)

Test score: 0.4056099704217911
Test accuracy: 0.84444


Clearly, we get pretty solid results (though it does look like this model is overfitting).

Just to check, let's compare the actual first review with what our RNN predicts:

In [26]:
word_to_id = imdb.get_word_index()
word_to_id = {key: (value+3) for key, value in word_to_id.items()}
word_to_id["<PAD>"] = 0
word_to_id["<START>"] = 1
word_to_id["<UNK>"] = 2
id_to_word = {value: key for key, value in word_to_id.items()}

print("Positive example:\n")
print(' '.join(id_to_word[id] for id in x_test[10]))
pred = "Positive" if results [10][0] > 0.5 else "Negative"
print("The model predicted: {} {}".format(pred, results[10]))

print("Negative example:\n")
print(' '.join(id_to_word[id] for id in x_test[3]))
pred = "Positive" if results [3][0] > 0.5 else "Negative"
print("The model predicted: {} {}".format(pred, results[3]))

Positive example:

<PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PAD> <PA

## Compare to Regular Neural Network

Let's see how this would compare to a regular Dense neural network.

In [25]:
# Create Model
model = Sequential()

# Create first dense layer
model.add(Dense(512, input_shape=(400,)))
model.add(Activation('relu'))

# Create second dense layer
model.add(Dense(512))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Dropout(0.2))

# Create third dense layer
model.add(Dense(256))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Dropout(0.2))

# Create final sigmoid layer
model.add(Dense(1))
model.add(Activation('sigmoid'))

# Compile model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

Now fit the model:

In [17]:
model.fit(x_train, y_train, batch_size=128,
          epochs=5, verbose=1, validation_data=(x_test, y_test))

Train on 25000 samples, validate on 25000 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x10b47240>

In [18]:
score = model.evaluate(x_test, y_test)

print("Test score:", score[0])
print("Test accuracy:", score[1])

Test score: 0.9144534546279908
Test accuracy: 0.5108


Wow - what a difference between a vanilla neural net and the recurrent neural net. The regular neural net barely yields an accuracy over 50%.