## Stacked Bi-Directional LSTMs

Reference [Jon Krohn](https://github.com/the-deep-learners/TensorFlow-LiveLessons/blob/master/notebooks/stacked_bidirectional_lstm.ipynb)

In this model, we classify sentiment of movie review from IMDB using a Stacked Bi-directional LSTM

In [None]:

import keras
from keras.datasets import imdb
from keras.preprocessing.sequence import pad_sequences
from keras.models import Sequential
from keras.layers import Dense, Dropout, Embedding, SpatialDropout1D, LSTM
from keras.layers import Bidirectional # note this dependency
from keras.callbacks import ModelCheckpoint
import os
from sklearn.metrics import roc_auc_score
import matplotlib.pyplot as plt
%matplotlib inline

Load & Preprocess Data

In [None]:
# vector-space embedding
n_dim = 64
n_unique_words = 10000
max_review_length = 200
# this can be a bit longer, we are reading our reviews in both directions
# gradients disappear from both ends of the sequence
pad_type = trunc_type = 'pre'
drop_embed = 0.2


(x_train, y_train), (x_valid, y_valid) = imdb.load_data(num_words=n_unique_words)


x_train = pad_sequences(x_train, maxlen=max_review_length, padding=pad_type, truncating=trunc_type, value=0)
x_valid = pad_sequences(x_valid, maxlen=max_review_length, padding=pad_type, truncating=trunc_type, value=0)


Set Hyperparameters

In [None]:
# output directory name
output_dir = 'weighs_stackedLSTM'

# training details
epochs = 6
batch_size = 128

# LSTM layer architecture - note we are adding additional layers
n_lstm_1 = 128
n_lstm_2 = 128
drop_lstm = 0.3  # this made a difference!

Build the model

In [None]:
model = Sequential()
model.add(Embedding(n_unique_words, n_dim, input_length=max_review_length))
model.add(SpatialDropout1D(drop_embed))
model.add(Bidirectional(LSTM(n_lstm_1, dropout=drop_lstm, return_sequences=True)))
# notice return sequence - we loop output to retain the temporal sequence
model.add(Bidirectional(LSTM(n_lstm_2, dropout=drop_lstm)))
model.add(Dense(1, activation='sigmoid'))

In [None]:
# LSTM layer parameters will double - reading in both directions

model.summary()

Compile the model

In [None]:
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

In [None]:
# save epoch outputs

modelcheckpoint = ModelCheckpoint(filepath=output_dir+"/weights.{epoch:02d}.keras")
if not os.path.exists(output_dir):
    os.makedirs(output_dir)

Train the model

* use the GPU with RNNs especially a bi-directional LSTM.  

* the bi-directional nature of the model will take a LOOOONG time on an ordinary CPU



In [8]:
model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, verbose=1, validation_data=(x_valid, y_valid), callbacks=[modelcheckpoint]);

Epoch 1/6
Epoch 2/6
Epoch 3/6
Epoch 4/6
Epoch 5/6
Epoch 6/6


Evaluate the epoch with highest accuracy / lowest loss

In [9]:
# insert the relevant epoch

model.load_weights(output_dir+"/weights.01.keras") # zero-indexed

In [None]:
y_hat = model.predict(x_valid)



In [None]:
plt.hist(y_hat)
_ = plt.axvline(x=0.5, color='orange')


In [None]:
"{:0.2f}".format(roc_auc_score(y_valid, y_hat)*100.0)

Remember that the easy review classification is simple for most models to get to 90% - it gets much harder to classify the last few % of reviews - a 1% increase in ROC is much better performance from 90% to 91% than from 60% to 61%!