# Import Module


 For this model, I am using imdb data from Keras datasets. Basically, this dataset has 25,000 movie reviews from IMDB, labeled by sentiment(either positive or negative). The reviews have been preprossed, each review is encoded as a sequence of word indexes which are (integers). Words are indexed by overall frequency in the dataset.

In [None]:
from __future__ import print_function
import keras 
import numpy as np 
from keras.models import Sequential, Model
from keras.layers import Dense, Dropout, BatchNormalization 
from keras.layers import LSTM, Embedding, Input, merge, Bidirectional
from keras.preprocessing import sequence
from keras import optimizers

import time
import os

from keras.datasets import imdb

# SpecifyParameters

In [None]:
max_features = 2000
max_len = 200

batch_size = 32
epochs = 3
n_classes = 2

embedding_dim = 128
lstm_layer_dim = 64
n_val_samples = 5000
learning_rate = 0.01
decay = learning_rate / epochs


# Load Data

We need to load the IMDB dataset. We are constraining the dataset to the top 2,000 words. We also split the dataset into training, testing and validation sets.

Notice that the data has been already pre-processed, where all the words have numbers, and the reviews come in as a vector with the words that the review contains. 

The output comes as a vector of 1's and 0's, where 1 is a positive sentiment for the review, and 0 is negative.

In [None]:
(x_train, y_train),(x_test,y_test) = imdb.load_data(num_words = max_features)
x_train = x_train[:-n_val_samples]
y_train = y_train[:-n_val_samples]
x_val = x_train[-n_val_samples:]
y_val = y_train[-n_val_samples:]
print('x_train Shape: ', x_train.shape)
print('y_train Shape: ', y_train.shape)
print('x_val Shape: ', x_val.shape)
print('y_val Shape: ', y_val.shape)
print('x_test Shape: ', x_test.shape)
print('y_test Shape: ', y_test.shape)

## Preprocess input data

Next, we'll turn the input vectors into (0,1)-vectors. For example, if the pre-processed vector contains the number 14, then in the processed vector, the 14th entry will be 1. 


Also, we need to truncate and pad the input sequences so that they are all the same length for modeling. The model will learn the zero values carry no information so indeed the sequences are not the same length in terms of content, but same length vectors is required to perform the computation in Keras.

In [None]:
x_train = sequence.pad_sequences(x_train, maxlen= max_len)
x_test = sequence.pad_sequences(x_test, maxlen = max_len)
x_val = sequence.pad_sequences(x_val, maxlen= max_len)

print('After Padding x_train Shape: ', x_train.shape)
print('After padding x_test Shape: ', x_test.shape)
print('After padding x_val Shape: ', x_val.shape)

And we'll also one-hot encode the output.

In [None]:
y_train = keras.utils.to_categorical(y_train, n_classes)
y_test = keras.utils.to_categorical(y_test, n_classes)
y_val = keras.utils.to_categorical(y_val, n_classes)
print('Training set labels size: ' , y_train.shape)
print('Test set labels size: ', y_test.shape)
print('Evaluating set label size: ',y_val.shape)

# Build Model Architecture


The first layer is the Embedded layer that uses 128 length vectors to represent each word. Then we add batch normalization layer here to normalize the value after embedding for the next layer. The next layer is the Bi-directional LSTM layer with 64 memory units. Then we add a dropout layer to reduce overfitting. 

Finally, because this is a classification problem we use a Dense output layer with a single neuron and a softmax activation function to produce the probability of each label('positive' or 'negative') for this problem.

In [None]:
#Option1: Sequential Model 
model = Sequential()
model.add(Embedding(max_features, embedding_dim, input_length = max_len))
model.add(BatchNormalization())
model.add(Bidirectional(LSTM(lstm_layer_dim)))
model.add(Dropout(0.25))
model.add(Dense(2, activation = 'softmax'))

model.summary()

In [None]:
#Option2: Functional API 
sequence = Input(shape = (max_len, ), dtype = np.int32)
embedding = Embedding(max_features, embedding_dim, input_length = max_len)(sequence)
batch_norm = BatchNormalization()(embedding)

bi_lstm = Bidirectional(LSTM(lstm_layer_dim))(batch_norm)
drop_out = Dropout(0.25)(bi_lstm)
preds = Dense(2,activation='softmax')(drop_out)
model = Model(sequence, preds)
model.summary()

# Compilation
Compile the model here. Feel free to experiment with different optimizers!

In [None]:
##Compile the model using a loss function and an optimizer.
sgd = optimizers.SGD(lr = learning_rate, decay = decay, momentum= 0.9, nesterov= True)
model.compile(loss = 'categorical_crossentropy',
              optimizer = sgd, 
              metrics =['accuracy'])