# Import Module


In [None]:
import keras 
import numpy as np 
from keras.models import Sequential, Model
from keras.layers import Dense, Dropout, BatchNormalization 
from keras.layers import LSTM, Embedding, Input, merge, Bidirectional
from keras.preprocessing import sequence
from keras.optimizers import SGD
import time
import os

from keras.datasets import imdb

# Define Parameters

In [None]:
max_features = 2000
max_len = 200
batch_size = 32
epochs = 2
n_classes = 2
embedding_dim = 256
lstm_layer_dim = 64
n_val_samples = 5000
learning_rate = 0.01
decay = learning_rate / epochs

# Load Data

We need to load the IMDB dataset. We are constraining the dataset to the top 2,000 words. We also split the dataset into train (50%) and test (50%) sets.

Notice that the data has been already pre-processed, where all the words have numbers, and the reviews come in as a vector with the words that the review contains. 

The output comes as a vector of 1's and 0's, where 1 is a positive sentiment for the review, and 0 is negative.

In [None]:
(x_train, y_train),(x_test,y_test) = imdb.load_data(num_words = max_features)
x_train = x_train[:-n_val_samples]
y_train = y_train[:-n_val_samples]

print('x_train Shape: ', x_train.shape)
print('y_train Shape: ', y_train.shape)
print('x_test Shape: ', x_test.shape)
print('y_test Shape: ', y_test.shape)

## Preprocess input data

Next, we'll turn the input vectors into (0,1)-vectors. For example, if the pre-processed vector contains the number 14, then in the processed vector, the 14th entry will be 1. 


Also, we need to truncate and pad the input sequences so that they are all the same length for modeling. The model will learn the zero values carry no information so indeed the sequences are not the same length in terms of content, but same length vectors is required to perform the computation in Keras.

In [None]:
x_train = sequence.pad_sequences(x_train, maxlen= max_len)
x_test = sequence.pad_sequences(x_test, maxlen = max_len)

print('After Padding x_train Shape: ', x_train.shape)
print('After padding x_test Shape: ', x_test.shape)

And we'll also one-hot encode the output.

In [None]:
y_train = keras.utils.to_categorical(y_train, n_classes)
y_test = keras.utils.to_categorical(y_test, n_classes)

print('Training set labels size: ' , y_train.shape)
print('Test set labels size: ', y_test.shape)

# Build Model 


The first layer is the Embedded layer that uses 128 length vectors to represent each word. Then we add batch normalization layer here to normalize the value after embedding for the next layer. The next layer is the Bi-directional LSTM layer with 64 memory units. Then we add a dropout layer to reduce overfitting. 

Finally, because this is a classification problem we use a Dense output layer with a single neuron and a softmax activation function to produce the probability of each label('positive' or 'negative') for this problem.

In [None]:
#Option1: Sequential Model 
model = Sequential()
model.add(Embedding(max_features, embedding_dim, input_length = max_len))
model.add(BatchNormalization())
model.add(Bidirectional(LSTM(lstm_layer_dim)))
model.add(Dropout(0.25))
model.add(Dense(2, activation = 'softmax'))

model.summary()

In [None]:
#Option2: Functional API 
sequence = Input(shape = (max_len, ), dtype = np.int32)
embedding = Embedding(max_features, embedding_dim, input_length = max_len)(sequence)
batch_norm = BatchNormalization()(embedding)

bi_lstm = Bidirectional(LSTM(lstm_layer_dim))(batch_norm)
drop_out = Dropout(0.25)(bi_lstm)
preds = Dense(2,activation='softmax')(drop_out)

model = Model(sequence, preds)
model.summary()

# Compilation
Compile the model here. Feel free to experiment with different optimizers!

In [None]:
#Compile the model using a loss function and an optimizer.
sgd = SGD(lr = learning_rate, decay = decay, momentum= 0.9, nesterov= True)
model.compile(loss = 'categorical_crossentropy',
              optimizer = sgd, 
              metrics =['accuracy'])

# Training
Run the model here. Feel free to experiment with different batch_size, and number of epochs!

In [None]:
#Train Model 
model.fit(x_train,y_train, 
         epochs= epochs,
         batch_size = batch_size,
         verbose = 1,
         validation_split=0.2,
         shuffle=True)

In [10]:
#Optional
from keras.callbacks import ModelCheckpoint,EarlyStopping
checkpoint = ModelCheckpoint(filepath = 'imdb.model_best.hdf5',
                             verbose = 1, 
                             monitor = 'val_acc',
                            save_best_only = True)
earlystopping = EarlyStopping(monitor ='val_acc', min_delta=0)

model.fit(x_train,y_train, 
         epochs= epochs,
         batch_size = batch_size,
         verbose = 1,
         validation_split=0.2,
         shuffle=True,
         callbacks =[checkpoint, earlystopping])

Train on 16000 samples, validate on 4000 samples
Epoch 1/2

Epoch 00001: val_acc improved from -inf to 0.81225, saving model to imdb.model_best.hdf5
Epoch 2/2

Epoch 00002: val_acc improved from 0.81225 to 0.83425, saving model to imdb.model_best.hdf5


<keras.callbacks.History at 0x7fb755bd2860>

When you set **verbose** to 1, you will be able to see the log line printed under after every batch.. 

After creating the **checkpoint**, you pass it as a parameter when you fit the model.

When we set 0.2 as **validation_split ratio**, the function will set 20% of training data as validation set.

By setting **save_best_only** parameter as True you can tell the model to save only the weights to get the best accuracy to the validation set.

**ModelCheckpoint**: it will save the model after each epoch.
* $filepath$: string, path to save the model file.
monitor: quantity to monitor.
* $verbose$: verbosity mode, 0 or 1.
* save_best_only: if save_best_only=True, the latest best model according to the quantity monitored will not be overwritten.
* $mode$: one of {auto, min, max}. If save_best_only=True, the decision to overwrite the current save file is made based on either the maximization or the minimization of the monitored quantity. For val_acc, this should be max, for  val_loss this should be min, etc. In auto mode, the direction is automatically inferred from the name of the monitored quantity.

**EarlyStopping**: it will stop training when a monitored quantity has stopped improving.
* $monitor$: quantity to be monitored.
* min_delta: minimum change in the monitored quantity to qualify as an improvement, i.e. an absolute change of less than min_delta, will count as no improvement.
* $patience$: number of epochs with no improvement after which training will be stopped.
* $verbose$: verbosity mode.
* $mode$: one of {auto, min, max}. In min mode, training will stop when the quantity monitored has stopped decreasing; in max mode it will stop when the quantity monitored has stopped increasing; in auto mode, the direction is automatically inferred from the name of the monitored quantity.




# Evaluation
This will give you the accuracy and scores of the model, as evaluated on the testing set. 

In [11]:
#evaluate test accuracy
socres,acc = model.evaluate(x_test,y_test,
                            batch_size = batch_size,
                            verbose=1)
print('Test Accuracy: %.2f%%' % (acc*100))

Test Accuracy: 83.84%


# Prediction

In [12]:
# Make predictions 
predictions = model.predict(x_test[:10])

In [13]:
print('Predictons for the first 10 test samples: ')
predictions

Predictons for the first 10 test samples: 


array([[0.383759  , 0.61624104],
       [0.11922486, 0.88077515],
       [0.7489442 , 0.25105575],
       [0.59171015, 0.40828985],
       [0.00688119, 0.99311876],
       [0.22190663, 0.77809334],
       [0.4930913 , 0.5069087 ],
       [0.95532894, 0.04467105],
       [0.25497398, 0.745026  ],
       [0.17399378, 0.82600623]], dtype=float32)