In [12]:
import numpy
from keras.datasets import imdb
from keras.models import Sequential
from keras.layers import LSTM,Dense,Convolution1D,MaxPooling1D
from keras.layers.embeddings import Embedding
from keras.preprocessing import sequence
#fix random seed
numpy.random.seed(7)

We need to load the IMDB dataset. We are constraining the dataset to the top 5,000 words. We also split the dataset into train (50%) and test (50%) sets.

In [2]:
#load the dataset
top_words=5000
(X_train,y_train),(X_test,y_test)=imdb.load_data(nb_words=top_words)

Next, we need to truncate and pad the input sequences so that they are all the same length for modeling. The model will learn the zero values carry no information so indeed the sequences are not the same length in terms of content, but same length vectors is required to perform the computation in Keras.

In [3]:
#truncate adn pad input sequences
max_review_length=500
X_train=sequence.pad_sequences(X_train,maxlen=max_review_length)
X_test=sequence.pad_sequences(X_test,maxlen=max_review_length)


We can now define, compile and fit our LSTM model.

The first layer is the Embedded layer that uses 32 length vectors to represent each word. The next layer is the LSTM layer with 100 memory units (smart neurons). Finally, because this is a classification problem we use a Dense output layer with a single neuron and a sigmoid activation function to make 0 or 1 predictions for the two classes (good and bad) in the problem.

Because it is a binary classification problem, log loss is used as the loss function (binary_crossentropy in Keras). The efficient ADAM optimization algorithm is used. The model is fit for only 2 epochs because it quickly overfits the problem. A large batch size of 64 reviews is used to space out weight updates.

In [4]:
#create the model
embedding_vector_length=32
model=Sequential()
model.add(Embedding(top_words,embedding_vector_length,input_length=max_review_length))
model.add(LSTM(100))
model.add(Dense(1,activation='sigmoid'))
model.compile(loss='binary_crossentropy',optimizer='adam',metrics=['accuracy'])
model.fit(X_train,y_train,nb_epoch=2,batch_size=64,verbose=1)


Epoch 1/2
Epoch 2/2


<keras.callbacks.History at 0x7f155d5e3a50>

In [5]:
#Final evaluation of the model
scores=model.evaluate(X_test,y_test,verbose=0)
print("Accuracy: %.2f%%"% (scores[1]*100))

Accuracy: 83.70%


## LSTM with Dropout
Recurrent Neural networks like LSTM generally have the problem of overfitting.

Dropout can be applied between layers using the Dropout Keras layer. We can do this easily by adding new Dropout layers between the Embedding and LSTM layers and the LSTM and Dense output layers. We can also add dropout to the input on the Embedded layer by using the dropout parameter.

In [7]:
model=Sequential()
model.add(Embedding(top_words,embedding_vector_length,input_length=max_review_length))
model.add(LSTM(100,dropout_W=0.2,dropout_U=0.2))
model.add(Dense(1,activation='sigmoid'))
model.compile(loss='binary_crossentropy',optimizer='adam',metrics=['accuracy'])
model.fit(X_train,y_train,nb_epoch=1,batch_size=64)
#final evaluation
scores=model.evaluate(X_test,y_test,verbose=1)

Epoch 1/1


## LSTM with CNN
Convolutional neural networks excel at learning the spatial structure in input data.

The IMDB review data does have a one-dimensional spatial structure in the sequence of words in reviews and the CNN may be able to pick out invariant features for good and bad sentiment. This learned spatial features may then be learned as sequences by an LSTM layer.

We can easily add a one-dimensional CNN and max pooling layers after the Embedding layer which then feed the consolidated features to the LSTM. We can use a smallish set of 32 features with a small filter length of 3. The pooling layer can use the standard length of 2 to halve the feature map size

In [13]:
model = Sequential()
model.add(Embedding(top_words, embedding_vector_length, input_length=max_review_length))
model.add(Convolution1D(nb_filter=32, filter_length=3, border_mode='same', activation='relu'))
model.add(MaxPooling1D(pool_length=2))
model.add(LSTM(100))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy',optimizer='adam',metrics=['accuracy'])
model.fit(X_train,y_train,nb_epoch=2,verbose=1)
model.evaluate(X_test,y_test,verbose=0)
print scores[1]*100

Epoch 1/2
Epoch 2/2
79.88
