# Sequence Classification of Movie Reviews

Sequence classification is a predictive modeling problem where you have some sequence of inputs over space or time, and the task is to predict a category for the sequence. This problem is difficult because the sequences can vary in length, comprise a very large vocabulary of input symbols, and may require the model to learn the long-term context or dependencies between symbols in the input sequence. In this project, you will discover how you can develop LSTM recurrent neural network models for sequence classification problems in Python using the Keras deep learning library. After completing this project, you will know:

* How to develop an LSTM model for a sequence classification problem.
* How to reduce overfitting in your LSTM models through the use of dropout.
* How to combine LSTM models with Convolutional Neural Networks that excel at learning spatial relationships.

Let's get started.

## Simple LSTM for Sequence Classification

The problem that we will use to demonstrate sequence learning in this tutorial is the IMDB movie review sentiment classification problem. We can quickly develop a small LSTM for the IMDB problem and achieve good accuracy. Let's start by importing the classes and functions required for this model and initializing the random number generator to a constant value to ensure we can easily reproduce the results.

In [1]:
## Listing 26.1

We need to load the IMDB dataset. We are constraining the dataset to the top 5,000 words. We also split the dataset into train (50%) and test (50%) sets.

In [2]:
## Listing 26.2

Next, we need to truncate and pad the input sequences to all the same length for modeling. The model will learn that the zero values carry no information, so the sequences are not the same length in terms of content, but the same length vectors are required to perform the computation in Keras.

In [3]:
## Listing 26.3

We can now define, compile and fit our LSTM model. The first layer is the Embedded layer that uses 32 length vectors to represent each word. The next layer is the LSTM layer with 100 memory units (smart neurons). Finally, because this is a classification problem, we use a `Dense` output layer with a single neuron and a sigmoid activation function to make 0 or 1 predictions
for the two classes (good and bad). Because it is a binary classification problem, log loss is used as the loss function (`binary_crossentropy` in Keras). The efficient `ADAM` optimization algorithm is used. The model is fit for only three epochs because it quickly overfits the problem. A large batch size of 64 reviews is used to space out weight updates.

In [4]:
## Listing 26.4

Once fit, we estimate the performance of the model on unseen reviews.

In [5]:
## Listing 26.5

For completeness, here is the complete code listing for this LSTM network on the IMDB dataset.

In [6]:
## Listing 26.6

Running this example produces the following output. Note, if you are using a TensorFlow backend, you may see some warning messages related to PoolAllocator, that you can ignore for now.

In [7]:
## Listing 26.7

You can see that this simple LSTM with little tuning achieves near state-of-the-art results on the IMDB problem. Importantly, this is a template that you can use to apply LSTM networks to your sequence classification problems. Now, let's look at some extensions of this simple model that you may also want to bring to your problems.

## LSTM For Sequence Classification With Dropout

Recurrent Neural networks like LSTM generally have the problem of overfitting. Dropout can be applied between layers using the Dropout Keras layer. We can do this easily by adding new Dropout layers between the Embedding and LSTM layers and the LSTM and Dense output layers. For example:

In [9]:
## Listing 26.8

The full code listing example above with the addition of Dropout layers is as follows:

In [10]:
## Listing 26.9

Running this example provides the following output.

In [11]:
## Listing 26.10

We can see dropout having the desired impact on training with a slightly slower trend in convergence and, in this case, a lower final accuracy. The model could probably use a few more training epochs and achieve a higher skill (try it and see). Alternately, dropout can be applied to the input and recurrent connections of the memory units with the LSTM precisely and separately. Keras provides this capability with parameters on the `LSTM` layer, the `dropout` for configuring the input dropout, and `recurrent_dropout` for configuring the recurrent dropout. For example, we can modify the first example to add dropout to the input and recurrent connections as follows:

In [12]:
## Listing 26.11

The full code listing with a more precise LSTM dropout is listed below for completeness.

In [13]:
## Listing 26.12

Running this example provides the following output.

In [14]:
## Listing 26.13

We can see that the LSTM specific dropout has a more pronounced effect on the convergence of the network than the layer-wise dropout. As above, the number of epochs was kept constant and could be increased to see if the model's skill can be further lifted. Dropout is a powerful technique for combating overfitting in your LSTM models, and it is a good idea to try both methods, but you may bet better results with the gate-specific dropout provided in Keras.

## LSTM and CNN For Sequence Classification

Convolutional neural networks excel at learning the spatial structure in input data. The IMDB review data has a one-dimensional spatial structure in the sequence of words in reviews, and the CNN may pick out invariant features for the good and bad sentiment. These learned spatial features may then be learned as sequences by an LSTM layer. We can easily add one-dimensional CNN and max-pooling layers after the `Embedding` layer, which then feed the consolidated features to the LSTM. We can use a small set of 32 features with a small filter length of 3. The pooling layer can use the standard length of 2 to halve the feature map size. For example, we would create the model as follows:

In [15]:
## Listing 26.14

The full code listing with CNN and LSTM layers is listed below for completeness.

In [16]:
## Listing 26.15

Running this example provides the following output.

In [17]:
## Listing 26.16

We can see that we achieve similar results to the first example, although with fewer weights and faster training time. We would expect that even better results could be achieved if this example was further extended to use dropout.

## Summary

In this project, you discovered how to develop LSTM network models for sequence classification predictive modeling problems. Specifically, you learned:

* How to develop a simple single-layer LSTM model for the IMDB movie review sentiment classification problem.
* How to extend your LSTM model with layer-wise and LSTM-specific dropout to reduce overfitting.
* How to combine the spatial structure learning properties of a Convolutional Neural Network with the sequence learning of an LSTM.