# 1. Intro
Two sessions ago we saw how to use a feedforward neural network in order to classify a sentence via a bag of words representation of the sentence. This discarded all order information in the sentence and just preserved whether or not a keyword was contained in a document. In this session we will be classifying sentences using RNNs in order to represent the sentence as an ordered sequence of words rather than a bag of words.

# 2. Data
Again like we did in lab 4, download text from the IMDB dataset to create a sentiment analysis dataset.

In [None]:
from keras.datasets import imdb

((train_data, train_labels), (test_data, test_labels)) = imdb.load_data(num_words=10000, start_char=1, oov_char=2, index_from=2)

vocab = ['PAD', 'START', 'UNKNOWN'] + sorted(imdb.get_word_index().keys(), key=imdb.get_word_index().get)

index2token = { index: token for (index, token) in enumerate(vocab) }
token2index = { token: index for (index, token) in enumerate(vocab) }

Here is an example of what a sentence looks like:

In [None]:
print(train_data[0])
print()
print(' '.join(index2token[i] for i in train_data[0]))

At the moment there is a problem which is that the sentences are all of different lengths. This means that it is impossible to store the datasets as matrices (regular rectangular structures of numbers) and Keras expects all its inputs to be regularly shaped.

In [None]:
print(len(train_data[0]))
print(len(train_data[1]))
print(len(train_data[2]))

To fix this we need to make all sentences equal in length to the longest sentence. We can do this by adding a special word to each short sentence which we will then tell the neural network to ignore. This word is called a pad token. The pad token will be represented by the index 0 which means that we can create a matrix right away with just zeros and then put in the actual word indexes in their places, leaving every where else with zeros as padding.

In [None]:
import numpy as np

max_len_train = max(len(sent) for sent in train_data.tolist())
padded_train_data = np.zeros([len(train_data), max_len_train], np.int32)
for (i, sent) in enumerate(train_data.tolist()):
    padded_train_data[i, :len(sent)] = sent #numpy magic! (set the first len(sent) elements in row i to the contents of sent)

#Do the same for the test set

Is is what our dataset looks like now. Note how much memory is being 'wasted' on pad tokens.

In [None]:
print(max_len_train)
print(padded_train_data[0].tolist())

# 3. Sequence Classification via RNNs
We will now write our Keras neural network to learn to classify these sentences. Note that RNNs are significantly slower than feedforward neural nets when it comes to classifying long sequences.

In [None]:
from keras import models
from keras import layers

rnn_model = models.Sequential()
rnn_model.add(layers.Embedding(len(vocab), 8, input_length=None, mask_zero=True)) #leave input length as None to tell Keras that it can be anything and also mask all zero inputs so that they get ignored
rnn_model.add(layers.SimpleRNN(8))
rnn_model.add(layers.Dense(1, activation='sigmoid')) #output yes (1) or no (0)

rnn_model.compile(
        optimizer='adam',
        loss='binary_crossentropy',
        metrics=['acc']
     )

history = rnn_model.fit(
        padded_train_data, train_labels,
        epochs=2,
        batch_size=128,
        validation_data=(padded_test_data, test_labels)
    )

## 3.1. Multi-Layer RNNs
If you have time to spare, you can create two layers of RNNs. The first RNN scans over the inputs whilst the second RNN scans on the first RNN's intermediate states. This allows the second RNN to work with a processed version of the input rather than with the embedded words directly. In order to get access to the intermediate states of an RNN you need to use the "return_sequences=True" parameter in the RNN function.

In [None]:
#Make a sequence classifier with two layers of RNNs.

## 3.2. Bi-Directional RNNs
RNNs, especially simple RNNs, have a problem with remembering long sequences. Look at the diagram illustrating what an unrolled RNN looks like. The path from the first word in the sentence to the output is longer than the path from the last word to the output. This means that the first word has to go through more non-linear transformations before influencing the output than the last word. As a consequence, the first word tends to be forgotten. One trick that used to be applied was to scan the sentence in reverse order so that the first words are remembered more than the last words. You can try it by using the "go_backwards=True" parameter in the SimpleRNN function.

In [None]:
#Make a sequence classifier with an RNN that goes backwards.

Of course there's no sense in choosing between scanning the sentences in one direction or another. You can instead go both ways at once. This is done using two RNNs at once, one going forward and one going backward. The two final states produced by the two RNNs are then concatenated into one vector which represents the sequence. In Keras you make a bidirectional RNN by putting the RNN function inside a "layers.Bidirectional" function.

In [None]:
#Make a sequence classifier with a bi-directional RNN.

# 5. Further exercises

Now try to improve the accuracy of your classifiers.