# Chapter 6: Recurrent Neural Network

This notebook is a replica of Chapter 6: Recurrent Neural Network from the Deep Learning with Keras book by Antonio Gulli, Sujit Paul

LSTMs have been used extensively by the **natural language processing (NLP)** community for various applications. One such application is building language models. A language model allows us to predict the probability of a word in a text given the previous words. Language models are important for various higher level tasks such as machine translation, spelling correction, and so on.

A side effect of the ability to predict the next word given previous words is a generative model that allows us to generate text by sampling from the output probabilities. In language modeling, our input is typically a sequence of words and the output is a sequence of predicted words. The training data used is existing unlabeled text, where we set the label $y_t$ at time $t$ to be the input $x_{t+1}$ at time $t+1$.

In this example of using Keras for building LSTMs, we will train a character based language model on the text of *Alice in Wonderland* to predict the next character given 10 previous characters. We have chosen to build a character-based model here because it has a smaller vocabulary and trains quicker. The idea is the same as using a word-based language model, except we use characters instead of words. We will then use the trained model to generate some text in the same style.

In [1]:
import numpy as np
import pandas as pd

import collections

import re, nltk

from keras.models import Sequential
from keras.layers import LSTM, Dense, Dropout, Activation

# Visualization
import seaborn as sns

# this allows plots to appear directly in the notebook
import matplotlib.pyplot as plt

Using TensorFlow backend.


In [2]:
import tensorflow as tf
tf.test.gpu_device_name()

'/device:GPU:0'

In [3]:
%matplotlib inline
%config InlineBackend.figure_format = 'retina'

We read our input text from the text of Alice in Wonderland on the Project Gutenberg website (http://www.gutenberg.org/files/11/11-0.txt). The file contains line breaks and non- ASCII characters, so we do some preliminary cleanup and write out the contents into a variable called `text`:

In [4]:
INPUT_FILE = "data/alice_in_wonderland.txt"

In [5]:
# extract the input as a stream of characters
lines = []
with open(file=INPUT_FILE) as fin:
    for line in fin:
        line = line.strip().lower()
        if len(line) == 0:
            continue
        lines.append(line)
text = " ".join(lines)

Since we are building a character-level LSTM, our vocabulary is the set of characters that occur in the text. There are 60 of them in our case. Since we will be dealing with the indexes to these characters rather than the characters themselves, the following code snippet creates the necessary lookup tables:

In [6]:
# Here chars is the number of features in our character "vocabulary"
chars = sorted(list(set(text)))
nb_chars = len(chars)

In [7]:
print(chars)

[' ', '!', '#', '$', '%', '(', ')', '*', ',', '-', '.', '/', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', ':', ';', '?', '@', '[', ']', '_', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '‘', '’', '“', '”', '\ufeff']


In [8]:
print('Total vocab:', nb_chars)

Total vocab: 60


In [9]:
# creating lookup tables i.e. mapping of unique chars to integers
char2index = dict((c, i) for i, c in enumerate(chars))
index2char = dict((i, c) for i, c in enumerate(chars))

The next step is to create the input and label texts. We do this by stepping through the text by a number of characters given by the `STEP` variable (`1` in our case) and then extracting a span of text whose size is determined by the `SEQLEN` variable (`10` in our case). The next character after the span is our label character:

In [10]:
# create inputs and labels from the text. We do this by stepping
# through the text ${step} character at a time, and extracting a 
# sequence of size ${seqlen} and the next output char. For example,
# assuming an input text "The sky was falling", we would get the 
# following sequence of input_chars and label_chars (first 5 only)
#   The sky wa -> s
#   he sky was ->  
#   e sky was  -> f
#    sky was f -> a
#   sky was fa -> l
SEQLEN = 10
STEP = 1

In [11]:
input_chars = []
label_chars = []
for i in range(0, len(text) - SEQLEN, STEP):
    input_chars.append(text[i:i + SEQLEN])
    label_chars.append(text[i + SEQLEN])

The next step is to vectorize these input and label texts. Each row of the input to the LSTM corresponds to one of the input texts shown previously. There are `SEQLEN` characters in this input, and since our vocabulary size is given by `nb_chars`, we represent each input character as a one-hot encoded vector of size (`nb_chars`). Thus each input row is a tensor of size (`SEQLEN` and `nb_chars`). Our output label is a single character, so similar to the way we represent each character of our input, it is represented as a one-hot vector of size (`nb_chars`). Thus, the shape of each label is `nb_chars`:

In [12]:
# X shape to be [samples, time steps, features]
# Y shape to be [samples, features]
X = np.zeros((len(input_chars), SEQLEN, nb_chars), dtype=np.bool)
y = np.zeros((len(input_chars), nb_chars), dtype=np.bool)
for i, input_char in enumerate(input_chars):
    for j, ch in enumerate(input_char):
        X[i, j, char2index[ch]] = 1
    y[i, char2index[label_chars[i]]] = 1

Finally, we are ready to build our model. We define the LSTM's output dimension to have a size of 128. This is a hyper-parameter that needs to be determined by experimentation. In general, if we choose too small a size, then the model does not have sufficient capacity for generating good text, and you will see long runs of repeating characters or runs of repeating word groups. On the other hand, if the value chosen is too large, the model has too many parameters and needs a lot more data to train effectively. We want to return a single character as output, not a sequence of characters, so `return_sequences=False`. We have already seen that the input to the LSTM is of shape (`SEQLEN` and `nb_chars`). In addition, we set `unroll=True` because it improves performance on the TensorFlow backend.

The LSTM is connected to a dense (fully connected) layer. The dense layer has (`nb_char`) units, which emits scores for each of the characters in the vocabulary. The activation on the dense layer is a `softmax`, which normalizes the scores to probabilities. The character with the highest probability is chosen as the prediction. We compile the model with the categorical cross-entropy loss function, a good loss function for categorical outputs, and the RMSprop optimizer:

In [13]:
# Build the model. We use a single RNN with a fully connected layer
# to compute the most likely predicted output char
HIDDEN_SIZE = 128
BATCH_SIZE = 32
NUM_ITERATIONS = 25
NUM_EPOCHS_PER_ITERATION = 1
NUM_PREDS_PER_EPOCH = 100

In [14]:
model = Sequential()
model.add(LSTM(units=HIDDEN_SIZE, return_sequences=True, input_shape=(SEQLEN, nb_chars), unroll=True))
model.add(Dropout(0.2))
model.add(LSTM(units=HIDDEN_SIZE, return_sequences=False, unroll=True))
model.add(Dropout(0.2))
model.add(Dense(nb_chars))
model.add(Activation("softmax"))

In [15]:
model.compile(loss="categorical_crossentropy", optimizer="rmsprop")

Our training approach is a little different from what we have seen so far. So far our approach has been to train a model for a fixed number of epochs, then evaluate it against a portion of held-out test data. Since we don't have any labeled data here, we train the model for an epoch (`NUM_EPOCHS_PER_ITERATION=1`) then test it. We continue training like this for 25 (`NUM_ITERATIONS=25`) iterations, stopping once we see intelligible output. So effectively, we are training for `NUM_ITERATIONS` epochs and testing the model after each epoch.

Our test consists of generating a character from the model given a random input, then dropping the first character from the input and appending the predicted character from our previous run, and generating another character from the model. We continue this 100 times (`NUM_PREDS_PER_EPOCH=100`) and generate and print the resulting string. The string gives us an indication of the quality of the model:

In [16]:
# We train the model in batches and test output generated at each step
for iteration in range(NUM_ITERATIONS):
    print("=" * 50)
    print("Iteration #: %d" % (iteration))
    
    model.fit(X, y, epochs=NUM_EPOCHS_PER_ITERATION)
    
    # testing model
    # randomly choose a row from input_chars, then use it to 
    # generate text from model for next 100 chars
    test_idx = np.random.randint(len(input_chars))
    test_chars = input_chars[test_idx]
    print("Generating from seed: %s" % (test_chars))
    print(test_chars, end="")
    for i in range(NUM_PREDS_PER_EPOCH):
        X_test = np.zeros((1, SEQLEN, nb_chars))
        for i, ch in enumerate(test_chars):
            X_test[0, i, char2index[ch]] = 1
        pred = model.predict(X_test, verbose=0)[0]
        ypred = index2char[np.argmax(pred)]
        print(ypred, end="")
        # move forward with test_chars + ypred
        test_chars = test_chars[1:] + ypred
print()

Iteration #: 0
Epoch 1/1
Generating from seed: muttering 
Iteration #: 1
Epoch 1/1
Generating from seed: hat i’m go
Iteration #: 2
Epoch 1/1
Generating from seed: ce looked 
Iteration #: 3
Epoch 1/1
Generating from seed:  ‘you are,
Iteration #: 4
Epoch 1/1
Generating from seed: ep instant
Iteration #: 5
Epoch 1/1
Generating from seed: gain!’ sai
Iteration #: 6
Epoch 1/1
Generating from seed: e this las
Iteration #: 7
Epoch 1/1
Generating from seed: ative work
Iteration #: 8
Epoch 1/1
Generating from seed: ’ so alice
Iteration #: 9
Epoch 1/1
Generating from seed:  you didn’
Iteration #: 10
Epoch 1/1
Generating from seed: ttle. ‘’ti
Iteration #: 11
Epoch 1/1
Generating from seed: compliance
Iteration #: 12
Epoch 1/1
Generating from seed: ither with
Iteration #: 13
Epoch 1/1
Generating from seed: at sort it
Iteration #: 14
Epoch 1/1
Generating from seed: ft off whe
Iteration #: 15
Epoch 1/1
Generating from seed: t countrie
Iteration #: 16
Epoch 1/1
Generating from seed: ng, not gr
Iterati

Generating the next character or next word of text is not the only thing you can do with this sort of model. This kind of model has been successfully used to make stock predictions (for more information refer to the article: *Financial Market Time Series Prediction with Recurrent Neural Networks*, by A. Bernal, S. Fok, and R. Pidaparthi, 2012) and generate classical music (for more information refer to the article: *DeepBach: A Steerable Model for Bach Chorales Generation*, by G. Hadjeres and F. Pachet, arXiv:1612.01010, 2016), to name a few interesting applications. Andrej Karpathy covers a few other fun examples, such as generating fake Wikipedia pages, algebraic geometry proofs, and Linux source code in his blog post at: *The Unreasonable Effectiveness of Recurrent Neural Networks at http://karpathy.github.io/2015/05/21/rnn-effectiveness/*.