Welcome to another part of the series. This time we will build a model that predicts the next word (a character actually) based on a few of the previous. We will extend it a bit by asking it for 3 suggestions instead of only 1. Similar models are widely used today. You might be using one without even knowing! Here's one example:

<iframe width="100%" height="480" src="https://www.youtube.com/embed/5DSfFDdybzg" frameborder="0" allowfullscreen></iframe>

# Recurrent Neural Networks

Our weapon of choice for this task will be Recurrent Neural Networks (RNNs). But why? What's wrong with the type of networks we've used so far? Nothing! Yet, they lack something that proves to be quite useful in practice - memory! 

In short, RNN models provide a way to not only examine the current input but the one that was provided one step back, as well. If we turn that around, we can say that the decision reached at time step $t - 1$ directly affects the future at step $t$.

It seems like a waste to throw out the memory of what you've seen so far and start from scratch every time. That's what other types of Neural Networks do. Let's end this madness!

## Definition

RNNs define a recurrence relation over time steps which is given by:

$$S_{t} = f(S_{t-1} * W_{rec} + X_t * W_x)$$

Where $S_t$ is the state at time step $t$, $X_t$ an exogenous input at time $t$, $W_{rec}$ and $W_x$ are weights parameters. The feedback loops gives memory to the model because it can remember information between time steps.

RNNs can compute the current state $S_t$ from the current input $X_t$ and previous state $S_{t-1}$ or predict the next state from $S_{t + 1}$ from the current $S_t$ and current input $X_t$.

# Setup

In [None]:
import numpy as np
np.random.seed(42)
import tensorflow as tf
tf.set_random_seed(42)
from keras.models import Sequential, load_model
from keras.layers import Dense, Activation
from keras.layers import LSTM, Dropout
from keras.layers import TimeDistributed
from keras.layers.core import Dense, Activation, Dropout, RepeatVector
from keras.optimizers import RMSprop
import matplotlib.pyplot as plt
import pickle
import sys
import heapq
import seaborn as sns
from pylab import rcParams

%matplotlib inline

sns.set(style='whitegrid', palette='muted', font_scale=1.5)

rcParams['figure.figsize'] = 12, 5

# Loading the data

We will use Friedrich Nietzsche`s Beyond Good and Evil as a training corpus for our model.

In [None]:
path = 'nietzsche.txt'
text = open(path).read().lower()
print('corpus length:', len(text))

# Preprocessing

Let's find all unique chars in the corpus and create char to index and index to char maps:

In [None]:
chars = sorted(list(set(text)))
char_indices = dict((c, i) for i, c in enumerate(chars))
indices_char = dict((i, c) for i, c in enumerate(chars))

print(f'unique chars: {len(chars)}')

Next, let's cut the corpus into chunks of 40 characters, spacing the sequences by 3 characters. Additionally, we will store the next character (the one we need to predict) for every sequence:

In [None]:
SEQUENCE_LENGTH = 40
step = 3
sentences = []
next_chars = []
for i in range(0, len(text) - SEQUENCE_LENGTH, step):
    sentences.append(text[i: i + SEQUENCE_LENGTH])
    next_chars.append(text[i + SEQUENCE_LENGTH])
print(f'num training examples: {len(sentences)}')

It is time for generating our features and labels. We will use the previously generated sequences and characters that need to be predicted to create 0-1 encoded vectors using the `char_indeces` map:

In [None]:
X = np.zeros((len(sentences), SEQUENCE_LENGTH, len(chars)), dtype=np.bool)
y = np.zeros((len(sentences), len(chars)), dtype=np.bool)
for i, sentence in enumerate(sentences):
    for t, char in enumerate(sentence):
        X[i, t, char_indices[char]] = 1
    y[i, char_indices[next_chars[i]]] = 1

Let's have a look at a training sequence:

In [None]:
sentences[100]

In [None]:
next_chars[100]

The encoded data looks like:

In [None]:
X[0][0]

In [None]:
y[0]

And for the dimensions:

In [None]:
X.shape

In [None]:
y.shape

We have 200285 training examples, each sequence has length of 40 and we have 57 unique chars.

# Building the model

In [None]:
model = Sequential()
model.add(LSTM(128, input_shape=(SEQUENCE_LENGTH, len(chars))))
model.add(Dense(len(chars)))
model.add(Activation('softmax'))

## Training

In [None]:
# optimizer = RMSprop(lr=0.01)
# model.compile(loss='categorical_crossentropy', optimizer=optimizer, metrics=['accuracy'])

# history = model.fit(X, y, validation_split=0.05, batch_size=128, epochs=20, shuffle=True).history

## Saving

It took a lot of time to train our model. Let's save our progress:

In [None]:
# model.save('keras_model.h5')
# pickle.dump(history, open("history.p", "wb"))

And load it back, just to make sure it works:

In [None]:
model = load_model('keras_model.h5')
history = pickle.load(open("history.p", "rb"))

# Evaluation

Let's have a look at how our accuracy and loss change over training epochs:

In [None]:
plt.plot(history['acc'])
plt.plot(history['val_acc'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left');

In [None]:
plt.plot(history['loss'])
plt.plot(history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left');

# Prediction

Finally, time to predict some words using our model. Let's define some helper functions for sampling from out RNN:

In [None]:
def prepare_input(text):
    x = np.zeros((1, SEQUENCE_LENGTH, len(chars)))

    for t, char in enumerate(text):
        x[0, t, char_indices[char]] = 1.
        
    return x
    
def sample(preds, top_n=3):
    preds = np.asarray(preds).astype('float64')
    preds = np.log(preds)
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    
    return heapq.nlargest(top_n, range(len(preds)), preds.take)

def predict_completion(text):
    original_text = text
    generated = text
    completion = ''
    while True:
        x = prepare_input(text)
        preds = model.predict(x, verbose=0)[0]
        next_index = sample(preds, top_n=1)[0]
        next_char = indices_char[next_index]

        text = text[1:] + next_char
        completion += next_char
        
        if len(original_text + completion) + 2 > len(original_text) and next_char == ' ':
            return completion
        
def predict_completions(text, n=3):
    x = prepare_input(text)
    preds = model.predict(x, verbose=0)[0]
    next_indices = sample(preds, n)
    return [indices_char[idx] + predict_completion(text[1:] + indices_char[idx]) for idx in next_indices]

Next, sequences of 40 characters that we will use as seed for our completions. We will use quotes from Friedrich Nietzsche himself:

In [None]:
quotes = [
    "It is not a lack of love, but a lack of friendship that makes unhappy marriages.",
    "That which does not kill us makes us stronger.",
    "I'm not upset that you lied to me, I'm upset that from now on I can't believe you.",
    "And those who were seen dancing were thought to be insane by those who could not hear the music.",
    "It is hard enough to remember my opinions, without also remembering my reasons for them!"
]

In [None]:
for q in quotes:
    seq = q[:40].lower()
    print(seq)
    print(predict_completions(seq, 5))
    print()