# RNN example with Chainer

In this example, we build the language model made of stacked LSTMs with [Penn Treebank](https://catalog.ldc.upenn.edu/docs/LDC95T7/cl93.html) dataset. 

A language model is a probabilistic model over words. It assigns a probability \\(p(w)\\) to a sequence of words \\(w = (w\_1, \ldots, w\_n)\\). We model the probability with a Recurrent Neural Network (RNN). Specifically, we decompose the probability as
\\[p(w) = \prod\_{t=1}^{n} p(w\_t|w\_1, \ldots w\_{t-1})\\] and models the conditional probability on the right hand side with the RNN. At time \\(t\\), the RNN should outputs the probability distribution over words given the previous words \\(w\_1, \ldots w\_{t-1}\\). The RNN holds the information of previous words as a state, written as \\(h\_t\\) at time \\(t\\). RNN simultaneously outputs the probability distribution of next word and updates the internal state. Schematically,
\\[(p(w\_t), h\_t) = \mathrm{RNN}(w\_t, h\_{t-1}). \\]

## Penn Treebank (PTB)

Treebank is a text corpas that annotates syntactic and semantic structure. The Penn Treebank(PTB) is one of the most famous dataset of treebank consists of approximately 4.5 million words. The sentences in the datasets are annotated with POS (part of speech)  taggingIn this tutorial, we do not use the grammatical structure. the just treat the dataset as a bundle of sentences and 

https://catalog.ldc.upenn.edu/docs/LDC95T7/cl93.html

## Procedures

This example takes the following steps:

1. Import packages
2. Prepare dataset
3. Prepare model
4. Setup optimizer
5. Training
6. Save models

## Codes

### 1. Import packages 

In [1]:
from __future__ import division
from __future__ import print_function

import numpy as np

import chainer
import chainer.datasets as D
import chainer.functions as F
import chainer.links as L
import chainer.optimizers as O
from chainer import training
from chainer.training import extensions as E

### 2. Prepare dataset

The following picture show how to create a mini batch from the raw dataset.

![How to create minibatch](../image/chainer_rnn_minibatch.png)
Fig. How to create a minibatch

The raw dataset is a long sequence of integers, each of which corresponds to an ID of single word. We will make a training data that is a list of pairs of the current word and the next words. We will create a mini batch from equally spaced pairs of words. This procedure corresponds to `(*)` in the following code.

In [2]:
class ParallelSequentialIterator(chainer.dataset.Iterator):

    def __init__(self, dataset, batch_size, repeat=True):
        self.dataset = dataset
        self.batch_size = batch_size
        self.epoch = 0
        self.is_new_epoch = False
        self.repeat = repeat
        length = len(dataset)
        self.offsets = [i * length // batch_size for i in range(batch_size)]
        self.iteration = 0

    def get_words(self):
        return [self.dataset[(offset + self.iteration) % len(self.dataset)]
                for offset in self.offsets]

    def __next__(self):
        length = len(self.dataset)
        if not self.repeat and self.iteration * self.batch_size >= length:
            raise StopIteration
        
        # Get current words that will be fed to RNN
        cur_words = self.get_words()
        self.iteration += 1
        # Get next words that will be the target values.
        next_words = self.get_words()

        epoch = self.iteration * self.batch_size // length
        self.is_new_epoch = self.epoch < epoch
        if self.is_new_epoch:
            self.epoch = epoch

        return list(zip(cur_words, next_words))

    @property
    def epoch_detail(self):
        return self.iteration * self.batch_size / len(self.dataset)


    def serialize(self, serializer):
        self.iteration = serializer('iteration', self.iteration)
        self.epoch = serializer('epoch', self.epoch)

In [3]:
# Load the Penn Tree Bank long word sequence dataset
# train/val/test is just an array of integers
train, val, test = D.get_ptb_words()
n_vocab = max(train) + 1

# Get iterators of datasets
batchsize = 20
train_iter = ParallelSequentialIterator(train, batchsize)
val_iter = ParallelSequentialIterator(val, 1, repeat=False)
test_iter = ParallelSequentialIterator(test, 1, repeat=False)

### 3. Prepare model

In [4]:
# Definition of a recurrent net for language modeling
class RNNForLM(chainer.Chain):

    def __init__(self, n_vocab, n_units, train=True):
        super(RNNForLM, self).__init__(
            embed=L.EmbedID(n_vocab, n_units),
            l1=L.LSTM(n_units, n_units),
            l2=L.LSTM(n_units, n_units),
            l3=L.Linear(n_units, n_vocab),
        )
        for param in self.params():
            param.data[...] = np.random.uniform(-0.1, 0.1, param.data.shape)
        self.train = train

    def reset_state(self):
        self.l1.reset_state()
        self.l2.reset_state()

    def __call__(self, x):
        h0 = self.embed(x)
        h1 = self.l1(F.dropout(h0, train=self.train))
        h2 = self.l2(F.dropout(h1, train=self.train))
        y = self.l3(F.dropout(h2, train=self.train))
        return y
    
# Prepare an RNNLM model
rnn = RNNForLM(n_vocab, 650)
model = L.Classifier(rnn)
model.compute_accuracy = False  # we only want the perplexity

gpu = 1
if gpu >= 0:
    chainer.cuda.get_device(gpu).use()
    model.to_gpu()

### 4. Setup optimizer

In [5]:
optimizer = O.SGD(lr=1.0)
optimizer.setup(model)
optimizer.add_hook(chainer.optimizer.GradientClipping(5.))

### 5. Training and 6. Save models

The most typical way of training a RNN is to unfold the RNN to regard it as a simple feed forward neural network (i.e. a computational graph without cycles) and do back propagation as usual. This procedure is known as **Back Propagation Through Time** (BPTT in short). But when the input sequence is long, BPTT is impossible because the whole data cannot fit into memory. In that case, we truncate the graph into short time ranges so that errors does not propagate too long in back propagation. This hurestic is known as **truncated Back Propagation Through Time** (truncated BPTT). 

To realize truncated BPTT in Chainer, we make a customized ``Updater``.

In [6]:
class BPTTUpdater(training.StandardUpdater):

    def __init__(self, train_iter, optimizer, bprop_len, device):
        super(BPTTUpdater, self).__init__(
            train_iter, optimizer, device=device)
        self.bprop_len = bprop_len

    def update_core(self):
        loss = 0
        train_iter = self.get_iterator('main')
        optimizer = self.get_optimizer('main')

        for i in range(self.bprop_len):
            batch = train_iter.__next__()

            # self.converter concatenates the word IDs to matrices and send them to the device
            x, t = self.converter(batch, self.device)

            # Compute the loss at this time step and accumulate it
            loss += optimizer.target(chainer.Variable(x), chainer.Variable(t))

        optimizer.target.cleargrads()  # Clear the parameter gradients
        loss.backward()  # Backprop
        loss.unchain_backward()  # Truncate the graph
        optimizer.update()  # Update the parameters

In [7]:
# Setup trainer
epoch = 20
bproplen = 35
updater = BPTTUpdater(train_iter, optimizer, bproplen, gpu)
trainer = training.Trainer(updater, (epoch, 'epoch'))

In [8]:
# Append an extension for evaluation with validation dataset.
eval_model = model.copy()
eval_rnn = eval_model.predictor
eval_rnn.train = False

trainer.extend(E.Evaluator(
    val_iter, eval_model, device=gpu,
    eval_hook=lambda _: eval_rnn.reset_state()))  # Reset the RNN state at the beginning of each evaluation

In [9]:
# Append an extension for logging
def compute_perplexity(result):
    result['perplexity'] = np.exp(result['main/loss'])
    if 'validation/main/loss' in result:
        result['val_perplexity'] = np.exp(result['validation/main/loss'])

interval = 200
trainer.extend(E.LogReport(postprocess=compute_perplexity,
                           trigger=(interval, 'iteration')))

trainer.extend(E.PrintReport(
        ['epoch', 'iteration', 'perplexity', 'val_perplexity']
), trigger=(interval, 'iteration'))

In [10]:
# Append an extension for saving training snapshots
trainer.extend(E.snapshot())
trainer.extend(E.snapshot_object(model, 'model_iter_{.updater.iteration}'))

In [11]:
trainer.run()

epoch       iteration   perplexity  val_perplexity
[J0           200         1275.53                     
[J0           400         569.226                     
[J0           600         348.365                     
[J0           800         287.389                     
[J0           1000        273.528                     
[J0           1200        236.117                     
[J1           1400        223.513     203.803         
[J1           1600        207.171                     
[J1           1800        201.297                     
[J1           2000        195.856                     
[J1           2200        169.499                     
[J1           2400        165.077                     
[J1           2600        154.577                     
[J2           2800        152.006     151.676         
[J2           3000        158.183                     
[J2           3200        148.043                     
[J2           3400        145.232                   

### (optional) Evaluation with validation dataset

In [12]:
# Evaluate the final model with test dataset.
eval_rnn.reset_state()
evaluator = E.Evaluator(test_iter, eval_model, device=gpu)
result = evaluator()
print('test perplexity:', np.exp(float(result['main/loss'])))

test perplexity: 85.4273086004
