## Recurrent Neural Network (RNN) with Torch

Simple tutorial to train your first RNN with torch. More specifically, we will train a Gated Reccurrent Unit (GRU) network.

As a toy dataset, we will use the common MNIST dataset. More information about MNIST can be found at http://yann.lecun.com/exdb/mnist/.

First, we import some dependecies.

In [1]:
require 'torch'
require 'rnn'
require 'optim'
mnist = require 'mnist'

We load and shuffle the data.

In [2]:
train_data = mnist.traindataset()
test_data = mnist.testdataset()

perm = torch.randperm(train_data.label:size(1)):long()
train_data.data = train_data.data:index(1, perm):float()
train_data.label = train_data.label:index(1, perm)

test_data.data = test_data.data:float()

We define some hyperparameters of the network.

RNNs are designed to handle time sequences.

Here, the images have a 28x28 shape, and we will consider them as a sequence of 28 time steps, each time step being a 28-dimensional vector. The network will look at images one column at a time.

In [3]:
inputSize = train_data.data:size(3) -- length of the input
rho = 100000
dropout = .2 -- each hidden neuron will be dropped with this given probability
hiddenSize = 1024 -- number of hidden units of the GRU cell, also equals to the output size of the GRU
classes_n = 10

We create the network and print the network's architecture.

In [4]:
function create_model()
    model = nn.Sequential()
    model:add(nn.SplitTable(1,2))
    -- model:add(nn.Sequencer(nn.GRU(inputSize, hiddenSize, rho, dropout)))
    model:add(nn.Sequencer(nn.LSTM(inputSize, hiddenSize)))
    model:add(nn.SelectTable(-1)) -- last step of output sequence
    model:add(nn.Linear(hiddenSize, classes_n))
    return model
end

In [5]:
model = create_model()
print(model:__tostring())

nn.Sequential {
  [input -> (1) -> (2) -> (3) -> (4) -> output]
  (1): nn.SplitTable
  (2): nn.Sequencer @ nn.LSTM(28 -> 1024)
  (3): nn.SelectTable(-1)
  (4): nn.Linear(1024 -> 10)
}	


We create the loss function. Here, we chose the common cross entropy loss that is suitable for classification.

In [6]:
criterion = nn.CrossEntropyCriterion()

We add one to the labels because the function `nn.CrossEntropyCriterion` expects classes starting from 1 and not 0. Thus, the class number 1 corresponds to the digit 0, ..., the class number 10 corresponds to the digit 9.

In [7]:
train_data.label = (train_data.label + 1.0):float()
test_data.label = (test_data.label + 1.0):float()

Defining now hyperparameters related to the network training.

In [8]:
optimState = {learningRate=0.01, momentum=0.5} -- parameters for the (minibatch) gradient descent algorithm

batchSize = 128
maxEpoch = 20

display_n = 500000 -- display training loss and accuracy every x step
test_n = 500 -- test the model every x step

In [9]:
-- placeholder for mini-batch batch
X_batch = train_data.data[{{1, batchSize}}]:clone():float()
Y_batch = train_data.label[{{1, batchSize}}]:clone():float()

-- placeholder for batch indices
batchIndices = torch.LongTensor(batchSize)

Optionally, we can cast the model created as well as the different variables on GPU.

In [10]:
require 'cutorch'
require 'cunn'
cutorch.setDevice(3) -- chose which GPU to use
model = model:cuda()
criterion = criterion:cuda()
train_data.data = train_data.data:cuda()
test_data.data = test_data.data:cuda()
X_batch = X_batch:cuda()
Y_batch = Y_batch:cuda()

Function that returns the accuracy of a batch of predictions.

In [11]:
function get_accuracy(output_, y_true)
    --- output_: a `Tensor`, output from the model.
    --- y_true: labels
    --- Returns accuracy
    y_true = y_true:double()
    _, y_predicted = torch.max(output_, 2)
    y_predicted = y_predicted:double()
    accuracy = y_true:eq(y_predicted):sum() / y_true:size(1)
    return accuracy, y_predicted
end

We now train the network.

In [12]:
-- model parameters for optimization
local params, gradParams = model:getParameters()
local iteration = 0

local steps_per_epoch = train_data.label:size(1) / batchSize

while true do
   iteration = iteration + 1
    
   batchIndices:random(1, train_data.label:size(1))
   X_batch:copy(train_data.data:index(1, batchIndices))
   Y_batch:copy(train_data.label:index(1, batchIndices))

    ----------------------------------------
    -------------- TRAINING ----------------
    ----------------------------------------
   function feval(params)
      gradParams:zero()
      outputs = model:forward(X_batch)
      loss = criterion:forward(outputs, Y_batch)
      local gradOutputs = criterion:backward(outputs, Y_batch)
      model:backward(X_batch, gradOutputs)
      return loss, gradParams
   end
    
   timer = torch.Timer() -- timer
   optim.adagrad(feval, params, optimState)
   datum_sec = batchSize / 
   timer:time().real
    
   if iteration % display_n == 0 then
          print(string.format("TRAINING – epoch %.2f, loss = %.4f, %.2f datum/sec", iteration / steps_per_epoch, loss, datum_sec))
   end
    ----------------------------------------
    ----------------------------------------
    
    
    ----------------------------------------
    ------------- EVALUATION ---------------
    ----------------------------------------
   if (iteration - 1) % test_n == 0 or iteration / steps_per_epoch >= maxEpoch then
       model:evaluate() -- evaluation mode, i.e. do not use dropout
       local outputs_ = model:forward(test_data.data)
       local loss_ = criterion:forward(outputs, test_data.label)
       local accuracy, y_predicted = get_accuracy(outputs_, test_data.label)
       print(string.format('TEST – epoch %.2f, loss: %.4f, accuracy: %.4f', iteration / steps_per_epoch, loss_, accuracy))
       model:training() -- training mode, i.e. use dropout
   end
    ----------------------------------------
    ----------------------------------------
    
    if iteration / steps_per_epoch > maxEpoch then
        break
   end
end

TEST – epoch 0.00, loss: 2.8839, accuracy: 0.2114	


TEST – epoch 1.07, loss: 0.2373, accuracy: 0.9172	


TEST – epoch 2.14, loss: 0.2161, accuracy: 0.9286	


TEST – epoch 3.20, loss: 0.2019, accuracy: 0.9371	


TEST – epoch 4.27, loss: 0.2028, accuracy: 0.9414	


TEST – epoch 5.34, loss: 0.2098, accuracy: 0.9420	


TEST – epoch 6.40, loss: 0.2170, accuracy: 0.9427	


TEST – epoch 7.47, loss: 0.2233, accuracy: 0.9457	


TEST – epoch 8.54, loss: 0.2309, accuracy: 0.9441	


TEST – epoch 9.60, loss: 0.2460, accuracy: 0.9432	


TEST – epoch 10.67, loss: 0.2557, accuracy: 0.9433	


TEST – epoch 11.74, loss: 0.2603, accuracy: 0.9433	


TEST – epoch 12.80, loss: 0.2640, accuracy: 0.9450	


TEST – epoch 13.87, loss: 0.2721, accuracy: 0.9437	


TEST – epoch 14.94, loss: 0.2756, accuracy: 0.9443	


TEST – epoch 16.00, loss: 0.2802, accuracy: 0.9440	


TEST – epoch 17.07, loss: 0.2842, accuracy: 0.9439	


TEST – epoch 18.14, loss: 0.2870, accuracy: 0.9434	


TEST – epoch 19.20, loss: 0.2910, accuracy: 0.9441	


TEST – epoch 20.00, loss: 0.2932, accuracy: 0.9436	


TEST – epoch 20.00, loss: 0.2933, accuracy: 0.9436	
