# Hands-on tutorial on Convnets with Torch

Adapted from [assignment 2](http://cs231n.github.io/assignment2/) and [assignment 3](http://cs231n.github.io/assignment3/) of CS231N (the Convnet course at Stanford)

Pre-requisites:

CS231N: [this part](http://cs231n.github.io/neural-networks-3) and [this part](http://cs231n.github.io/convolutional-networks/)




Math/linear algebra: [tensors](http://www.physlink.com/Education/AskExperts/ae168.cfm) (N-rank generalizations of numbers (0-tensors), vectors (1-tensors) and matrices (2-tensors))

Basics of Torch: [Deep Learning with Torch: the 60-minute blitz](https://github.com/soumith/cvpr2015/blob/master/Deep%20Learning%20with%20Torch.ipynb)


[Install Torch](http://torch.ch/docs/getting-started.html)

# Warm-up exercise: Add two tensors
https://github.com/torch/torch7/blob/master/doc/maths.md#res-torchaddres-tensor1-tensor2

In [None]:
function addTensors(a,b)
    return -- your code here
end

In [None]:
a = torch.ones(8,2)
b = torch.Tensor(2,8):fill(4)
print(addTensors(a,b))

# Load the data
Load up part of CIFAR-10 data, which are 32x32 colored images.

In [None]:
-- os.execute('wget -c https://s3.amazonaws.com/torch7/data/cifar10torchsmall.zip')
-- os.execute('unzip cifar10torchsmall.zip')
trainset = torch.load('cifar10-train.t7')
testset = torch.load('cifar10-test.t7')
classes = {'airplane', 'automobile', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck'}

In [None]:
print(trainset)

# Visualize some images


In [None]:
math.randomseed(os.time())
-- see http://stackoverflow.com/a/20157671

In [None]:
require 'image'

random_number= math.random(10000) 
img=image.scale(trainset.data[random_number],200 ) -- increase image size 

img=itorch.image(img) -- display the random_number-th image in dataset
print(classes[trainset.label[random_number]])

# Pre-process data


In [None]:
-- ignore setmetatable for now, it is a feature beyond the scope of this tutorial. It sets the index operator.
setmetatable(trainset, 
    {__index = function(t, i) 
                    return {t.data[i], t.label[i]} 
                end}
);
trainset.data = trainset.data:double() -- convert the data from a ByteTensor to a DoubleTensor.

function trainset:size() 
    return self.data:size(1) 
end

In [None]:
mean = {} -- store the mean, to normalize the test set in the future
stdv  = {} -- store the standard-deviation for the future
for i=1,3 do -- over each image channel
    mean[i] = trainset.data[{ {}, {i}, {}, {}  }]:mean() -- mean estimation
    print('Channel ' .. i .. ', Mean: ' .. mean[i])
    trainset.data[{ {}, {i}, {}, {}  }]:add(-mean[i]) -- mean subtraction
    
    stdv[i] = trainset.data[{ {}, {i}, {}, {}  }]:std() -- std estimation
    print('Channel ' .. i .. ', Standard Deviation: ' .. stdv[i])
    trainset.data[{ {}, {i}, {}, {}  }]:div(stdv[i]) -- std scaling
end

# Train a ConvNet!

The architecture is conv-relu-pool-Dense-Logsoftmax, where the conv layer uses stride-1.

Try 20 filters, of size 5x5.
The pool layer uses non-overlapping
  2x2 pooling regions.

Hint: since the input image is a (3,32,32)-tensor, and stride=1, padding=0,then the output of the conv layer is of size: 

(output channels, (32-5)/1+1, (32-5)/1+1 ), see http://cs231n.github.io/convolutional-networks/#conv


In [None]:
require 'nn';

In [None]:
-- your code here

In [None]:
criterion = nn.ClassNLLCriterion()
trainer = nn.StochasticGradient(net, criterion)
trainer.learningRate = 0.0001
trainer.maxIteration = 10 -- just do 10 epochs of training.

In [None]:
trainer:train(trainset)

###Questions: 
###1. why the output layer is LogSoftmax, and not Softmax?
###2. for the criterion, why do we minimize the cross entropy? instead of, for example, maximizing accuracy?

(your answers here)

In [None]:
net:zeroGradParameters() -- zero the internal gradient buffers of the network 

Let's look at the performance of this model.

Let's take an image at random and see if it classifies it correctly

In [None]:
testset.data = testset.data:double()   -- convert from Byte tensor to Double tensor
for i=1,3 do -- over each image channel
    testset.data[{ {}, {i}, {}, {}  }]:add(-mean[i]) -- mean subtraction    
    testset.data[{ {}, {i}, {}, {}  }]:div(stdv[i]) -- std scaling
end

In [None]:

random_number= math.random(10000) 
img=image.scale(testset.data[random_number],200 ) -- increase image size 

img=itorch.image(img) -- display the random_number-th image in dataset

print('Predictions of the model:')
for i=1,#classes do
    print(classes[i], math.floor(100*net:forward(testset.data[random_number]):exp()[i]) .. ' %'  )
end

--print(net:forward(testset.data[random_number]):exp()    )
print( 'Answer: ' .. classes[testset.label[random_number]])

Compute the error on the test set.

Hint: https://github.com/torch/nn/blob/master/doc/training.md#nn.traningneuralnet.dok

If your computer crashes, use a smaller test set with the operation 'narrow':
https://github.com/torch/torch7/blob/master/doc/tensor.md

In [None]:
-- your code here

Alright, fine. Some examples sucked, but how many in total seem to be correct over the test set?

In [None]:
correct = 0
for i=1,10000 do
    local groundtruth = testset.label[i]
    local prediction = net:forward(testset.data[i])
    local confidences, indices = torch.sort(prediction, true)  -- true means sort in descending order
    if groundtruth == indices[1] then
        correct = correct + 1
    end
end

In [None]:
print( 100*correct/10000 .. ' % ')

That should look waaay better than chance, which is 10% accuracy (randomly picking a class out of 10 classes), if the network learnt something (and if you learnt something too, btw).

Hmmm, what are the classes that performed well, and the classes that did not perform well:

In [None]:
class_performance = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
for i=1,10000 do
    local groundtruth = testset.label[i]
    local prediction = net:forward(testset.data[i])
    local confidences, indices = torch.sort(prediction, true)  -- true means sort in descending order
    if groundtruth == indices[1] then
        class_performance[groundtruth] = class_performance[groundtruth] + 1
    end
end

In [None]:
for i=1,#classes do
    print(classes[i], 100*class_performance[i]/1000 .. ' %')
end

#Dropout

To reduce overfitting, we can use dropout. See what happens!

See: http://torch.ch/blog/2015/07/30/cifar.html

In [None]:
-- your code here

Let's look at the performance on the test set:

In [None]:
-- your code here

###Question: did you notice an improvement?

(your answer here)

#Data augmentation

Rotate an image from the dataset by a random angle between -0.5 and 0.5 radians

see https://github.com/torch/image/blob/master/doc/simpletransform.md

In [None]:
-- -- your code here

Re-write the same neural network but adding data augmentation (horizontal flip and rotation)
Hint: Use lines 21-43 of https://github.com/szagoruyko/cifar.torch/blob/master/train.lua

In [None]:
-- your code here

In [None]:
criterion = nn.ClassNLLCriterion()
trainer = nn.StochasticGradient(net, criterion)
trainer.learningRate = 0.0001
trainer.maxIteration = 10

In [None]:
trainer:train(trainset)

Let's look at the performance on the test set:

In [None]:
print(criterion:forward(net:forward(testset.data:narrow(1, 1, 500)),testset.label:narrow(1, 1, 500)))

###Question: did you notice an improvement?

(your answer here)