<h1>MNIST digit recognition with LeNet</h1>

In this practical session we will build a convolutional neural network that is able to recognise the digits 0-9 in images.

You can run the code in a cell by selecting the cell and pressing Shift+Enter.

<h4>1) Import statements</h4>
First, import some of the packages we will need (run the cell below).

Documentation for each of these packages can be found online: <br>
For numpy: https://docs.scipy.org/doc/numpy-dev/user/quickstart.html <br>
For matplotlib: http://matplotlib.org/api/pyplot_api.html <br>
For lasagne: http://lasagne.readthedocs.io <br>
For random: https://docs.python.org/2/library/random.html <br>

In [None]:
import cPickle
import gzip
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import theano
import lasagne
import time
import random
random.seed(0)

<h4>2) Loading the data</h4>
Download the data from: http://deeplearning.net/data/mnist/mnist.pkl.gz and save it somewhere on your disc. The function below loads the data from the location where you have saved it (path) and stores it in numpy arrays. The data is already split in a train set, a validation set and a test set. Each of these three sets are saved in two separate variables, one containing the labels and one containing the images. The labels are lists of numbers between 0 and 9. The images are 4-dimensional arrays (of the same length) with the image dimensions in the last 2 dimensions.

Change the path in the second cell below to the location where you have saved it and run the two cells.

In [None]:
def loadMNIST(path):
    f = gzip.open(path, 'rb')
    train_set, valid_set, test_set = cPickle.load(f)
    f.close()
    
    train_set_labels = train_set[1]
    train_set_images = np.resize(train_set[0],(len(train_set_labels),1,28,28))
    train_set_images = np.pad(train_set_images,((0,0),(0,0),(2,2),(2,2)),'constant', constant_values=0)
   
    valid_set_labels = valid_set[1]
    valid_set_images = np.resize(valid_set[0],(len(valid_set_labels),1,28,28))
    valid_set_images = np.pad(valid_set_images,((0,0),(0,0),(2,2),(2,2)),'constant', constant_values=0)

    test_set_labels = test_set[1]
    test_set_images = np.resize(test_set[0],(len(test_set_labels),1,28,28))
    test_set_images = np.pad(test_set_images,((0,0),(0,0),(2,2),(2,2)),'constant', constant_values=0)
    
    return train_set_labels, train_set_images, valid_set_labels, valid_set_images, test_set_labels, test_set_images

In [None]:
train_set_labels, train_set_images, valid_set_labels, valid_set_images, test_set_labels, test_set_images = loadMNIST(r'D:\mnist\mnist.pkl.gz')

<h4>3) Visualising the data</h4>
Let's look at the data we've just loaded! 

How many samples are in each set? (Use .shape to see the dimensions) 

How large are the images? 

How many samples are there for each of the 10 digits? 

Show some of the images with plt.imshow (use cmap='gray_r' for black digits on a white background and interpolation='none' to see the real pixels), you can access one of the training images as: train_set_images[i,0].

In [None]:
#plt.imshow(train_set_images[0,0],cmap='gray_r',interpolation='none')

<h4>4) One-hot-encoding</h4>
Convert the labels from a number between 0 and 9 to 'one-hot-encoding'. This means that for a label with number 3, there should be a 1 at element 3 and 0 everywhere else, i.e. [0, 0, 0, 1, 0, 0, 0, 0 ,0 ,0]. These are our target nodes, the node at position 3 should be active when the input image shows a 3. The code below does this for the training labels. 

Do the same for the validation labels!

In [None]:
labels = np.zeros((len(train_set_labels),10),dtype=np.int16)        
for n in xrange(10):
    labels[:,n] = train_set_labels==n

print train_set_labels[:10]
print labels[:10]

#validlabels = np.zeros((len(valid_set_labels),10),dtype=np.int16)        
#for n in xrange(10):
#    validlabels[:,n] = valid_set_labels==n

<h4>5) Building the network</h4>
The function below builds the LeNet network as we looked at in the lecture. The layers are defined as functions with as input the previous network and as output the new network with the new layer added. The print network.output_shape statement shows the dimensions after the current layer. 

Can you recognise all the elements of the network from the lecture?

In [None]:
def buildLeNet(X1):
    inputlayer = lasagne.layers.InputLayer(shape=(None, 1, 32, 32),input_var=X1)    
    print inputlayer.output_shape
    
    layer1 = lasagne.layers.Conv2DLayer(inputlayer, num_filters=6, filter_size=(5,5), nonlinearity=lasagne.nonlinearities.rectify, W=lasagne.init.GlorotUniform())
    print layer1.output_shape 
    
    layer2 = lasagne.layers.MaxPool2DLayer(layer1, pool_size=(2, 2))
    print layer2.output_shape 
    
    layer3 = lasagne.layers.Conv2DLayer(layer2, num_filters=16, filter_size=(5,5), nonlinearity=lasagne.nonlinearities.rectify, W=lasagne.init.GlorotUniform())
    print layer3.output_shape 
    
    layer4 = lasagne.layers.MaxPool2DLayer(layer3, pool_size=(2, 2))
    print layer4.output_shape 
    
    layer4 = lasagne.layers.flatten(layer4)
    print layer4.output_shape 
    
    layer5 = lasagne.layers.DenseLayer(layer4,num_units=120,nonlinearity=lasagne.nonlinearities.rectify)    
    print layer5.output_shape 
    
    layer6 = lasagne.layers.DenseLayer(layer5,num_units=84,nonlinearity=lasagne.nonlinearities.rectify)
    print layer6.output_shape 
    
    outputlayer = lasagne.layers.DenseLayer(layer6,num_units=10,nonlinearity=lasagne.nonlinearities.softmax)     
    print outputlayer.output_shape 
    
    return layer1, layer2, layer3, layer4, layer5, layer6, outputlayer

In [None]:
X = theano.tensor.tensor4()
Y = theano.tensor.matrix()
layer1, layer2, layer3, layer4, layer5, layer6, outputlayer = buildLeNet(X)

<h4>6) Training function</h4>
Define the functions for training the network. We will use negative log likelihood (called categorical cross-entropy in lasagne) as loss function (second row) and stochastic gradient descent with momentum as optimiser (fourth row).

In [None]:
outputtrain = lasagne.layers.get_output(outputlayer) #function that gets the output from the network defined before.
trainloss = lasagne.objectives.categorical_crossentropy(outputtrain, Y).mean() #function that computes the mean crossentropy between the output and the real labels.
params = lasagne.layers.get_all_params(outputlayer, trainable=True) #function that gets all the parameters (weights) in the network.
updates = lasagne.updates.momentum(trainloss, params, learning_rate=0.001) #function that performs an update of the weights based on the loss.
train = theano.function(inputs=[X, Y], outputs=trainloss, updates=updates, allow_input_downcast=True) #function that does all the above based on training samples X and real labels Y.

<h4>7) Validation function</h4>
Define a function to validate the network.

In [None]:
validate = theano.function(inputs=[X, Y], outputs=trainloss, allow_input_downcast=True) #function that computes the loss without performing an update

<h4>8) Test function</h4>
Define the functions for testing the network. 

In [None]:
outputtest = lasagne.layers.get_output(outputlayer, deterministic=True) #function that gets the output from the network defined before.
test = theano.function(inputs=[X], outputs=outputtest, allow_input_downcast=True) #function that gets the output based on input X

<h4>9) Training the network</h4>
Do the training in random batches of a specific number of samples (we set the values below to 250 batches of 100 samples). 

Use random.sample(a,n) to select a random batch of n samples from array a. 

Next, use the train(X,Y) function we have defined in 6) to perform an update of the network based on a random batch of training images X and training labels Y. 

The train function returns the loss. Save the loss of each training batch in the variable 'losslist' so we can look at them later (you can use .append() to add the current loss to the list).

Also keep track of the loss for random batches from the validation set to see if your network is not overfitting on the training set. You can use validate(X,Y) from 7) to compute the loss on the validation set (without doing an update).

<b>Remember</b> that if you restart the training process from the beginning you also need to reinitialise the network by running the cells starting from 5) again.

In [None]:
trainingsamples = np.arange(len(train_set_labels)) #numbers from 0 until the number of samples
validsamples = np.arange(len(valid_set_labels))

minibatches = 1000
minibatchsize = 100 

losslist = []
validlosslist = []

t0 = time.time()

for i in xrange(minibatches):
    #select random training en validation samples and perform training and validation steps here.
    
    #solution:
    #minibatchsamples = random.sample(trainingsamples,minibatchsize)                

    #loss = train(train_set_images[minibatchsamples],labels[minibatchsamples])
    #if (i+1)%50==0:
    #    print 'Loss minibatch {}: {}'.format(i,loss)
    #losslist.append(loss)            

    #validminibatchsamples = random.sample(validsamples,minibatchsize)           
    #validloss = validate(valid_set_images[validminibatchsamples],validlabels[validminibatchsamples])
    #if (i+1)%50==0:
    #    print 'Loss validation minibatch {}: {}'.format(i,validloss)
    #validlosslist.append(validloss)

t1 = time.time()
print 'Training time: {} seconds'.format(t1-t0)

<h4>9) Loss curves</h4>
Plot the loss curves for the training and validation sets (use plt.plot(losslist) for the training loss). 

Is 250 batches enough to train the network? How many do we need? 

What happens if you change the learning rate in 6)? 

What happens if you change the minibatchsize? 

What happens if you use another optimizer? 

Try to get the loss as low as possible! 

What happens if you make changes to the network? Use for example more or less filters or nodes, remove a layer, etc.

In [None]:
#plt.figure()
#plt.plot(losslist)
#plt.plot(validlosslist)
#plt.legend(['Training loss','Validation loss'])

<h4>10) Evaluation on the test set</h4>
Evaluate the network on the test set with the test(X) function we have defined in 8). You can use np.argmax() to select the node with the highest probability. 

How well did it do? How many of the 10 000 test samples did it label correctly?

In [None]:
#t0 = time.time()
#test_set_predictions = np.argmax(test(test_set_images),axis=1)
#t1 = time.time()
#print 'Testing time: {} seconds'.format(t1-t0)

In [None]:
#TP = np.sum(test_set_labels==test_set_predictions)
#print 'Accuracy: {}'.format(float(TP)/float(len(test_set_labels)))

<h4>11) Visualising what the network has learned</h4>
To see what is happening within the network we can visualise the learned filters and their feature maps. We now define an additional function that obtains the feature maps after the first layer.

In [None]:
outputlayer1 = lasagne.layers.get_output(layer1) 
outputfeatures = theano.function(inputs=[X], outputs=outputlayer1, allow_input_downcast=True) 

<h4>12) Visualising the features</h4>
Let's look at the feature maps after the first layer for one of the images from the test set. We have defined the function 'outputfeatures' for that. 

Look at the shape of the features variable. 

Visualise the 6 features maps for some of the 10 000 test samples with plt.imshow

In [None]:
features = outputfeatures(test_set_images)

In [None]:
#print features.shape
#for i in xrange(6):
#    plt.figure()
#    plt.imshow(features[1,i],cmap='gray_r',interpolation='none')

<h4>13) Visualising the filters</h4>
Let's look at the filters that are learned. We can use the lasagne function 'get_all_param_values' for that. These are the filters that are applied to the images to obtain the feature maps that we saw above.

Look at the shape of the filters and biases.

Visualise the 6 filters of the first layer with plt.imshow. Do you see any structure in the learned filters?

In [None]:
weights = lasagne.layers.get_all_param_values(layer1)
filters = weights[0]
biases = weights[1]

In [None]:
#print filters.shape
#print biases.shape
#for i in xrange(6):
#    plt.figure()
#    plt.imshow(filters[i,0],cmap='gray_r',interpolation='none')

<h4>14) Visualising other layers </h4>
Can you also visualise features and kernels of other layers? Take for example a look at the features after the third layer.

In [None]:
#outputlayer3 = lasagne.layers.get_output(layer3) 
#outputfeatures3 = theano.function(inputs=[X], outputs=outputlayer3, allow_input_downcast=True) 

In [None]:
#features = outputfeatures3(test_set_images)
#print features.shape
#for i in xrange(6):
#    plt.figure()
#    plt.imshow(features[1,i],cmap='gray_r',interpolation='none')

In [None]:
#weights3 = lasagne.layers.get_all_param_values(layer3)
#filters3 = weights3[0]
#biases3 = weights3[1]
#for i in xrange(6):
#    plt.figure()
#    plt.imshow(filters3[i,0],cmap='gray_r',interpolation='none')