# Computational Cognitive Neuroscience

## Assignment 1 - Training an MLP on MNIST

#### Douwe van Erp (s4258126) & Arianne Meijer - van de Griend (s4620135)

In [1]:
import chainer.functions as F
import chainer.links as L
from chainer import Chain
from chainer import iterators, optimizers
from chainer import report, training
from chainer.training import extensions
import utils

The *MLP* class specificies the architecture of the multilayer perceptron. It consists of two layers: one hidden layer with *n_units* hidden units and one output layer with *n_out* output units. We use the *rectified linear unit* (ReLu) as  activation function for the hidden layer, since it is effective for image recognition.

In [2]:
class MLP(Chain):
    def __init__(self, n_units, n_out):
        super(MLP, self).__init__()
        with self.init_scope():
            self.l1 = L.Linear(None, n_units)
            self.l2 = L.Linear(None, n_out)

    def __call__(self, x):
        h1 = F.relu(self.l1(x))
        y = self.l2(h1)
        return y

The *Classifier* class on top of the *MLP* class specifies the *softmax* function as the classification function, and the *cross entropy loss* as loss function. By evoking the *report* command, Chainer will report the accuracy and loss while training.


In [3]:
class Classifier(Chain):
    def __init__(self,predictor):
        super(Classifier, self).__init__()
        self.predictor = predictor
        
    def __call__(self, x, t):
        y = self.predictor(x)
        loss = F.softmax_cross_entropy(y, t)
        accuracy = F.accuracy(y, t)
        report({'loss': loss, 'accuracy' : accuracy}, self)
        return loss

Now we can chain the different components of the model. We initialize our multilayer perceptron with 10 hidden units and 10 output units. Additionally the softmax classifier is added to the MLP. Finally we use stochastic gradient descent (SGD) to optimize our model.

In [4]:
model = L.Classifier(MLP(10, 10))
optimizer = optimizers.SGD()
optimizer.setup(model)

The *get_mnist* function from the *utils* class will select n_train=n_test=100 training and test samples per class. Because there are 10 digits (classes), we obtain a training and test dataset of both 1000 samples and 784 features.

In [5]:
train, test = utils.get_mnist(n_train=100, n_test=100, n_dim=1, with_label=True)

Chainer's *SerialIterator* is used to create minibatches of size 32. It is also used to shuffle the training data, to avoid the risk of using correlated minibatches.

In [6]:
train_iter = iterators.SerialIterator(train, batch_size=32, shuffle=True)
test_iter = iterators.SerialIterator(test, batch_size=32, repeat=False, shuffle=False)

Now we train the model for 20 epochs, while using several extension to report the accuracy and loss during the training. 

In [7]:
updater = training.StandardUpdater(train_iter, optimizer)
trainer = training.Trainer(updater, (20, 'epoch'), out='result')
trainer.extend(extensions.Evaluator(test_iter, model))
trainer.extend(extensions.LogReport())
trainer.extend(extensions.PrintReport(['epoch', 'main/accuracy', 'validation/main/accuracy']))
trainer.extend(extensions.PlotReport(['main/loss', 'validation/main/loss'], 'epoch'))
trainer.run()

epoch       main/accuracy  validation/main/accuracy
1           0.193359       0.192383                  
2           0.251008       0.25293                   
3           0.335685       0.300781                  
4           0.396169       0.390625                  
5           0.472656       0.47168                   
6           0.548387       0.532227                  
7           0.597782       0.582031                  
8           0.673387       0.615234                  
9           0.723633       0.633789                  
10          0.751008       0.65332                   
11          0.767137       0.678711                  
12          0.777218       0.681641                  
13          0.796875       0.689453                  
14          0.794355       0.706055                  
15          0.8125         0.697266                  
16          0.816532       0.710938                  
17          0.822266       0.729492                  
18          0.826613       0.7

The model achieves an accuracy of 73.7% on the test set. However given the small dataset sizes, the accuracy is highly dependent on which samples are chosen, how they are shuffled, and the random weight initialization of the MLP. The plot shows that the validation loss keeps declining with the training loss, which indicates that the model has not overfitted to the training set. 

![](result/plot.png)