# CNN example with Chainer

In this example, we classify hand-written digits of MNIST dataset with LeNet5. This example is based on the official example in Chainer.

## MNIST

The MNIST dataset is a dataset of images of hand written digits. It consists of 60000 samples for training and 10000 for testing. Each sample is a gray scale image whose size is normalized to 28x28. One of single digit out of ten is written in each image. The task is to classify the number written from the image.

![Example of MNIST](../image/mnist.png)
Fig. Example of the image in the MNIST dataset.

## LeNet5

LeNet5 is one of the most famous CNN architecture proposed by LeCun et al. in [LeCun+98].

![The architecture of LeNet5](../image/lenet5.png)
Fig. The architecture of LeNet5 (cited from [LeCun+98]).


[LeCun+98]: LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. *Proceedings of the IEEE*, 86(11), 2278-2324.

## Basics of Chainer

**chainer.Variable**

* Data nodes of computational graphs.
* Wrapper for ``numpy.ndarray`` and ``cupy.ndarray``

**chainer.Function**

* Non-parameterized operator nodes of computational graphs.
* e.g. ``F.exp``, ``F.sigmoid``, ``F.LinearFunction``
* Fully-connected layers are realized as \\(f(x, W, b) = Wx + b\\)

**chainer.Link**

* Parmeterized operator nodes of computational graphs.
* e.g. ``L.Linear``, ``L.Convolution2D``
* Fully-connected layers are realized as \\(f_{W, b}(x) = Wx + b\\)
* In most case, it internally uses corresponding Function.

**chainer.Chain**

* An object that bundles ``Link`` instances.
* ``Chain`` itself is a derived class of ``Link``.

![linear function and linear link](../image/linear.png)
Fig. Linear function and linear link


**chainer.training.Trainer and related modules**

* An object that is responsible for training loops.
* Iterates the procedures of forward/backward propagations and parameter update.
* Training procedures are customized by trainer-related modules.
    * ``Dataset`` and its ``Iterator`` extract mini batches from the dataset.
    * ``Updater`` determines how parameters are updated.
* Extra procedures (e.g. take snapshot of models etc.) are appended by ``Extension``'s.


![The relationship of trainer related modules](../image/trainer_related_modules.png)
Fig. Trainer-related modules

## Procedures

This example takes the following steps:

1. Import packages
2. Prepare dataset
3. Prepare model
4. Setup optimizer
5. Training
6. Save models

## Codes

### 1. Import packages

In [1]:
from __future__ import print_function

import chainer
from chainer import cuda
import chainer.training.extensions as E
import chainer.functions as F
import chainer.links as L
import chainer.optimizers as O
import chainer.datasets as D
from chainer import training

### 2. Prepare dataset

#### How to handle image data as a tensor

A mini batch of images are represented as a 4-dimensional tensor in usual frameworks.
Each axis represents sample size, channels, height, and width. For example, RGB images have 3 channels, each of which represents, Red, Green, and Blue, respectively. As the MNIST dataset is gray scale, its chennel size is 1. Similarly, the input and output of 2D Convolution layer are also 4-dimensional, whose chennals are not necesarily 1 even if we handle gray scale images.

There are two ordering methods of how images data are represented.

* ``shape = (samples, channels, height, width) e.g. (B, 1, 28, 28)``
* ``shape = (samples, height, width, channels) e.g. (B, 28, 28, 1)``

Chainer and Theano use the former format, while TensorFlow the latter one. Keras can switch the ordering depending on the backends.

In [2]:
train, test = D.get_mnist(ndim=3)  # each image has shape (1, 28, 28)

# Get iterators of datasets
batchsize = 128
train_iter = chainer.iterators.SerialIterator(train, batchsize)
test_iter = chainer.iterators.SerialIterator(test, batchsize, repeat=False, shuffle=False)

### 3. Prepare model

In [3]:
# Discrimininator
class LeNet5(chainer.Chain):

    def __init__(self):
        super(LeNet5, self).__init__(
            # The size of each image of MNIST is 28 x 28,
            # but LeNet5 is designed so that its input is 32 x 32.
            # So we add the padding of size (32 - 28) / 2 = 2.
            conv1=L.Convolution2D(1, 6, 5, pad=2),
            conv2=L.Convolution2D(6, 16, 5),
            fc1=L.Linear(400, 120),
            fc2=L.Linear(120, 84),
            fc3=L.Linear(84, 10)
        )
        self.train = True

    # Implementation of forward propagation.
    def __call__(self, x):
        h = F.tanh(self.conv1(x))
        h = F.max_pooling_2d(h, 2)
        h = F.tanh(self.conv2(h))
        h = F.max_pooling_2d(h, 2)
        h = F.tanh(self.fc1(h))
        h = F.tanh(self.fc2(h))
        return self.fc3(h)

# Classifier forwards the data to the discriminator to get the prediction
# and calculates the loss with the specified loss function.
# By default, softmax_cross_entropy loss (softmax followed by cross entropy loss) is used.
# Classifier itself is a derived class of Chain.
model = L.Classifier(LeNet5())

gpu = 0  # GPU ID to use. Negative value to use CPU.
if gpu >= 0:
    # Transfer the model to GPU
    cuda.get_device(gpu).use()
    model.to_gpu()

### 4. Setup optimizer

In [4]:
optimizer = O.Adam()
optimizer.setup(model)

### 5. Training & 6. Save model

In [5]:
epoch = 10

# Setup trainer
updater = training.StandardUpdater(train_iter, optimizer, device=gpu)
trainer = training.Trainer(updater, (epoch, 'epoch'))

# Evaluate the model with the test dataset for each epoch
trainer.extend(E.Evaluator(test_iter, model, device=gpu))

# Dump a computational graph from 'loss' variable at the first iteration
trainer.extend(E.dump_graph('main/loss'))

# Take a snapshot at each epoch
trainer.extend(E.snapshot(), trigger=(epoch, 'epoch'))

# Write a log of evaluation statistics for each epoch
trainer.extend(E.LogReport())

# Save two plot images to the result dir
trainer.extend(
    E.PlotReport(['main/loss', 'validation/main/loss'],
                 'epoch', file_name='loss.png'))
trainer.extend(
    E.PlotReport(['main/accuracy', 'validation/main/accuracy'],
                 'epoch', file_name='accuracy.png'))

# Print selected entries of the log to stdout
trainer.extend(E.PrintReport(
        ['epoch', 'main/loss', 'validation/main/loss',
         'main/accuracy', 'validation/main/accuracy', 'elapsed_time']))

# Print a progress bar to stdout
# trainer.extend(E.ProgressBar())

In [6]:
# Execute trainer
trainer.run()

epoch       main/loss   validation/main/loss  main/accuracy  validation/main/accuracy  elapsed_time
[J1           0.267172    0.0726448             0.925423       0.977848                  6.78626       
[J2           0.0686068   0.0469076             0.979777       0.985858                  13.2431       
[J3           0.0467516   0.047265              0.986074       0.984474                  19.6243       
[J4           0.0357922   0.0421441             0.989166       0.986155                  26.0474       
[J5           0.0277272   0.0371696             0.991455       0.988528                  32.4075       
[J6           0.0216469   0.0338254             0.993487       0.988924                  38.8193       
[J7           0.0170145   0.0361768             0.994903       0.988924                  45.2158       
[J8           0.0132753   0.0389417             0.996127       0.987935                  51.6177       
[J9           0.01141     0.0330201             0.996718   