Neural Networks
===============

This tutorial is based on [Deep Learning with PyTorch: A 60 Minute Blitz](https://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html)

In this tutorial, we want to develop a classifier to classify digit images (MNIST dataset):

![classifier](fig/classifier.png)

We use the famous **LeNet5** network:

![LeNet](fig/LeNet.png)

It is a simple feed-forward network. It takes the input, feeds it
through several layers one after the other, and then finally gives the
output.

About convolutional layer, see the animation below.

<table><tr>
    <td><img src="https://www.cc.gatech.edu/~san37/img/dl/conv.gif" border=0></td>
    <td><img src="https://upload.wikimedia.org/wikipedia/commons/4/4f/3D_Convolution_Animation.gif" border=0></td>
</tr></table>

A typical training procedure for a neural network is as follows:

![](fig/torch_flow.png)
- Define the neural network that has some learnable **parameters** (or
  weights)
- Iterate over a dataset of inputs
- Process input through the network
- Compute the loss (how far is the output from being correct)
- Propagate gradients back into the network’s parameters
- Update the weights of the network, typically using a simple update rule:
  $\theta = \theta - \alpha \nabla_\theta \mathcal{L}(\theta)$
  * `learning_rate` $\alpha$ here is a kind of **hyperparameter**

Define the network
------------------

Let’s define this network:



In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F

class LeNet(nn.Module): # inherent from nn.Module!

    def __init__(self):
        super(LeNet, self).__init__()
        self.features = nn.Sequential( # 3*28*28
            # in_channel, out_channel, kernel_size
            nn.Conv2d(1, 6, 5), # 6*26*26
            nn.ReLU(inplace=True),
            nn.MaxPool2d(2, 2), # 6*13*13
            nn.Conv2d(6, 16, 5), # 16*7*7
            nn.ReLU(inplace=True),
            nn.MaxPool2d(2, 2) # 16*4*4
        )
        self.classifier = nn.Sequential(
            nn.Linear(16 * 4 * 4, 120),
            nn.ReLU(inplace=True),
            nn.Linear(120, 84),
            nn.ReLU(inplace=True),
            nn.Linear(84, 10)
        )

    def forward(self, x):
        x = self.features(x)
        x = x.view(-1, 16 * 4 * 4)
        x = self.classifier(x)
        return x

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        # in_chan, out_chan, kernel_size
        self.conv1 = nn.Conv2d(1, 8, 3, stride=1)
        self.relu = nn.ReLU()
        self.fc = nn.Linear(26 * 26 * 8, 10)

    def forward(self, x):
        output = self.conv1(x)
        output = self.relu(output)
        output = output.view(output.shape[0],-1)
        output = self.fc(output)
        return output

net = Net()
print(net)

You just have to define the ``forward`` function, and the ``backward``
function (where gradients are computed) is automatically defined for you
using ``autograd``.
You can use any of the Tensor operations in the ``forward`` function.

The learnable parameters of a model are returned by ``net.parameters()``

Loss Function
-------------
A loss function takes the (output, target) pair of inputs, and computes a
value that estimates how far away the output is from the target.

There are several different
[loss functions](https://pytorch.org/docs/nn.html#loss-functions) under the
nn package .
A simple loss is: ``nn.MSELoss`` which computes the mean-squared error
between the input and the target.
But for classification problem, we commonly use [``nn.CrossEntropyLoss``](https://pytorch.org/docs/stable/nn.html#crossentropyloss).

(**Notice**: In PyTorch's implementation, **CrossEntropyLoss = Softmax + NLLLoss**, thus no need for softmax in the last layer of NN)

In [None]:
criterion = nn.CrossEntropyLoss()

Later we can define loss as

``loss = criterion(output, target)``

then call `loss.backward()` to do the differentiation automatically.

If you follow ``loss`` in the backward direction, using its
``.grad_fn`` attribute, you will see a graph of computations that looks
like this:

```python
    input -> conv2d -> relu -> maxpool2d -> conv2d -> relu -> maxpool2d
          -> view -> linear -> relu -> linear -> relu -> linear
          -> MSELoss
          -> loss
          
```

So, when we call ``loss.backward()``, the whole graph is differentiated
w.r.t. the loss, and all Tensors in the graph that has ``requires_grad=True``
will have their ``.grad`` Tensor accumulated with the gradient.

Optimizer
--------

In order to update the weights/parameters following the rule below, we need to define an optimizer.

``weight = weight - learning_rate * gradient``

Here we use Stochastic Gradient Descent (SGD):

In [None]:
LEARNING_RATE = 0.1
optimizer = torch.optim.SGD(net.parameters(), lr=LEARNING_RATE)

Other useful optimizer including Nesterov-SGD, Adam, RMSProp, etc., can be found in ``torch.optim``.

## Load dataset

Now we can load the [MNIST](http://yann.lecun.com/exdb/mnist/) dataset and prepare for training.
![MNIST](https://upload.wikimedia.org/wikipedia/commons/thumb/2/27/MnistExamples.png/220px-MnistExamples.png)

We need not download ourselves, PyTorch has encapsulated common datasets in its package.

In [None]:
from torchvision import datasets, transforms

BATCH_SIZE = 64 # Mini-batch size

transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.1307, ), (0.3081, ))
])

train_dataset = datasets.MNIST('./data', train=True, download=True, transform=transform)
test_dataset = datasets.MNIST('./data', train=False, download=True, transform=transform)

train_dataloader = torch.utils.data.DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True)
test_dataloader = torch.utils.data.DataLoader(test_dataset, batch_size=BATCH_SIZE, shuffle=False)

## Let's train!

Firstly define hyperparameters.

In [None]:
NUM_EPOCHS = 20
# LEARNING_RATE
# BATCH_SIZE

And set your training device. If you use GPU, set `DEVICE = torch.device('cuda')`

In [None]:
DEVICE = torch.device('cpu') # cuda
net = net.to(DEVICE)

Then compose what we have coded together.

In [None]:
net.train() # set the network in train mode
for epoch_idx in range(NUM_EPOCHS):
    for batch_idx, (data, target) in enumerate(train_dataloader): # train in minibatch
        # get (x_i, y_i)
        # be careful of their shape
        # data: (N, channels, height, width) (20,1,28,28)
        # target: (N, )
        data, target = data.to(DEVICE), target.to(DEVICE)

        optimizer.zero_grad() # first zero_grad
        output = net(data) # forward

        loss = criterion(output, target) # calculate loss
        
        loss.backward() # backward
        optimizer.step() # update parameters
        break

But wait, how can we know the performance of the network?

We should add the evaluation/inference module.

In [None]:
def evaluate(model_eval, loader_eval, criterion_eval):
    model_eval.eval() # set the network in evaluation mode
    loss_eval = 0
    correct = 0.
    with torch.no_grad():
        for data, target in loader_eval:
            data, target = data.to(DEVICE), target.to(DEVICE)
            output = net(data)
            loss_eval += criterion_eval(output, target).item()

            pred = output.argmax(dim=1, keepdim=True)
            correct += pred.eq(target.view_as(pred)).sum().item()

    loss_eval = loss_eval / loader_eval.dataset.__len__()
    accuracy = correct / loader_eval.dataset.__len__()
    response = {'loss': loss_eval, 'acc': accuracy}
    return response

Then redefine the training process.

In [None]:
net.train()
for epoch_idx in range(NUM_EPOCHS):
    for batch_idx, (data, target) in enumerate(train_dataloader):
        data, target = data.to(DEVICE), target.to(DEVICE)
        optimizer.zero_grad()

        output = net(data)

        loss = criterion(output, target)

        loss.backward()
        optimizer.step()

    # add evaluation below
    train_resp = evaluate(net, train_dataloader, criterion)
    eval_resp = evaluate(net, test_dataloader, criterion)

    print ('-*-*-*-*-*- Epoch {} -*-*-*-*-*-'.format(epoch_idx))
    print ('Train Loss: {:.6f}\t'.format(train_resp['loss']))
    print ('Train Acc: {:.6f}\t'.format(train_resp['acc']))
    print ('Eval Loss: {:.6f}\t'.format(eval_resp['loss']))
    print ('Eval Acc: {:.6f}\t'.format(eval_resp['acc']))
    print ('\n')
    torch.save(net, 'count.pth') # save model each epoch

To better see the network training procedure, we can use the `tqdm` package (install it by pip).

In [None]:
from tqdm import tqdm

def evaluate(model_eval, loader_eval, criterion_eval):
    model_eval.eval() # set the network in evaluation mode
    loss_eval = 0
    correct = 0.
    pbar = tqdm(total = len(loader_eval), desc='Evaluation', ncols=100)
    with torch.no_grad():
        for data, target in loader_eval:
            data, target = data.to(DEVICE), target.to(DEVICE)
            output = net(data)
            loss_eval += criterion_eval(output, target).item()

            pred = output.argmax(dim=1, keepdim=True)
            correct += pred.eq(target.view_as(pred)).sum().item()
            pbar.update(1)
    pbar.close()

    loss_eval = loss_eval / loader_eval.dataset.__len__()
    accuracy = correct / loader_eval.dataset.__len__()
    response = {'loss': loss_eval, 'acc': accuracy}
    return response

net.train()
for epoch_idx in range(NUM_EPOCHS):
    pbar = tqdm(total = len(train_dataloader), desc='Train - Epoch {}'.format(epoch_idx), ncols=100)
    for batch_idx, (data, target) in enumerate(train_dataloader):
        data, target = data.to(DEVICE), target.to(DEVICE)
        optimizer.zero_grad()

        output = net(data)

        loss = criterion(output, target)

        loss.backward()
        optimizer.step()
        pbar.update(1)
    pbar.close()

    # add evaluation below
    train_resp = evaluate(net, train_dataloader, criterion)
    eval_resp = evaluate(net, test_dataloader, criterion)

    print ('-*-*-*-*-*- Epoch {} -*-*-*-*-*-'.format(epoch_idx))
    print ('Train Loss: {:.6f}\t'.format(train_resp['loss']))
    print ('Train Acc: {:.6f}\t'.format(train_resp['acc']))
    print ('Eval Loss: {:.6f}\t'.format(eval_resp['loss']))
    print ('Eval Acc: {:.6f}\t'.format(eval_resp['acc']))
    print ('\n')
    torch.save(net, 'count.pth') # save model each epoch