# Series 04. Part II - Kaggle Workflow using Pytorch: Kannada MNIST

Welcome to the series. This is the second part of tutorial series #04. The overview of tutorial series #04 is as follows:

- **Part I  :** Convolutional Neural Network  [[notebook](#04.a.Convolutional_Neural_Net)]
- **Part II :** Hands-on with Kannada MNIST dataset using Pytorch (this notebook)
- **Part III:** Pytorch model training and deployment in AWS Sagemaker [[notebook](#04.b.Kaggle_Kannada_Pytorch)]

## Table of Content

1. [__Getting Started__](#100)
    1. [__Background__](#110)
    2. [__Goal__](#120)
    3. [__Packages__](#130)
2. [__Meet and Greet Data__](#200)
    1. [__Downloading Kaggle Dataset__](#210)
    2. [__Loading Dataset__](#220)
    3. [__Visualising Dataset Distribution__](#230)
    4. [__Cross Validation__](#240)
3. [__Pytorch Dataset Object & Data Loader__](#300)
    1. [__Pytorch Dataset Object__](#310)
    2. [__Image Transformation Pipeline__](#320)
    3. [__Pytorch Data loader__](#330)
    4. [__Visualising Kannada Image__](#340)
4. [__Building CNN Model__](#400)
    1. [__Designing CNN using Pytorch__](#410)
    2. [__Constructing Neural Network Object__](#420)
5. [__Training Session__](#500)
    1. [__Evaluation Metric__](#510)
    2. [__Model Training__](#520)
    3. [__Model Evaluation__](#530)
    4. [__Run Complete Training Session__](#540)
    5. [__Visualising Training Result__](#550)
    6. [ __Writting Submission to Kaggle__](#560) [Optional]

6. [__REFERENCES__](#600)

---

## 1. Getting Started


### 1.1 Background

Kannada is a language spoken predominantly by people of Karnataka in southwestern India. The language has roughly 45 million native speakers and is written using the Kannada script. [Wikipedia](https://en.wikipedia.org/wiki/Kannada)


![Kannada](images/kannada_banner.png)

The challenge faced by many industry and banking instances in India is to build a Optical Character Recognition that can detect Kannada digits from handwriting in cheques, written documents, etc to automate important task. By building a simple digit recognition, we can help many bussiness which application relies on reading handwritten digits, e.g. automating thousands of transaction from bank cheque validation.

A study in the paper of: Prabhu, Vinay Uday nicely captured the Kannada handwritten digits and the author has kindly shared with us the dataset we can experiment with. "Kannada-MNIST: A new handwritten digits dataset for the Kannada language." arXiv preprint [arXiv:1908.01242 (2019)](https://arxiv.org/abs/1908.01242).

### 1.2 Goal
This tutorial will show you how to build a convolutional neural network using Pytorch. The dataset we are using is the [Kannada MNIST dataset](https://www.kaggle.com/c/Kannada-MNIST), a playground dataset from Kaggle. Here you can expect to get an insight about:
- Using Pytorch for data science modelling
- Get to know the workflow when participating in data science competition in [Kaggle](https://www.kaggle.com/) platform
- Understand about [Convolutional Neural Network](http://cs231n.stanford.edu/) for modelling Image Classification problem.

For more information about the PyTorch, please visit [Pytorch official website](https://pytorch.org/)

### 1.3 Packages

We will also setup our project by specifying libraries and modules that we need

In [15]:
%%bash
pip3 install --user kaggle



You are using pip version 19.0.2, however version 19.3.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.


In [2]:
# standard library
import os, sys
import math
import json, logging, argparse

In [3]:
# data analytics library
import numpy as np
import pandas as pd

In [4]:
# Pytorch Library
import torch
import torch.nn as nn
import torch.distributed as dist
import torch.optim as optim
import torch.nn.functional as F

# Pytorch Utilities and Vision Library
from torch.utils.data import Dataset, DataLoader
from torch.utils.data.distributed import DistributedSampler

from torchvision import datasets, transforms

try:
    from torchsummary import summary
except:
    summary = print

In [5]:
# visualisation
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline

In [6]:
%%html
<style>
    table {
        display: inline-block
    }
</style>

In [7]:
from IPython.display import HTML
import random


def hide_toggle(for_next=False):
    this_cell = """$('div.cell.code_cell.rendered.selected')"""
    next_cell = this_cell + '.next()'

    toggle_text = 'Toggle show/hide'  # text shown on toggle link
    target_cell = this_cell  # target cell to control with toggle
    js_hide_current = ''  # bit of JS to permanently hide code in current cell (only when toggling next cell)

    if for_next:
        target_cell = next_cell
        toggle_text += ' next cell'
        js_hide_current = this_cell + '.find("div.input").hide();'

    js_f_name = 'code_toggle_{}'.format(str(random.randint(1,2**64)))

    html = """
        <script>
            function {f_name}() {{
                {cell_selector}.find('div.input').toggle();
            }}

            {js_hide_current}
        </script>

        <a href="javascript:{f_name}()">{toggle_text}</a>
    """.format(
        f_name=js_f_name,
        cell_selector=target_cell,
        js_hide_current=js_hide_current, 
        toggle_text=toggle_text
    )
    return HTML(html)

hide_toggle()

In [8]:
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

## Meet and Greet Data, Kaggle Kannada MNIST
_(Duration: 20 min)_

In [9]:
data_dir = 'data/Kannada'

### Downloading Kaggle Dataset

In this workshop, we will quickly demonstrate how to download kaggle datasets into our jupyter environment. Kaggle has been and remains the de factor platform to try your hands on data science projects. The platform has huge rich free datasets for machine learning projects.

In [18]:
!python3 -m pip uninstall -y kaggle

Uninstalling kaggle-1.5.6:
  Successfully uninstalled kaggle-1.5.6
[33mYou are using pip version 10.0.1, however version 19.3.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


In [12]:
%%bash
## Copy API key file to where Kaggle expects it
## Make sure to download the kaggle key file next to this notebook
mkdir -p ~/.kaggle
cp kaggle.json ~/.kaggle/kaggle.json && chmod 600 ~/.kaggle/kaggle.json

Next we downloaded the data using Kaggle client and make an API call to stream the data. Once we receive the data, we unzip them into a folder we have prepared in `data/Kannada/raw`

Using the credential, kaggle API will be able to authenticate us. Then the download begin when we run

```
$ kaggle competitions download -c Kannada-MNIST```

In [16]:
%%bash
## Download Kannada MNIST from Kaggle
kaggle competitions download -c Kannada-MNIST

bash: line 2: kaggle: command not found


Finally, we unzip the data once we finish the download to a folder called `data/Kannada/raw`. We will also create a directory in `data/Kannada/processed` to save our processed dataset later.

In [None]:
%%bash
## Create our data directory
mkdir -p data/Kannada/raw
mkdir -p data/Kannada/processed

In [None]:
%%bash
## Unzip to data/Kannada directory
unzip Kannada-MNIST.zip -d data/Kannada/raw

It's always a good practice to seperate raw data we just downloaded into a folder called 'raw'. Along the process, we will encounter a situation where we have pre-processed/cleaned our dataset and would like to store them seperately from the raw data.

Now let's check the content of our dataset in `raw` folder. We see several files here:

In [None]:
# Check the data folder content
for dirname, _, filenames in os.walk(data_dir):
    for filename in filenames:
        print('data at: ' + os.path.join(dirname, filename))

### Loading Dataset

#### Dataset Files

There are 4 files we get after downloading the data from Kaggle. According to the official docs above, the files summary is as follows:
- `train.csv` - the training set. The first column of every row is the label, the rest 784 values in the same row is the pixel value of our image with flattened structure defined above.
- `Dig-MNIST.csv`: this is similar to image training set in `train.csv`. The contributor provide this set kindly to allow us to have a validation/testing set before we make a submission.
- `test.csv`: the submission set. Unlike the `train.csv` and `Dig-MNIST.csv` it doesn't come with label because it suppose to be our submission for the competition. We will refer data from this file as *submission set* from now on.
- `sample_submission.csv` - a sample submission file in the correct format

Let's read our dataset

In [None]:
from sklearn.model_selection import train_test_split

# # Load Data
train = pd.read_csv(os.path.join(data_dir, 'raw/train.csv'))
test = pd.read_csv(os.path.join(data_dir, 'raw/Dig-MNIST.csv'))  # this will be our test set for evaluation
submission_set = pd.read_csv(os.path.join(data_dir, 'raw/test.csv')).iloc[:,1:]  # this set is only for submission

# # Seperate train data and labels
train_data = train.drop('label',axis=1)
train_targets = train['label']

# # Seperate test data and labels
test_images=test.drop('label',axis=1)
test_labels=test['label']

* We have $28$x$28$ dimension handwritten pics.
* Dataset has been already flattened and has 784-pixel values for each pic.
* Totaly we have $60000$ pics in training set.

#### Overview

Let's look at the content of the files. There are things to take note regarding our data:
- The content of our file is flattened image. Originally, each row was a 28 x 28 pixels image. 
- The each image is flattened into 784 row and saved into a `.csv` file. 
- The image itself is `Gray` image, which means it only has one channel 

In [None]:
train_data.head(10).iloc[:,490:520]

### Visualising Dataset Distribution

* It is important to know the distribution of data according to the labels they have.
* This data set is __homogeneously__ distributed as you see below.

In [None]:
train_dist = train_labels.value_counts(normalize = True)
test_dist = test_labels.value_counts(normalize = True)
submission_dist = train_labels.value_counts(normalize = True)

dists_to_plot = [('Trainset', train_dist), ('Testset', submission_dist)]

fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(14, 5))
colors = []

for i, (name, df) in enumerate(dists_to_plot):
    sns.barplot(x=df.index, y=df, ax=axes[i])
    axes[i].set_title(name + ' distribution');

### Cross Validation: Training - Validation - Test Split 

**Splitting training set to train and validation set** - In order to measure the generalization ability of the model, we train the data with the training set and make the model arrangement according to the error value in the validation set. In addition, we determine the final performance of the model with the test set.

This Kernel is prepared on a Kaggle competition dataset. They give us a training set for training the model and a test set without labels for prediction. As we don't have the labels we don't know the final performance of the test set until we submit our predictions.

**Why we need testing set? Isn't validation set enough?** - -The reason for using the test set on the final evaluation is the model would have a bias on the validation set because we developed the model according to the validation set performance. So a kind of overfitting on the validation set is formed. The testing set will be performed to further evaluate the model after validation set. As mentioned above, this is to check whether our modelling has been overfitted/biased toward the validation set. 

To evaluate the model better, we need to split our training set into training and validation set. We will use a testing set from `Dig-MNIST.csv` instead which has been prepared for the purpose of evaluation. 

Commonly, the prefered split for training set:
  * Training set -  $80$% 
  * Validation set -  $20$%

Now how do we split our training and validation set? For simple splitting scheme, we can borrow the functionality from `sklearn.model_selection.train_test_split` as follows. How about testing set? Well we already have it in `test_images` dataframe. Hence we only need to split on training set to train and validation set.

In [None]:
# Train Test Split for evaluation
train_images, val_images, train_labels, val_labels = train_test_split(train_data, 
                                                                     train_targets, 
                                                                     test_size=0.2)

In [None]:
# Reset Index
train_images.reset_index(drop=True, inplace=True)
train_labels.reset_index(drop=True, inplace=True)

val_images.reset_index(drop=True, inplace=True)
val_labels.reset_index(drop=True, inplace=True)

test_images.reset_index(drop=True, inplace=True)
test_labels.reset_index(drop=True, inplace=True)

After splitting, our dataset is organised as follows:

| Split      | Number of data | Shape of Image | Number of labels
| ---:       | ---:           | ---:           |  ---:
|Training    |  48000         |  1 x 28 x 28   |  48000
|Validation  |  12000         |  1 x 28 x 28   |  12000
|Testing     |  10240         |  1 x 28 x 28   |  10240
|Submission  |  5000          |  1 x 28 x 28   |  5000

In [None]:
print('Shape of training set   :', train_images.shape)
print('Shape of validation set :', val_images.shape)
print('Shape of testing set    :', test_images.shape)

print('Shape of submission set :', submission_set.shape)

### Saving Dataset Split

Now that we have done several splitting step, let's save our data that we have at this point. This is a good practice so that the next time you would like to continue your work, you don't have to run the code from the beggining. Writing the data could take some time.

In [None]:
processed_data_dir = os.path.join(data_dir, 'processed/')

# Save data to local folder first
train_images.to_csv(processed_data_dir + 'train.csv', index=False, header=False)
train_labels.to_csv(processed_data_dir + 'train_labels.csv', index=False, header=False)

val_images.to_csv(processed_data_dir + 'validation.csv', index=False, header=False)
val_labels.to_csv(processed_data_dir + 'validation_labels.csv', index=False, header=False)

test_images.to_csv(processed_data_dir + 'test.csv', index=False, header=False)
test_labels.to_csv(processed_data_dir + 'test_labels.csv', index=False, header=False)

submission_set.to_csv(processed_data_dir + 'submission.csv', index=False, header=False)

## Pytorch Dataset Object & Data Loader

A lot of effort in solving any machine learning problem goes in to preparing the data. PyTorch provides many tools to make data loading easy and hopefully, to make your code more readable. Let's see how we can simply to load and preprocess/augment data from our Kannada dataset.

### Pytorch Dataset Object

Now, our raw dataset originally came from a `.csv` format. This table is represented by `pd.DataFrame` object which support Microsoft-Excel-like functionality on table manipulation.

However, our Kannada dataset is meant to be an image, not table dataset. If you look carefully, our data is a flattened image pixels, stored in a row of a table. The problem here is that representing image with `pd.DataFrame` table is unheard off as we will find difficulties in performing image operations, e.g. filtering, rotation, and convolution.

Let's see how we can mitigate this problem, and automate them in Pytorch `Dataset object`.

In [None]:
class KannadaDataSet(Dataset):
    """ Representation of our Kannada Dataset as a whole.
    Scope of this object is to dictate how to access a single data/image and
    it's label from our datapool, then convert it to desired matrix format which represent
    an image.
    """
    def __init__(self, X, labels, transforms=None):
        """
        Arguments:
        --------------------
            .. X (pd.DataFrame): table representation of flattened Kannada image pixels
            .. labels (array-like): ground truth label of the digit. Ranging from digit 0 to digit 9
        """
        self.X = X
        self.labels = labels
        self.transforms = transforms
         
    def __len__(self):
        return (len(self.X))
    
    def __getitem__(self, i):
        data = self.X.iloc[i,:]
        data = np.array(data).astype(np.uint8).reshape(IMGSIZE,IMGSIZE,1)
        
        if self.transforms:
            data = self.transforms(data)
            
        if self.labels is not None:
            # for train set, val set, and test set
            return (data, self.labels[i])
        else:
            # for kaggle submission
            # since submission set will not have labels
            return data

### Image Transformation Pipeline

As part of data science workflow, we will need to write some prepocessing code. One issue with our dataset right now is that it needs to be converted from `np.ndarray` matrix form to `torch.Tensor`. 

In addition to that, we would like to introduce several data augmentation, such as rotation, and translation to our image to increase the variety of our dataset. This is done so that our model sees more variety of data and more robust to the effect of image translation and rotation. 

Let's define three transform:
- **`ToPILImage`**: PIL Image representation allows image based operation. It's good to convert our matrix to PIL Image first
- **`RandomCrop`**: to crop from image randomly. This is data augmentation, which introduce translation
- **`RandomAffine`**: to rotate image randomly from -5 to 5 degrees. This is data augmentation
- **`ToTensor`**: to convert the numpy images to torch images (we need to swap axes).

In [None]:
IMGSIZE = 28

# Transformations for the train
train_trans = transforms.Compose(([
    transforms.ToPILImage(),
    transforms.RandomCrop(IMGSIZE),
    transforms.RandomAffine(degrees=5, translate=(0.1, 0.1)),
    transforms.ToTensor(), # automatically divide pixels by 255
]))

# Transformations for the validation & test sets
val_trans = transforms.Compose(([
    transforms.ToPILImage(),
    transforms.ToTensor(), # automatically divide pixels by 255
]))

### Pytorch Data Loader

Loading data may take a few moments, and you should see your progress as the data is loading. You may also choose to increase the `batch_size` if you want to load more data at a time.

In [None]:
batch_size = 64

# Initialise dataset object for each set
train_data = KannadaDataSet(train_images, train_labels, train_trans)
val_data   = KannadaDataSet(val_images, val_labels, val_trans)
test_data  = KannadaDataSet(test_images, test_labels, val_trans)
submission_data = KannadaDataSet(submission_set, None, val_trans)

# Define Dataloader for each set
train_loader = DataLoader(train_data,
                          batch_size=batch_size,
                          shuffle=True)

val_loader = DataLoader(val_data, 
                        batch_size=batch_size, # batch_size=1000
                        shuffle=False)

test_loader = DataLoader(test_data,
                         batch_size=batch_size, # batch_size=1000
                         shuffle=False)

# for kaggle submission
submission_loader = DataLoader(submission_data,
                               batch_size=batch_size,
                               shuffle=False)

### Visualise Image in Training Batch

The first step in a classification task is to take a look at the data, make sure it is loaded in correctly, then make any initial observations about patterns in that data.

In [None]:
# obtain one batch of training images
dataiter = iter(train_loader)
images, labels = dataiter.next()
images = images.numpy()

# plot the images in the batch, along with the corresponding labels
fig = plt.figure(figsize=(25, 6))
for idx in np.arange(16):
    ax = fig.add_subplot(2, 16/2, idx+1, xticks=[], yticks=[])
    ax.imshow(np.squeeze(images[idx]), cmap='gray')
    ax.set_title('digit ' + str(labels[idx].item()), fontsize=16)  # .item() gets single value in scalar tensor

### Visualise Image in Detail

Now let's see an image from MNIST dataset in detail. Notice how our image pixels only ranges from $(0, 1)$. This means that no further normalisation is required in the preprocessing step.

In [None]:
img = np.squeeze(images[1])

fig = plt.figure(figsize = (12,12)) 
ax = fig.add_subplot(111)
ax.imshow(img, cmap='gray')
width, height = img.shape
thresh = img.max()/2.5

for x in range(width):
    for y in range(height):
        val = round(img[x][y],2) if img[x][y] !=0 else 0
        ax.annotate(str(val), xy=(y,x),
                    horizontalalignment='center',
                    verticalalignment='center',
                    color='white' if img[x][y]<thresh else 'black')

ax.set_title('Kannada Digit in detail: label %d' % labels[1].item());

---

## 4. Building CNN Model
_(Duration: 25 min)_

On building stage you specify the architecture of the model mainly.

You can decide the Filter size and Padding type you will use on Convolution operations and add Pooling, Batch Normalization, Dropout, activation function layers with build section.


### 4. 1 Designing CNN using Pytorch

In [None]:
class KannadaCNN(nn.Module):
    """ Simple Convolutional Neural Network to classify 
    Kannada handwritten digit
    """
    def __init__(self, drop_p=0.4, num_classes=10):
        """ Initialise and build network layers 
        Arguments
        --------------------
            .. drop_p (float, range[0, 1.]): constant probability in our dropout layer
            .. num_classes (int): number of target classes in our data
        """
        super().__init__()
        self.num_classes = num_classes
        
        # First hidden layer
        self.conv2d_0 = nn.Conv2d(1, 64, kernel_size=5, padding=2)
        self.convbn_0 = nn.BatchNorm2d(num_features=64)

        self.conv2d_1 = nn.Conv2d(64, 64, kernel_size=5, padding=2)
        self.convbn_1 = nn.BatchNorm2d(num_features=64)

        self.pool_1 = nn.MaxPool2d(kernel_size=2, stride=2, padding=0)
        self.drop_1 = nn.Dropout2d(p=drop_p)

        # Second hidden layer
        self.conv2d_2 = nn.Conv2d(64, 128, kernel_size=3, padding=1)
        self.convbn_2 = nn.BatchNorm2d(num_features=128)

        self.conv2d_3 = nn.Conv2d(128, 128, kernel_size=3, padding=1)
        self.convbn_3 = nn.BatchNorm2d(num_features=128)

        self.pool_2 = nn.MaxPool2d(kernel_size=2, stride=2, padding=0)
        self.drop_2 = nn.Dropout2d(p=drop_p)

        # Third hidden layer
        self.conv2d_4 = nn.Conv2d(128, 256, kernel_size=3, padding=1)
        self.convbn_4 = nn.BatchNorm2d(num_features=256)
        
        self.conv2d_5 = nn.Conv2d(256, 256, kernel_size=3, padding=1)
        self.convbn_5 = nn.BatchNorm2d(num_features=256)

        self.pool_3 = nn.MaxPool2d(kernel_size=2, stride=2, padding=0)
        self.drop_3 = nn.Dropout(p=drop_p)

        # Dense fully connected layer
        self.dense_linear_1 = nn.Linear(256*3*3, 512)
        self.drop_4 = nn.Dropout(p=drop_p)

        self.dense_linear_2 = nn.Linear(512, 256)
        self.drop_5 = nn.Dropout(p=drop_p)

        self.dense_linear_3 = nn.Linear(256, 128)
        self.out_layer = nn.Linear(128, num_classes)

    def forward(self, x):
        """ Define the feed-forward flow of our neural network
        """

        x = self.conv2d_0(x)
        x = self.convbn_0(x)
        x = F.leaky_relu(x)
        
        x = self.conv2d_1(x)
        x = self.convbn_1(x)
        x = F.leaky_relu(x)

        x = self.pool_1(x)
        x = self.drop_1(x)

        x = self.conv2d_2(x)
        x = self.convbn_2(x)
        x = F.leaky_relu(x)

        x = self.conv2d_3(x)
        x = self.convbn_3(x)
        x = F.leaky_relu(x)

        x = self.pool_2(x)
        x = self.drop_2(x)

        x = self.conv2d_4(x)
        x = self.convbn_4(x)
        x = F.leaky_relu(x)
        
        x = self.conv2d_5(x)
        x = self.convbn_5(x)
        x = F.leaky_relu(x)
        
        x = self.pool_3(x)
        x = self.drop_3(x)

        x = x.view(-1, 256*3*3)
        x = self.dense_linear_1(x)
        x = F.relu(x)
        x = self.drop_4(x)
        
        x = self.dense_linear_2(x)
        x = F.relu(x)
        x = self.drop_5(x)
        
        x = self.dense_linear_3(x)
        x = F.relu(x)

        out = self.out_layer(x)
        return out

hide_toggle()

### 4. 2 Construct Neural Network Object

Now that we have defined our CNN model class using `nn.Module` and added the all layers, let's make it come to life by constructing the object so that we can use it to make prediction.

In [None]:
# Constructing our CNN module
model = KannadaCNN().to(device)
# initialise network
net = KannadaCNN().to(device)

# optimiser
optimiser = optim.Adam(net.parameters(), lr=5e-4)
criterion = nn.CrossEntropyLoss()

# display model summary
summary(model, input_size=(1,IMGSIZE,IMGSIZE))  # IMGSIZE = 28

---

## 5. Training Session

Now let's fit our convolutional neural network by using our training set. For every epoch, we will evaluate the performance of our network on validation set and monitor the result. Finally, we perform one more evaluation on our testing set to make sure our setup is not biased towards our validation set.

### 5. 1 Evaluation Metric
We define a function to help calculate the accuracy metric. Often, while building machine learning models, we focus on the accuracy metrics, trying to get the right class of an image or the right category for a paragraph of text. 

$$ \text{Accuracy} = \frac{\text{Number of Correct Prediction}}{\text{Number of all images in Dataset}}
$$

But these tasks, if only measured on the accuracy of the highest probability prediction limits our understanding of the network and limits the areas it can be applied to. 

#### Concept of Top-N Accuracy Metric

Let's define two terms here.

**Top 1 accuracy** — In a classification problem, top1 accuracy method extracts the maximum value out of your final softmax outputs — the value that corresponds to the confidence for the predicted class for your input. Now it means that we choose the predicted class with highest probability as our guess.

$$ \text{Top1} = \frac{\text{# of times the correct class has highest probability}}{\text{Number of all images in Dataset}}
$$

**Top N accuracy** — Top N accuracy is when a measure of how often your predicted class falls in the top N values of your softmax distribution. In the case of top-5 score, we check if the target label is one of your top 5 predictions (the 5 ones with the highest probabilities).

$$ \text{Top5} = \frac{\text{# of times the correct class is in top5 probability}}{\text{Number of all images in Dataset}}
$$

If you wanted to know more about evaluation metrics in machine learning, check out a nice course on [courser](https://www.coursera.org/lecture/recommender-metrics/rank-aware-top-n-metrics-Wk98r) from University of Minnesota

In [None]:
def accuracy(output, target, topk=(1,)):
    """Computes the precision@k for the specified values of k"""
    maxk = max(topk)
    batch_size = target.size(0)

    _, pred = output.topk(maxk, 1, True, True)
    pred = pred.t()
    correct = pred.eq(target.view(1, -1).expand_as(pred))

    res = []
    for k in topk:
        correct_k = correct[:k].view(-1).float().sum(0)
        res.append(correct_k.mul_(100.0 / batch_size))
    return res

### 5. 2 Model Fitting

Let's define how our training looks like when we fit and run our network to learn on data.

1. First we load our data per mini-batch
2. Perform lightweight data preprocessing/transformation (under the hood of `torch.DataLoader`)
3. Feed forward data to our `CNN`. Let it classify which digit does the image belongs to. Our network make a guess by outputing a probability for each class.
4. Calculate the error/loss based on groun truth and our `CNN` prediction.
5. Perform gradient descent, optimise weights and parameters update in our `CNN` so that next time, it can learn and make better guess.

In [None]:
def train_helper(train_loader, model, optimizer, criterion,
                  epoch, device='cpu', log_interval=25):
    # set to training mode
    model.train()

    # training result to record
    train_loss = 0.0
    train_top1 = 0.0
    train_top5 = 0.0

    for batch_idx, (data, target) in enumerate(train_loader, start=1):
        # convert tensor for current runtime device
        data, target = data.to(device), target.to(device)

        # reset optimiser gradient to zero
        optimizer.zero_grad()

        # feed forward
        out = model(data)
        
        # calculate loss and optimise network params
        loss = criterion(out, target)
        loss.backward()
        
        # optimize weight to account for loss/gradient
        optimizer.step()

        # calculate training accuracy for top1 and top5
        top1, top5 = accuracy(out, target, topk=(1,5))

        # update result records
        train_top1 += top1.item()
        train_top5 += top5.item()
        train_loss += loss.item()

        # logging loss output to stdout
        if batch_idx % log_interval == 0:
            print('Train Epoch: {:03d} [{:05d}/{:05d} ({:2.0f}%)] | '
                  'Top1 Acc: {:4.1f} \t| Top5 Acc: {:4.1f} \t| Loss: {:.4f}'
                  .format(epoch, batch_idx * len(data), len(train_loader.sampler),
                      100 * batch_idx / len(train_loader),
                      top1, top5, loss.item()))

    # display training result
    train_loss /= len(train_loader.dataset)
    train_top1 /= len(train_loader) # average loss over mini-batches
    train_top5 /= len(train_loader) # average loss over mini-batches

    print('Training Summary Epoch: {:03d} | '
          'Average Top1 Acc: {:.2f}  | Average Top5 Acc: {:.2f} | Loss: {:.4f}'
          .format(epoch, train_top1, train_top5, train_loss))
    
    return train_loss, train_top1, train_top5 

### 5.3 Model Evaluation

For both validation and testing, the flow will be the same except that we use different images from different set/split.

1. First we load our data per mini-batch
2. Perform lightweight data preprocessing/transformation (under the hood of `torch.DataLoader`)
3. Feed forward data to our `CNN`. Let it classify which digit does the image belongs to. Our network make a guess by outputing a probability for each class.
4. Calculate the error/loss, top1 accuracy, and top5 accuracy as our evaluation metric

In [None]:
def test_helper(test_loader, model, criterion, 
                 epoch, device='cpu'):
    # set to validation mode
    model.eval()
    
    test_loss = 0.0  # record testing loss
    test_top1 = 0.0
    test_top5 = 0.0
    for batch_idx, (data, target) in enumerate(test_loader, start=1):

        # convert tensor for current runtime device
        data, target = data.to(device), target.to(device)

        # generate image x
        out = model(data)

        # calculate loss and optimise network params
        loss = criterion(out, target)
        
        # calculate testing accuracy for top1 and top5
        top1, top5 = accuracy(out, target, topk=(1,5))

        # update test loss
        test_top1 += top1.item()
        test_top5 += top5.item()
        test_loss += loss.item()

    # display validation/testing result
    test_loss /= len(test_loader.dataset)  # average loss over all images
    test_top1 /= len(test_loader)
    test_top5 /= len(test_loader)

    print('Test Summary Epoch: {:03d} | '
          'Average Top1 Acc: {:.2f}  | Average Top5 Acc: {:.2f} | Loss: {:.4f}'
          .format(epoch, test_top1, test_top5, test_loss))
    
    return test_loss, test_top1, test_top5

### 5. 4 Running Complete Train Session

Finally, combining the training and model evaluation in our flow, we will run the model for several epochs. Below are several training **hyperparameters** you need to be aware of

- **`epochs`** : Number of times our `CNN` will see the training data
- **`batch size`**: Often called mini-batchsize as well.

In [None]:
# ----------------------------
# COMPLETE TRAINING SESSION
# ----------------------------

train_losses = []
val_losses = []

train_accuracies = []
val_accuracies = []

for epoch in range(1, 2 + 1):
    print('\n' + '-' * 100)
    # run session on training set
    train_loss, train_acc, _  = train_helper(train_loader, net, optimiser, criterion,
                                              epoch=epoch, device=device, log_interval=100)
    train_losses.append(train_loss)
    train_accuracies.append(train_acc)

    # run session on validation set
    val_loss, val_acc, _ = test_helper(val_loader, net, criterion, epoch, device=device)
    val_losses.append(val_loss)
    val_accuracies.append(val_acc)

# finally, run session testing set
print('\n' + 'Final Test Set Result:\n'+ '*' * 80)
test_helper(test_loader, net, criterion, epoch, device=device);

### 5. 5 Visualise Training Session results

In [None]:
fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(14, 5))

# Plot Error training vs validation

axes[0].set_title('Losses over Epochs')
axes[0].plot(train_losses);
axes[0].plot(val_losses);

axes[0].set_ylabel('Error');
axes[0].set_xlabel('Epochs');

axes[0].set_ylim(0, 0.005);
axes[0].legend(labels=['train error', 'validation error']);

# Plot Accuracy training vs validation
axes[1].set_title('Accuracy over Epochs')
axes[1].plot(train_accuracies);
axes[1].plot(val_accuracies);

axes[1].set_ylabel('Accuracy (%)');
axes[1].set_xlabel('Epochs');

axes[1].set_ylim(80, 100);
axes[1].legend(labels=['train acc', 'validation acc']);

## 6. References

[1] Donahue, J, et al. "Long-term recurrent convolutional networks for visual recognition and description."     Proceedings of the IEEE conference on computer vision and pattern recognition. 2015.

[2]Vinyals, Oriol, et al. "Show and tell: Lessons learned from the 2015 mscoco image captioning challenge." IEEE transactions on pattern analysis and machine intelligence 39.4 (2017): 652-663.

[3] TensorFlow Show and Tell:A Neural Image Caption Generator [example] (https://github.com/tensorflow/models/tree/master/im2txt)

[4] Karapthy, A. [NeuralTalk2](https://github.com/karpathy/neuraltalk2)

[5]Lin, Tsung-Yi, et al. "Microsoft coco: Common objects in context." European Conference on Computer Vision. Springer International Publishing, 2014.