# MNIST + MLP + CNN

You can open this notebook either within a supported container or Google colaboratory [here](https://colab.research.google.com/github/drinkingkazu/TRISEP2022ML/blob/master/NeuralNetworks/NN-01-CNN.ipynb).

In this notebook, we design and train a Convolutional Neural Network (CNN) for hand-written digit classification task. We use MNIST dataset that contains 28x28 pixel images of a hand-written digit (0 to 9, so 10 classification targets). 

## Goals
1. Design CNN and train on MNIST

Let's start with usual import!

In [1]:
import matplotlib.pyplot as plt
import matplotlib as mpl
%matplotlib inline
mpl.rcParams['figure.figsize'] = [8, 6]
mpl.rcParams['font.size'] = 16
mpl.rcParams['axes.grid'] = True

import torch
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
torch.device(device)
import numpy as np
SEED=12345
_=np.random.seed(SEED)
_=torch.manual_seed(SEED)

## Create MNIST Dataset
Following the previous notebook, let's create train and test dataset and dataloader

In [2]:
from torchvision import datasets, transforms
# Data file download directory
LOCAL_DATA_DIR = './mnist-data'
# Use prepared data handler from pytorch (torchvision)
train_dataset = datasets.MNIST(LOCAL_DATA_DIR, train=True, download=True,
                         transform=transforms.Compose([transforms.ToTensor()]))

train_loader = torch.utils.data.DataLoader(train_dataset,
                                           batch_size=32,
                                           shuffle=True,
                                           num_workers=4)

# Use prepared data handler from pytorch (torchvision)
test_dataset = datasets.MNIST(LOCAL_DATA_DIR, train=False, download=True,
                              transform=transforms.Compose([transforms.ToTensor()]))

test_loader = torch.utils.data.DataLoader(test_dataset,
                                          batch_size=32,
                                          shuffle=False,
                                          num_workers=4)

### Define train and test functions

In [3]:
from ipywidgets import IntProgress
from IPython.display import display
from torch.utils.tensorboard import SummaryWriter
import time

def run_train(model, loader,  
              num_iterations=100, log_dir='log',
              lr=0.001, optimizer='SGD', device=None):
    print("\nTraining...")
    tstart = time.time()
    if log_dir:
        writer = SummaryWriter(log_dir=log_dir)
    criterion = torch.nn.CrossEntropyLoss()
    optimizer = getattr(torch.optim,optimizer)(model.parameters(),lr=lr)
    f = IntProgress(min=0,max=int(num_iterations/100),bar_style='info')
    display(f)
    
    iteration = 0
    while iteration < num_iterations:
        for data,label in loader:
            
            if device:
                data,label = data.to(device),label.to(device)

            loss = criterion(model(data), label)
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
            
            if log_dir: 
                writer.add_scalar('loss/train', loss.item(), iteration)
            if iteration%100 == 0:
                f.value += 1
            # Brake if we consumed all iteration counts
            iteration += 1
            if iteration >= num_iterations:
                break
    print('done',time.time()-tstart,'[s]')

def run_test(model,loader,device=None):

    label_v, softmax_v = [],[]
    softmax = torch.nn.Softmax(dim=1)
    f = IntProgress(min=0,max=int(len(loader)),bar_style='info')
    display(f)
    
    with torch.set_grad_enabled(False):
        for data,label in loader:
            if device:
                data,label = data.to(device), label.to(device)
            label_v.append  ( label.detach().reshape(-1)   )
            softmax_v.append( softmax(model(data)).detach())
            f.value += 1
    return torch.concat(label_v).cpu().numpy(), torch.concat(softmax_v).cpu().numpy()


## Logistic regression with CNN

we design CNN to try the same task. Let's define 3 convolution layers followed by LeakyReLU for activation and MaxPool2d for downsampling.

In [4]:
class CNN(torch.nn.Module):
    def __init__(self,num_filters=16):
        
        super(CNN, self).__init__()
        # feature extractor CNN
        self._feature_extractor = torch.nn.Sequential(
            torch.nn.Conv2d(1,num_filters,3,padding=1),
            torch.nn.LeakyReLU(),
            torch.nn.MaxPool2d(2,2),
            torch.nn.Conv2d(num_filters,num_filters*2,3,padding=1),
            torch.nn.LeakyReLU(),
            torch.nn.MaxPool2d(2,2),
            torch.nn.Conv2d(num_filters*2,num_filters*4,3,padding=1),
            torch.nn.LeakyReLU(),
            torch.nn.MaxPool2d(7,7))
        # classifier MLP
        self._classifier = torch.nn.Linear(num_filters*4,10)

    def forward(self, x):
        # extract features
        features = self._feature_extractor(x)
        # flatten the 3d tensor (2d space x channels = features)
        features = features.view(-1, np.prod(features.size()[1:]))
        # classify and return
        return self._classifier(features)

### Exercise 1

Now let's train this CNN model!

1. Train CNN for 4000 iterations with `Adam` optimizer on CPU, measure the time taken in wall-clock, plot the loss.
2. Do the same but using GPU.


You should see that, this time, GPU gave a good speed-up. This is because CNN takes many separate multiplications of weights with input local matrix, and that can benefit from parallelization = GPU is suited.

In [5]:
%load_ext tensorboard

%tensorboard --logdir mnist_cnn

### Exercise 2

Run the inference on the test dataset. Compute the accuracy over the all images in the test dataset.

### Exercise 3
How many parameters are there in our CNN model?