## TBMI26 - Deep Learning Lab ##
### Lab overview ###
In this lab, you will experience the power of deep learning in an image classification task. The aim is to create a Convolutional Neural Network (CNN) and train it on CIFAR10 dataset.
***
** CNN **
There are hundreds, maybe thousands, of CNN architectures for image classification. In this lab, we will train LeNet [1] on image classification dataset. The architecture of the network is shown below
<img src="images/lenet.png" alt="Lenet Architecture" title="Lenet Architecture" />

Your <font color=blue>**first task**</font> is to try different combinations of activation functions and subsampling methods. For example:
1. Sigmoid activation + average pooling subsampling 
2. Sigmoid activation + max pooling subsampling 
3. ReLU activation + average pooling subsampling 

***

The <font color=blue>**second task**</font> is to plot the convergence curves (loss vs. epochs) and (accuracy vs. epochs) and see which of the three combinations above converges faster.

***

** CIFAR10 **
It is one of the earliest datasets for image classification. It has 60,000 images of 10 different classes of images. The dataset is divided into a training set (50,000 images) and a test set (10,000 images)

Your <font color=blue>**third task**</font> is to take the last network from the first task and retrain it again using data augmentation (random horizontal flip and random crop). How does this affect the performance of the network ?

***

The <font color=blue>**final task**</font> is to show some test images with their correpsonding groundtruth and predictions

***

[1] LeCun, Yann, et al. "Gradient-based learning applied to document recognition." Proceedings of the IEEE 86.11 (1998): 2278-2324.

### Import Modules

In [None]:
import os
import glob

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.optim import lr_scheduler
from torch.autograd import Variable
import torch.backends.cudnn as cudnn

import torchvision
import torchvision.transforms as transforms

import matplotlib.pyplot as plt
import numpy as np

from models.Lenet import LeNet
from train import *
from test import *

%matplotlib notebook  

# Autoreload modules when they are updated
%load_ext autoreload
%autoreload 2

### Download and Load CIFAR-10 Dataset
First, we download and load the CIFAR10 dataset. Pytorch has a built-in function for that. We apply transformations on images in trainin set and test set separately. These transformations are *ToTensor* which normalizes the images to the range from 0 to 1 and then *Normalize* which does contrast normalization for all images to make them zero mean along each channel. Contrast normalization was shown to improve the accuracy of CNNs. More data augmentation trasformations could be added to *transform_train*.

In [None]:
# Check if CUDA support is available (GPU)
use_cuda = torch.cuda.is_available()

# Image transformations to apply to all images in the dataset (Data Augmentation)
transform_train = transforms.Compose([
    transforms.ToTensor(),                # Convert images to Tensors (The data structure that is used by Pytorch)
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)), # Normalize the images to zero mean
])

# Image transformations for the test set.
transform_test = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
])

# Specify the path to the CIFAR-10 dataset and create a dataloader where you specify the "batch_size"
trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform_train)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=256, shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform_test)
testloader = torch.utils.data.DataLoader(testset, batch_size=128, shuffle=False, num_workers=2)

# Specify classes labels
classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

### Load and initialize the network model
The network is constructed by creating an object of the class LeNet which defines the network architecture. By default, when this object is created, the weights of each convolution layer are initialized randomly.
Afterwards, we define the loss function. We use *CrossEntropyLoss* as it suits multi-label classification tasks. Then we define the training optimizer, you can choose between the famous gradient descent *SGD* or Adam optimizer.

### 1. Sigmoid activation + average pooling subsampling

In [None]:
# Load and initialize the network architecture 
model1 = LeNet(activation=F.sigmoid, pooling=F.avg_pool2d, pretrained=False)

# Load the last save checkpoint
use_checkpoint=False

if use_cuda:
    model1.cuda()
    cudnn.benchmark = True

# The objective (loss) function
objective = nn.CrossEntropyLoss()

# The optimizer used for training the model
optimizer = optim.Adam(model1.parameters())

#### Start Training

In [None]:
start_epoch = 1
num_epochs = 50
model1, loss_log1, acc_log1 = train(model1, trainloader, optimizer, objective, use_cuda, start_epoch, num_epochs=num_epochs)

#### Evaluate the network (Run this cell to evaluate on the test set)

In [None]:
test_acc1 = test(model1, testloader, use_cuda)

<font color=blue>**What do you observe regarding training and test accuracies?**</font>

### 2. Sigmoid activation + max pooling subsampling

Check pytorch documentation for the name of max_pooling layer and modify the function call below to utilize it
http://pytorch.org/docs/master/nn.html#pooling-functions

In [None]:
# Load and initialize the network architecture 
model2 = LeNet(activation=F.sigmoid, pooling=[], pretrained=False)

if use_cuda:
    model2.cuda()

optimizer = optim.Adam(model2.parameters()) 


model2, loss_log2, acc_log2 = train(model2, trainloader, optimizer, objective, use_cuda, start_epoch, num_epochs=50)

In [None]:
test_acc2 = test(model2, testloader, use_cuda)

<font color=blue>**How does the max pooling affect the training and test accuracies? and why in your opinion?**</font>

### 3. ReLU activation + average pooling subsampling 

Check pytorch documentation for the name of relu activation function and modify the function call below to utilize it
http://pytorch.org/docs/master/nn.html#non-linear-activation-functions

In [None]:
# Load and initialize the network architecture 
model3 = LeNet(activation=[], pooling=F.avg_pool2d, pretrained=False)

if use_cuda:
    model3.cuda()

optimizer = optim.Adam(model3.parameters()) 


model3, loss_log3, acc_log3 = train(model3, trainloader, optimizer, objective, use_cuda, start_epoch, num_epochs=50)

In [None]:
test_acc3 = test(model3, testloader, use_cuda)

<font color=blue>**How does the ReLU activation affect the training and test accuracies? and why in your opinion?**</font>

### Plot convergence curves
Show how the network converges during training by plotting loss vs. epoch and accuracy vs. epoch. A good converging network should have a monotonically decreasing loss and increasing accuracy.

In [None]:
fig = plt.figure()
plt.subplot(211)
plt.plot(range(start_epoch,num_epochs+1), acc_log1, '--b', label='Sig + Avg_pool')
plt.plot(range(start_epoch,num_epochs+1), acc_log2, '.m', label='Sig + Max_pool')
plt.plot(range(start_epoch,num_epochs+1), acc_log3, 'k', label='ReLU + Avg_pool')
plt.xlabel('Epochs')
plt.ylabel('Training Accuracy (%)')
legend = plt.legend()

plt.subplot(212)
plt.plot(range(start_epoch,num_epochs+1), loss_log1, '--b', label='Sig + Avg_pool')
plt.plot(range(start_epoch,num_epochs+1), loss_log2, '.m', label='Sig + Max_pool')
plt.plot(range(start_epoch,num_epochs+1), loss_log3, 'k', label='ReLU + Avg_pool')
plt.xlabel('Epochs')
plt.ylabel('Training Loss')
legend = plt.legend()

<font color=blue>**Which network converges faster? and why in you opinion?**</font>

## Data Augmentation

Check torchvision.transforms documentation to see how to perform RandomCrop (to size of 32 x 32 nad padding of 4 pixels) and Random horizontal flip on the input. 
http://pytorch.org/docs/master/torchvision/transforms.html

Add these two transformations instead of the brackets [ ] below

In [None]:
# Image transformations to apply to all images in the dataset (Data Augmentation)
transform_train = transforms.Compose([
    [], # Crop all the images randomly to a fixed size
    [],    # Randomly flip some of the images horizontaly
    transforms.ToTensor(),                # Convert images to Tensors (The data structure that is used by Pytorch)
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)), # Normalize the images to zero mean
])

# Specify the path to the CIFAR-10 dataset and create a dataloader where you specify the "batch_size"
trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform_train)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=256, shuffle=True, num_workers=2)

Don't forget to replace the brackets [ ] with the relu activation

In [None]:
# Load and initialize the network architecture 
model4 = LeNet(activation=[], pooling=F.avg_pool2d, pretrained=False)

if use_cuda:
    model4.cuda()

optimizer = optim.Adam(model4.parameters()) 


model4, loss_log4, acc_log4 = train(model4, trainloader, optimizer, objective, use_cuda, start_epoch, num_epochs=50)

In [None]:
test_acc4 = test(model4, testloader, use_cuda)

<font color=blue>**How does data augmentation affect the training and test accuracies?**</font>

### Test Accuracy Per Class

In [None]:
class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))
for data in testloader:
    images, labels = data
    outputs = model4(Variable(images.cuda()))
    _, predicted = torch.max(outputs.data, 1)
    c = (predicted == labels.cuda()).squeeze()
    for i in range(4):
        label = labels[i]
        class_correct[label] += c[i]
        class_total[label] += 1


for i in range(10):
    print('Accuracy of %5s : %2d %%' % (
        classes[i], 100 * class_correct[i] / class_total[i]))

<font color=blue>**Which class has the lowest accuracy? How do we improve it?**</font>

### Visualize some test images with groundtruth and network predictions

In [None]:
def imshow(inp, mean=None, std=None, title=None):          
    # Check if input is torch, convert it to numpy
    if type(inp) in (torch.cuda.FloatTensor, torch.FloatTensor ):
        if inp.shape[0] == 3 :
            inp = inp.cpu().numpy().transpose((1, 2, 0))
        elif inp.shape[0] == 1 :
            inp = np.squeeze(inp.cpu().numpy(), 0)
        
    if mean is not None and std is not None:
        inp = std * inp + mean
    plt.imshow(inp.clip(0,1))

    if title is not None:
        plt.title(title)

        
dataiter = iter(testloader)
images, labels = dataiter.next()

print('GroundTruth: ', ' '.join('%5s' % classes[labels[j]] for j in range(10)))

# print images
plt.figure()
img = torchvision.utils.make_grid(images[0:10], 10)

imshow(img, (0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))



_, predicted = torch.max(outputs.data, 1)

print('Predicted: ', ' '.join('%5s' % classes[predicted[j]]
                              for j in range(10)))

<font color=blue>**Which are the mostly confused classes and why are they confused in your opinion?**</font>

## <font color=Red>Extra Task</font>

Run the last network (with data augmentation) three times and plot the convergence curves for the three runs as we did in the second task.
Are they identical ?