# Laboratory 08 -  Convolutional Neural Networks

Thus far you have seen much of pytorch functionality for building, training, validation, testing and inference with deep models in the context of fully connected networks. While these networks provide us with the means to obtain actual classification/regression output predictions, they also pose several difficulties in extracting patterns from our data. As such, fully connected layers are mainly used as final layers that summarize extracted patterns into the actual outputs, whereas hidden layers are constructed using other types of layers that are able to extract meaningful patterns. One such type of layers are convolutional layers and networks that use them are called convolutional  neural networks.  

So in this laboratory you are going to build your own covolutional neural network. You are going to walk through all the steps necessary to build, train, test and make inferences using the CFAR10 dataset. The data set details can be found [here](https://www.cs.toronto.edu/~kriz/cifar.html) and [here](https://en.wikipedia.org/wiki/CIFAR-10).

Without further ado, here are the imports you'll need in this lab.

In [31]:
import torch
import torch.nn.functional as F
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.datasets as datasets
import torchvision.transforms as transforms

from torch.utils.data.sampler import SubsetRandomSampler

import numpy as np
import matplotlib.pyplot as plt

from collections import OrderedDict, namedtuple
from itertools import product
torch.manual_seed(42)
torch.cuda.manual_seed_all(42)
np.random.seed(42)

You'll want to train your network using the GPU's if available. In order to do this in a independent way from your machine capabilities you can use the `device` object instantiated below:

In [32]:
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

### Exercise 1

Given the network architecture below, using the class notebooks implement the network and appropriate saving/loading model parameters functions.

```Python
Net(
    (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): BatchNorm2d(64, eps=1e-05, momentum=0.05, affine=True, track_running_stats=True)
    (2): ReLU(inplace=True)
    (3): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (4): BatchNorm2d(128, eps=1e-05, momentum=0.05, affine=True, track_running_stats=True)
    (5): ReLU(inplace=True)
    (6): MaxPool2d(kernel_size=(2, 2), stride=2, padding=0, dilation=1, ceil_mode=False)
    (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (8): BatchNorm2d(128, eps=1e-05, momentum=0.05, affine=True, track_running_stats=True)
    (9): ReLU(inplace=True)
    (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): BatchNorm2d(256, eps=1e-05, momentum=0.05, affine=True, track_running_stats=True)
    (12): ReLU(inplace=True)
    (13): MaxPool2d(kernel_size=(2, 2), stride=2, padding=0, dilation=1, ceil_mode=False))
(classifier): Sequential(
    (0): Dropout(p=0.2, inplace=True)
    (2): Linear(in_features=16384, out_features=1024, bias=True)
    (3): ReLU(inplace=True))
)
```

In [33]:
# TODO 1.1. Implement the network architecture
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()

        # TODO: implement feature extraction and classifier
        #  - use logsoftmax on the output
        self.features = nn.Sequential(
            nn.Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
            nn.BatchNorm2d(64, eps=1e-05, momentum=0.05, affine=True, track_running_stats=True),
            nn.ReLU(inplace=True),

            nn.Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
            nn.BatchNorm2d(128, eps=1e-05, momentum=0.05, affine=True, track_running_stats=True),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=(2, 2), stride=2, padding=0, dilation=1, ceil_mode=False),
    
            nn.Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
            nn.BatchNorm2d(128, eps=1e-05, momentum=0.05, affine=True, track_running_stats=True),
            nn.ReLU(inplace=True),
            
            nn.Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
            nn.BatchNorm2d(256, eps=1e-05, momentum=0.05, affine=True, track_running_stats=True),
            nn.ReLU(inplace=True),
            
            nn.MaxPool2d(kernel_size=(2, 2), stride=2, padding=0, dilation=1, ceil_mode=False)
        )

        self.classifier = nn.Sequential(
            nn.Dropout(p=0.2, inplace=True),
            nn.Linear(in_features=16384, out_features=1024, bias=True),
            nn.ReLU(inplace=True),
            nn.LogSoftmax(dim = 1)
        )

    def forward(self, x):
        x = self.features(x)
        x = x.view(-1, 16384)
        x = self.classifier(x)
        return x
    
# TODO 1.2. Implement Network saving/loading functions
def save_model(model, checkpoint='checkpoint.pth'):
    # TODO: implementation goes here
    checkpoint_model = OrderedDict(model.named_children())
    checkpoint_dict = {
        'model': checkpoint_model,
        'state_dict': model.state_dict()
    }
    torch.save(checkpoint_dict, checkpoint)



def load_model(model = None, checkpoint='./checkpoints/checkpoint.pth'):
    checkpoint_dict = torch.load(checkpoint)
    
    if model is None:
        model = nn.Sequential(checkpoint_dict['model'])
    model.load_state_dict(checkpoint_dict['state_dict'])
    
    return model


### Exercise 2

As usual you have to load in your data while applying the apropriate transformations to create the train, validate and test loaders. For the CFAR10 dataset you can use `datasets.CIFAR10()` from `torchvision` in a similar manner as you did for FashionMNIST. However, this dataset contains RGB images with 3 channales. Hence, you'll need to provide a transformation that normalizes the data across all 3 channales. So, for each of the channels use a $\mu=0.5$ and $\sigma=0.5$.

Complete the TODO's below.

**Note:** At a minimum you'll have to define the batch size hyper-parameter, so this is a good place to define all other training parameters you'll need (e.g. the number of epochs to train).  

In [34]:
# TODO 2.1. Implement train_valid_split function
def train_valid_split(dataset, valid_percent=0.2, batch_size=32):
    train_length = len(dataset)
    indices = list(range(train_length))
    # np.random.shuffle(indices)
    split = int(np.floor(valid_percent * train_length))
    train_idx, valid_idx = indices[split:], indices[:split]
    train_sampler = SubsetRandomSampler(train_idx)
    valid_sampler = SubsetRandomSampler(valid_idx)
    
    train_loader = torch.utils.data.DataLoader(
                      dataset,
                      batch_size = batch_size,
                      sampler = train_sampler,
                      num_workers = 0
    )

    valid_loader = torch.utils.data.DataLoader(
                      dataset,
                      batch_size = batch_size,
                      sampler = valid_sampler,
                      num_workers = 0
    )

    return train_loader, valid_loader

# TODO 2.2. Define the batch_size (and other hyper-parameters)
batch_size = 64
epochs = 15
valid_percent = 0.25
learning_rate = 0.003
patience_thresold = 5


# TODO 2.3. Define the transformation used in loading the dataset
transforms = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])

train_set = datasets.CIFAR10(root = 'data', train = True, download = True, transform = transforms)
test_set = datasets.CIFAR10(root = 'data', train = False, download = True, transform = transforms)

# TODO 2.4. Create the train, valid and test loaders
train_loader, valid_loader = train_valid_split(train_set, valid_percent, batch_size)
test_loader = torch.utils.data.DataLoader(test_set, batch_size = batch_size, num_workers = 0)

Files already downloaded and verified
Files already downloaded and verified


### Exercise 3

You now have to implement the actual traning of the network. While this is no diffrent from training you've done in the context of fully connected networks, here you'll have to agument the training process with early stopping.


So, implement the training function such that it supports early stopping with model saving and also prints the training and valdation losses in each epoch.

**Note:** To run the actual traning you'll also need to instantiate the appropiate loss and optimizer. Since, the network achitecture applies `LogSoftmax()` to get the final output, the loss needs to be the negative log loss, wheareas for the optimizer you can safely use `Adam()`.

In [35]:
# TODO 3.1. Implement the training function
def train(model, train_loader, valid_loader, criterion, optimizer, epochs=15, run = None):
    # TODO: implement the training loop with model saving and early stopping
    #   - use train_valid_split to get training and validation loaders
    #   - print the epoch, training loss and validation loss
    valid_loss_min = np.Inf
    patience = patience_thresold
    best_model_checkpoint = 'checkpoint.pth'
    model.to(device)
    best_model = None
    if run is not None:
        optimizer = optim.Adam(model.parameters(), lr = run.lr)
        epochs = run.epochs
        patience = run.patience_threshold

    
    for epoch in range(epochs):
        train_loss = 0
        model.train()
        for data, target in train_loader:
            data, target = data.to(device), target.to(device)
            optimizer.zero_grad()
            output = model(data)
            loss = criterion(output, target)
            loss.backward()
            optimizer.step()
            train_loss += loss.item() * train_loader.batch_size
        
        valid_loss = 0
        model.eval()
        with torch.no_grad():
            for data, target in valid_loader:
                data, target = data.to(device), target.to(device)
                output = model(data)
                loss = criterion(output, target)
                valid_loss += loss.item() * batch_size

        train_loss = train_loss / len(train_loader.sampler)
        valid_loss = valid_loss / len(valid_loader.sampler)

        print("Epoch: {}/{}.. ".format(epoch + 1, epochs),
          "Training Loss: {:.3f}.. ".format(train_loss),
          "Validation Loss: {:.3f}.. ".format(valid_loss),
          "Early Stopping Patience: {}.. ".format(patience + 1))
        
        patience -= 1
        if valid_loss <= valid_loss_min:
            print('Validation loss decreased ({:.4f} --> {:.4f})..'.format(valid_loss_min, valid_loss))
            print('Saving to ', best_model_checkpoint,  '\n')
            save_model(model, best_model_checkpoint)
            best_model = model
            valid_loss_min = valid_loss
            if run is not None:
                patience = run.patience_threshold
            else:
                patience = patience_thresold
        
        if patience == 0:
            break
           
    print('Done!\n')
    return best_model_checkpoint, valid_loss_min, best_model

# TODO 3.2. Instantiate the model
model = Net()

# TODO 3.3. Instantiate the loss function
criterion = nn.NLLLoss()

# TODO 3.4. Instantiate the optimizer
optimizer = optim.Adam(model.parameters(), lr = learning_rate)

# TODO 3.5. Train the model
best_model_checkpoint, _, _ = train(model, train_loader, valid_loader, criterion, optimizer, epochs=epochs)

Epoch: 1/15..  Training Loss: 3.699..  Validation Loss: 2.873..  Early Stopping Patience: 6.. 
Validation loss decreased (inf --> 2.8735)..
Saving to  checkpoint.pth 

Epoch: 2/15..  Training Loss: 2.553..  Validation Loss: 2.343..  Early Stopping Patience: 6.. 
Validation loss decreased (2.8735 --> 2.3427)..
Saving to  checkpoint.pth 

Epoch: 3/15..  Training Loss: 2.303..  Validation Loss: 2.268..  Early Stopping Patience: 6.. 
Validation loss decreased (2.3427 --> 2.2684)..
Saving to  checkpoint.pth 

Epoch: 4/15..  Training Loss: 2.153..  Validation Loss: 2.247..  Early Stopping Patience: 6.. 
Validation loss decreased (2.2684 --> 2.2468)..
Saving to  checkpoint.pth 

Epoch: 5/15..  Training Loss: 2.042..  Validation Loss: 2.206..  Early Stopping Patience: 6.. 
Validation loss decreased (2.2468 --> 2.2056)..
Saving to  checkpoint.pth 

Epoch: 6/15..  Training Loss: 1.954..  Validation Loss: 2.088..  Early Stopping Patience: 6.. 
Validation loss decreased (2.2056 --> 2.0879)..
Savin

### Exercise 4

Up to now we generally computed (and printed) the overall loss and accuracy for the test set. However, a more detailed summary such as the prediction accuracy for each class is often more useful in evaluating the behavior of the a network architecture. You can do this by first computing the correct predictions in each mini-batch and then use this to compute the total number correct predictions for each class:

```Python

# This gives you a tensor of 0/1, where 1 corresponds to correct predictions
correct_outputs = pred.eq(target.data.view_as(pred))

...

# Computing the number of correct predictions in each class and
# the number of examples in each class (class_correct / class_total) 
for i in range(len(images))
    class_correct[label] += correct_outputs[i].item()
    class_total[label] += 1

...

# Computing class i accuracy 
class_accuracy = class_correct[i] / class_total[i]

```

**Note:** To actually compare output probability distribution with the one-hot encoded target you can transform each output by using the `torch.max()` method on the appropriate dimension, i.e. supplying the correct`dim=` argument. 

So, in the exercise below implement the test function and print the overall loss and accuracy, plus the accuracy for each class.


In [36]:
# TODO 4.1. Implement the test loop and print relavant information: 
# test loss, class accuracy, overall accuracy

def test(model, test_loader, criterion):

    class_correct = list(0. for i in range(len(test_set.classes)))
    class_total = list(0. for i in range(len(test_set.classes)))

    # TODO: implement the test loop
    #   - print relavant information: test loss, class accuracy, overall accuracy

    test_loss = 0
    model.eval()
    for data, target in test_loader:
        data, target = data.to(device), target.to(device)
        output = model(data)
        loss = criterion(output, target)
        test_loss += loss.item() * batch_size

        ps = torch.exp(output)
        _, pred = torch.max(ps, dim=1)
        correct_tensor = pred.eq(target.data.view_as(pred))

        if torch.cuda.is_available():
            correct = np.squeeze(correct_tensor.cpu().numpy())
        else:
            correct = np.squeeze(correct_tensor.numpy())

        for i in range(len(data)):
            label = target.data[i].item()
            class_correct[label] += correct[i].item()
            class_total[label] += 1
    
    # average test loss
    test_loss = test_loss/len(test_loader.dataset)
    print('Test Loss: {:.4f}'.format(test_loss))

    for i in range(len(test_set.classes)):
        if class_total[i] > 0:
            print('Test Accuracy of %5s: %2d%% (%2d/%2d)' % (
                test_set.classes[i], 100 * class_correct[i] / class_total[i],
                np.sum(class_correct[i]), np.sum(class_total[i])))
        else:
            print('Test Accuracy of %5s: N/A (no training examples)' % (test.classes[i]))

    print('\nTest Accuracy (Overall): %2d%% (%2d/%2d)' % (
        100. * np.sum(class_correct) / np.sum(class_total),
        np.sum(class_correct), np.sum(class_total)))
    
model = Net()
model = load_model(model, best_model_checkpoint).to(device)
test(model, test_loader, criterion)

Test Loss: 2.0700
Test Accuracy of airplane: 83% (832/1000)
Test Accuracy of automobile:  0% ( 0/1000)
Test Accuracy of  bird: 70% (705/1000)
Test Accuracy of   cat:  0% ( 0/1000)
Test Accuracy of  deer: 75% (755/1000)
Test Accuracy of   dog: 85% (856/1000)
Test Accuracy of  frog: 87% (874/1000)
Test Accuracy of horse: 86% (866/1000)
Test Accuracy of  ship: 88% (882/1000)
Test Accuracy of truck: 87% (875/1000)

Test Accuracy (Overall): 66% (6645/10000)


### Exercise 5

As you have noticed, our network performs poorly. To determine whether this is a hyper-parameter issue or we actually need to make changes in the network architecture, you can do a hyper-parameter search and view how the network preforms for each combinations of parameters in each run. So, in this exercise you will have to:
- Implement a RunBuilder class that outputs a list of parameters describing each run.
- Refactor the training function such that it accepts a dictionary of parameters and use this dictionary to do the actual training.
- The training function must return the absolute best model checkpoint among all the training runs. While during training we selected the best model based on the validation error, here you'll want to also keep track of the best model in each run durring testing.

In [37]:
# TODO 5.1. Implement traning and test with hyper-parameter search
class RunBuilder():
    @staticmethod
    def get_runs(params):
        Run = namedtuple('Run', params.keys())

        runs = []
        for v in product(*params.values()):
            runs.append(Run(*v))
        return runs

In [38]:
params = OrderedDict(
    epochs = [30],
    lr = [0.0003, 0.003, 0.03],
    batch_size = [32, 64],
    patience_threshold = [6, 10]
)
runs = RunBuilder.get_runs(params)

In [39]:
min_valid_loss = np.Inf
best_run = None
best_model = None
for run in runs:
    model = Net()
    criterion = nn.NLLLoss()
    optimizer = optim.Adam(model.parameters(), lr = run.lr)
    batch_size = run.batch_size
    train_loader, valid_loader = train_valid_split(train_set, valid_percent, batch_size)
    best_model_checkpoint, valid_loss, crt_model = train(model, train_loader, valid_loader, criterion, optimizer, epochs = epochs, run = run)
    if valid_loss < min_valid_loss:
        min_valid_loss = valid_loss
        best_model = crt_model
        best_run = run
    print(str(runs.index(run)) + " out of " + str(len(runs)) + ": " + str(run) + " " + str(min_valid_loss))

Epoch: 1/30..  Training Loss: 3.330..  Validation Loss: 3.062..  Early Stopping Patience: 7.. 
Validation loss decreased (inf --> 3.0615)..
Saving to  checkpoint.pth 

Epoch: 2/30..  Training Loss: 2.960..  Validation Loss: 2.961..  Early Stopping Patience: 7.. 
Validation loss decreased (3.0615 --> 2.9609)..
Saving to  checkpoint.pth 

Epoch: 3/30..  Training Loss: 2.826..  Validation Loss: 2.836..  Early Stopping Patience: 7.. 
Validation loss decreased (2.9609 --> 2.8362)..
Saving to  checkpoint.pth 

Epoch: 4/30..  Training Loss: 2.729..  Validation Loss: 2.857..  Early Stopping Patience: 7.. 
Epoch: 5/30..  Training Loss: 2.647..  Validation Loss: 2.778..  Early Stopping Patience: 6.. 
Validation loss decreased (2.8362 --> 2.7776)..
Saving to  checkpoint.pth 

Epoch: 6/30..  Training Loss: 2.580..  Validation Loss: 2.690..  Early Stopping Patience: 7.. 
Validation loss decreased (2.7776 --> 2.6905)..
Saving to  checkpoint.pth 

Epoch: 7/30..  Training Loss: 2.523..  Validation Los

In [40]:
test_loader = torch.utils.data.DataLoader(test_set, batch_size = best_run.batch_size, num_workers = 0)
test(best_model, test_loader, criterion)

Test Loss: 0.7913
Test Accuracy of airplane: 73% (736/1000)
Test Accuracy of automobile: 93% (939/1000)
Test Accuracy of  bird: 73% (730/1000)
Test Accuracy of   cat: 68% (685/1000)
Test Accuracy of  deer: 81% (819/1000)
Test Accuracy of   dog: 72% (721/1000)
Test Accuracy of  frog: 88% (889/1000)
Test Accuracy of horse: 80% (802/1000)
Test Accuracy of  ship: 89% (897/1000)
Test Accuracy of truck: 85% (856/1000)

Test Accuracy (Overall): 80% (8074/10000)
