# Coursework1: Convolutional Neural Networks 

## instructions

Please submit a version of this notebook containing your answers **together with your trained model** on CATe as CW2.zip. Write your answers in the cells below each question.

### Setting up working environment 

For this coursework you will need to train a large network, therefore we recommend you work with Google Colaboratory, which provides free GPU time. You will need a Google account to do so. 

Please log in to your account and go to the following page: https://colab.research.google.com. Then upload this notebook.

For GPU support, go to "Edit" -> "Notebook Settings", and select "Hardware accelerator" as "GPU".

You will need to install pytorch by running the following cell:

In [None]:
!pip install torch torchvision



## Introduction

For this coursework you will implement one of the most commonly used model for image recognition tasks, the Residual Network. The architecture is introduced in 2015 by Kaiming He, et al. in the paper ["Deep residual learning for image recognition"](https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/He_Deep_Residual_Learning_CVPR_2016_paper.pdf). 
<br>

In a residual network, each block contains some convolutional layers, plus "skip" connections, which allow the activations to by pass a layer, and then be summed up with the activations of the skipped layer. The image below illustrates a building block in residual networks.

![resnet-block](utils/resnet-block.png)

Depending on the number of building blocks, resnets can have different architectures, for example ResNet-50, ResNet-101 and etc. Here you are required to build ResNet-18 to perform classification on the CIFAR-10 dataset, therefore your network will have the following architecture:

![resnet](utils/resnet.png)

## Part 1 (40 points)

In this part, you will use basic pytorch operations to define the 2D convolution, max pooling operation, linear layer as well as 2d batch normalization. 

### YOUR TASK

- implement the forward pass for Conv2D, MaxPool2D, Linear and BatchNorm2d
- You are **NOT** allowed to use the torch.nn modules

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F

class Conv2d(nn.Module):
    def __init__(self,
                 in_channels,
                 out_channels,
                 kernel_size,
                 stride=1,
                 padding=0,
                 bias=True):

        super(Conv2d, self).__init__()
        """
        An implementation of a convolutional layer.

        The input consists of N data points, each with C channels, height H and
        width W. We convolve each input with F different filters, where each filter
        spans all C channels and has height HH and width WW.

        Parameters:
        - w: Filter weights of shape (F, C, HH, WW)
        - b: Biases, of shape (F,)
        - kernel_size: Size of the convolving kernel
        - stride: The number of pixels between adjacent receptive fields in the
            horizontal and vertical directions.
        - padding: The number of pixels that will be used to zero-pad the input.
        """

        ########################################################################
        # TODO: Define the parameters used in the forward pass                 #
        ########################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        assert(type(in_channels) is int and type(out_channels) is int)
        assert(type(stride) is int and type(padding) is int)
        assert(type(bias) is bool)
        assert(type(kernel_size) is int or type(kernel_size) is tuple)
        if type(kernel_size) is tuple:
            assert(len(kernel_size) == 2 and type(kernel_size[0]) is int and type(kernel_size[1]) is int)
            self.kernel_size = kernel_size
        else:
            self.kernel_size = (kernel_size, kernel_size)
        k = torch.tensor(1.0/in_channels)
        w = torch.sqrt(k)*torch.randn((out_channels, in_channels, self.kernel_size[0], self.kernel_size[1]), requires_grad = True)
        self.w = nn.Parameter(w)
        b = torch.sqrt(k)*torch.randn(out_channels, requires_grad = True)
        self.b = nn.Parameter(b)
        self.stride = stride
        self.padding = padding
        self.bias = bias
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ########################################################################
        #                             END OF YOUR CODE                         #
        ########################################################################

    def forward(self, x):
        """
        Input:
        - x: Input data of shape (N, C, H, W)
        Output:
        - out: Output data, of shape (N, F, H', W').
        """

        ########################################################################
        # TODO: Implement the forward pass                                     #
        ########################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        
        #Calculate the output shape
        N = x.shape[0]
        C = self.w.shape[0] #out channels
        H_ = (x.shape[2]+2*self.padding-self.kernel_size[0])//self.stride+1
        W_ = (x.shape[3]+2*self.padding-self.kernel_size[1])//self.stride+1
        # reference: https://pytorch.org/docs/stable/nn.html?highlight=unfold#torch.nn.Unfold
        # First Unfold the filter with the given size
        x_unf = F.unfold(x, self.kernel_size, padding = self.padding, stride = self.stride)
        # then we can use matrix multiplication to calculate convolution
        out_unf = x_unf.transpose(1,2).matmul(self.w.view(self.w.size(0),-1).t()).transpose(1,2)
        out = out_unf.view(N, C, H_, W_)
        if self.bias:
          bias = self.b.repeat(H_, W_, 1).T
          out += bias
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ########################################################################
        #                             END OF YOUR CODE                         #
        ########################################################################

        return out

In [None]:
class MaxPool2d(nn.Module):
    def __init__(self, kernel_size):
        super(MaxPool2d, self).__init__()
        """
        An implementation of a max-pooling layer.

        Parameters:
        - kernel_size: the size of the window to take a max over
        """
        ########################################################################
        # TODO: Define the parameters used in the forward pass                 #
        ########################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        
        # Check that kernel size is either an int or a tuple of two ints
        assert(type(kernel_size) is int or type(kernel_size) is tuple)
        if type(kernel_size) is tuple:
            assert(len(kernel_size) == 2 and type(kernel_size[0]) is int and type(kernel_size[1]) is int)
            self.kernel_size = kernel_size
        else:
            self.kernel_size = (kernel_size, kernel_size)

        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ########################################################################
        #                             END OF YOUR CODE                         #
        ########################################################################

    def forward(self, x):
        """
        Input:
        - x: Input data of shape (N, C, H, W)
        Output:
        - out: Output data, of shape (N, F, H', W').
        """
        ########################################################################
        # TODO: Implement the forward pass                                     #
        ########################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        # Get the size of the kernel
        kh, kw = self.kernel_size
        # Unfold the input and shape it to the desired shape
        x1 = F.unfold(x, kernel_size = self.kernel_size, stride = self.kernel_size)
        x1 = x1.view(x1.shape[0], x.shape[1], -1, x1.shape[2]) #x1.shape[0], -1, self.kernel_size**2, x1.shape[2]
        max_pool, _ = x1.max(axis = 2)
        # Calculate output shape
        H_ = -int(-(x.shape[2]-kh+1)/kh)
        W_ = -int(-(x.shape[3]-kw+1)/kw)
        out = max_pool.view(x.shape[0], x.shape[1], H_, W_)

        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ########################################################################
        #                             END OF YOUR CODE                         #
        ########################################################################

        return out

In [None]:
class Linear(nn.Module):
    def __init__(self, in_channels, out_channels, bias=True):
        super(Linear, self).__init__()
        """
        An implementation of a Linear layer.

        Parameters:
        - weight: the learnable weights of the module of shape (in_channels, out_channels).
        - bias: the learnable bias of the module of shape (out_channels).
        """
        ########################################################################
        # TODO: Define the parameters used in the forward pass                 #
        ########################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        assert(type(in_channels) is int and type(out_channels) is int and type(bias) is bool)
        k = torch.tensor(1.0/in_channels)
        weight = torch.randn((in_channels, out_channels), requires_grad = True)*torch.sqrt(k)
        self.weight = nn.Parameter(weight)
        self.use_bias = False
        if bias:
            bias = torch.randn(out_channels, requires_grad = True)*torch.sqrt(k)
            self.bias = nn.Parameter(bias)
            self.use_bias = True
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ########################################################################
        #                             END OF YOUR CODE                         #
        ########################################################################

    def forward(self, x):
        """
        Input:
        - x: Input data of shape (N, *, H) where * means any number of additional
        dimensions and H = in_channels
        Output:
        - out: Output data of shape (N, *, H') where * means any number of additional
        dimensions and H' = out_channels
        """
        ########################################################################
        # TODO: Implement the forward pass                                     #
        ########################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        out = torch.matmul(x, self.weight)
        if self.use_bias:
            out = out + self.bias

        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ########################################################################
        #                             END OF YOUR CODE                         #
        ########################################################################

        return out

In [None]:
class BatchNorm2d(nn.Module):
    def __init__(self, num_features, eps=1e-05, momentum=0.1):
        super(BatchNorm2d, self).__init__()
        """
        An implementation of a Batch Normalization over a mini-batch of 2D inputs.

        The mean and standard-deviation are calculated per-dimension over the
        mini-batches and gamma and beta are learnable parameter vectors of
        size num_features.

        Parameters:
        - num_features: C from an expected input of size (N, C, H, W).
        - eps: a value added to the denominator for numerical stability. Default: 1e-5
        - momentum: momentum - the value used for the running_mean and running_var
        computation. Default: 0.1
        - gamma: the learnable weights of shape (num_features).
        - beta: the learnable bias of the module of shape (num_features).
        """
        ########################################################################
        # TODO: Define the parameters used in the forward pass                 #
        ########################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        assert(type(num_features) is int)
        assert(type(eps) is float and type(momentum) is float)
        assert(momentum >= 0 and momentum <= 1)
        self.num_features = num_features
        self.eps = eps
        self.momentum = momentum
        # gamma and beta are set according to the documentation of pytorch BatchNorm2d
        # implementation: https://pytorch.org/docs/stable/nn.html?highlight=maxpool#torch.nn.AdaptiveMaxPool2d
        self.gamma = nn.Parameter(torch.ones(num_features, requires_grad = True))
        self.beta = nn.Parameter(torch.zeros(num_features, requires_grad = True))
        # Initialize running mean and vars
        self.running_mean = torch.zeros(num_features)
        self.running_var = torch.ones(num_features)
        # flag for training
        self.training = True

        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ########################################################################
        #                             END OF YOUR CODE                         #
        ########################################################################

    def forward(self, x):
        """
        During training this layer keeps running estimates of its computed mean and
        variance, which are then used for normalization during evaluation.
        Input:
        - x: Input data of shape (N, C, H, W)
        Output:
        - out: Output data of shape (N, C, H, W) (same shape as input)
        """
        ########################################################################
        # TODO: Implement the forward pass                                     #
        #       (be aware of the difference for training and testing)          #
        ########################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        
        # Very strange behavior compared to pytorch BatchNorm2D
        if self.training:
            mean = x.mean([0,2,3])
            var = x.var([0,2,3])
            self.running_mean = (1-self.momentum)*self.running_mean + self.momentum*mean
            self.running_var = (1-self.momentum)*self.running_var + self.momentum*var
            # Change the variance to biased
            var = x.var([0,2,3], unbiased = False)
        else:
            mean = self.running_mean
            var = self.running_var
        x = (x-mean.view(1,-1,1,1))/torch.sqrt(var+self.eps).view(1,-1,1,1)*self.gamma.view(1,-1,1,1) + self.beta.view(1,-1,1,1)
        # Don't know how to see if training
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ########################################################################
        #                             END OF YOUR CODE                         #
        ########################################################################

        return x

## Part 2

In this part, you will train a ResNet-18 defined on the CIFAR-10 dataset. Code for training and evaluation are provided. 

### Your Task

1. Train your network to achieve the best possible test set accuracy after a maximum of 10 epochs of training.

2. You can use techniques such as optimal hyper-parameter searching, data pre-processing

3. If necessary, you can also use another optimizer

4. **Answer the following question:**
Given such a network with a large number of trainable parameters, and a training set of a large number of data, what do you think is the best strategy for hyperparameter searching? 

**Answer for 4**. For a problem of such a big parameter space and large training data, we need to consider the time we have when developing a strategy for hyperparameter searching. A simple grid search may be too time consuming in such a case, especially when the hardware's ability is limited. One possible way to find the best optimal hyperparameter is to use Bayesian optimization on the hyperparameters. Another approach is to use genetic algorithms to simulate an evolution of the hyperparameters, although this may not be as mathematically rigorous as using Bayesian optimization. In this coursework I tried to design a genetic algorithm to find the best hyperparameters to feed into the Adam optimiser. The performance of a set of hyperparameters are evaluated on a validation set and afterwards we find the best set of hyperparameters and evaluate its performance on the testing set.

In [None]:
import torch
from torch.nn import Conv2d, MaxPool2d
import torch.nn as nn
import torch.nn.functional as F

Next, we define ResNet-18:

In [None]:
# define resnet building blocks

class ResidualBlock(nn.Module): 
    def __init__(self, inchannel, outchannel, stride=1): 
        
        super(ResidualBlock, self).__init__() 
        
        self.left = nn.Sequential(Conv2d(inchannel, outchannel, kernel_size=3, 
                                         stride=stride, padding=1, bias=False), 
                                  nn.BatchNorm2d(outchannel), 
                                  nn.ReLU(inplace=True), 
                                  Conv2d(outchannel, outchannel, kernel_size=3, 
                                         stride=1, padding=1, bias=False), 
                                  nn.BatchNorm2d(outchannel)) 
        
        self.shortcut = nn.Sequential() 
        
        if stride != 1 or inchannel != outchannel: 
            
            self.shortcut = nn.Sequential(Conv2d(inchannel, outchannel, 
                                                 kernel_size=1, stride=stride, 
                                                 padding = 0, bias=False), 
                                          nn.BatchNorm2d(outchannel) ) 
            
    def forward(self, x): 
        
        out = self.left(x) 
        
        out += self.shortcut(x) 
        
        out = F.relu(out) 
        
        return out


    
    # define resnet

class ResNet(nn.Module):
    
    def __init__(self, ResidualBlock, num_classes = 10):
        
        super(ResNet, self).__init__()
        
        self.inchannel = 64
        self.conv1 = nn.Sequential(Conv2d(3, 64, kernel_size = 3, stride = 1,
                                            padding = 1, bias = False), 
                                  nn.BatchNorm2d(64), 
                                  nn.ReLU())
        
        self.layer1 = self.make_layer(ResidualBlock, 64, 2, stride = 1)
        self.layer2 = self.make_layer(ResidualBlock, 128, 2, stride = 2)
        self.layer3 = self.make_layer(ResidualBlock, 256, 2, stride = 2)
        self.layer4 = self.make_layer(ResidualBlock, 512, 2, stride = 2)
        self.maxpool = MaxPool2d(4)
        self.fc = nn.Linear(512, num_classes)
        
    
    def make_layer(self, block, channels, num_blocks, stride):
        
        strides = [stride] + [1] * (num_blocks - 1)
        
        layers = []
        
        for stride in strides:
            
            layers.append(block(self.inchannel, channels, stride))
            
            self.inchannel = channels
            
        return nn.Sequential(*layers)
    
    
    def forward(self, x):
        
        x = self.conv1(x)
        
        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)
        
        x = self.maxpool(x)
        
        x = x.view(x.size(0), -1)
        
        x = self.fc(x)
        
        return x
    
    
def ResNet18():
    return ResNet(ResidualBlock)

### Loading dataset
We will import images from the [torchvision.datasets](https://pytorch.org/docs/stable/torchvision/datasets.html) library <br>
First, we need to define the alterations (transforms) we want to perform to our images - given that transformations are applied when importing the data. <br>
Define the following transforms using the torchvision.datasets library -- you can read the transforms documentation [here](https://pytorch.org/docs/stable/torchvision/transforms.html): <br>
1. Convert images to tensor
2. Normalize mean and std of images with values:mean=[0.4914, 0.4822, 0.4465], std=[0.2023, 0.1994, 0.2010]

In [None]:
import torch.optim as optim
from torch.utils.data import DataLoader
from torch.utils.data import sampler

import torchvision.datasets as dset

import numpy as np

import torchvision.transforms as T

##############################################################
#                       YOUR CODE HERE                       #       
##############################################################

transform = T.Compose([
           T.RandomRotation(20),   #We add RandomRotation and RandomHorizontalFlip to create a variety of data
           T.RandomCrop(size=32, padding=4),
           T.RandomHorizontalFlip(),
           T.ToTensor(),
           T.Normalize(mean = [0.4914, 0.4822, 0.4465], std = [0.2023, 0.1994, 0.2010])
])

##############################################################
#                       END OF YOUR CODE                     #
##############################################################




Now load the dataset using the transform you defined above, with batch_size = 64<br>
You can check the documentation [here](https://pytorch.org/docs/stable/torchvision/datasets.html)
Then create data loaders (using DataLoader from torch.utils.data) for the training and test set

In [None]:

##############################################################
#                       YOUR CODE HERE                       #       
##############################################################

data_dir = './data'

train_num = 49000
val_num = 1000

cifar10_train = dset.CIFAR10(data_dir, train = True, transform = transform, download = True)
data_train = DataLoader(cifar10_train, batch_size = 64, sampler= sampler.SubsetRandomSampler(range(train_num)), num_workers = 4)

data_val = DataLoader(cifar10_train, batch_size = 64, sampler=sampler.SubsetRandomSampler(range(train_num, train_num + val_num)), num_workers = 4)

cifar10_test = dset.CIFAR10(data_dir, train = False, transform = transform, download = True)
data_test = DataLoader(cifar10_test, batch_size = 64)


##############################################################
#                       END OF YOUR CODE                     #       
##############################################################



Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./data/cifar-10-python.tar.gz


HBox(children=(IntProgress(value=1, bar_style='info', max=1), HTML(value='')))


Extracting ./data/cifar-10-python.tar.gz to ./data
Files already downloaded and verified


In [None]:
USE_GPU = True
dtype = torch.float32 

if USE_GPU and torch.cuda.is_available():
    device = torch.device('cuda')
else:
    device = torch.device('cpu')
    
    

print_every = 100
def check_accuracy(loader, model):
    # function for test accuracy on validation and test set
    
    if loader.dataset.train:
        print('Checking accuracy on validation set')
    else:
        print('Checking accuracy on test set')   
    num_correct = 0
    num_samples = 0
    model.eval()  # set model to evaluation mode
    with torch.no_grad():
        for x, y in loader:
            x = x.to(device=device, dtype=dtype)  # move to device
            y = y.to(device=device, dtype=torch.long)
            scores = model(x)
            _, preds = scores.max(1)
            num_correct += (preds == y).sum()
            num_samples += preds.size(0)
        acc = float(num_correct) / num_samples
        print('Got %d / %d correct (%.2f)' % (num_correct, num_samples, 100 * acc))

        

def train_part(model, optimizer, epochs=1, print_ = True):
    """
    Train a model on CIFAR-10 using the PyTorch Module API.
    
    Inputs:
    - model: A PyTorch Module giving the model to train.
    - optimizer: An Optimizer object we will use to train the model
    - epochs: (Optional) A Python integer giving the number of epochs to train for
    
    Returns: Nothing, but prints model accuracies during training.
    """
    model = model.to(device=device)  # move the model parameters to CPU/GPU
    for e in range(epochs):
        if print_:
          print(len(loader_train))
        for t, (x, y) in enumerate(loader_train):
            model.train()  # put model to training mode
            x = x.to(device=device, dtype=dtype)  # move to device, e.g. GPU
            y = y.to(device=device, dtype=torch.long)

            scores = model(x)
            loss = F.cross_entropy(scores, y)

            # Zero out all of the gradients for the variables which the optimizer
            # will update.
            optimizer.zero_grad()

            loss.backward()

            # Update the parameters of the model using the gradients
            optimizer.step()

            if t % print_every == 0 and print_ == True:
                print('Epoch: %d, Iteration %d, loss = %.4f' % (e, t, loss.item()))
                check_accuracy(loader_val, model)
                print()

In [None]:
# Here we use genetic algorithm to find the best data
#Step 1. define relevant functions
def return_accuracy(loader, model, device = torch.device('cuda')):
    # function for test accuracy on validation and test set 
    num_correct = 0
    num_samples = 0
    model.eval()  # set model to evaluation mode
    with torch.no_grad():
        for x, y in loader:
            x = x.to(device=device, dtype=dtype)  # move to device
            y = y.to(device=device, dtype=torch.long)
            scores = model(x)
            _, preds = scores.max(1)
            num_correct += (preds == y).sum()
            num_samples += preds.size(0)
        acc = float(num_correct) / num_samples
        return acc

def eval_geno(genotype, bit):
  feat = genotype//(2**bit)
  new_geno = genotype - 2**bit*feat
  return feat, new_geno

def pheno(genotype, genom_l = 15):
  assert(type(genotype) == int)
  assert(0<=genotype<2**genom_l)
  # Get ams grad
  geno_ams, genotype = eval_geno(genotype, 14)
  amsgrad = geno_ams > 0
  # Get learning rate
  geno_lr, genotype = eval_geno(genotype, 11)
  lr = 2**geno_lr*1e-5
  # Get beta 1
  geno_b1, genotype = eval_geno(genotype, 8)
  b1 = 0.75+0.03*geno_b1
  # Get beta 2
  geno_b2, genotype = eval_geno(genotype, 5)
  b2 = 0.997+0.0003*geno_b2
  # Get eps
  geno_eps, genotype = eval_geno(genotype, 3)
  eps = 10**(-9+geno_eps)
  geno_wdk, genotype = eval_geno(genotype, 0)
  weight_decay = 0.06*geno_wdk
  return lr, (b1, b2), eps, weight_decay, amsgrad

def breed(parents, geno_l = 15):
  splits = np.random.randint(0, geno_l)
  #np.random.shuffle(parents)
  p_0 = (2**splits)*parents[0]//(2**splits)
  p_1 = parents[1] - (2**splits)*parents[1]//(2**splits)
  child = p_0+p_1
  #mutate
  for i in range(12):
    if np.random.rand() < 0.1:
      bit = (0.5-child//(2**i)%2)
      child += bit*2**(i+1)
  return child

In [None]:
loader_train = data_train
loader_val = data_val

# Testing different params
best_geno = None
best_accuracy = 0
# create 10 genotypes
geno_length = 15
genos = np.array([29880, 31928, 30904, 21529, 21969, 2473, 19267, 19640, 17215, 27730, 15423, 31992,
 19047, 25413, 1108])#np.random.randint(0, 2**(geno_length), size = 15)
num_of_generations = 10
acc = np.array([0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0, 0.0, 0.0, 0.0, 0.0])
for i in range(num_of_generations):
  print('Heres a new generation ...')
  s = 3
  for gen in genos[3:]:
    lr, betas, eps, weight_decay, amsgrad = pheno(int(gen))
    # define and train the network
    model = ResNet18()
    optimizer = optim.Adam(model.parameters(), lr = lr, betas = betas , eps = eps, weight_decay = weight_decay, amsgrad = amsgrad)
    train_part(model, optimizer, epochs = 10, print_ = False)
    acc[s] = return_accuracy(loader_val, model, device = device)
    s += 1
  genos = genos[acc.argsort()[::-1]]
  acc[::-1].sort()
  print(acc)
  
  print(genos)
  #keep top 3 and use these to mute
  for j in range(3,15):
    parents = np.random.choice(genos[:3], 2)
    genos[j] = breed(parents)
  if acc.max() > best_accuracy:
    best_accuracy = acc.max()
    best_geno = genos[0]
  print('Generation: ',i+1,' average acc: ', acc.mean(), ' max acc: ', acc.max())

Heres a new generation ...
[0.834 0.744 0.737 0.729 0.727 0.711 0.695 0.677 0.675 0.533 0.451 0.079
 0.    0.    0.   ]
[31992 19047 21529 19267  2473 21969  1108 17215 19640 25413 27730 15423
 30904 31928 29880]
Generation:  1  average acc:  0.5061333333333332  max acc:  0.834
Heres a new generation ...
[0.837 0.834 0.833 0.759 0.758 0.75  0.744 0.738 0.737 0.734 0.73  0.728
 0.718 0.201 0.174]
[31992 31992 30968 19047 21529 19047 19047 19047 21529 21643 19047 21561
 19043 31986 31868]
Generation:  2  average acc:  0.685  max acc:  0.837
Heres a new generation ...
[0.838 0.837 0.834 0.833 0.833 0.83  0.828 0.828 0.828 0.819 0.818 0.817
 0.816 0.245 0.167]
[31984 31992 31992 31864 30968 30904 30968 31976 30968 29304 31992 30832
 31992 32249 31996]
Generation:  3  average acc:  0.7447333333333332  max acc:  0.838
Heres a new generation ...
[0.838 0.837 0.835 0.834 0.833 0.833 0.83  0.826 0.825 0.818 0.818 0.341
 0.187 0.181 0.164]
[31984 31992 31848 31992 31992 30968 31992 31992 31992 3

In [None]:
# code for optimising your network performance

##############################################################
#                       YOUR CODE HERE                       #       
##############################################################

# Here we use the best hyperparameters found by genetic algorithm to train on the whole test set

data_train = DataLoader(cifar10_train, batch_size = 64, sampler=sampler.SubsetRandomSampler(range(train_num + val_num)), num_workers = 4)

loader_train = data_train
loader_test = data_test

lr, betas, eps, weight_decay, amsgrad = pheno(29880)#pheno(int(genos[0])) 

##############################################################
#                       END OF YOUR CODE                     #
##############################################################


# define and train the network
model = ResNet18()
#optimizer = optim.Adam(model.parameters(), lr = lr, betas = betas , eps = eps, weight_decay = weight_decay, amsgrad = amsgrad)
optimizer = optim.Adam(model.parameters(), lr = 0.0056, amsgrad = True)
train_part(model, optimizer, epochs = 10)


# report test set accuracy

check_accuracy(loader_test, model)


# save the model
torch.save(model.state_dict(), 'model.pt')

782
Epoch: 0, Iteration 0, loss = 4.6675

Epoch: 0, Iteration 100, loss = 2.1702

Epoch: 0, Iteration 200, loss = 2.0308



## Part 3

The code provided below will allow you to visualise the feature maps computed by different layers of your network. Run the code (install matplotlib if necessary) and **answer the following questions**: 

1. Compare the feature maps from low-level layers to high-level layers, what do you observe? 

2. Use the training log, reported test set accuracy and the feature maps, analyse the performance of your network. If you think the performance is sufficiently good, explain why; if not, what might be the problem and how can you improve the performance?

3. What are the other possible ways to analyse the performance of your network?

**YOUR ANSWER FOR PART 3 HERE**

Answers
1. For the low level layers we see that some sort of structure is captured after the convolution operations. We can see the contour like features and some high intensity regions in the first layers of the network. As the data progress to high-level layers, we see some pixels have higher intensity than others, and the output images look like an activation map with individual pixels activated. The overall structure of the original image becomes unobservable and the functions are more abstract.
2. The test set accuracy achieves 87% after 10 episodes. The result is acceptable but unsatisfying. In oberving ways to improve performance, we see that the loss seems to be decreasing still around 10 epochs. This indicates that training on more epochs may further improve performance. When we compare the feature maps we obtained with other published feature maps (https://arxiv.org/pdf/1311.2901.pdf). We see that our feature maps are rather "unobvious", thus we may not be learning well enough representations for each feature. As mentioned above we may increase the number of epochs in training or increase the complexity of our neural network model.
3. Other than measuring test set accuracy, we can also use the average loss on the test set to evaluate the performance. For more detailed analysis we can create a confusion matrix for the testing set, measure the precision, recall, F1-scores and other measures. Looking at the confusion matrix is especially important if we are dealing with unbalanced datasets.

In [None]:
#!pip install matplotlib

import matplotlib.pyplot as plt

plt.tight_layout()


activation = {}
def get_activation(name):
    def hook(model, input, output):
        activation[name] = output.detach()
    return hook

vis_labels = ['conv1', 'layer1', 'layer2', 'layer3', 'layer4']

for l in vis_labels:

    getattr(model, l).register_forward_hook(get_activation(l))
        
data, _ = cifar10_test[0]
data = data.unsqueeze_(0).to(device = device, dtype = dtype)

output = model(data)



for idx, l in enumerate(vis_labels):

    act = activation[l].squeeze()

    if idx < 2:
        ncols = 8
    else:
        ncols = 32
        
    nrows = act.size(0) // ncols
    
    fig, axarr = plt.subplots(nrows, ncols)
    fig.suptitle(l)


    for i in range(nrows):
        for j in range(ncols):
            axarr[i, j].imshow(act[i * nrows + j].cpu())
            axarr[i, j].axis('off')