### Name: Ihebenachi Chigozirim
### CID:

In [None]:
!pip install pycm livelossplot
%pylab inline

#### Provided imports (add more you need them)

In [None]:
import random
import progressbar
import copy
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D  
from matplotlib import cm
import numpy as np
import torch
from sklearn.metrics import accuracy_score
from torchvision import models, datasets, transforms
from torch.utils.data import DataLoader, Subset, SubsetRandomSampler, TensorDataset
import torch.nn as nn
import torch.nn.functional as F
from livelossplot import PlotLosses
from sklearn.model_selection import StratifiedShuffleSplit

def set_seed(seed):
    """
    Use this to set ALL the random seeds to a fixed value and take out any randomness from cuda kernels
    """
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)

    torch.backends.cudnn.benchmark = True  ##uses the inbuilt cudnn auto-tuner to find the fastest convolution algorithms. -
    torch.backends.cudnn.enabled   = True

    return True

device = 'cpu'
if torch.cuda.device_count() > 0 and torch.cuda.is_available():
    print("Cuda installed! Running on GPU!")
    device = 'cuda'
else:
    print("No GPU available!")

## 1-Prepare your LeNet-5 network [10 points]
Use the code provided in the Jupyter Notebook template and modify it as you see fit to be able to perform a forward pass using the single dummy tensor input `x` provided. The lines of code that will do the forward pass and print the network are provided in the template.

In [None]:
#     make modifications in the code below


class LeNet5(nn.Module):
  def __init__(self):
    super(LeNet5, self).__init__()
    self.c1 = nn.Conv2d(1, 6, kernel_size=5, stride=1, padding=2) 
    self.s2 = nn.MaxPool2d(kernel_size=2, stride=2)   
    self.c3 = nn.Conv2d(6, 16, kernel_size=5, stride=1)   
    self.s4 = nn.MaxPool2d(kernel_size=2, stride=2)
    self.c5 = nn.Linear(16*5*5, 120)    
    self.f6 = nn.Linear(120, 84)   
    self.output = nn.Linear(84, 10)  
    self.act = nn.ReLU()           
    
  def forward(self, x):
    x = self.act(self.c1(x))    
    x = self.act(self.s2(x))   
    x = self.act(self.c3(x))   
    x = self.act(self.s4(x))    
    x = x.view(-1, x.size(1)*x.size(2)*x.size(3))     
    x = self.act(self.c5(x))       
    x = self.act(self.f6(x))    
    return self.output(x)          
  
# dummy input of the same size as the CIFAR-10 images
x = torch.ones((1, 3, 32, 32))
model = LeNet5()
y = model(x)

print(model)

## 2-Load CIFAR-10 [10 points]
Use `torchvision.datasets.CIFAR10` to load the CIFAR-10 dataset (training and test sets).

In [None]:
# your code goes here

## 3-Plot data [5 points]
Plot 25 images of the training set together with their corresponding label names.

In [None]:
# your code goes here

## 4-Create a training, validation split [5 points]
Split the data using `sklearn.model\_selection.StratifiedShuffleSplit`:

- 90\% of the data in the training set
- 10\% of the data in the validation set

Prepare the downloaded datasets to be used with your modified network in **1-Prepare your LeNet-5 network**

In [None]:
# your code goes here

## 5-Grid search [20 points]
From  the  list  below,  select  two  hyperparameters  and  perform  a  2D  grid-search  to  find  the  optimal values for these two hyperparameters.  The range of values to test are provided.  Justify your choice of the two hyperparameters you want to tune (write a paragraph in a markdown cell explaining why you chose these two particular parameters). 

The list of hyperparameters to choose from is:

a)  Random Number Seed:  **42**  [31, 42, 53] \
b)  Learning Rate:  **1e-2**  [1e-1, 1e-2, 1e-3] \
c)  Momentum:  **0.5**  [0.2, 0.5, 0.8] \
d)  Batch Size:  **64**  [64, 128, 640] \
e)  Number of epochs:  **30**  [10, 30, 50]

The **values in bold** next to each hyperparameter are the values you need to use if you are not tuning this particular hyperparameter.  The values between square brackets are the values to use if you choose to tune this particular hyperparameter. Fixed hyperparameters:

- Optimiser:SGD+momentum
- Test batch size:1000

Write the results in two tables (one for the loss and one for the accuracy) where the columns and rows are the first and second hyperparameter have selected. You can use markdown tables or create the table in python.

You don’t need to plot all the livelossplotplots for each combination you try, as the results will be summarised in the table, but at least plot two of the grid-search runs. 

Select the best values for the two hyperparameters you have chosen to optimise and **justify your choice**.

In [None]:
# your code goes here

## 6-Train with best hyperparameters [5 points]

Once you have your two best hyperparameters, retrain the model by combining the validation, training, and test sets as you see fit. Report the final accuracy on the test set. Use `livelossplot` to plot the values of the training evolution.

In [None]:
# your code goes here

## 7-Answer the following questions [1 point each]
Which of these data-augmentation transforms would be reasonable to apply to CIFAR10 and why? 

1. Left-Right Flips
2. Random Rotations by up to 10 Degrees
3. Up-Down Flips
4. Shifting up-down, left-right by 5 pixels
5. Contrast Changes
6. Adding Gaussian Noise
7. Random Rotations by up to 90 Degrees

Justify each one of your answers.

---

Write your answers here:

1. 
2. 
3. 
4. 
5. 
6. 
7. 



## 8-Plot augmented data [13 points]

Select one of the data-augmentation transforms you decided were reasonable in the previous question. Implement it, and apply it to 9 images of the CIFAR-10 dataset.

Plot the 9 transformed images.

In [None]:
# your code goes here

## 9-Visualising loss landscapes paper [10 points]

Read the provided paper [Visualising the Loss Landscape of Neural Nets](https://arxiv.org/pdf/1712.09913.pdf).

This paper is contains a lot of advanced concepts. You only need to read and understand it well up to, and including, section 4, (Proposed Visualization: Filter-Wise Normalisation) to answer the questions below. In section 4 you don't need to fully understand the rationale for doing Filter-Wise Normalisation, but you do need to understand what Filter-Wise Normalisation is.

Answer the following questions (in a markdown cell):

1. What are $\delta$, $\eta$, $\alpha$ and $\beta$ in equation (1)? [5 points]
2. What does Filter-Wise Normalisation do? [5 points] Don't need to explain the reasons for doing it, just how it modifies the random directions $\delta$ and $\eta$. 

Explain well and justify your answers. **(Don't answer in 1. that $\delta$ is a random direction just because it says that in 2. Explain what is meant by a random direction well.)**

---

Write you answers here:

1. 


2. 



## 10-Visualise loss landscape [15 points]

Use the formula described in equation (1) in the paper in combination with the Filter-Wise Normalisation to generate landscape plots. For that use your final trained model (output of question **7**) and 25 values for $\alpha$ and 25 values for $\beta$ to generate a 2D plot with a 100 points. Use the provided snippets of code in the Jupyter Notebook template to assist you in generating the plots and to guide you in the functions you will need to implement (not mandatory, you can implement everything from scratch if you prefer).

Note that in this question you will not be comparing the smoothness of different loss landscapes (as they do in the paper), you will only be plotting the loss function landscape around the loss value corresponding to your trained network.



In [None]:
# The following snippets of code are only to assist you. You can decide to use 
# them or not. They are only intended to provide you with some functionality
#  you may find useful when trying to generate the loss landscape plots.


# function to create random directions:
def create_random_directions(weights, ignore1D=False, seed=42):
    torch.manual_seed(seed)
    direction = [torch.randn(w.size()).to(device) for w in weights]
    
    # apply filter normalisation, where every perturbation d in direction has the same norm as its corresponding w in weights
    for d, w in zip(direction, weights):
        if ignore1D and d.dim() <= 1:
            d.fill_(0)
        d.mul_(w.norm()/(d.norm() + 1e-10)) # add small perturbation to avoid division by zero

    return direction


# function to update weigths
def update_weights(model, origin_weights, x_dir, y_dir, dx=0.1, dy=0.1):
    updates = [x.to(device)*dx + y.to(device)*dy for (x, y) in zip(x_dir, y_dir)]
    for (p, w, u) in zip(model.parameters(), origin_weights, updates):
        p.data = w + u
    return None


# function to plot loss landscape as a surface
def plot_loss_landscape(xx, yy, loss_landscape):
    fig, ax = plt.subplots(figsize=(8, 8),subplot_kw={"projection": "3d"})
    surf = ax.plot_surface(xx, yy, loss_landscape, cmap='viridis', edgecolor='none',
                       linewidth=0, antialiased=True,  rstride=1, cstride=1,)
    ax.set_xlabel(r'X')
    ax.set_ylabel(r'Y')
    ax.set_zlabel(r'Loss')
    fig.colorbar(surf, shrink=0.5, aspect=5)
    plt.show()


# function to plot loss landscape as a contour
def contour_loss_landscape(xx, yy, loss_landscape):
    fig, ax = plt.subplots(figsize=(7, 7))
    surf = ax.contourf(xx, yy, loss_landscape, cmap='viridis', levels=100)
    ax.set_xlabel(r'X')
    ax.set_ylabel(r'Y')
    fig.colorbar(surf, shrink=0.5, aspect=5)
    plt.show()


# BONUS: functions to compute the angle between 2 random vectors
# in high-dimensional spaces two random vectors are quite likely to be
# orthogonal (or almost). No points involved here, this is just for fun!
#
def angle(vec1, vec2):
    return torch.acos(torch.dot(vec1, vec2)/(vec1.norm()*vec2.norm())).item()

def rad2deg(angle):
    return angle*180/np.pi

def concat_torch_list(torch_list):
    for i, t in enumerate(torch_list):
        torch_list[i] = t.flatten()
    return torch.cat(torch_list)

In [None]:
# your code here

### 10-(continued)

Discuss your results, and justify the choices you make along the generation process of plotting the loss landscapes (for example the range your choose for your $\alpha$ and $\beta$ values).

---

Write your answers here: