#### Notebook modified from the original version by `Lukas Mosser` and `Navjot Kukreja`

In [None]:
!pip install pycm livelossplot
%pylab inline

## ACSE 8 Session 4:
# From Convolutions to ConvNets

###Objectives of this session:
Thursday 6 May 13:00-16:00h:
- Very quick overview of convolutions in traditional Computer Vision with examples of... cats.
- Torch layer operations simple examples with... cats.
- Torch Convolutional layers.
- Implementation of a network similar to LeNet5.
- Train our LeNet5-like network on MNIST
Friday 7 May 13:00h-16:00h
- Train MNIST again using data augmentation
- Transfer Learning: bees & ants (***Debbie***)


On practical 2, we learned how to **train a feed-forward network**.
On practical 3, we learned how to **optimise for hyperparameters** with cross-validation.

Today we will use these two techniques on **CNNs**.

<img src="https://miro.medium.com/max/2340/1*Fw-ehcNBR9byHtho-Rxbtw.gif" alt="network" width="600"/>


#### A few imports before we get started

In [None]:
from sklearn.metrics import accuracy_score
from sklearn.model_selection import StratifiedShuffleSplit

from livelossplot import PlotLosses
from pycm import *

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import TensorDataset, DataLoader
import torchvision.transforms as transforms
from torchvision.datasets import MNIST


def set_seed(seed):
    """
    Use this to set ALL the random seeds to a fixed value and take out any randomness from cuda kernels
    """
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)

    torch.backends.cudnn.benchmark = False  ##uses the inbuilt cudnn auto-tuner to find the fastest convolution algorithms. -
    torch.backends.cudnn.enabled   = False

    return True

device = 'cpu'
if torch.cuda.device_count() > 0 and torch.cuda.is_available():
    print("Cuda installed! Running on GPU!")
    device = 'cuda'
else:
    print("No GPU available!")

### Mounting the google drive for later storage

In [None]:
from google.colab import drive
drive.mount('/content/gdrive/')

# Important concepts from previous sessions revisited:

- **Recap 1** `StratifiedShuffleSplit` to split our training dataset into training and validation for k-fold validation:

  - compute indices using `StratifiedShuffleSplit`
  - standardise data
  - create normalised training, validation, and test datasets as TensorDatasets

In [None]:
shuffler = StratifiedShuffleSplit(n_splits=1, test_size=0.1, random_state=42).split(mnist_train.train_data, mnist_train.train_labels)
indices = [(train_idx, validation_idx) for train_idx, validation_idx in shuffler][0]

def apply_standardization(X):
  X /= 255.
  X -= 0.1307
  X /= 0.3081
  return X

X_train, y_train = apply_standardization(mnist_train.train_data[indices[0]].float()), mnist_train.train_labels[indices[0]]
X_val, y_val = apply_standardization(mnist_train.train_data[indices[1]].float()), mnist_train.train_labels[indices[1]]
X_test, y_test =  apply_standardization(mnist_test.test_data.float()), mnist_test.test_labels

mnist_train = TensorDataset(X_train, y_train.long())
mnist_validate = TensorDataset(X_val, y_val.long())
mnist_test = TensorDataset(X_test, y_test.long())

<img src="https://scikit-learn.org/stable/_images/grid_search_cross_validation.png" alt="network" width="500"/>

<br>

- **Recap 2** livelossplot to visualise training evolution

<img src="https://raw.githubusercontent.com/stared/livelossplot/master/livelossplot.gif" alt="network" width="800"/>

## Computer Vision - Convolutions as Feature Detectors

In the following exercise we'll do some classical computer vision before moving to convolutional networks.
We will use the [Sobel-filter](https://en.wikipedia.org/wiki/Sobel_operator), a classical convolution operator.

### Task 1:
Implement the Sobel Filter $G_x$ (according to its wiki definition, see link above) as a simple 2D convolution operation.

- First instantiate a [`nn.Conv2d`](https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html) object with a single 3x3 kernel, padding=1, taking in a single image channel and outputting one channel.  
- Then modify the weight matrix to reflect the sobel filter.

In [None]:
from PIL import Image  # PIL is hte Python Imaging Library
import requests        # library that provides an easy way to make http requests
from io import BytesIO # let's us read raw bites as a file

url = "https://cataas.com/cat" # cat as a service!
response = requests.get(url)   # requests a cat
img = np.array(Image.open(BytesIO(response.content)).convert('L')).astype(float) # BytesIO tells python to read it as a file (and .content extracts only the image bytes)
plt.imshow(img, cmap="gray")   # matplotlib likes numpy arrays

### `code along` Implement the Sobel Filter $G_x$

In [None]:
# define sobel as an instance of an nn.Conv2d class.
# print the size of the filter
# define a filter as a torch.Tensor with the right coefficients
# assign filter values to the sobel object
# load the cat image as a tensor
# filter the cat image
# plot the Sobel-filtered version of the cat
## detach() is necessary to detach filtered_cat from the computational graph before it can be converted to a numpy array for plotting

## Some useful Pytorch Layers

- [`nn.Conv2d`](https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html): Convolutional layers are parameterized by their kernel-weights and biases and are often used to reduce the spatial dimensionality.
- [`nn.ConvTranspose2d`](https://pytorch.org/docs/stable/generated/torch.nn.ConvTranspose2d.html): Transposed convolutions (not deconvolutions!); similar to convolutions, but normally used to upsample (increase the spatial dimensionality). [Interesting blog discussing problems with tranposed convolutions](https://distill.pub/2016/deconv-checkerboard/)
- [`nn.UpsamplingBilinear2d`](https://pytorch.org/docs/stable/generated/torch.nn.UpsamplingBilinear2d.html) for upsampling (also check nearest neighbor upsampling, `nn.Upsample`)
- [`nn.MaxPool2d`](https://pytorch.org/docs/stable/generated/torch.nn.MaxPool2d.html): Pooling layers summarize spatial information (also check  `nn.AvgPool2d`)
- [`nn.Dropout2d`](https://pytorch.org/docs/stable/generated/torch.nn.Dropout2d.html): Also exists in two (and more) dimensions: Can be use to regularise training of deep networks
- Batch normalisation: Shift and center the distribution of the weights to a centered Gaussian distribution by keeping a running average of mini-batch properties. Introduced in [this paper](https://arxiv.org/abs/1502.03167). Originally, it was thought that doing batch normalisation would reduce the internal covariate shift and accelerate training, but a [later paper](https://arxiv.org/abs/1805.11604) questioned if that was the real reason why it was working so well. It seems to help learning in very deep convolutional neural networks, but it is not really well understood why this is the case.

The pytorch documentation is extremely well organised and I highly recommend you use it to your own advantage.

In [None]:
convolution = nn.Conv2d(1, 1, kernel_size=5, padding=2, stride=1)
transposed_convolution = nn.ConvTranspose2d(1, 1, kernel_size=4, stride=2)
upsampling = nn.UpsamplingBilinear2d(scale_factor=2)
pool = nn.MaxPool2d(kernel_size=2, stride=2)
dropout = nn.Dropout(0.5)
#dropout = nn.Dropout2d(0.5)
batchnorm = nn.BatchNorm2d(1) ##1 corresponds to the number of output channels in the convolutional layer

plt.imshow(img)

fig, axarr = plt.subplots(2, 3, figsize=(24, 12))
for ax, op, name in zip(axarr.flatten(), [convolution, transposed_convolution, upsampling, pool, dropout, batchnorm], ["conv", "conv_transposed", "upsample", "pool", "dropout", "batchnorm"]):
  filtered = op(x)
  im = ax.imshow(filtered[0, 0].detach().numpy())
  ax.set_title(name, fontsize=18)
  #fig.colorbar(im, ax=ax, fraction=0.03)
plt.show()


### A clearer/simpler toy-example to understand [`nn.Dropout`](https://pytorch.org/docs/stable/generated/torch.nn.Dropout.html) layers:

In [None]:
p = 0.1
m = nn.Dropout(p=p)
input = torch.randn(25, 25)*100000
output = m(input)

plt.figure(figsize = (8,8))
im = plt.imshow(output.detach().numpy(), cmap='seismic')#, vmin=-0.1, vmax=0.1)
plt.colorbar(im,fraction=0.044, pad=0.1)

what does the value p do?

### A clearer/simpler toy-example to understand [`nn.BatchNorm2d`](https://pytorch.org/docs/stable/generated/torch.nn.BatchNorm2d.html) layers:

In [None]:
m = nn.BatchNorm2d(1)
input = torch.randn(1,1,25, 25)*100000
output = m(input)

plt.figure(figsize = (8,8))
im = plt.imshow(output[0,0].detach().numpy(), cmap='seismic')
plt.colorbar(im,fraction=0.044, pad=0.1)

<br>

### Before we start coding CNNs, let's make sure we know what we are doing:

[CNN Explainer](https://poloclub.github.io/cnn-explainer/)

<br>

[interactive cool MNIST classifeir](https://www.cs.ryerson.ca/~aharley/vis/conv/flat.html)

<br>

### Task 2: A simple Convolutional Network - LeNet-5 (almost)
![](https://www.researchgate.net/profile/Vladimir_Golovko3/publication/313808170/figure/fig3/AS:552880910618630@1508828489678/Architecture-of-LeNet-5.png)

We will now use the layer classes we just saw to implement a version of Yann LeCun's LeNet-5 (see figure above).


- Here the network is shown to have input's of size 32x32, so we will tell our first convolutional layer to add some padding to our 28x28 MNIST images.  
- All convolutional layers with trainable parameters should have:
  - kernel-size=5
  - stride 1
  - padding 2.  
- All MaxPool layers use a kernel size 2 and a stride value of 2.
- Use ReLUs for all activations.


In [None]:
class LeNet5(nn.Module):
  def __init__(self):
    super(LeNet5, self).__init__()
    # define a 2D convolutional layer
    # define a maxpool layer
    # new 2D convolutional layer
    # another maxpool layer
    # first linear layer
    # second linear layer
    # final output layer
    # activation function
    
  def forward(self, x):
    # activate pass through the first layer
    # activate pass through the second layer
    # activate pass through the third layer
    # activate pass through the fourth layer
    # flatten (return a "flattened" view of the 2d tensor as inputs for the fully connected layer)
    # activate pass through fifth layer
    # activate pass through last layer
    # return output
  


**Bonus**: On the original [paper](http://vision.stanford.edu/cs598_spring07/papers/Lecun98.pdf) there were a few important differences from what we just implemented. Can you name any?

- No ReLUs but sigmoids and one weight and bias per pooling layer.
- Not all filters in S2-C3 act on all layers on S2

That's why we have an almost LeNet-5

### The MNIST Dataset - Hello World of Deep-Learning - Now with ConvNets!

In [None]:
mnist_train = MNIST("./", download=True, train=True)
mnist_test = MNIST("./", download=True, train=False)

### `code along` Instantiate and create a ```StratifiedShuffleSplit``` using sklearn.
1. Create a ```sklearn.model_selection.StratifiedShuffleSplit``` object with 1-split and a test-size of 10%.
2. Get the training and validation indices from the shuffel-split 

In [None]:
# find indices to split the data


### `code along` Standardise and split the MNIST dataset
The original mnist data is given in gray-scale values between 0 and 255.
You will need to write a normalisation method that takes in a ```torch.Tensor``` and performs normalisation.
The mean of MNIST is 0.1307 and it's standard deviation is 0.3081 (after division by 255).

In [None]:
# define an standardisation function


In [None]:
# standardise the data


### `code along` Instantiate a ```torch.utils.data.TensorDataset``` for training, validation and test data

Remember that we use TensorDataset to be able to operate on the dataset without having to load it all in memory.

And remember that torch likes all categorical data to be in a ```.long()``` format.


In [None]:
# create the TensorDatasets containing mnist_train, mnist_validate, and mnist_test


Let's visualise an example of the images and check whether the data is normalised properly (compute .mean() and .std() on the training set.)

In [None]:
plt.imshow(X_train[0], cmap = 'gray')
print(X_train.mean(), X_train.std())

### Provided Train, Validation and Evaluate Functions

There is an error in these functions. Can you spot it?

In [None]:
def train(model, optimizer, criterion, data_loader):
    model.train()
    train_loss, train_accuracy = 0, 0
    for X, y in data_loader:
        X, y = X.to(device), y.to(device)
        optimizer.zero_grad()
        output = model(X.view(-1, 28*28))
        loss = criterion(output, y)
        loss.backward()
        train_loss += loss*X.size(0)
        y_pred = F.log_softmax(output, dim=1).max(1)[1]
        train_accuracy += accuracy_score(y.cpu().numpy(), y_pred.detach().cpu().numpy())*X.size(0)
        optimizer.step()  
        
    return train_loss/len(data_loader.dataset), train_accuracy/len(data_loader.dataset)
  
def validate(model, criterion, data_loader):
    model.eval()
    validation_loss, validation_accuracy = 0., 0.
    for X, y in data_loader:
        with torch.no_grad():
            X, y = X.to(device), y.to(device)
            output = model(X.view(-1, 28*28)) 
            loss = criterion(output, y)
            validation_loss += loss*X.size(0)
            y_pred = F.log_softmax(output, dim=1).max(1)[1]
            validation_accuracy += accuracy_score(y.cpu().numpy(), y_pred.cpu().numpy())*X.size(0)
            
    return validation_loss/len(data_loader.dataset), validation_accuracy/len(data_loader.dataset)
  
def evaluate(model, data_loader):
    model.eval()
    ys, y_preds = [], []
    for X, y in data_loader:
        with torch.no_grad():
            X, y = X.to(device), y.to(device)
            output = model(X.view(-1, 28*28)) 
            y_pred = F.log_softmax(output, dim=1).max(1)[1]
            ys.append(y.cpu().numpy())
            y_preds.append(y_pred.cpu().numpy())
            
    return np.concatenate(y_preds, 0),  np.concatenate(ys, 0)

 ### Set the hyperparameters of your model
- Seed: 42
- learning rate: 1e-2
- Optimizer: SGD
- momentum: 0.9
- Number of Epochs: 30
- Batchsize: 64
- Test Batch Size (no effect on training apart from time): 1000
- Shuffle the training set every epoch: Yes

In [None]:
seed = 42
lr = 1e-2
momentum = 0.5
batch_size = 64
test_batch_size = 1000
n_epochs = 30

### Instantiate our model, optimizer and loss function
Set the random number generator seed using ```set_seed``` to make everything reproducible.
As a criterion use a sensible loss for the multi-class classification problem.

### Perform the training of the network and validation
Here we provide you with a method to visualize both training and validation loss while training your networks.

In [None]:
def train_model(momentum):
  set_seed(seed)
  model = LeNet5().to(device)
  optimizer = torch.optim.SGD(model.parameters(), lr=lr, momentum=momentum)
  criterion = nn.CrossEntropyLoss()
  
  train_loader = DataLoader(mnist_train, batch_size=batch_size, shuffle=True, num_workers=0)
  validation_loader = DataLoader(mnist_validate, batch_size=test_batch_size, shuffle=False, num_workers=0)
  test_loader = DataLoader(mnist_test, batch_size=test_batch_size, shuffle=False, num_workers=0)
  
  liveloss = PlotLosses()
  for epoch in range(30):
      logs = {}
      train_loss, train_accuracy = train(model, optimizer, criterion, train_loader)

      logs['' + 'log loss'] = train_loss.item()
      logs['' + 'accuracy'] = train_accuracy.item()

      validation_loss, validation_accuracy = validate(model, criterion, validation_loader)
      logs['val_' + 'log loss'] = validation_loss.item()
      logs['val_' + 'accuracy'] = validation_accuracy.item()

      liveloss.update(logs)
      liveloss.draw()
      
  return model

model = train_model(0.5)

<br>

Results obtained with the feed-forward network from previous session:

<img src="https://raw.githubusercontent.com/acse-2020/ACSE-8/main/implementation/practical_4/Figs/MNIST_feedforward_cross-validation.png?token=ABNZJP4TNJA6XMPP2CGJMVDATTTU2" alt="results" width="600"/>


### `code along` Implement an evaluate method
This method performs the same as validate but doesn't report losses, but simply returns all predictions on a given dataset (training, validation, test-set)

In [None]:
# create a validation_loader and generate predictions


### `code along` Plotting a confusion matrix

We can use a confusion matrix to diagnose problems in our models.
We may see for example that our model confuses 9's for 4's quite often.

In [None]:
# Create a confusion matrix from
# print the confusion matrix

And plot it to easily visualise where the classifier 'struggles'

In [None]:
import seaborn as sns

def to_raw_matrix(cm):
    plt_cm = []
    for i in cm.classes :
        row=[]
        for j in cm.classes:
            row.append(cm.table[i][j])
        plt_cm.append(row)
    plt_cm = np.array(plt_cm)
    return plt_cm

rcm = to_raw_matrix(cm) #store the confusion matrix values

sns.heatmap(rcm, cmap="Blues") # use sensible limits to be able to see where the network struggles to identify digits

## `code along` Assume that you have estimated your hyperparameters.

Now train your model on the full dataset and evaluate on the test set. How good is the accuracy?

In [None]:
mnist_train = MNIST("./", download=True, train=True) # reload MNIST

# check the code from practical_3 if you get stuck.


model_save_name = 'LeNet5_mnist_classifier.pt'
path = F"/content/gdrive/My Drive/models/{model_save_name}" 
torch.save(model.state_dict(), path)

<br>

Results obtained with the feed-forward network from previous session:

<img src="https://raw.githubusercontent.com/acse-2020/ACSE-8/main/implementation/practical_4/Figs/MNIST_feedforward_final_training.png?token=ABNZJPY2YVTCBMJ7LSUOVJTATTUOG" alt="results" width="600"/>


# Summary of today's session:

- Convolutions as building blocks of CNNs.
- Overview of a few PyTorch layers used in CNNs.
- LeNet-5 architecture.
- Training a network similar to LeNet-5 of MNIST.
- Demonstration that CNNs are superior to feed-forward networks because they are aware of the spatial context of the input images.

## Custom Datasets and Transforms

Pytorch allows us to simply extend the available Datasets to more custom functionality.
Here we provide an example of such a custom dataset class.
You can see that there are 3 functions we need to implement:
- __init__(*args, **kwargs): this will handle everything prior to actually using the dataset
- __len__(self): returns the length of the dataset i.e. the number of data items
- __getitem__(self, idx): this method takes an index of a specific data item and returns that item.
  - You can do whatever you want in these functions: apply transforms, normalize data, perform another computation etc.
  - Here we also have the functionality to apply a set of [```torchvision.transforms```](https://pytorch.org/tutorials/beginner/data_loading_tutorial.

In [None]:
from torch.utils.data import Dataset 

class CustomImageTensorDataset(Dataset):
    def __init__(self, data, targets, transform=None):
        """
        Args:
            data (Tensor): A tensor containing the data e.g. images
            targets (Tensor): A tensor containing all the labels
            transform (callable, optional): Optional transform to be applied
                on a sample.
        """
        self.data = data
        self.targets = targets
        self.transform = transform

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        sample, label = self.data[idx], self.targets[idx]
        sample = sample.view(1, 28, 28).float()/255.
        if self.transform:
            sample = self.transform(sample)

        return sample, label

### Transforms

Transforms can be used to perform manipulation of individual data prior to passing the data to our models.
This is useful for:
 - Data-augmentation i.e. creating slightly modified instance of the data we have while preserving their labels.
 - Data Preprocessing: Such as Normalization, Histogram Equalization 
 - Transforming Targets: You may have complex labels that should change together with changes in the preprocessing of the images
 
 Pytorch and especially torchvision provides a [number of transforms](https://pytorch.org/docs/stable/torchvision/index.html) for you to use!
 A nice tutorial on custom dataloaders and transforms can be found [here](https://github.com/utkuozbulak/pytorch-custom-dataset-examples).
 
 The (probably) most state-of-the-art library for image augmentation is [albumentations](https://github.com/albu/albumentations) which has been successfully applied in winning kaggle competitions.
 

In [None]:
from torchvision.transforms import Compose, ToTensor, Normalize, RandomRotation, ToPILImage


#Often we will want to apply more transformations at training time than test time, therefore here we have two different ones
train_transform = Compose([
    ToPILImage(),
    RandomRotation(10),
    ToTensor(),
    Normalize(mean=[0.1307], std=[0.3081]), 

]) ##Compose different transforms together. PIL is Python Imaging Library useful for opening, manipulating, and saving many different image file formats.

#In Validation and Test Mode we only want to normalize our images, because they are already tensors
validation_test_transform = Compose([
    Normalize(mean=[0.1307], std=[0.3081])
])


<img src="https://raw.githubusercontent.com/acse-2020/ACSE-8/main/implementation/practical_4/Figs/data_augmentation.png?token=ABNZJP4FGWYIGD6DLD3KGVDATZENK" alt="network" width="600"/>


### `code along` Training with data augmentation

- Instantiate a ```CustomImageTensorDataset``` with data from the MNIST dataset
- Provide the training and validation and testing datasets with the right transforms
- Train LeNet-5 with data-augmentation on a validation set, then train on the full training set and report accuracies. Did you improve the model?






#### Create the ```CustomImageTensorDataset```:

In [None]:
# download mnist
# split in train and validation

# create train custom dataset
# create validation custom dataset
# create test custom dataset


### Training LeNet5 with data augmentation

In [None]:
def train_model_augmented(train_dataset, validation_dataset, momentum=0.5):
  set_seed(seed)
  model = LeNet5().to(device)
  optimizer = torch.optim.SGD(model.parameters(), lr=lr, momentum=momentum)
  criterion = nn.CrossEntropyLoss()
  
  train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True, num_workers=0)
  validation_loader = DataLoader(validation_dataset, batch_size=test_batch_size, shuffle=False, num_workers=0)

  liveloss = PlotLosses()
  for epoch in range(30):
      logs = {}
      train_loss, train_accuracy = train(model, optimizer, criterion, train_loader)

      logs['' + 'log loss'] = train_loss.item()
      logs['' + 'accuracy'] = train_accuracy.item()

      validation_loss, validation_accuracy = validate(model, criterion, validation_loader)
      logs['val_' + 'log loss'] = validation_loss.item()
      logs['val_' + 'accuracy'] = validation_accuracy.item()

      liveloss.update(logs)
      liveloss.draw()
      
  return model

model = train_model_augmented(custom_mnist_train, mnist_validation)

### Training on the full dataset

We can apply transforms directly when we get MNIST from [`torchvision.datasets.MNIST`](https://pytorch.org/vision/stable/datasets.html#mnist)

In [None]:
mnist_train = MNIST("./", download=True, train=True, transform=Compose([
    RandomRotation(10),
    ToTensor(),
    Normalize(mean=[0.1307], std=[0.3081]), 

]))
train_loader = DataLoader(mnist_train, batch_size=batch_size, shuffle=True, num_workers=4)
test_loader = DataLoader(mnist_test, batch_size=test_batch_size, shuffle=False, num_workers=0)    

set_seed(seed)
model = LeNet5().to(device)
optimizer = torch.optim.SGD(model.parameters(), lr=lr, momentum=momentum)
criterion = nn.CrossEntropyLoss()

liveloss = PlotLosses()
for epoch in range(n_epochs):
    logs = {}
    train_loss, train_accuracy = train(model, optimizer, criterion, train_loader)

    logs['' + 'log loss'] = train_loss.item()
    logs['' + 'accuracy'] = train_accuracy.item()
    liveloss.update(logs)
    liveloss.draw()
    logs['val_' + 'log loss'] = 0.
    logs['val_' + 'accuracy'] = 0.

test_loss, test_accuracy = validate(model, criterion, test_loader)    
print("Avg. Test Loss: %1.3f" % test_loss.item(), " Avg. Test Accuracy: %1.3f" % test_accuracy.item())
print("")

model_save_name = 'LeNet5_mnist_classifier_with_augmentation.pt'
path = F"/content/gdrive/My Drive/models/{model_save_name}" 
torch.save(model.state_dict(), path)

#### And voila! we have improved a bit more the accuracy of our test set!