<a href="https://colab.research.google.com/github/dylanwalker/BA865/blob/master/BA865_Lecture_08_Exercise_Solutions.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [0]:
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import torch
from torch.utils.data import DataLoader
import torchvision
from torchvision import transforms


if not os.path.exists('./models'):
  os.mkdir('./models')


# Exercise: Train a CNN for labeling CIFAR-10 images

While simple neural networks perform well for the MNIST data, they work badly for the CIFAR-10 images due to the complexity of these images that have three color channels. By using advanced architecture such as convolution layers, we will build a better model than the current one. 

Before writing a code block for an advanced model, let's think about its structure. First, we need to capture edges or other features in images. Convolution layers are what we need for this purpose. The number of output channels would be larger than the number of input channels to let the model learn many features as the following line.

```python
torch.nn.Conv2d(in_channels=3, out_channels=sizeOutChannels, kernel_size=3, padding=1)
```

Next, by adding a batch normalization layer, we can increase speed and performance of training. Note that `num_features` in the `BatchNorm2d` should match the `out_channels` in the previous convolution layer.

```python
torch.nn.BatchNorm2d(num_features = sizeOutChannels)
```

Any activation layer can be added after the batch normalization layer. In this lecture, we will use the ReLU function. 

```python
torch.nn.ReLU()
```

Lastly, add a pooling layer to aggregate values. 

```python
torch.nn.MaxPool2d(kernel_size=2, stride=2)
```

Our model is built on the combinations of convolution layers, batch normalization layers, activation layers, and pooling layers. 

Below, create a new model class called CnnCIFAR (it needs to inherit from `torch.nn.Module`).

You can model it after the NN that we built above. However, to make your `forward` method simpler and to better organize your layers, you should  combine layers into logical blocks using `torch.nn.Sequential()`.

In your constructor:
- Don't forget to call the super's constructor first
- add arguments `sizeOutChannels`(the out_channels of the Conv2D layer), `sizeHiddenLayer` (the out features of the fully connected linear layer) to your constructor.
- define a convolution layer block that consists of sequential layers of:
 - a Conv2d
 - a BatchNorm2d
 - a ReLU
 - a MaxPool2d (`kernel_size=2`, `stride=2`)
- define a fully connected layer block that consists of sequential layers of:
 - a linear layer with the appropriate input feature size to match the output of the convolution layer (it will be some number *`sizeOutChannels` -- you'll have to figure out what that number is based on the `kernel_size` and `stride` of the MaxPool2d layer) and the output size given by our argument `sizeHiddenLayer`
 - a ReLU
 - a Dropout (`p=0.2`)
 - another linear layer with output size of 10 (for each of the 10 classes a CIFAR image can belong to).

Define your `forward` method to pass the input x through the logical layer blocks that are defined in your constructor.
- IMPORTANT: you'll want to flatten the output of the convolution layer block before passing it into the fully connected layer block. You can do this with `x.view(x.size(0),-1)`


In [0]:
# Write you CnnCifar class here
class CnnCIFAR(torch.nn.Module):
  def __init__(self, sizeOutChannels, sizeHiddenLayer):
    super(CnnCIFAR, self).__init__()
    self.conv_layer = torch.nn.Sequential(
        torch.nn.Conv2d(in_channels=3, out_channels=sizeOutChannels, kernel_size=3, padding=1),
        torch.nn.BatchNorm2d(num_features = sizeOutChannels),
        torch.nn.ReLU(),
        torch.nn.MaxPool2d(kernel_size=2, stride=2)
    )
    self.fc_layer = torch.nn.Sequential(
        torch.nn.Linear(sizeOutChannels*16*16, sizeHiddenLayer),
        torch.nn.ReLU(),
        torch.nn.Dropout(p=0.2),
        torch.nn.Linear(sizeHiddenLayer, 10) # return values to predict a class among 10 labels.
    )  

  def forward(self, x):
    # conv_layer
    x = self.conv_layer(x)
    # flatten
    x = x.view(x.size(0), -1)
    # fc_layer
    x = self.fc_layer(x)
    return x 

In `self.fc_layer`, why does `torch.nn.Linear` accept an input of length sizeOutChannels $\times$ 16 $\times$ 16? It depends on the `kernel_size` and `stride` that determine the size of returns of the convolution layer. Imagine the structure of the convolution neural network that processes a 32 $\times$ 32 matrix and check whether the size of the first input of `fc_layer` is correct.

Run the below code (to ensure that CIFAR10 is loaded and transformed properly, in case a session disconnect happened earlier)

In [0]:
# Load the CIFAR10 data
transform_cifar = transforms.Compose( [ transforms.ToTensor(),transforms.Normalize(mean=(0.5, 0.5, 0.5), std=(0.5, 0.5, 0.5)) ] )
trainset_cifar = torchvision.datasets.CIFAR10(root='./cifar10', train=True, download=True, transform=transform_cifar)
testset_cifar = torchvision.datasets.CIFAR10(root='./cifar10', train=False, download=True, transform=transform_cifar)

batch_size = 64
train_dl_cifar = DataLoader(trainset_cifar, batch_size=batch_size, shuffle=True)
test_dl_cifar = DataLoader(testset_cifar, batch_size=batch_size, shuffle=True)

We can see how tensors change shape as they move through the different layers of our NN:

In [0]:
xb1,yb1 = next(iter(train_dl_cifar))
print(xb1.shape)
foo = torch.nn.Conv2d(in_channels=3,out_channels=16,kernel_size=3,padding=1)
xb2 = foo(xb1)
print(xb2.shape)
bar = torch.nn.MaxPool2d(kernel_size=2,stride=2)
xb3=bar(xb2)
print(xb3.shape)
xb4 = xb3.view(xb3.size(0),-1)
print(xb4.shape)

Run the below code to define the model with the given arguments for `sizeOutChannels`, `sizeHiddenLayer` and to define a loss function (here we'll use the cross entropy loss) and optimizer (here we'll use Stochastic Gradient Descent)

In [0]:
cnnCIFAR = CnnCIFAR(sizeOutChannels = 16, sizeHiddenLayer = 50)
cnnCIFAR = cnnCIFAR.cuda() # define the model for cuda

cnn_CIFAR_loss_fn = torch.nn.CrossEntropyLoss() # use cross entropy loss
cnn_CIFAR_opt = torch.optim.SGD(cnnCIFAR.parameters(), lr=0.003, momentum=0.9) # where did I get these "magic numbers?"  Trial and error and voodoo.

In [0]:
cnnCIFAR.train()
# We will train the model for 15 epochs as same as the previous fully connected network.
for epoch in range(15):
    running_loss = 0.0
    for inputs, labels in train_dl_cifar:
        # data to train
        inputs = inputs.cuda()
        labels = labels.cuda()

        # intitiate gradients
        cnn_CIFAR_opt.zero_grad()

        # calculate loss and update parameters
        outputs = cnnCIFAR(inputs)
        loss = cnn_CIFAR_loss_fn(outputs, labels)
        loss.backward()
        cnn_CIFAR_opt.step()

        # Sum losses
        running_loss += loss.item()

    print(f"Epoch {epoch+1} loss = {running_loss/len(train_dl_cifar)}") # print out the loss (averaged over all the predictions in the batch)

Let's evaluate the trained model.


In [0]:
from sklearn.metrics import confusion_matrix

cnnCIFAR.eval() # put the model into evaluation mode -- may affect some types of layers (e.g., dropout)
with torch.no_grad():
  running_loss = 0
  total = 0
  correct = 0
  numClasses = len(test_dl_cifar.dataset.classes)
  cm = np.zeros((numClasses,numClasses),dtype=np.int32) # an empty matrix to hold the confusion matrix, we'll sum the confusion matrices for each batch
  for xb, yb in test_dl_cifar:
    xb = xb.cuda()
    yb = yb.cuda()
    pred = cnnCIFAR(xb)
    predLabels = torch.argmax(pred,dim=1)
    cm += confusion_matrix(yb.cpu().numpy(),predLabels.cpu().numpy(),range(0,10)) # add this batch's confusion matrix to the total matrix -- we have to specify the list of class indexes, or sklearn will shorten our cm to only the classes seen

In [0]:
acc = np.diag(cm)/cm.sum(axis=1)
print(cm, '\n', acc)

In [0]:
print("Accuracy of the convolution neural network: ", np.mean(acc))

The convolution neural network performs better than the previous fully connected network. Could we still do better? We only added one convolutional layer block. But these images belong to many different classes.  You should now go back and try to experiment with the model. What if you run it for more epochs? What if you try changing the arguments for our model (e.g., try adjusting the parameters `sizeOutChannels`, `sizeHiddenLayer`). You could also try adding another convolutional layer blocks.  Remember the final output needs to match the number of classes we're trying to predict.  See if you can do better. Don't be afraid to google around for examples of CNN's applied to images. What do they do? What kind of performance can they achive on CIFAR (this is a well known and standard dataset).  