# Laboratory 4: Convolutional Neural Networks

In this laboratory session we will train some CNNs to recognize color images in the [CIFAR-10 Dataset](https://www.cs.toronto.edu/~kriz/cifar.html).

## Part 1: Initial Setup and Data Exploration

We begin with some standard imports, as usual.

In [1]:
# Standard imports
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('dark_background')

# Standard Pytorch imports (note the aliases).
import torch
import torch.nn as nn
import torch.nn.functional as F

### Exercise 1.1: Dataset and Dataloader Preparation

The `torchvision` library provides a class (with the same interface as MNIST) for the CIFAR-10 dataset. As with MNIST, it will automatically download and prepare the dataset for use. Use the CIFAR10 class to load the training (use a subset, say of 5000 images), validation (use an **independent** subset of 2000 images from the training set), and test splits.

**Note**: Don't forget to *transform* the images in the datasets to convert them to tensors and standardize them!

**Hint**: Feel free to copy-and-paste liberally from the notebook I published for the capsule lecture. **BUT**, make sure you know what you are doing, and be aware that *some* of the code will have to be adapted for use with the CIFAR10 dataset.

In [2]:
from torchvision.datasets import CIFAR10
import torchvision.transforms as transforms

# Validation and train set size.
train_size = 5000
val_size = 2000

# Your code here.

### Exercise 1.2: Dataloaders
Set up dataloaders for **all** of the datasets -- even though the validation set is small! Test out the datasets defined above and the dataloaders to make sure you understand the dataset format. Visualize some of the images to get a feel for the type of images and classes in CIFAR-10.

In [None]:
# Setup dataloaders for all three datasets. Use the largest batch size possible.
batch_size = 256

# Your code here.

## Part 2: Establishing a stable baseline

In this part of the laboratory we will establish a simple baseline as a starting point.

### Exercise 2.1: An MLP Baseline

Define a simple Multilayer Perceptron to classify the CIFAR-10 images. Define it as a class inheriting from torch.nn.Module. Don't make it too complex or too deep. We're just looking for a starting point. A *baseline*.

In [None]:
# Your code here.

### Exercise 2.2: Train and Evaluate your MLP Baseline

Train the model for a few (say, 20) epochs. Again, feel free to use my training code from the Capsule Lecture (or roll your own, mine is very basic). Make sure you plot training curves for both training and validation sets, and report finalaccuracy on the test set.

In [1]:
# Your code here.

## Part 3: A CNN for CIFAR-10 Classification

OK, we have a (simple) MLP baseline for comparison. Let's implement a simple CNN to classify CIFAR-10 images and see if we can beat the MLP.

### Exercise 3.1: Defining the CNN

Define a simple CNN model with a few convolutional and maxpooling layers -- not too many, since CIFAR-10 images are only 32x32 pixels! Use two fully-connected layers after the last convolution and before the logit outputs. Test out the model by passing a *single* image through it to make sure it's working.

In [None]:
# Your code here.

### Exercise 3.2: Training and Evaluating your CNN

Train the CNN using similar hyperparameters to what you used for the MLP above (epochs, learning rate). Evaluate the model in the same way as before.

In [None]:
# Your code here.

### Exercise 3.3: Adding data augmentation

See if you can improve the results of your CNN on CIFAR-10 by adding **data augmentation** to your training pipeline. To do this you will have to rethink your `Dataset` definition in order to incorporate random geometric transformations to images requested by your *training* `DataLoader`.

**Hint**: A good starting place for this is the **Torchvision** documentation for the `transforms` package.

In [3]:
# Your code here.

## Part 4: Going Forward

In practice we usually don't train deep models from *scratch*. Especially if we don't have a lot of annotated data we almost always use a **pre-trained** model either as a **feature extractor** or to **fine-tune** on our problem. The Torchvision library supports access to a [huge variety or pre-trained models](https://pytorch.org/vision/stable/models/resnet.html) that you can use for *exactly* this purpose. Always keep this in mind if you have an image recognition problem -- you can use a pre-trained model as a **feature extractor** and then train a *simple* MLP to solve your classification problem. This works *very* well in practice.

### Exercise 4.1: Adapt a pre-trained model

Adapt a ResNet (e.g. ResNet-18) pre-trained on ImageNet to classify images from CIFAR-10. Carefully consider what changes you might have to make in your pipeline to make CIFAR-10 images compatible with a network as deep as ResNet-18. There are several strategies you could take to perform this adaptation:
+ You could use the pre-trained ResNet as a **feature extractor** by computing the final hidden layer activations (i.e. the layer just before the classifier) on the entire training set. Then, you could train an MLP to classify the ten CIFAR-10 classes.
+ As an alternative, you could **fine-tune** the ResNet on the new dataset. To do this you will need to substitute the final classification layer with a new, linear layer for the ten-class classification problem of CIFAR-10.

In [40]:
from torchvision.models import resnet18, ResNet18_Weights
from torchvision.transforms import Compose, ToTensor, Resize   # <-- There is a hint in this import.
from torchvision.datasets import CIFAR10

In [41]:
# Your code here.