In [1]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
import torch
from torch import nn
from torch import optim
#keep

# Basic image classification

In this notebook, we are going to train a basic CNN on the MNIST dataset, a small dataset that contains handwritten digits.

## Implementing a Simple CNN

You can use [`nn.Conv2d()`](https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html) to create a single convolutional layer in PyTorch. Some of its interesting arguments are:

- `in_channels`: the number of channels in the input tensor. When the input is an RGB image, this is `3`. For a single-channel gray scale image, this is `1`. Note that each convolution filter in the layer will have this amount of channels as the filter needs to cover the *entire depth* of the input.
- `out_channels`: the number of convolution filters that this layer has. Each filter will produce a new channel in the output. As such, if another `nn.Conv2d()` takes the output of this layer as its input, that layer's `in_channels` should be set to this layer's `out_channels`.
- `kernel_size`: the height and width of the convolution filters. If this is a single integer, it will be used for both the height and the width (i.e., square filters).
- `stride`: the stride (step size) of the convolution operation (default: `1`).
- `padding`: the amount of padding to add to the input (default: `0`). *Padding* is a border of black pixels that is added around the input image. This allows more of the convolution operation to be applied to the pixels at the edge of the image and can avoid the output resolution to shrink w.r.t. the input resolution. When `padding` is set to `1`, a black border of a single pixel wide will be added at each image edge.

Apart from convolutional layers, a CNN also typically contains *pooling layers*. Similar to conv layers, a pooling layer uses a sliding window to operate on its input. Instead of computing an inner product, however, the pooling window **aggregates** the underlying values, e.g., by computing the *maximum* or *average* value. For example:

- [`nn.MaxPool2d()`](https://pytorch.org/docs/stable/generated/torch.nn.MaxPool2d.html) applies max pooling.
- [`nn.AvgPool2d()`](https://pytorch.org/docs/stable/generated/torch.nn.AvgPool2d.html) applies average pooling.

Just like `nn.Conv2d()`, these pooling layers take `kernel_size`, `stride` (default: `1`) and `padding` (default: `0`) as an argument.

The pooling layers also have *adaptive* equivalents: [`nn.AdaptiveMaxPool2d()`](https://pytorch.org/docs/stable/generated/torch.nn.AdaptiveMaxPool2d.html) and [`nn.AdaptiveAvgPool2d()`](https://pytorch.org/docs/stable/generated/torch.nn.AdaptiveAvgPool2d.html). These adaptive layers produce an **output of a fixed, predefined width and height no matter the input size** (the number of channels stays the same). This is in contrast with regular pooling layers, where the output size depends on the input size.

There are multiple ways to tie layers together into a network. One of the easiest ways is through [`nn.Sequential()`](https://pytorch.org/docs/stable/generated/torch.nn.Sequential.html). You can check out the corresponding documentation for more info and examples.

Create a neural network with the following layers:

- A `Conv2d` layer with 64 5x5 filters, stride 2 and padding 2
- An `AdaptiveAvgPool2d` layer that outputs width and height 1
- A `Flatten` layer that flattens it input to shape `(N,64)`
- A `Linear` (i.e., fully-connected) layer that transforms its input to 10 output values, corresponding to the 10 classes in the MNIST dataset.

In [3]:
nn.Conv2d(in_channels=)

TypeError: Conv2d.__init__() missing 3 required positional arguments: 'in_channels', 'out_channels', and 'kernel_size'

Let's try and train our model! But first, we'll need two more things:

1. Data to train on
2. Training logic

We'll define both in the following two sections.

## Obtaining the Data for Batch Training

In the cell below, we have defined a function that returns all images and the corresponding labels for a subset of the MNIST dataset (i.e., the training set when `train=True`, the validation set when `train=False`).

In [3]:
from torchvision.datasets import MNIST

def get_mnist(train=True):
    mnist = MNIST(root='../data', download=True, train=train)
    data = ((mnist.data.float()[:, None, :, :] / 255))
    half_len = len(data) // 2

    return data[:half_len], mnist.targets[:half_len]

#keep

Use this function to create the training and validation set. Inspect the shapes of the returned tensors.

In [None]:
# ... WRITE YOUR CODE HERE ... #

## Defining the Training Loop

Now that we have our first model and data ready, we can define our training loop! We have already implemented it below. Read through it, and **ensure that you understand what's happening**. The function **returns three lists**: the first one contains the training losses, the second one the validation losses and the third one the validation accuracies throughout training.

In [5]:
def train_classifier(model, x_train, y_train, x_val, y_val, optimizer, loss_fn, num_epochs):
    """
    Train a classifier using batch gradient descent.

    Args:
        model: The classification model.
        x_train: A Tensor containing the training images.
        y_train: A Tensor with the true label of each training image.
        x_val: A Tensor containing the validation images.
        y_val: A Tensor with the true label of each validation image.
        optimizer: The optimizer.
        loss_fn: The loss function.
        num_epochs: The number of epochs to train.
    """
    train_loss_curve = []
    val_loss_curve = []
    val_acc_curve = []

    # Iterate over the epochs
    for epoch in range(num_epochs):
        # Put model in training mode
        model.train()

        # Compute predictions
        y_pred = model(x_train)

        # Compute loss
        loss = loss_fn(y_pred, y_train)

        # Backpropagate + optimizer step
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        # Append to train loss curve
        train_loss_curve.append(loss.detach().cpu().numpy())

        # Put model in evaluation mode
        model.eval()

        # Compute predictions (without storing gradients)
        with torch.inference_mode():
            y_pred = model(x_val)

        # Compute validation loss
        loss = loss_fn(y_pred, y_val)
        val_loss_curve.append(loss.cpu().numpy())

        # Compute validation accuracy
        acc = (y_val == y_pred.argmax(dim=1)).float().mean()
        val_acc_curve.append(acc.cpu().numpy())

        if epoch % 10 == 0:
            # Log losses
            print(f"Train loss at epoch {epoch}: {train_loss_curve[-1]:.4f}")
            print(f"Validation loss after epoch {epoch}: {val_loss_curve[-1]:.4f}")
            print(f"Validation accuracy after epoch {epoch}: {val_acc_curve[-1]:.4f}")
            print()

    return np.array(train_loss_curve), np.array(val_loss_curve), np.array(val_acc_curve)
#keep

## Running the Training

Time to train our home-grown CNN! 🪴

- Define a loss function for training classification (see [here](https://pytorch.org/docs/stable/nn.html#loss-functions)). Use cross entropy loss.
- Create an optimizer (see [here](https://pytorch.org/docs/stable/optim.html)). Use the Adam algorithm.
- Call `train_classifier()`, and store the returned training and validation curves. Train for 100 epochs.

In [None]:
# ... WRITE YOUR CODE HERE ... #

You might notice that the training progresses rather slowly. Feel free to cancel it. **Move all data, as well as the model to the GPU** by calling [`.to(device)`](https://pytorch.org/docs/stable/generated/torch.Tensor.to.html), with `device` set to `'cuda'` if CUDA is available, else `'cpu'`.

In [7]:
# ... WRITE YOUR CODE HERE ... #

Redefine the optimizer (since our old optimizer still holds the cpu parameters) and call `train_classifier` again. **Do you notice a speed-up?**

In [None]:
# ... WRITE YOUR CODE HERE ... #

Plot the train loss curve, validation loss curve and the validation accuracy curve using Matplotlib.

In [None]:
# ... WRITE YOUR CODE HERE ... #

## Improve the model

Create a new model based on the previous one, but add an extra convolutional layer after the first one with the same hyper parameters. Make sure the number of input channels of the new layer matches the number of output channels of its predecessor.

In [10]:
# ... WRITE YOUR CODE HERE ... #

Move the new model to GPU, redefine the optimizer and train it. Plot the train loss curve, validation loss curve and the validation accuracy curve using Matplotlib.

In [None]:
# ... WRITE YOUR CODE HERE ... #

Compare the loss curves of the new model with the first one.
Create a new model  based on the previous one, but wow **insert a `ReLU` layer after each convolution**.

In [12]:
# ... WRITE YOUR CODE HERE ... #

Move the new model to GPU, redefine the optimizer and train it. Plot the train loss curve, validation loss curve and the validation accuracy curve using Matplotlib.

In [None]:
# ... WRITE YOUR CODE HERE ... #

Again compare the loss curves of this model with the previous two. What do you notice?

## Inspecting the results

Pass all validation samples through your best model and store the results in a variable `y_pred`. Hint: look at the validation routine inside `train_classifier`

In [14]:
# ... WRITE YOUR CODE HERE ... #

Now write code to **select a random sample from the validation set**. **Visualize the sample** using Matplotlib and put the **predicted label and true label in the title** of the figure. Hint: use the index of the neuron with the largest output value as prediction.

In [None]:
# ... WRITE YOUR CODE HERE ... #

Use [`sklearn.metrics.confusion_matrix`](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html) and [`sklearn.metrics.ConfusionMatrixDisplay`](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.ConfusionMatrixDisplay.html) to visualize the confusion matrix.

In [None]:
# ... WRITE YOUR CODE HERE ... #