# Exercices Part 2: Convolutional Neural Networks

Convolutional neural network examples for computer vision. Written for the 2024 DeepLabCut AI Residency.

While these CNNs are written to process 2-dimensional images, they can of course be adapted to deal with 1d or 3d inputs!

This notebook uses PyTorch to define and train neural networks.

## Setup

### Imports

In [None]:
from pathlib import Path

import matplotlib.pyplot as plt
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms

## Image Classification with LeNet5

Some computer vision tasks are very difficult and require large, slow models. Other tasks, like image classification on the [MNIST](https://en.wikipedia.org/wiki/MNIST_database) dataset are much easier and smaller models can be used to achieve good results.

The LeNet model family (developed by Yann LeCun in 1998) are really the first . They're well described on [wikipedia](https://en.wikipedia.org/wiki/LeNet), but you can also go read [Gradient-based learning applied to document recognition](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=726791), which has been cited more than 67'000 times.

In this notebook, you'll implement LeNet5 and train it on the MNIST dataset.

### Loading the Dataset

PyTorch has an [api method](https://pytorch.org/vision/stable/generated/torchvision.datasets.MNIST.html#torchvision.datasets.MNIST) to download MNIST dataset, so we can just use that to download the dataset.

In this example, I download the dataset to the folder this notebook is in, but you can change the path where the dataset is downloaded if you want to.

In [None]:
train_dataset_base = torchvision.datasets.MNIST(
    root=Path("."),
    train=True,
    download=True,
)

test_dataset_base = torchvision.datasets.MNIST(
    root=Path("."),
    train=False,
    download=True,
)

We can then look through a few images to see what the data looks like. Datasets in PyTorch define the `__getitem__` method, which allows to retrieve elements from the dataset as we would from a list.

As per their documentation, the `__getitem__` method returns `(image, target)` where target is index of the target class. As you can see in the plot, the images contain integer values between 0 and 255.

You can play around with the index returned to look at different images in the dataset.

In [None]:
print(f"Training images: {len(train_dataset_base)}")
print(f"Test images:     {len(test_dataset_base)}\n")
image, target = train_dataset_base[0]

print(f"Target: {target}")
print(f"Image size: {image.size}")

plt.figure()
cbar = plt.imshow(image, cmap="gray")
plt.colorbar(cbar)
plt.show()

### Image Preprocessing - Transformations

One very important step when training computer vision models is normalizing your data, so that your data has mean ~0 and standard deviation ~1. This is to help your model converge. You can read more about it in any machine learning textbook, or a bit more about it in this [discussion on the PyTorch forum](https://discuss.pytorch.org/t/why-image-datasets-need-normalizing-with-means-and-stds-specified-like-in-transforms-normalize-mean-0-485-0-456-0-406-std-0-229-0-224-0-225/187818/4)

A set of normalization parameters you'll often see are computed from the ImageNet dataset:

```
mean=[0.485, 0.456, 0.406]
std=[0.229, 0.224, 0.225]
```

When working with "natural" images, it's very common to normalize using these parameters. This is discussed in [this thread](https://discuss.pytorch.org/t/discussion-why-normalise-according-to-imagenet-mean-and-std-dev-for-transfer-learning/115670). In our case, we can just normalize with `0.5`. We could also compute the mean and standard deviation of pixels in the dataset and use those to Normalize.

In [None]:
transform = transforms.Compose(
    [
        transforms.ToTensor(),
        transforms.Normalize(0.5, 0.5)
    ]
)

image, target = train_dataset_base[0]
image = transform(image)

plt.figure()
cbar = plt.imshow(image.squeeze().numpy(), cmap="gray")
plt.colorbar(cbar)
plt.show()

We can re-build the dataset while directly using these transformations, so we receive the images in the format we want.

In [None]:
transform = transforms.Compose(
    [
        transforms.ToTensor(),
        transforms.Normalize(0.5, 0.5)
    ]
)

train_dataset = torchvision.datasets.MNIST(
    root=Path("."),
    train=True,
    download=True,
    transform=transform,
)

test_dataset = torchvision.datasets.MNIST(
    root=Path("."),
    train=False,
    download=True,
    transform=transform,
)

image, target = train_dataset[0]

plt.figure()
cbar = plt.imshow(image.squeeze().numpy(), cmap="gray")
plt.colorbar(cbar)
plt.show()

### Convolutional Neural Networks - 2D Convolutions in PyTorch

A CNN consists of different layers of convolutions, where the kernels are **learned**, instead of manually defined like we saw in the exercises about convolutions.

A convolutional layer in PyTorch is implemented through the [`nn.Conv2d`](https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html) class. We can create a convolutional layer easily, as shown below. When we create the layer, the weights are initialized randomly (but according to a specified distribution - the distribution chosen is actually very important - there are many resources online to learn about this) This was also described by Yann LeCun in [Gradient-based learning applied to document recognition](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=726791) (check out the paper for more info):

> Before training, the weights are initialized with random values using a uniform distribution ...

In [None]:
conv = nn.Conv2d(1, 1, 3)

print(f"Conv Kernel Weight: {conv.weight}")
print()
print(f"Conv Kernel Bias: {conv.bias}")

We "apply" this convolution to our image. But as mentioned in the docs, an [`nn.Conv2d`](https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html) expects the inputs to be a Tensor of shape `(N, C_in, H, W)`, where N is the batch size. Our images are of shape `(1, H, W)` (which is `(1, 28, 28)` for MNIST). We can simply add a dimension to our image for the batch size (which is 1 of course). We can do that with `unsqueeze()`.

In [None]:
img, target = train_dataset[0]

print(f"Image shape: {img.shape}")
img = img.unsqueeze(dim=0)
print(f"Batched image shape: {img.shape}")

img_out = conv(img)
img_out = img_out.detach().numpy().squeeze()

plt.figure()
plt.imshow(img_out, cmap="gray")
plt.show()

Note that this is **never** done in practice, but we can set the weights of our 2d convolution to be a Sobel detector, just like we did in the convolutions notebook. This is just to illustrate the while the kernel for the convolution is learned, the operation is exactly the same.

In [None]:
edge_detector = nn.Conv2d(1, 1, 3)
edge_detector.weight = nn.Parameter(
    torch.tensor(
        [
            [1.0, 0.0, -1.0],
            [2.0, 0.0, -2.0],
            [1.0, 0.0, -1.0],
        ],
    ).unsqueeze(dim=0).unsqueeze(dim=0)
)
edge_detector.bias = nn.Parameter(torch.tensor([0.0]))

img, target = train_dataset[0]
img = img.unsqueeze(dim=0)

img_out = edge_detector(img)
img_out = img_out.detach().numpy().squeeze()

plt.figure()
cbar = plt.imshow(img_out, cmap="gray")
plt.colorbar(cbar)
plt.show()

### Designing Neural Networks in PyTorch

Most of this part of the notebook is taken from this [tutorial from PyTorch](https://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html). It's very well written, and if you want to learn more you can check it out. **HOWEVER**, they use a LeNet5 as their example model, and I think it would be more beneficial to write the LeNet from scratch, without looking at the solution. So I would suggest first coding it up yourselves, and then looking at _a_ solution in the tutorial.

A neural net in PyTorch is an `nn.Module`. As neural networks have conceptually, these modules have layers (which themselves are modules). They have a forward method which returns the output of the neural network.

As an example, let's train a neural network to approximate an affine function, say `10x + 5`. We can "learn" this function with a very simple network, composed of a single linear layer. A linear layer is composed of weights $W$ and a bias $b$, and for input $x$ outputs:

> $f(x) = Wx + b$

We can train this network to learn the affine function we defined. Some resources for loss functions and optimizers:
- [PyTorch loss functions](https://pytorch.org/docs/stable/nn.html#loss-functions)
- [PyTorch optimizers](https://pytorch.org/docs/stable/optim.html#module-torch.optim)

Here, you can play around with the loss criterion, optimizer, learning rate and number of iterations used to see how quickly the model converges.

In [None]:
class AffineNet(nn.Module):
    """A net to learn an affine function"""

    def __init__(self):
        super().__init__()
        self.layer = nn.Linear(1, 1)
        
    def forward(self, x):
        return self.layer(x)


num_iterations = 1000

net = AffineNet()
criterion = nn.MSELoss()
optimizer = optim.SGD(net.parameters(), lr=0.001)

running_loss = []
for i in range(num_iterations):  # run 1000 iterations
    inputs = torch.rand(1)

    with torch.no_grad():
        targets = 10 * inputs + 5

    # zero the parameter gradients
    optimizer.zero_grad()

    # forward + backward + optimize
    outputs = net(inputs)
    loss = criterion(outputs, targets)
    loss.backward()
    optimizer.step()

    # get statistics
    running_loss.append(loss.item())


print("Finished Training")
print(f"Final loss: {running_loss[-1]}")
print(f"Final weight: {net.layer.weight}")
print(f"Final bias: {net.layer.bias}")

plt.figure()
plt.plot(running_loss)
plt.show()

We can add more layers to this neural network. By adding a second layer, our neural network is now composed of two layers $L_1$ and $L_2$, with weights and bias $W_1$, $b_1$ and $W_2$, $b_2$ respectively. Now, for input $x$, the net outputs:

> $f(x) = W_2(W_1x + b_1) + b_2$

Notice how larger nets might not perform better (in this case, it's more difficult to train). There are also many ways to set the model parameters so that $f(x)$ represents the affine function $10x + 5$.

In [None]:
class TwoLayerNet(nn.Module):
    """A net to learn an affine function"""

    def __init__(self):
        super().__init__()
        self.layer_1 = nn.Linear(1, 1)
        self.layer_2 = nn.Linear(1, 1)
        
    def forward(self, x):
        out_1 = self.layer_1(x)
        out_2 = self.layer_2(out_1)
        return out_2


num_iterations = 1000

net = TwoLayerNet()
criterion = nn.MSELoss()
optimizer = optim.SGD(net.parameters(), lr=0.001)

running_loss = []
for i in range(num_iterations):  # run 1000 iterations
    inputs = torch.rand(1)

    with torch.no_grad():
        targets = 10 * inputs + 5

    # zero the parameter gradients
    optimizer.zero_grad()

    # forward + backward + optimize
    outputs = net(inputs)
    loss = criterion(outputs, targets)
    loss.backward()
    optimizer.step()

    # get statistics
    running_loss.append(loss.item())


print("Finished Training")
print(f"Final loss: {running_loss[-1]}")
print(f"Final weight 1: {net.layer_1.weight}")
print(f"Final bias 1:   {net.layer_1.bias}")
print(f"Final weight 2: {net.layer_2.weight}")
print(f"Final bias 2:   {net.layer_2.bias}")

plt.figure()
plt.plot(running_loss)
plt.show()

### Exercise - Implementing LeNet5

Based on the [wikipedia article](https://en.wikipedia.org/wiki/LeNet), or the [paper's description](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=726791), implement a LeNet5 model and train it on the MNIST dataset. You'll need the following building blocks:

- [`MaxPool2d`](https://pytorch.org/docs/stable/generated/torch.nn.MaxPool2d.html)
- [`Conv2d`](https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html)
- [`Linear`](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html#torch.nn.Linear)
- [`Sigmoid`](https://pytorch.org/docs/stable/generated/torch.nn.Sigmoid.html#torch.nn.Sigmoid)
- [`torch.flatten`](https://pytorch.org/docs/stable/generated/torch.flatten.html#torch.flatten)

In [None]:
class LeNet5(nn.Module):
    """PyTorch tutorial Neural Network"""

    def __init__(self):
        super().__init__()
        # TODO: implement the model!


Once your model is defined, you can train it on the MNIST dataset: look at the tutorial for more information about that!