In [None]:
## Problem 5: Convolutional Neural Networks (PyTorch Implementation)

Time to implement your first convolutional neural network (CNN) in PyTorch!

For this assignment, we'll be training the network on the canonical MNIST dataset. After building the network, we'll experiment with an array of hyperparameters, tweaking the network's width, depth, learning rate and more in pursuit of the highest classification accuracy we can muster.

You may find the PyTorch tutorials helpful as you complete this problem: https://pytorch.org/tutorials/beginner/basics/intro.html. If you haven't yet, we suggest you go through them. Pay more attention to the tutorial on the optimization loop, which you will need to build more or less from scratch.
### Step 0: Setup Environment
If you haven't set up PyTorch locally, you can do so following this [local installation guide](https://pytorch.org/get-started/locally/).


Installing PyTorch locally is **not** necessary for the course. You can access PyTorch either through:

- the class partition `cpsc452` on [McCleary](https://docs.ycrc.yale.edu/clusters/mccleary/)

- use of [Google Colab](https://colab.research.google.com)

If you are new to the Yale High Performance Clusters (HPC) please consulte this [guide](https://docs.ycrc.yale.edu/clusters-at-yale/)
<div style="display:none">

```bash
[mccleary ~]$ salloc ---reservation=cpsc452
[cpsc452_netID@gpu ~]$  bash
```

```bash
# sbatch.script
@SBATCH -p cpsc452
```

As usual, we'll start by importing the necessary libraries and setting up our environment. Please run the following cell to do so.
!pip install numpy matplotlib tqdm
from typing import Callable

import torch
import torch.nn as nn            # neural network modules
import torch.nn.functional as F  # activation functions
import torch.optim as optim      # optimizer
import torch.utils.data          # dataloader
import torchvision.datasets as datasets

import numpy as np
import matplotlib.pyplot as plt
import tqdm

torch.manual_seed(42)
# Download the MNIST dataset
mnist_train = datasets.MNIST(root='./data', train=True, download=True, transform=None)
mnist_test = datasets.MNIST(root='./data', train=False, download=True, transform=None)

# Load into torch datasets
train_dataset = torch.utils.data.TensorDataset(mnist_train.data.unsqueeze(1).float(), mnist_train.targets.long())
test_dataset = torch.utils.data.TensorDataset(mnist_test.data.unsqueeze(1).float(), mnist_test.targets.long())

# Visualize the data
for i in range(100):
    plt.subplot(10, 10, i+1)
    plt.imshow(train_dataset[i][0][0], cmap='gray')
    plt.axis('off')
### Step 1: Learn PyTorch Basics
In this section, you will learn different PyTorch basic operations (`Conv2d`, `MaxPool`, `Linear`) and reshape operations. You might refer to PyTorch documentation for details of these operations.
# Part 1: Explore `nn.Module`
image = torch.randn(1, 1, 28, 28)  # image: (1, 1, 28, 28)


# TODO: define a 3x3 convolutional layer that maps 1 input channel to 32 output channels
# refer to https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html
# you might need to specify the input channels, output channels, kernel size, stride, padding, etc.
conv_1 = ...
output_1 = conv_1(image)    # image: (1, 1, 28, 28) -> output_1: (1, 32, 28, 28)
assert output_1.shape == (1, 32, 28, 28), "The shape of output_1 is incorrect!"


# TODO: define a max pooling layer that halves the height and width of the input
# refer to https://pytorch.org/docs/stable/generated/torch.nn.MaxPool2d.html
# you might need to specify the kernel size, stride, padding, etc.
pool_1 = ...
output_2 = pool_1(output_1) # output_1: (1, 32, 28, 28) -> output_2: (1, 32, 14, 14)
assert output_2.shape == (1, 32, 14, 14), "The shape of output_2 is incorrect!"


# TODO: define a 3x3 convolutional layer that maps 32 input channels to 64 output channels
conv_2 = ...
output_3 = conv_2(output_2) # output_2: (1, 32, 14, 14) -> output_3: (1, 64, 14, 14)
assert output_3.shape == (1, 64, 14, 14), "The shape of output_3 is incorrect!"


# TODO: define a max pooling layer that halves the height and width of the input
pool_2 = ...
output_4 = pool_2(output_3) # output_3: (1, 64, 14, 14) -> output_4: (1, 64, 7, 7)
assert output_4.shape == (1, 64, 7, 7), "The shape of output_4 is incorrect!"


# TODO: flatten the output of the previous layer
# refer to https://pytorch.org/docs/stable/generated/torch.flatten.html
flatten_4 = ...            # output_4: (1, 64, 7, 7) -> flatten_4: (1, 64 * 7 * 7)
assert flatten_4.shape == (1, 64 * 7 * 7), "The shape of flatten_4 is incorrect!"


# TODO: define a linear layer that maps 64 * 7 * 7 input features to 10 output features
# refer to https://pytorch.org/docs/stable/generated/torch.nn.Linear.html
# you might need to specify the input size, output size, etc.
fc = ...
output_5 = fc(flatten_4)     # flatten_4: (1, 64 * 7 * 7) -> output_5: (1, 10)
assert output_5.shape == (1, 10), "The shape of output_5 is incorrect!"


# Part 2: Explore reshape, squeeze, unsqueeze, transpose, repeat
tensor = torch.tensor([[[1, 2, 3, 4], [5, 6, 7, 8]]])  # tensor: (1, 2, 4)

# TODO: reshape the tensor to (2, 2, 2)
# refer to https://pytorch.org/docs/stable/generated/torch.reshape.html
reshaped_tensor = ...
assert torch.allclose(reshaped_tensor, torch.tensor([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])), "The reshaped tensor is incorrect!"


# TODO: squeeze the first dimension of the tensor
# refer to https://pytorch.org/docs/stable/generated/torch.squeeze.html
squeezed_tensor = ...
assert torch.allclose(squeezed_tensor, torch.tensor([[1, 2, 3, 4], [5, 6, 7, 8]])), "The squeezed tensor is incorrect!"


# TODO: unsqueeze the first dimension of the tensor
# refer to https://pytorch.org/docs/stable/generated/torch.unsqueeze.html
unsqueeze_tensor = ...
assert torch.allclose(unsqueeze_tensor, torch.tensor([[[[1, 2, 3, 4], [5, 6, 7, 8]]]])), "The unsqueezed tensor is incorrect!"


# TODO: transpose dim 1 and dim 2 of the tensor
# refer to https://pytorch.org/docs/stable/generated/torch.transpose.html
transposed_tensor = ...
assert torch.allclose(transposed_tensor, torch.tensor([[[1, 5], [2, 6], [3, 7], [4, 8]]])), "The transposed tensor is incorrect!"


# TODO: repeat the tensor 3 times along dim 0
# refer to https://pytorch.org/docs/stable/generated/torch.repeat.html
repeated_tensor = ...
assert torch.allclose(repeated_tensor, torch.tensor([[[1, 2, 3, 4], [5, 6, 7, 8]], [[1, 2, 3, 4], [5, 6, 7, 8]], [[1, 2, 3, 4], [5, 6, 7, 8]]])), "The repeated tensor is incorrect!"
### Step 2: Build and Train a SimpleCNN on MNIST Dataset
Follow the TODOs to build a two-layer fully-connected neural network. This is the first ``SimpleCNN`` with linear layers only. You will use this as a baseline model for the next step.
class SimpleCNN(nn.Module):
    def __init__(
        self,
        input_dim: int = 1,
        output_dim: int = 10,
        hidden_dim_list: list = [4, 8],
    ):
        super().__init__()
        self.input_dim = input_dim
        self.output_dim = output_dim
        self.hidden_dim_list = hidden_dim_list

        # TODO: define the layers of the network
        self.conv_1 = ...
        self.conv_2 = ...
        self.fc = ...

    def forward(self, x):
        x = self.conv_1(x)
        x = self.conv_2(x)
        x = ...             # TODO: flatten the output of the previous layer
        x = self.fc(x)
        return x
Implement the training function for your CNN. The function should take the model, optimizer, loss function, training data loader, and validation data loader as input. It should return the training and validation loss and accuracy after each epoch.

Implement the ``plot_metrics`` function to visualize the training history.

**Warning**: When implementing the training loop, be aware that in each iteration, the `loss` variable is a tensor. It's important to extract its scalar value for logging or calculating average loss. Use `loss.item()` to get the scalar value of the tensor. Otherwise, you might encounter unexpected out-of-memory errors.
def plot_metrics(train_metrics, test_metrics, xlabel, ylabel, title):
    # TODO: plot train and test metrics in a single plot
    raise NotImplementedError()

def train(model, loss_fn, train_loader, test_loader, optimizer, epochs=5):
    """Train the model.
    Args:
        model: the model
        loss_fn: the loss function
        train_loader: the training data loader
        test_loader: the testing data loader
        optimizer: the optimizer
        epochs: the number of epochs to train
    Returns:
        train_losses: the training losses
        test_losses: the testing losses
    """
    train_losses = []
    test_losses = []
    train_accuracies = []
    test_accuracies = []

    loop = tqdm.tqdm(range(1, epochs + 1))

    for epoch in loop:
        # TODO: implement training and testing loop

        # train the model for one epoch
        train_loss, train_accuracy = ...
        ...

        # test the model for one epoch
        test_loss, test_accuracy = ...
        ...

        loop.set_description(f'Epoch {epoch}')
        loop.set_postfix(train_loss=train_loss, test_loss=test_loss, train_accuracy=train_accuracy, test_accuracy=test_accuracy)
    return train_losses, test_losses, train_accuracies, test_accuracies


def train_epoch(model, loss_fn, train_loader, optimizer):
    """Train the model for one epoch.
    Args:
        model: the model
        loss_fn: the loss function
        train_loader: the training data loader
        optimizer: the optimizer
    Returns:
        train_loss: the loss of the epoch
    """
    model.train()  # set model to training mode
    train_loss = 0
    train_accuracy = 0

    for batch_idx, (data, target) in enumerate(train_loader):
        # TODO: implement training iteration
        raise NotImplementedError

    return train_loss, train_accuracy

def test_epoch(model, loss_fn, test_loader):
    """Test the model for one epoch.
    Args:
        model: the model
        loss_fn: the loss function
        test_loader: the testing data loader
    Returns:
        test_loss: the loss of the epoch
    """
    model.eval()  # set model to evaluation mode
    test_loss = 0
    test_accuracy = 0

    with torch.no_grad():  # disable gradient calculation
        for data, target in test_loader:
            # TODO: implement test iteration
            raise NotImplementedError

    return test_loss, test_accuracy
Use the training function above to train your ``SimpleCNN`` on ``MNIST`` dataset. You should get a training accuracy less than 92%. Don't worry, we will improve it in the next step.

Here are some hyperparameters you can try to improve the performance of your model (we will dive into hyperparameter tuning in the last step):
- Number of hidden units
- Learning rate
- Number of training epochs
- Batch size
batch_size = 64
learning_rate = 1e-4
epochs = 10
input_dim = 1
hidden_dim_list = [4, 8]
output_dim = ...    # TODO: define the output dimension

model = ...
loss_fn = ...       # refer to https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html
optimizer = ...     # refer to https://pytorch.org/docs/stable/generated/torch.optim.SGD.html

train_loader = ...  # refer to https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader
test_loader = ...   # refer to https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader

train_losses, test_losses, train_accuracies, test_accuracies = train(model, loss_fn, train_loader, test_loader, optimizer, epochs=epochs)

plt.subplot(2, 1, 1)
plot_metrics(train_losses, test_losses, xlabel="Epoch", ylabel="Loss", title="Loss")
plt.subplot(2, 1, 2)
plot_metrics(train_accuracies, test_accuracies, xlabel="Epoch", ylabel="Accuracy", title="Accuracy")
### Step 3: Improve the SimpleCNN
As you can see in the previous step, the training accuracy of the ``SimpleCNN`` is poor. In this step, you will improve the performance of the ``SimpleCNN`` by adding ``nn.MaxPool2d``, ``nn.Dropout``, and activation functions.

**Hint**: The max pooling layer is used to downsample the input along the spatial dimensions (width and height) independently for each channel. It is recommended to add the max pooling layer after the activation function.
class CNN(nn.Module):
    def __init__(
        self,
        input_dim: int = 1,
        output_dim: int = 10,
        hidden_dim_list: list = [4, 8],
        p: float = 0.0,
        act_fn: Callable = F.relu,
    ):
        super().__init__()
        self.input_dim = input_dim
        self.output_dim = output_dim
        self.hidden_dim_list = hidden_dim_list

        # TODO: define the layers of the network
        self.conv_1 = ...
        self.pool_1 = ...
        self.conv_2 = ...
        self.pool_2 = ...
        self.fc = ...
        self.act_fn = ...
        self.dropout = ...

    def forward(self, x):
        # TODO: add activation functions and dropout to the correct layers
        x = self.conv_1(x)
        x = self.pool_1(x)
        x = self.conv_2(x)
        x = self.pool_2(x)
        x = ...             # TODO: flatten the output of the previous layer
        x = self.fc(x)
        return x
Again, use the training function and the same set of hyperparameters above to train your ``CNN`` on ``MNIST dataset``. You should get a training accuracy around 95%.
batch_size = 64
learning_rate = 1e-4
epochs = 10
input_dim = 1
hidden_dim_list = [4, 8]
output_dim = ...    # TODO: define the output dimension
act_fn = F.relu
p = 0.0

model = ...
loss_fn = ...       # refer to https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html
optimizer = ...     # refer to https://pytorch.org/docs/stable/generated/torch.optim.SGD.html

train_loader = ...  # refer to https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader
test_loader = ...   # refer to https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader

train_losses, test_losses, train_accuracies, test_accuracies = train(model, loss_fn, train_loader, test_loader, optimizer, epochs=epochs)

plt.subplot(2, 1, 1)
plot_metrics(train_losses, test_losses, xlabel="Epoch", ylabel="Loss", title="Loss")
plt.subplot(2, 1, 2)
plot_metrics(train_accuracies, test_accuracies, xlabel="Epoch", ylabel="Accuracy", title="Accuracy")
Here are the experiments:
- Try adjusting the learning rate to improve its accuracy. You might also try increasing the number of epochs used. Record your results in a table.
- Try training your network with different non-linearities between the layers (i.e. relu, softplus, elu, tanh). You should experiment with these and record your test results for each in a table
- Try changing the width of the hidden layer, keeping the activation function that performs best. Remember to add these results to your table.
- Experiment with the optimizer of your network (i.e. SGD, Adam, RMSProp). You should experiment with these and record your test results for each in a table
batch_size = 64
learning_rate = ...
epochs = ...
input_dim = 1
hidden_dim_list = [..., ...]
output_dim = ...    # TODO: define the output dimension
act_fn = ...        # TODO: define the activation function
p = ...             # TODO: define the dropout probability

model = ...
loss_fn = ...       # refer to https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html
optimizer = ...     # refer to https://pytorch.org/docs/stable/generated/torch.optim.SGD.html

train_loader = ...  # refer to https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader
test_loader = ...   # refer to https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader

train_losses, test_losses, train_accuracies, test_accuracies = train(model, loss_fn, train_loader, test_loader, optimizer, epochs=epochs)

plt.subplot(1, 2, 1)
plot_metrics(train_losses, test_losses, xlabel="Epoch", ylabel="Loss", title="Loss")
plt.subplot(1, 2, 2)
plot_metrics(train_accuracies, test_accuracies, xlabel="Epoch", ylabel="Accuracy", title="Accuracy")
### Step 4: Hyperparameter Tuning
Now the interesting part begins. Try to improve the performance of your ``CNN`` by tuning the hyperparameters. You should be able to get a training accuracy around 98% and a validation accuracy around 97%.

Here are some new parameters you can try to improve the performance of your model:
- ``Optimizer (SGD, Adam, RMSProp, etc)``: Different optimizers may lead to different convergence speed and performance.
- ``Weight decay (L2 penalty)``: Weight decay is a regularization technique to prevent overfitting. It is recommended to use a small weight decay value (e.g., 1e-4).
- ``Activation function (ReLU, Leaky ReLU, Tanh, etc)``: Different activation functions may lead to different convergence speed and performance.
- ``Dropout rate``: Dropout is a regularization technique to prevent overfitting. It is recommended to use a small dropout rate (e.g., 0.2).
- ...

Please implement a grid search algorithm to find the best set of hyperparameters and report the best validation accuracy you can get. Any hyperparameter can be tuned!
# Grid search
batch_size = 64
learning_rate = [..., ..., ...]
epochs = 10
input_dim = 1
hidden_dim_list = [4, 8]    # to save time, don't tune this
output_dim = ...            # TODO: define the output dimension
act_fn = [..., ..., ...]    # refer to https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity

# TODO: and all other hyperparameters you want to tune, e.g., dropout
# dropout = [..., ..., ...]
# optimizers = [..., ..., ...]   # refer to https://pytorch.org/docs/stable/optim.html

loss_fn = ...       # refer to https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html

best_accuracy = 0
best_model = None
best_history = None
for lr in learning_rate:
    for af in act_fn:
        print(...)
        # TODO: implement the grid search
        raise NotImplementedError

print(f"Best accuracy: {best_accuracy}")
print(f"Best model: {best_model}")

plt.subplot(1, 2, 1)
plot_metrics(best_history[0], best_history[1], xlabel="Epoch", ylabel="Loss", title="Loss")
plt.subplot(1, 2, 2)
plot_metrics(best_history[2], best_history[3], xlabel="Epoch", ylabel="Accuracy", title="Accuracy")
### Step 5 Confusion Matrix
With your best performing model, plot a confusion matrix showing which digits were misclassified, and what they were misclassified as. What numbers are frequently confused with one another by your model?
from sklearn.metrics import confusion_matrix

model = best_model
model.eval()  # set model to evaluation mode

# TODO: implement the confusion matrix
raise NotImplementedError