For this notebook, please insert where there is `_FILL_` either code or logic to make this work.



# MNIST CNN Digit Recognition Network

For this problem, you will code a basic digit recognition network. The data are images which specify the digits 1 to 10 as (1, 28, 28) data - this data is black and white images. Each pixed of the image is an intensity between 0 and 255, and together the (1, 28, 28) pixel image can be visualized as a picture of a digit. The data is given to you as $\{(x^{(i)}, y^{(i)})\}_{i=1}^{N}$ where $y$ is the given label and x is the (1, 28, 28) data. This data will be gotten from `torchvision`, a repository of computer vision data and models.

Highlevel, the model and notebook goes as follows:
*   You first download the data and specify the batch size of B = 16. Each image will need to be turned from a (1, 28, 28) volume into a serious of other volumes either via convolutional layers or max pooling layers.
*   You will pass the data through several layers to built a CNN classfier. Use the hints below to get the right dimensions and figure out what the layers should be. Be careful with the loss function. Add regularization (L1 and L2) manually.

See the comments below and fill in the analysis where there is `_FILL_` specified. All asserts should pass and Test accuracy should be about 95%.






In [1]:
!pip install torchmetrics

Collecting torchmetrics
  Downloading torchmetrics-1.2.0-py3-none-any.whl (805 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m805.2/805.2 kB[0m [31m9.9 MB/s[0m eta [36m0:00:00[0m
Collecting lightning-utilities>=0.8.0 (from torchmetrics)
  Downloading lightning_utilities-0.9.0-py3-none-any.whl (23 kB)
Installing collected packages: lightning-utilities, torchmetrics
Successfully installed lightning-utilities-0.9.0 torchmetrics-1.2.0


In [2]:
import torchvision
from torchvision import transforms
import torch
from torch.utils.data import DataLoader, TensorDataset
import torch.nn as nn
import torchmetrics

In [3]:
SEED = 1
torch.manual_seed(SEED)
_FILL_ = '_FILL_'

In [4]:
image_path = './'

# Use ToTensor
transform = transforms.Compose([transforms.ToTensor()])

mnist_train_dataset = torchvision.datasets.MNIST(
    root=image_path,
    train=True,
    transform=transform,
    download=True
  )

mnist_test_dataset = torchvision.datasets.MNIST(
    root=image_path,
    train=False,
    transform=transform,
    download=False
)

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ./MNIST/raw/train-images-idx3-ubyte.gz


100%|██████████| 9912422/9912422 [00:00<00:00, 234576929.45it/s]

Extracting ./MNIST/raw/train-images-idx3-ubyte.gz to ./MNIST/raw






Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to ./MNIST/raw/train-labels-idx1-ubyte.gz


100%|██████████| 28881/28881 [00:00<00:00, 22016665.54it/s]


Extracting ./MNIST/raw/train-labels-idx1-ubyte.gz to ./MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to ./MNIST/raw/t10k-images-idx3-ubyte.gz


100%|██████████| 1648877/1648877 [00:00<00:00, 78121831.72it/s]

Extracting ./MNIST/raw/t10k-images-idx3-ubyte.gz to ./MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to ./MNIST/raw/t10k-labels-idx1-ubyte.gz



100%|██████████| 4542/4542 [00:00<00:00, 7183457.30it/s]


Extracting ./MNIST/raw/t10k-labels-idx1-ubyte.gz to ./MNIST/raw



In [5]:
BATCH_SIZE = 16
LR = 0.1
L1_WEIGHT = 1e-10
L2_WEIGHT = 1e-12
EPOCHS = 20
# Get the dataloader for train and test
train_dl = DataLoader(mnist_train_dataset, BATCH_SIZE, shuffle=True)
test_dl = DataLoader(mnist_test_dataset, BATCH_SIZE, shuffle=True)

In [6]:
class CNNClassifier(nn.Module):

  def __init__(self):
    super().__init__()
    self.cnn1 = nn.Conv2d(in_channels=1, out_channels=32, kernel_size=3)
    self.cnn2 = nn.Conv2d(in_channels=32, out_channels=16, kernel_size=3)
    self.cnn3 = nn.Conv2d(in_channels=16, out_channels=1, kernel_size=1)
    self.linear = nn.Linear(in_features=25, out_features=10)

  def forward(self, x):
    assert(x.shape == (BATCH_SIZE, 1, 28, 28))

    # Pass through cnn layer 1
    # (28, 28, 1) -> (26, 26, 32)
    x = self.cnn1(x)
    assert(x.shape == (BATCH_SIZE, 32, 26, 26))

    # Pass through max pooling to give the result shape below
    # (26, 26, 32) -> (13, 13, 32)
    x = nn.functional.max_pool2d(x, kernel_size=2, stride=2)
    assert(x.shape == (BATCH_SIZE, 32, 13, 13))

    # Apply ReLU
    x = nn.functional.relu(x)

    # Pass through cnn layer 2 to give the result below
    # (13, 13, 32) -> (11, 11, 16)
    x = self.cnn2(x)
    assert(x.shape == (BATCH_SIZE, 16, 11, 11))

    # Pass through max pooling pool to give the result below
    # (11, 11, 16) -> (5, 5, 16)
    x = nn.functional.max_pool2d(x, kernel_size=2, stride=2)
    assert(x.shape == (BATCH_SIZE, 16, 5, 5))

    # Apply ReLU
    x = nn.functional.relu(x)

    # Pass through cnn layer 3 to give the result below
    # (5, 5, 16) -> (5, 5, 1)
    x = self.cnn3(x)
    assert(x.shape == (BATCH_SIZE, 1, 5, 5))

    # Apply ReLU
    x = nn.functional.relu(x)

    # Flatten to get the result below
    # (5, 5, 1) - > (25, )
    x = x.view(x.size(0), -1)
    assert(x.shape == (BATCH_SIZE, 25))

    # Pass through linear layer to get the result below
    # (25, ) -> (16, ) #?? should be 10?
    x = self.linear(x)
    assert(x.shape == (BATCH_SIZE, 10))

    # Return the logits
    return x

model = CNNClassifier()

In [7]:
# Get the loss function; remember you are outputting the logits
loss_fn = nn.CrossEntropyLoss()

# Set the optimizer to SGD and let the learning rate be LR
# Do not add L2 regularization; add it manually below ...
optimizer = torch.optim.SGD(model.parameters(), lr=LR)

torch.manual_seed(SEED)
for epoch in range(EPOCHS):
    accuracy_hist_train = 0
    auroc_hist_train = 0.0
    loss_hist_train = 0
    # Loop through the x and y pairs of data
    for x_batch, y_batch in train_dl:
        # Get he the model predictions
        y_pred = model(x_batch)

        # Get the loss
        loss = loss_fn(y_pred, y_batch)

        # Add an L1 regularizaton with a weight of L1_WEIGHT to the objective
        l1_reg = L1_WEIGHT * torch.norm(model.linear.weight, p=1)

        # Add an L2 regularization with a weight of L2_WEIGHT to the objective
        l2_reg = L2_WEIGHT * torch.norm(model.linear.weight, p=2)

        # Add the regularizers to the objective
        loss +=  (l1_reg + l2_reg)

        # Get the gradients
        loss.backward()

        # Add to the loss
        # Remember loss: is a mean over the batch size and we need the total sum over the number of samples in the dataset
        loss_hist_train += loss.item() * len(y_batch)

        # Update the parameters
        optimizer.step()

        # Zero out the gradient
        optimizer.zero_grad()

        # Get the number of correct predictions, do this with torchmetrics
        is_correct = torchmetrics.Accuracy(task='multiclass', num_classes = 10)(y_pred.argmax(dim=1), y_batch).item() * len(y_batch)

        accuracy_hist_train += is_correct
    accuracy_hist_train /= len(train_dl.dataset)
    loss_hist_train /= len(train_dl.dataset)
    print(f'Train Metrics Epoch {epoch} Loss {loss_hist_train:.4f} Accuracy {accuracy_hist_train:.4f}')

    accuracy_hist_test = 0
    loss_hist_test = 00
    # Get the average value of each metric across the test batches
    with torch.no_grad():
      accuracy_hist_test = 0
      auroc_hist_test = 0.0
      # Loop through the x and y pairs of data
      for x_batch, y_batch in test_dl:
          # Get he the model predictions
          y_batch_pred = model(x_batch)

          # Get the loss
          loss = loss_fn(y_batch_pred, y_batch)

          # Add an L1 regularizaton with a weight of L1_WEIGHT to the objective
          l1_reg = L1_WEIGHT * torch.norm(model.linear.weight, p=1)

          # Add an L2 regularization with a weight of L2_WEIGHT to the objective
          l2_reg = L2_WEIGHT * torch.norm(model.linear.weight, p=2)

          # Add the regularizers to the objective
          loss += (l1_reg + l2_reg)

          # Add to the loss
          # Remember loss: is a mean over the batch size and we need the total sum over the number of samples in the dataset
          loss_hist_test += loss.item() * len(y_batch)

          # Get the number of correct predictions via torchmetrics
          is_correct = torchmetrics.Accuracy(task='multiclass', num_classes = 10)(y_batch_pred.argmax(dim=1), y_batch).item() * len(y_batch)

          # Get the accuracy
          accuracy_hist_test += is_correct

      # Normalize the metrics by the right number
      accuracy_hist_test /= len(test_dl.dataset)
      loss_hist_test /= len(test_dl.dataset)
      print(f'Test Metrics Epoch {epoch} Loss {loss_hist_test:.4f} Accuracy {accuracy_hist_test:.4f}')

Train Metrics Epoch 0 Loss 0.4114 Accuracy 0.8718
Test Metrics Epoch 0 Loss 0.2177 Accuracy 0.9320
Train Metrics Epoch 1 Loss 0.2289 Accuracy 0.9299
Test Metrics Epoch 1 Loss 0.1655 Accuracy 0.9489
Train Metrics Epoch 2 Loss 0.1926 Accuracy 0.9418
Test Metrics Epoch 2 Loss 0.1787 Accuracy 0.9473
Train Metrics Epoch 3 Loss 0.1765 Accuracy 0.9464
Test Metrics Epoch 3 Loss 0.1465 Accuracy 0.9533
Train Metrics Epoch 4 Loss 0.1664 Accuracy 0.9494
Test Metrics Epoch 4 Loss 0.1904 Accuracy 0.9409
Train Metrics Epoch 5 Loss 0.1550 Accuracy 0.9530
Test Metrics Epoch 5 Loss 0.1330 Accuracy 0.9578
Train Metrics Epoch 6 Loss 0.1507 Accuracy 0.9539
Test Metrics Epoch 6 Loss 0.1424 Accuracy 0.9550
Train Metrics Epoch 7 Loss 0.1457 Accuracy 0.9554
Test Metrics Epoch 7 Loss 0.1236 Accuracy 0.9610
Train Metrics Epoch 8 Loss 0.1396 Accuracy 0.9574
Test Metrics Epoch 8 Loss 0.1268 Accuracy 0.9589
Train Metrics Epoch 9 Loss 0.1362 Accuracy 0.9584
Test Metrics Epoch 9 Loss 0.1287 Accuracy 0.9590
Train Metr