For this notebook, please insert where there is `_FILL_` either code or logic to make this work.



# MNIST MLP Digit Recognition Network

For this problem, you will code a basic digit recognition network. The data are images which specify the digits 1 to 10 as (1, 28, 28) data - this data is black and white images. Each pixed of the image is an intensity between 0 and 255, and together the (1, 28, 28) pixel image can be visualized as a picture of a digit. The data is given to you as $\{(x^{(i)}, y^{(i)})\}_{i=1}^{N}$ where $y$ is the given label and x is the (1, 28, 28) data. This data will be gotten from `torchvision`, a repository of computer vision data and models.

Highlevel, the model and notebook goes as follows:
*   You first download the data and specify the batch size of B = 16. Each image will need to be turned from a (1, 28, 28) volume into a vector of dimension 784 = 1 * 28 * 28. So each batch will be of size (16, 784).
*   Then, you pass the model through two hidden layers, one of dimension (784, 32) and another of dimension (32, 16). After each linear map, you pass the data through a TanH nonlinearity.
*   Finally, you pass the data through a (16, 10) linear layer and you return the log softmax of the data.
*   What objective do you use? Be careful!
*   How do you compute accuracy both manually and with torchmetrics?
*   How do you compute AUROC?

See the comments below and fill in the analysis where there is `_FILL_` specified. All asserts should pass and accuracy should be higher than 85%. If you use another nonlinearity, like ReLU, you might get higher. Play around with this but submit working code that does better than 85%.







In [1]:
!pip install torchmetrics

Collecting torchmetrics
  Downloading torchmetrics-1.2.0-py3-none-any.whl (805 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m805.2/805.2 kB[0m [31m6.0 MB/s[0m eta [36m0:00:00[0m
Collecting lightning-utilities>=0.8.0 (from torchmetrics)
  Downloading lightning_utilities-0.9.0-py3-none-any.whl (23 kB)
Installing collected packages: lightning-utilities, torchmetrics
Successfully installed lightning-utilities-0.9.0 torchmetrics-1.2.0


In [2]:
import torchvision
from torchvision import transforms
import torch
from torch.utils.data import DataLoader, TensorDataset
import torch.nn as nn
import torchmetrics

In [3]:
SEED = 1
torch.manual_seed(SEED)
_FILL_ = '_FILL_'

In [4]:
image_path = './'

# Use ToTensor to transform the data and scale it by 255 #??
# Look up transforms and Compose as well
# transforms.ToTensor():
# Converts a PIL Image or numpy.ndarray (H x W x C) in the range [0, 255] to
# a torch.FloatTensor of shape (C x H x W) in the range [0.0, 1.0]
# if the PIL Image belongs to one of the modes
transform = transforms.ToTensor()
# transforms.Compose([transforms.ToTensor()])

mnist_train_dataset = torchvision.datasets.MNIST(
    root=image_path,
    train=True,
    transform=transform,
    download=True
  )

mnist_test_dataset = torchvision.datasets.MNIST(
    root=image_path,
    train=False,
    transform=transform,
    download=False
)

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ./MNIST/raw/train-images-idx3-ubyte.gz


100%|██████████| 9912422/9912422 [00:00<00:00, 122499119.74it/s]


Extracting ./MNIST/raw/train-images-idx3-ubyte.gz to ./MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to ./MNIST/raw/train-labels-idx1-ubyte.gz


100%|██████████| 28881/28881 [00:00<00:00, 95911079.83it/s]

Extracting ./MNIST/raw/train-labels-idx1-ubyte.gz to ./MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to ./MNIST/raw/t10k-images-idx3-ubyte.gz



100%|██████████| 1648877/1648877 [00:00<00:00, 31428441.44it/s]


Extracting ./MNIST/raw/t10k-images-idx3-ubyte.gz to ./MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to ./MNIST/raw/t10k-labels-idx1-ubyte.gz


100%|██████████| 4542/4542 [00:00<00:00, 21429166.22it/s]


Extracting ./MNIST/raw/t10k-labels-idx1-ubyte.gz to ./MNIST/raw



In [5]:
BATCH_SIZE = 64
LR = 0.001
EPOCHS = 20
# Define the DL for train and test
train_dl = DataLoader(mnist_train_dataset, BATCH_SIZE, shuffle=True)
test_dl = DataLoader(mnist_test_dataset, BATCH_SIZE, shuffle=True)

In [6]:
class MLPClassifier(nn.Module):

  def __init__(self):
    super().__init__()
    # Define the layers
    self.linear1 = nn.Linear(784, 32)
    self.linear2 = nn.Linear(32, 16)
    self.linear3 = nn.Linear(16, 10)

  def forward(self, x):
    # Flatten x to be of last dimension 784
    x = x.view(x.size(0), -1)

    # Pass through linear layer 1
    x = self.linear1(x)

    # Apply tanh
    x = nn.functional.tanh(x)

    # Pass through linear layer 2
    x = self.linear2(x)

    # Apply tanh
    x = nn.functional.tanh(x)

    # Pass through linear layer 3
    x = self.linear3(x)

    # Return the LogSoftmax of the data
    # This will affect the loss we choose below
    return nn.functional.log_softmax(x, dim=1)

model = MLPClassifier()

In [7]:
# Get the loss function; remember you are outputting the LogSoftmax so be careful what loss you pick
'''
nn.CrossEntropyLoss() combines nn.LogSoftmax() (that is, log(softmax(x))) and nn.NLLLoss() in one single class. 
Therefore, the output from the network that is passed into 
nn.CrossEntropyLoss needs to be the raw output of the network (called logits), not the output of the softmax function.
'''
loss_fn = nn.NLLLoss()

# Set the optimizer to SGD and let the learning rate be LR
optimizer = torch.optim.SGD(model.parameters(), lr=LR)

torch.manual_seed(SEED)
for epoch in range(EPOCHS):
    accuracy_hist_train = 0
    auroc_hist_train = 0.0
    loss_hist_train = 0
    # Loop through the x and y pairs of data
    for x_batch, y_batch in train_dl:
        # Get the model predictions
        pred = model(x_batch)

        # Get the loss
        loss = loss_fn(pred, y_batch)

        # Get the gradients
        loss.backward()

        # Add to the loss
        # Remember loss: is a mean over the batch size and we need the total sum over the number of samples in the dataset
        loss_hist_train += loss.item() * len(y_batch)

        # Update the prameters
        optimizer.step()

        # Zero out the gradient
        optimizer.zero_grad()

        # Get the number of correct predictions, do this directly
        # This should be a tensor
        is_correct_1 = (pred.argmax(dim=1) == y_batch)

        # Get the number of correct predictions, do this with torchmetrics
        # This should be a Float
        is_correct_2 = torchmetrics.Accuracy(task="multiclass", num_classes=10)(pred.argmax(dim=1), y_batch).item() * len(y_batch)

        assert(is_correct_1.sum() ==  is_correct_2)

        accuracy_hist_train += is_correct_2 # is_correct_2 is a scalar

        # Get the AUROC - make sure to multiply by the batch length since this is just the AUC over the batch and you want to take a weighted average later
        auroc_hist_train += torchmetrics.AUROC(task="multiclass", num_classes=10)(pred, y_batch).item() * len(y_batch)
    accuracy_hist_train /= len(train_dl.dataset)
    auroc_hist_train /= len(train_dl.dataset)
    loss_hist_train /= len(train_dl.dataset)
    print(f'Train Metrics Epoch {epoch} Loss {loss_hist_train:.4f} Accuracy {accuracy_hist_train:.4f} AUROC {auroc_hist_train:.4f}')

    accuracy_hist_test = 0
    auroc_hist_test = 0.0
    loss_hist_test = 00
    # Get the average value of each metric across the test batches
    # Add a "with" clause here so that no gradients are computed; we want to just evaluate the model
    with torch.no_grad():
      accuracy_hist_test = 0
      auroc_hist_test = 0.0
      # Loop through the x and y pairs of data
      for x_batch, y_batch in test_dl:
          # Get he the model predictions
          pred = model(x_batch)

          # Get the loss
          loss = loss_fn(pred, y_batch)

          # Add to the loss
          # Remember loss: is a mean over the batch size and we need the total sum over the number of samples in the dataset
          loss_hist_test += loss.item() * len(y_batch)

          # Get the number of correct predictions via torchmetrics
          is_correct = torchmetrics.Accuracy(task="multiclass", num_classes=10)(pred.argmax(dim=1), y_batch).item() * len(y_batch)

          # Get the accuracy
          accuracy_hist_test += torchmetrics.Accuracy(task="multiclass", num_classes=10)(pred.argmax(dim=1), y_batch).item() * len(y_batch)

          # Get AUROC
          auroc_hist_test += torchmetrics.AUROC(task="multiclass", num_classes=10)(pred, y_batch).item() * len(y_batch)
      # Normalize the metrics by the right number
      accuracy_hist_test /= len(test_dl.dataset)
      auroc_hist_test /= len(test_dl.dataset)
      loss_hist_test /= len(test_dl.dataset)
      print(f'Test Metrics Epoch {epoch} Loss {loss_hist_test:.4f} Accuracy {accuracy_hist_test:.4f} AUROC {auroc_hist_test:.4f}')



Train Metrics Epoch 0 Loss 2.2644 Accuracy 0.2251 AUROC 0.7626
Test Metrics Epoch 0 Loss 2.2155 Accuracy 0.3305 AUROC 0.8753
Train Metrics Epoch 1 Loss 2.1683 Accuracy 0.4055 AUROC 0.8955
Test Metrics Epoch 1 Loss 2.1078 Accuracy 0.5435 AUROC 0.9100
Train Metrics Epoch 2 Loss 2.0493 Accuracy 0.5686 AUROC 0.9182
Test Metrics Epoch 2 Loss 1.9734 Accuracy 0.6048 AUROC 0.9290
Train Metrics Epoch 3 Loss 1.9050 Accuracy 0.6139 AUROC 0.9329
Test Metrics Epoch 3 Loss 1.8189 Accuracy 0.6390 AUROC 0.9396
Train Metrics Epoch 4 Loss 1.7482 Accuracy 0.6353 AUROC 0.9403
Test Metrics Epoch 4 Loss 1.6618 Accuracy 0.6526 AUROC 0.9459
Train Metrics Epoch 5 Loss 1.5959 Accuracy 0.6420 AUROC 0.9471
Test Metrics Epoch 5 Loss 1.5162 Accuracy 0.6528 AUROC 0.9515
Train Metrics Epoch 6 Loss 1.4587 Accuracy 0.6486 AUROC 0.9530
Test Metrics Epoch 6 Loss 1.3877 Accuracy 0.6642 AUROC 0.9557
Train Metrics Epoch 7 Loss 1.3386 Accuracy 0.6676 AUROC 0.9580
Test Metrics Epoch 7 Loss 1.2751 Accuracy 0.6939 AUROC 0.9593


In [9]:
# Get train/test final accuracy directly; make sure you normalize the data by 255.0
# Should be around 85%
pred = model(mnist_train_dataset.data.float() / 255.0 )
is_correct = ((pred.argmax(dim=1)) == mnist_train_dataset.targets).float()
print(f'Total Final Test accuracy: {is_correct.mean():.4f}')

pred = model(mnist_test_dataset.data.float() / 255.0)
is_correct = ((pred.argmax(dim=1)) == mnist_test_dataset.targets).float()
print(f'Total Final Test accuracy: {is_correct.mean():.4f}')


Total Final Test accuracy: 0.8555
Total Final Test accuracy: 0.8593
