### Using NN to recognize handwritten digits


We will be building a neural network to classify images to their respective digits.  

We will build and train a model on the classic **MNIST Handwritten Digits** dataset. Each grayscale image is a $28 \times 28$ matrix/tensor that looks like so:

<img src="https://upload.wikimedia.org/wikipedia/commons/2/27/MnistExamples.png" width="500" />

MNIST is a classification problem and the task is to take in an input image and classify them into one of ten buckets: the digits from $0$ to $9$. 

In [None]:
# RUN THIS CELL FIRST
import math
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
from jinja2 import optimizer

### Loading an external dataset

The cell below imports the MNIST dataset, which is already pre-split into train and test sets.  

The download takes approximately 63MB of space.

In [None]:
pip install torchvision

In [None]:
# DO NOT REMOVE THIS CELL – THIS DOWNLOADS THE MNIST DATASET
# RUN THIS CELL BEFORE YOU RUN THE REST OF THE CELLS BELOW
# Install torchvision
from torchvision import datasets

# This downloads the MNIST datasets ~63MB
mnist_train = datasets.MNIST("./", train=True, download=True)
mnist_test  = datasets.MNIST("./", train=False, download=True)

x_train = mnist_train.data.reshape(-1, 784) / 255
y_train = mnist_train.targets
    
x_test = mnist_test.data.reshape(-1, 784) / 255
y_test = mnist_test.targets

### Define the model architechure and implement the forward pass
Create a 3-layer network in the `__init__` method of the model `DigitNet`.  
These layers are all `Linear` layers and should correspond to the following the architecture:

<img src="img_linear_nn.png" width="600">

In our data, a given image $x$ has been flattened from a 28x28 image to a 784-length array.

After initializing the layers, stitch them together in the `forward` method. The network should look like so:

$$x \rightarrow \text{Linear(512)} \rightarrow \text{ReLU} \rightarrow \text{Linear(128)} \rightarrow \text{ReLU} \rightarrow \text{Linear(10)} \rightarrow \text{Softmax} \rightarrow \hat{y}$$

**Softmax Layer**: The final softmax activation is commonly used for classification tasks, as it will normalizes the results into a vector of values that follows a probability distribution whose total sums up to 1. The output values are between the range [0,1] which is nice because we are able to avoid binary classification and accommodate as many classes or dimensions in our neural network model.

In [None]:
import torch.nn.functional as F
class DigitNet(nn.Module):
    def __init__(self, input_dimensions, num_classes): # set the arguments you'd need
        super().__init__()
        """
        - create the 3 layers (and a ReLU layer) using the torch.nn layers API
        """
        self.fc1 = nn.Linear(input_dimensions, 512)
        self.fc2 = nn.Linear(512, 128)
        self.fc3 = nn.Linear(128, num_classes)
        
    def forward(self, x, softmax_output=False):
        """
        Performs the forward pass for the network.
        
        PARAMS:
            x : the input tensor (batch size is the entire dataset)
            
        RETURNS
            the output of the entire 3-layer model
        """
        
        """        
        - pass the inputs through the sequence of layers
        - run the final output through the Softmax function on the right dimension!
        """
        x = F.relu(self.fc1(x))  # First layer + ReLU
        x = F.relu(self.fc2(x))  # Second layer + ReLU
        x = self.fc3(x)          # Third layer (logits)
        if softmax_output:       # Apply Softmax if specified
            x = F.softmax(x, dim=1)
        return x

### 3.2 Training Loop

As demonstrated in Section 3.2, implement the function `train_model` that performs the following for every epoch/iteration:

1. set the optimizer's gradients to zero
2. forward pass
3. calculate the loss
4. backpropagate using the loss
5. take an optimzer step to update weights

This time, use the Adam optimiser to train the network.  
Use Cross-Entropy Loss, since we are performing a classification.  
Train for 20 epochs.  

In [None]:
from torch.utils.data import TensorDataset, DataLoader
import torch.optim as optim

def train_model(x_train, y_train, epochs=20):
    """
    Trains the model for 20 epochs/iterations
    
    PARAMS:
        x_train : a tensor of training features of shape (60000, 784)
        y_train : a tensor of training labels of shape (60000, 1)
        epochs  : number of epochs, default of 20
        
    RETURNS:
        the final model 
    """
    model = DigitNet(784, 10)
    optimiser = optim.Adam(model.parameters(), lr=1e-3) # use Adam
    loss_fn = nn.CrossEntropyLoss()   # use cross-entropy loss

    dataset = TensorDataset(x_train, y_train)
    dataloader = DataLoader(dataset, batch_size=64, shuffle=True)

    for epoch in range(epochs):
        for batch_x, batch_y in dataloader:
            optimiser.zero_grad()
            y_pred = model(batch_x)  # No Softmax here
            loss = loss_fn(y_pred, batch_y)
            loss.backward()
            optimiser.step()
        
        if (epoch + 1) % 5 == 0:
            print(f"Epoch [{epoch + 1}/{epochs}], Loss: {loss.item():.4f}")

    return model
                
digit_model = train_model(x_train, y_train)

## Explore the model

Now that we have trained the model, let us run some predictions on the model.

In [None]:
# This is a demonstration: You can use this cell for exploring your trained model
digit_model.eval()
with torch.no_grad():
    for idx in range(3):
        scores = digit_model(x_test[idx:idx+1], softmax_output=True)
        print(scores)
        _, predictions = torch.max(scores, 1)
        print("true label:", y_test[idx].item())
        print("pred label:", predictions[0].item())
        
        plt.imshow(x_test[idx].numpy().reshape(28, 28), cmap='gray')
        plt.axis("off")
        plt.show()

### Evaluate the model

Now that we have trained the model, we should evaluate it using our test set.  
We will be using the accuracy (whether or not the model predicted the correct label) to measure the model performance.  

Since our model takes in a (n x 784) tensor and returns a (n x 10) tensor of probability scores for each of the 10 classes, we need to convert the probability scores into the actual predictions by taking the index of the maximum probability.  

In [None]:
def get_accuracy(scores, labels):
    """
    helper function that returns accuracy of model (out of 100%)
    PARAMS:
        scores : the raw softmax scores of the network
        label : the ground truth labels
        
    RETURNS:
        accuracy out of 100%
    """
    # Get the predicted class by finding the index of the maximum score
    predictions = torch.argmax(scores, dim=1)
    
    # Compare predictions with the ground truth labels
    correct_predictions = (predictions == labels).float()
    
    # Compute accuracy as a percentage
    accuracy = correct_predictions.mean().item() * 100
    
    return accuracy


scores = digit_model(x_test) # n x 10 tensor
get_accuracy(scores, y_test)