![Practicum AI Logo image](images/practicum_ai_logo.png) <img src='images/practicumai_deep_learning.png' alt='Practicum AI: Deep Learning Foundations icon' align='right' width=50>

***
# *Practicum AI:* Deep Learning - MNIST Classifier

This exercise adapted from Baig et al. (2020) <i>The Deep Learning Workshop</i> from <a href="https://www.packtpub.com/product/the-deep-learning-workshop/9781839219856">Packt Publishers</a> (Exercise 2.07, page 92).

## Amelia's AI Adventure Continues...

<img alt="A cartoon of Dr. Amelia's dog looking at a computer with a stack of papers next to it showing some handwritten digits." src="images/Amelias_Dog_MNIST.jpg" padding=20 align="right" width=250>Amelia and her nutrition studies are back! After her adventures with image recognition and binary classification, she's curious to dive deeper. 

While Amelia's data collection process is working for most participants in her study, some do not like using the phone application to submit their survey responses. They keep sending in handwritten responses. Realizing that the data from these study participants is still vital to her research, Dr. Amelia is now looking to automate entering these responses using a program to read the numbers that make up the survey responses.

Again, Amelia decides to start with the basics: recognizing handwritten numbers. That's where the MNIST dataset comes in. With its vast collection of handwritten digits, it's the perfect training ground for Amelia's next AI venture.

**Note:** The cartoon of Dr Amelia's dog was generated with AI's assistance.

Training a model on the MNIST dataset is often considered the "Hello world!" of AI. It is a commonly used first introduction to image recognition with deep learning.


![AI Application Development Pathway model](https://github.com/PracticumAI/deep_learning_2_draft/blob/main/M3-AppDev.00_00_22_23.Still001.png?raw=true)

 >&#128221; While you're going through this notebook, see if you can figure out which steps here are associated with each of the steps of the Development Pathway.

## MNIST Handwritten Digit Classification Dataset

The [MNIST](http://yann.lecun.com/exdb/mnist/) (Modified National Institute of Standards and Technology) training dataset contains 60,000 28×28 pixel grayscale images of handwritten single digits between 0 and 9, with an additional 10,000 images available for testing. 

The MNIST dataset is frequently used in machine learning research and has become a standard benchmark for image classification models. Top-performing models often achieve a classification accuracy above 99%, with an error rate between 0.4% and 0.2% on the hold-out test dataset.

In this exercise, you will implement a deep neural network (multi-layer) capable of classifying these images of handwritten digits into one of 10 classes. 

Amelia knows that to start any AI project, she'll need the right tools. She begins by importing the necessary libraries to set the stage for her digit-reading neural network.

## 1. Import libraries

Import the necessary libraries.

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torch.utils.data import DataLoader, TensorDataset
import torchvision
import torchvision.transforms as transforms
import pytorch_lightning as pl
from pytorch_lightning import Trainer

import pandas as pd 
import numpy as np

import matplotlib.pyplot as plt  # Import the matplotlib library for plotting and visualization.


## 2. Load the MNIST dataset

Amelia will need to import the MNIST dataset from PyTorch's [torchvision.datasets module](https://pytorch.org/vision/stable/datasets.html#mnist). The `train_features` and `val_features` variables contain the training and test images, while `train_labels` and `val_labels` contain the corresponding labels for each item in those datasets.  

Notice that when training with MNIST data, the normalization is different than with the Imagenet data.


In [None]:
# Define transforms to convert PIL images to tensors and normalize
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.1307,), (0.3081,))  # MNIST mean and std
])

# Load the MNIST dataset from torchvision
train_dataset = torchvision.datasets.MNIST(root='./data', train=True, download=True, transform=transform)
val_dataset = torchvision.datasets.MNIST(root='./data', train=False, download=True, transform=transform)

# Extract features and labels for compatibility with visualization parts below
train_features = train_dataset.data.numpy()
train_labels = train_dataset.targets.numpy()
val_features = val_dataset.data.numpy()
val_labels = val_dataset.targets.numpy()

## 3. Visualize the data

Before we start to work with data, it is always good to get a better idea of what we are working with.

How many images do we have in our training and testing datasets?

**Note**: We are using the un-transformed `train_features` here. Later, when we train the model, we will use `train_dataset` and a dataloader that will transform the images.


In [None]:
print(f"Training images: {len(train_features)}")
print(f"Test images: {len(val_features)}")
print(f"Image shape: {train_features[0].shape}")

Let's have a look at a random image. You can run this cell multiple times and get a different image each time.

In [None]:
# Set line width for numpy array printing
np.set_printoptions(linewidth=150)

# Select a random number from train_features
select = np.random.randint(0,len(train_features))

# Print the image array - longer line length above should allow it to have all 28 rows in 1 line
print(train_features[select])

# Display the image as an actual image
plt.imshow(train_features[select], cmap='gray')
plt.show()

# Print the true label for the image from train_labels
print(f"The true label for this image is a {train_labels[select]}.")

The ouptut of the cell above should help clarify how images are encoded in our data. Each pixel has a value from 0 (black) to 255 (white). Since our images are black and white, we only have one grid of pixels. For color images, we would have three: one for each color, red, green, blue.

Our datasets have 60,000 images in the `train_features` and 10,000 images in the `val_features`. We will use these data as we move forward.

## 4. Build the sequential model using PyTorch Lightning

Now, the fun part begins! Amelia sets out to build her neural network. In the previous exercises, Amelia called a pre-trained model for image recognition and then built a single-layer network for her binary classifier. With her confidence high, she is going to create this model herself out of multiple layers. This approach gives her (and you!) the most control over the function of the model.

Using PyTorch Lightning, we'll create a model class that encapsulates the neural network architecture and training logic. The model will have the following structure:

* First, add a flattened layer to unroll the 28x28 pixel images into a single array of 784. 
* Add a dense hidden layer with 50 units (neurons) and ReLU (Rectified Linear Unit) activation function.
   * The ReLU function will allow the model to capture non-linearities.
* Add a second, dense hidden layer with 20 units and ReLU activation function.
* Add a dense output layer with 10 units and the softmax activation function.
   * We use ten neurons, each representing the digits 0-9. 
   * The softmax function ensures the output values are probabilities that sum to 1, making it suitable for classification.

Here's a graphical view of what we are doing:

![A diagram of the neural network being created. It shows the input 28X28 image being flattened into a 784 dimension array. That is the input. There are two hidden, fully connected layers with 50 and 20 neurons each. The final output layer has 10 neurons for the 10 classes in our model.](images/MNIST_neural_network.png)



In [None]:
# Define our model with improved logging for epoch metrics only
class MNISTClassifier(pl.LightningModule):
    def __init__(self):
        super().__init__()
        # Define the layers of the model
        self.flatten = nn.Flatten()        # Flatten 28x28 to 784
        self.fc1 = nn.Linear(784, 50)      # 784 inputs to 50 neurons
        self.fc2 = nn.Linear(50, 20)       # 50 outputs to 20 neurons
        self.fc3 = nn.Linear(20, 10)       # 20 to 10 output classes
        
    def forward(self, x):
        # Define how the data flows through the layers of the model
        # Also add in activation functions and other options
        x = self.flatten(x)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = F.log_softmax(self.fc3(x), dim=1)  # Log softmax for NLLLoss
        return x
    
    def training_step(self, batch, batch_idx):
        # Define how the model is trained
        x, y = batch
        y_hat = self(x)
        loss = F.nll_loss(y_hat, y)
        preds = torch.argmax(y_hat, dim=1)
        acc = (preds == y).float().mean()
        self.log('train_loss', loss, on_step=False, on_epoch=True)
        self.log('train_acc', acc, on_step=False, on_epoch=True)
        return loss
    
    def validation_step(self, batch, batch_idx):
        x, y = batch
        y_hat = self(x)
        loss = F.nll_loss(y_hat, y)
        preds = torch.argmax(y_hat, dim=1)
        acc = (preds == y).float().mean()
        self.log('val_loss', loss, on_step=False, on_epoch=True)
        self.log('val_acc', acc, on_step=False, on_epoch=True)
        return {'val_loss': loss, 'val_acc': acc}
    
    def configure_optimizers(self):
        # Define the optimizer and learning rate
        return optim.Adam(self.parameters(), lr=0.001)

# Instantiate the model
model = MNISTClassifier()

## 5. Prepare the data loaders

In PyTorch, we need to create data loaders to efficiently batch and iterate through our data during training. PyTorch Lightning works seamlessly with PyTorch's DataLoader.

```python
# Create data loaders
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=64, shuffle=False)
```

In [None]:
# Code it!


## 6. Inspect the model configuration using print

Display a summary of the model's architecture, including the layers, their shapes, and the number of parameters.


In [None]:
print(model)

# Count the total parameters
total_params = sum(p.numel() for p in model.parameters())
print(f"\nTotal parameters: {total_params}")

The model summary indicates that this model has 40,480 parameters (weights and biases). **Note**: If your model does not show `Total parameters: 40480`, double check your model was set up correctly.

## 7. Train the model using PyTorch Lightning

Now, train the model on the MNIST dataset using PyTorch Lightning's `Trainer`. We'll set the training to run for 10 epochs.

Train the model using the training data:
* `train_loader`: the DataLoader containing input images and labels
* `max_epochs=10`: the number of times the model will cycle through the entire dataset

```python
# Create a PyTorch Lightning trainer
trainer = Trainer(max_epochs=10, enable_progress_bar=True)

# Train the model
trainer.fit(model, train_loader)
```

In [None]:
# Code it!


## 8. Evaluate the model

Finally, evaluate your model's performance on the test set. 

In [None]:
trainer.validate(model, dataloaders=val_loader)

## 9. Model predictions

Let's see how the model performs on some randomly selected images.  Are its predictions correct?  

Randomly select an image from the test dataset, in this case, the 200th image.

Select a specific image from the test dataset for examination or prediction.

The variable `loc` is set to the index 200, which means we are selecting the 201st image (0-based index) from the test dataset.

```python
loc = 200

# Extract the corresponding image from the val_features array and store it in the 'val_image' variable.
val_image = val_features[loc]
```

In [None]:
# Code it!


First, let's take a look at the shape of the image.

* Get and display the shape (dimensions) of the `val_image` variable.
* This provides insight into the structure and size of the image.

```python
val_image.shape
```

In [None]:
# Code it!


We see that our image is 28x28 pixels. However, the model needs not just the size of the image but also the batch dimension. A simple call to the `reshape()` method or `unsqueeze()` fixes that problem. 

* Reshape the `val_image` from a 2D array (28x28) to a 3D array (1x28x28).
* This is commonly done to match the input shape that the model expects when making predictions on single samples.

```python
val_image_tensor = torch.tensor(val_image, dtype=torch.float32).unsqueeze(0)  # Add batch dimension
```

In [None]:
# Code it!


Now call the model's forward pass to make a prediction, assign the output to result, and then view its contents.

* Use the trained model to predict the label for the `val_image_tensor`.
* The model returns log probabilities, so we'll convert them to probabilities and display both.
* Each value in the array corresponds to the model's predicted probability that the image belongs to a particular class (digit).


In [None]:
model.eval()
with torch.no_grad():
    result = model(val_image_tensor)
    probabilities = torch.exp(result)  # Convert log probabilities to probabilities

# Print the array of probabilities to the console.
print("Log probabilities:", result)
print("Probabilities:", probabilities)

As we see, the model has returned the probability of 10 predictions, with the highest one being the most likely.  Use the `argmax` function to see the model's prediction.

* Use the `argmax` method to find the index (label) of the maximum value in the `result` tensor.
   * This gives us the model's most likely prediction for the class (digit) of the `val_image`.

```python
predicted_digit = result.argmax(dim=1).item()
print(f"Predicted digit: {predicted_digit}")
```

In [None]:
# Code it!


To verify the prediction, check the label of the corresponding image.

* Using the index loc, retrieve the true label (actual digit) for the `val_image` from the `val_labels` array.
   * This gives us the actual class (digit) of the `val_image` to compare with the model's prediction.

```python
true_digit = val_labels[loc]
print(f"True digit: {true_digit}")
```

In [None]:
# Code it!


Finally, visualize the image with pyplot.

* Use the `imshow` function from the `matplotlib` library to display the `val_image` as a visual image.
   * This helps in visually examining the content of the `val_image` (which is represented as a 28x28 array of pixel values).


In [None]:
plt.imshow(val_features[loc], cmap='gray')
plt.title(f'Predicted: {predicted_digit}, True: {true_digit}')
plt.show()

And we did it! We helped Amelia create a model that can recognize handwritten digits!


## Bonus exercise

* Write a function that ties all these steps into one function call. The function should take an input image and print the image with the predicted digit and true digit.

## Before continuing
###  <img src='images/alert_icon.svg' alt="Alert icon" width=40 align=center> Alert!
> Before continuing to another notebook within the same Jupyter session,
> use the **"Running Terminals and Kernels" tab** (below the File Browser tab) to **shut down this kernel**. 
> This will free up this notebook's GPU memory, making it available for
> your next notebook.
>
> Every time you run multiple notebooks within a Jupyter session with a GPU, this should be done.
>
> ![Screenshot of the Running Terminals and Kernels tab used t oshut down kernels before starting a new notebook](images/stop_kernel.png)

----
## Push changes to GitHub <img src="images/push_to_github.png" alt="Push to GitHub icon" align="right" width=150>

 Remember to **add**, **commit**, and **push** the changes you have made to this notebook to GitHub to keep your repository in sync.

In Jupyter, those are done in the git tab on the left. In Google Colab, use File > Save a copy in GitHub.
