<center>
<table>
  <tr>
    <td><img src="https://portal.nccs.nasa.gov/datashare/astg/training/python/logos/nasa-logo.svg" width="100"/> </td>
     <td><img src="https://portal.nccs.nasa.gov/datashare/astg/training/python/logos/ASTG_logo.png?raw=true" width="80"/> </td>
     <td> <img src="https://www.nccs.nasa.gov/sites/default/files/NCCS_Logo_0.png" width="130"/> </td>
    </tr>
</table>
</center>

        
<center>
<h1><font color= "blue" size="+3">ASTG Python Course Series</font></h1>
</center>

---

<center>
    <h1><font color="red">Image Classification Model with PyTorch</font></h1>
</center>

# <font color="red">Objectives</font>

In this presentation, we show how to build a Machine Learning (ML) model with PyTorch for an image classification problem.
We cover the following:

- Introduce the MNIST dataset
- Read the MNIST dataset to create PyTorch tensors.
- Set the hyperparameters
- Create a ML model
- Train the model
- Evaluate the model

# <font color="red">References</font>

- [PyTorch](https://pytorch.org/) from pytorch.org
- [Efficiently Building PyTorch Models: A Step-by-Step Guide](https://myscale.com/blog/efficient-pytorch-model-building-step-by-step-guide/) from myscale.com
- [MNIST Handwritten Digit Recognition in PyTorch](https://nextjournal.com/gkoehler/pytorch-mnist) by Gregor Koehler et al.
- [Create and train a PyTorch model for digit classification using the MNIST dataset](https://learn.arm.com/learning-paths/cross-platform/pytorch-digit-classification-arch-training/) from learn.arm.com

# <font color="red"> Python packages used</font>

- __Matplotlib__: Create visualization.
- __Pandas__: Data (two-dimensional labelled array) manipulation and analysis.
- __PyTorch__: Used to to build, train, and evaluate a deep machine learning algorithm based on Neural Networks.

In [None]:
try:
    import google.colab
    print("Running in Google Colab")
except:
    print("Not running in Google Colab")
else:
    print("Installing modules in Google Colab")
    !pip3 uninstall --yes torch torchaudio torchvision torchtext torchdata
    !pip3 install torch torchaudio torchvision torchtext torchdata

In [None]:
import warnings
warnings.filterwarnings('ignore')

In [None]:
import matplotlib.pyplot as plt

In [None]:
import numpy as np

In [None]:
import pandas as pd

In [None]:
import torch
from torch import nn
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader, Subset
from torchvision import datasets
from torchvision import transforms 

# <font color="red">Image Classification</font> 

We use the [MNIST data set](http://yann.lecun.com/exdb/mnist/) (Modified National Institute of Standards and Technology database).

* Is a large database of handwritten digits that is commonly used for training various image processing systems.
* The database is also widely used for training and testing in the field of machine learning.
* The dataset we will be using contains 70000 images of handwritten digits (`0-9`) among which 10000 are reserved for testing.
* Each image has `28x28` pixels.
* It is a good database for people who want to try learning techniques and pattern recognition methods on real-world data while spending minimal efforts on preprocessing and formatting.

![TSF](https://static.javatpoint.com/tutorial/tensorflow/images/mnist-dataset-in-cnn.jpg)
Image Source: [https://www.javatpoint.com/tensorflow-mnist-dataset-in-cnn](https://www.javatpoint.com/tensorflow-mnist-dataset-in-cnn)

# <font color="red">Loading the dataset</font>

## <font color="blue">Read the data</font>

We use `datasets.MNIST()` to get the dataset.

- `transforms.Compose()`: Combine multiple data transformation operations.
- `transforms.ToTensor()`: Convert the data to a tensor.
- `transforms.Normalize()`: Standardize the data, i.e., subtract the mean (`0.1307`) and divide by the standard deviation (`0.3081`). 

In [None]:
transform = transforms.Compose(
    [transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))]
)

In [None]:
train_dataset = datasets.MNIST(
    root='data', 
    train=True, 
    download=True, 
    transform=transform
)

In [None]:
test_dataset = datasets.MNIST(
    root='data', 
    train=False, 
    download=True, 
    transform=transform
)

In [None]:
len(train_dataset)

In [None]:
len(test_dataset)

## <font color="blue">Visualize the data</font>

Select 96 random images in the train dataset and display them.

In [None]:
import random
numbers = random.sample(range(0, len(train_dataset)-1), 96) 

In [None]:
sample_train_dataset = Subset(train_dataset, numbers)

In [None]:
len(sample_train_dataset)

In [None]:
def display_digits(sample_dataset):
    """
      Given an array of images of digits X and 
      the corresponding values of the digit y,
      this function plots the first 96 images and their values.
    """
    # Figure size (width, height) in inches
    fig = plt.figure(figsize=(8, 6))

    # Adjust the subplots 
    fig.subplots_adjust(left=0, right=1, bottom=0, top=1, hspace=0.05, wspace=0.05)

    for i in range(96):
        X, y = sample_dataset[i]
        # Initialize the subplots: 
        #    Add a subplot in the grid of 8 by 12, at the i+1-th position
        ax = fig.add_subplot(8, 12, i + 1, xticks=[], yticks=[])
        
        # Display an image at the i-th position
        ax.imshow(X.reshape(28, 28), cmap=plt.cm.binary)#, interpolation='nearest')

       
        # label the image with the target value
        ax.text(0, 7, str(y))

    # Show the plot
    plt.show()

In [None]:
display_digits(sample_train_dataset)

# <font color="red">Creating the ML model</font>

## <font color="blue">Set the hyperparameters</font>

It is a good practice to declare the following parameters before creating the model for ease of change and understanding.

__Dataset parameters__

These parameters are defines by the dataset used:

- number of features
- number of classes to predict

We have here a `28x28` image as input and a number between 0 and 9 as ouput of the neural network.


https://learn.arm.com/learning-paths/cross-platform/pytorch-digit-classification-arch-training/model/

In [None]:
input_size = 28*28
num_classes = 10

__Model parameters__

- batch size
- number of epochs
- learning rate (optimizer steps)

In [None]:
num_epochs = 10
batch_size_train = 64
batch_size_test = 1000
learning_rate = 0.01
num_hidden_nodes = 96
momentum = 0.5
log_interval = 100

random_seed = 1
torch.backends.cudnn.enabled = False
torch.manual_seed(random_seed)

Device configuration:

In [None]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

## <font color="blue">Building the ML model with PyTorch</font>

__Class to create a multi-layer model.__

We define a `ImageClassifierNetwork` class which consists of two main components:

1. `__init__()` method

We create a sequential network consisting of:

- A fully-connected (Linear) layer with `num_hidden_nodes` nodes, followed by the `Tanh` activation function.
- A Dropout layer with a `20%` dropout rate to prevent overfitting.
- A second Linear layer, with `num_hidden_nodes` nodes, followed by the `Sigmoid` activation function.
- Another Dropout layer, that removes `20%` of the nodes.
- A final Linear layer, with `num_classes` nodes (matching the number of classes in the dataset), followed by a Softmax activation function that outputs class probabilities.

The input is first flattened from its original `28x28` pixel format into a 1D array of 784 elements using nn.Flatten().

2. `forward()` method

- This method defines the forward pass of the network.
- It takes an input tensor `x`, flattens it using `self.flatten`, and then passes it through the defined sequential stack of layers (`self.net`).

The output, called `logits`, represents the class probabilities for the digit prediction.

In [None]:
class ImageClassifierNetwork(nn.Module):
    def __init__(self, input_size, num_hidden_nodes, num_classes):
        super(ImageClassifierNetwork, self).__init__()
        self.flatten = nn.Flatten()
        self.net = nn.Sequential(
            nn.Linear(input_size, num_hidden_nodes),
            #nn.ReLU(),
            nn.Tanh(),
            nn.Dropout(.2),
            
            nn.Linear(num_hidden_nodes, num_hidden_nodes),
            #nn.ReLU(),
            nn.Sigmoid(),
            nn.Dropout(.2),
            
            nn.Linear(num_hidden_nodes, num_classes),
            nn.Softmax(dim=1)
        )

    def forward(self, x):
        x = self.flatten(x)
        output = self.net(x)
        return output

The total number of trainable parameters for this network is calculated as follows:

- First hidden layer: $input\_size \times num\_hidden\_nodes + num\_hidden\_nodes$  parameters (weights and biases).
- Second hidden layer: $num\_hidden\_nodes \times num\_hidden\_nodes + num\_hidden\_nodes$ parameters.
- Output layer: $num\_hidden\_nodes \times num\_classes + num\_classes$ parameters.

Note that we do not have any activation function here because there is only one layer:
- Activation functions make deep learning possible.
   - Inserting non-linear activation functions between layers is what allows a deep learning model to simulate any function, rather than just linear ones.
- The model defined above can be seen as a single matrix multiplication.

__Create the model__

In [None]:
torch.manual_seed(1)

model = ImageClassifierNetwork(
    input_size=input_size, 
    num_hidden_nodes=num_hidden_nodes, 
    num_classes=num_classes
)

In [None]:
model.to(device)

__Print model information__

In [None]:
print('\t Model information: \n')
print(model)

In [None]:
print('\t Model parameters: \n')
for param in model.parameters():
    print(param)

In [None]:
for name, param in model.named_parameters():
    print(f"Parameter name: {name}, Parameter values: {param}")

__Determine the number of trainable parameters per level__

In [None]:
def print_trainable_parameters_per_layer(model):
    n = 20
    m = 10
    p = n+m+2
    print(f"{'-'*p}")
    print(f"{'Modules':<{n}}  {'Parameters':{m}}")
    print(f"{'-'*p}")
    for name, param in model.named_parameters():
        if param.requires_grad:
            print(f"{name:<{n}}  {param.numel():{m}}")

In [None]:
print_trainable_parameters_per_layer(model)

#### <font color="green">Basic testing of the model with an arbitrary image</font>

Running the model now will produce random and unreliable outputs, as the network has not been trained to recognize any patterns from the data

In [None]:
X, y = sample_train_dataset[0]

In [None]:
with torch.no_grad():
    logits = model(X)
    pred = logits.argmax(dim=1, keepdim=True)

In [None]:
print(pred[0].item())

__The next step is to train the model using a dataset and an optimization process, such as gradient descent, so that it can learn to make accurate predictions.__

## <font color="blue"> Defining a DataLoader</font>

- We pass the dataset to our dataloader, and our `batch_size` hyperparameter as initialization arguments.
- This creates an iterable data loader, so we can easily iterate over each batch using a loop.

In [None]:
train_loader = DataLoader(train_dataset, batch_size=batch_size_train)

In [None]:
test_loader = DataLoader(test_dataset, batch_size=batch_size_test)

## <font color="blue">The training loop</font>

The typical approach to training a neural network in PyTorch involves:

- Feeding batches of train data through the network.
- Calculating the prediction error or loss using a loss function, such as Cross-Entropy for classification tasks.
- Optimizing the model’s weights and biases using backpropagation.
   - Backpropagation involves computing the gradient of the loss with respect to each parameter and then updating the parameters using an optimizer, like Stochastic Gradient Descent (SGD) or Adam.
- Repeating the process for multiple epochs until the model achieves satisfactory performance, balancing accuracy and generalization.

__Define the loss function__

- Loss is a measure of how well a model’s predictions match the true labels of the data.
- It quantifies the difference between the predicted output and the actual output. The lower the loss, the better the model’s performance.
- The goal of training is to minimize the loss, and get the model’s predictions closer to the actual labels.
- In classification tasks, a common loss function is Cross-Entropy Loss, while Mean Squared Error (MSE) is often used for regression tasks.
- We use the `NLLLoss` (Negative Log Likelihood Loss) that is primarily used for multi-label classification models when your model produces log-probabilities.
  - It compares these log-probabilities with the true class labels.
  - It calculates the negative log likelihood of the correct class, which is essentially the measure of how confident your model was about the correct prediction.

In [None]:
loss_function = nn.NLLLoss()

__Define the optimizer__

- The optimizer updates the model’s parameters based on the gradients computed during backpropagation.
   - It determines how the model learns from the data. 
- We use the `Adam` (Adaptive Moment Estimation) optimizer.

In [None]:
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

__Feed train data into the model__

In [None]:
def train_model_per_batch(data, target, 
                model, loss_function, optimizer) -> float:
    #data, target = data.to(device), target.to(device)

    # Zero the gradients
    optimizer.zero_grad()

    # Perform forward pass
    output = model(data)

    # Compute loss
    loss = loss_function(output, target)

    # Perform backward pass
    loss.backward()

    # Perform optimization
    optimizer.step()
    
    return 1.+loss.item()

__Variables to keep track of the progress of the training__

In [None]:
train_losses = list()
train_counter = list()
test_losses = list()
test_counter = [i*len(train_loader.dataset) for i in range(num_epochs + 1)]

__Function to training the model one epoch at the time__

In [None]:
n_dataloader = len(train_loader.dataset)
n_data_per_batch = len(train_loader)

def train_one_epoch(epoch_idx):
    for batch_idx, (data, target) in enumerate(train_loader):
        data, target = data.to(device), target.to(device)
        n_data = len(data)
        loss_val = train_model_per_batch(data, target, model, loss_function, optimizer)
        if batch_idx % log_interval  == 0:
            print(f'Train Epoch: {epoch_idx} [{batch_idx*n_data}/{n_dataloader}' 
                  f'({100.*batch_idx/n_data_per_batch:.0f}%)]\tLoss: {loss_val:.6f}')
            train_losses.append(loss_val)
            train_counter.append((batch_idx*64) + ((epoch_idx-1)*n_dataloader))

In [None]:
#for epoch in range(num_epochs):
#    train_model(train_loader, model, loss_function, optimizer, epoch)

__Function for computing the accuracy of the model__

In [None]:
def compute_accuracy(model, dataloader):
    """
    Compute the percentage of correct classification.
    """

    model = model.eval()

    n_items = len(dataloader.dataset)
    correct = 0.0
    test_loss = 0

    with torch.no_grad(): 
        for idx, (data, target) in enumerate(dataloader):
            data, target = data.to(device), target.to(device)
            logits = model(data)
            # sum up batch loss
            test_loss += loss_function(logits, target).item()
            # get the index of the max log-probability
            _, pred = torch.max(logits.data, 1)
            #total += target.size(0)
            correct += (pred == target).sum().item()
            #pred = logits.argmax(dim=1, keepdim=True)
            #correct += pred.eq(target.view_as(pred)).sum().item()
    test_loss /= len(dataloader)
    test_losses.append(1.+test_loss)    
    perc = 100. * correct / n_items
    print(f'\nTest set: Average loss: {1.+test_loss:.4f}, Accuracy: {correct}/{n_items} ({perc:.0f}%)\n')

__Train the model over `num_epochs` epochs and collect statistics__

In [None]:
compute_accuracy(model, test_loader)
for epoch_idx in range(1, num_epochs+1):
    train_one_epoch(epoch_idx)
    compute_accuracy(model, test_loader)  

## <font color="blue">Evaluate the model's performance</font>

In [None]:
fig = plt.figure()
plt.plot(train_counter, train_losses, color='blue')
plt.scatter(test_counter, test_losses, color='red')
plt.legend(['Train Loss', 'Test Loss'], loc='upper right')
plt.xlabel('Number of training examples seen')
plt.ylabel('Loss')

__Quick check of some of the predicted classifications__

In [None]:
examples = enumerate(train_loader)
batch_idx, (example_data, example_targets) = next(examples)

In [None]:
with torch.no_grad():
  output = model(example_data)

In [None]:
fig = plt.figure()
for i in range(6):
  plt.subplot(2,3,i+1)
  plt.tight_layout()
  plt.imshow(example_data[i][0], cmap='gray', interpolation='none')
  plt.title("Prediction: {}".format(
    output.data.max(1, keepdim=True)[1][i].item()))
  plt.xticks([])
  plt.yticks([])