## Multi-classification problem


In [None]:
# Uncomment the next line to install packages
# pip install torch torchvision matplotlib pandas seaborn requests

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision
import torchvision.transforms as transforms
from torch.utils.data import DataLoader, random_split
import requests
import os


import matplotlib.pyplot as plt

%matplotlib inline

## Load Data

PyTorch provides two powerful data primitives: `torch.utils.data.DataLoader` and `torch.utils.data.Dataset` that allow you to use pre-loaded datasets as well as prepare your own data. `Dataset` stores the samples and their corresponding labels, and `DataLoader` wraps an iterable around the Dataset to enable easy access to the samples.

### USPS Dataset
* Handwritten digits with 10 classes
* 16x16 pixels for each image 
* 6 000 data examples in training set, 1 291 examples in validation set, 2 007 in test set

In [None]:
url = "https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass/usps.bz2"
r = requests.get(url, allow_redirects=True)
if not os.path.isdir("USPS/"):
    os.mkdir("USPS/")
open("USPS/usps.bz2", "wb").write(r.content)

In [None]:
# Loading USPS dataset from torchvision.dataset
dataset = torchvision.datasets.USPS(
    root="USPS/", train=True, transform=transforms.ToTensor(), download=False
)

In [None]:
# Get info from dataset
dataset

In [None]:
# Obtain the inputs and targets:
inputs = dataset.data
targets = dataset.targets

In [None]:
# Let's look at a data point
sample_index = 88

data_sample = dataset.data[sample_index]
target_sample = dataset.targets[sample_index]
print("Sample type and shape : ", type(data_sample), data_sample.shape)
print("Label type and value : ", type(target_sample), target_sample)

In [None]:
sample_index = 88
plt.imshow(dataset.data[sample_index], cmap=plt.cm.gray_r, interpolation="nearest")
plt.title("image label: %d" % dataset.targets[sample_index])
plt.show()

## Pytorch Tensor

Pytorch Documentation : https://pytorch.org/docs/stable/index.html

In [None]:
tensor_data_point = torch.tensor(data_sample)
print("Tensor type :", type(tensor_data_point), ", and shape : ", tensor_data_point.shape)

In [None]:
# Pyplot can manage torch Tensors
plt.imshow(tensor_data_point, cmap=plt.cm.gray_r)
plt.title("Tensor display")
plt.show()

In [None]:
# split the dataset to training and validation sets
train_set, val_set = random_split(dataset, [6000, 1291])

## Build your Neural Network
The `torch.nn` namespace provides all the building blocks you need to create your own neural network such as fully connected layers or convolutional layers etc. We define our neural network by subclassing `nn.Module`, and the neural network layers are initialized in **\__init\__**. Every `nn.Module` subclass implements the operations on input data in the **forward** method.

Inheritance in Python (https://www.programiz.com/python-programming/inheritance)

In [None]:
class Model(nn.Module):
    """A simple feedforward neural network for multi-class classification.

    This model consists of two fully connected layers with ReLU activation
    for the hidden layer and softmax activation for the output layer.

    Attributes
    ----------
    l1 : nn.Linear
        First linear layer mapping input (256) to hidden layer (100).
    l2 : nn.Linear
        Second linear layer mapping hidden layer (100) to output (10 classes).
    """

    def __init__(self) -> None:
        """Initialize the Model with two linear layers."""
        super(Model, self).__init__()
        # We allocate space for the weights
        self.l1 = nn.Linear(16 * 16, 100)
        self.l2 = nn.Linear(100, 10)
        # Input size is 16*16, output size should be the same with the number of classes

    def forward(self, inputs: torch.Tensor) -> torch.Tensor:
        """Perform forward pass through the network.

        Parameters
        ----------
        inputs : torch.Tensor
            Input tensor of shape (batch_size, 256) representing flattened images.

        Returns
        -------
        torch.Tensor
            Output tensor of shape (batch_size, 10) with class probabilities
            (softmax normalized).
        """
        h = F.relu(
            self.l1(inputs)
        )  # You can put anything, as long as its Pytorch functions
        outputs = F.softmax(
            self.l2(h), dim=1
        )  # Use softmax as the activation function for the last layer
        return outputs

Description of AutoGrad (https://pytorch.org/docs/stable/notes/autograd.html)

### Instantiation and forward call

In [None]:
# We initialize the Model class
my_model = Model()  # it calls the init method
print(" - What is the type of my_model ?", type(my_model))
print("=" * 50)
print(" - Description of the internal of the Network :", my_model)
print("=" * 50)
print(
    " - Content of the first Layer :",
    my_model.l1.weight,
)

In [None]:
# Shape of the Linear Layer
print("A :", my_model.l1.weight.shape, " b : ", my_model.l1.bias.shape)

In [None]:
# Let's explore the forward pass
example_batch_size = 3
example_loader = DataLoader(dataset, batch_size=example_batch_size, shuffle=True)

for images, labels in example_loader:
    print("Original tensor shape", images.shape)
    print("=" * 50)
    print("Impact of the view method", images.view(example_batch_size, -1).shape)
    print("=" * 50)
    example_output = my_model(images.view(example_batch_size, -1))
    print("Shape of the output", example_output.shape)
    print("=" * 50)
    print("Predictions for the first image :", example_output[0].detach())
    print("=" * 50)
    print(
        "Sum of all outputs : ", torch.sum(example_output[0])
    )  # You should use detach !
    break

In [None]:
# Example of One Hot Encoding

labels_one_hot = torch.FloatTensor(example_batch_size, 10)
labels_one_hot.zero_()
print("Original Labels : ", labels.detach())
print("=" * 50)
print("One Hot encoding :", labels_one_hot.scatter_(1, labels.view(-1, 1), 1).detach())

## Train your Model

In [None]:
# Create the model:
model = Model()

# Choose the hyperparameters for training:
num_epochs = 10
batch_size = 10

# Use mean squared loss function
criterion = nn.MSELoss()

# Use SGD optimizer with a learning rate of 0.01
# It is initialized on our model
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)

In [None]:
from torch.utils.data import Dataset


# define a function for training
def train(
    num_epochs: int,
    batch_size: int,
    criterion: nn.Module,
    optimizer: torch.optim.Optimizer,
    model: nn.Module,
    dataset: Dataset,
) -> list[float]:
    """Train a neural network model on a given dataset.

    Parameters
    ----------
    num_epochs : int
        Number of complete passes through the training dataset.
    batch_size : int
        Number of samples per gradient update.
    criterion : nn.Module
        Loss function to optimize (e.g., nn.MSELoss, nn.CrossEntropyLoss).
    optimizer : torch.optim.Optimizer
        Optimization algorithm (e.g., SGD, Adam).
    model : nn.Module
        Neural network model to train.
    dataset : Dataset
        Training dataset containing (image, label) pairs.

    Returns
    -------
    list[float]
        List of average training losses for each epoch.

    Notes
    -----
    This function uses one-hot encoding for labels and MSE loss.
    For CrossEntropyLoss, the one-hot encoding should be removed.
    """
    train_error: list[float] = []
    train_loader = DataLoader(dataset, batch_size, shuffle=True)
    model.train()  # Indicates to the network we are in training mode
    for epoch in range(num_epochs):
        epoch_average_loss: float = 0.0
        for images, labels in train_loader:
            y_pre = model(images.view(batch_size, -1))
            # reshape the inputs from [N, img_shape, img_shape] to [N, img_shape*img_shape]

            # One-hot encoding or labels so as to calculate MSE error:
            labels_one_hot = torch.FloatTensor(batch_size, 10)
            labels_one_hot.zero_()
            labels_one_hot.scatter_(1, labels.view(-1, 1), 1)

            loss = criterion(y_pre, labels_one_hot)  # Real number
            optimizer.zero_grad()  # Set all the parameters gradient to 0
            loss.backward()  # Computes  dloss/da for every parameter a which has requires_grad=True
            optimizer.step()  # Updates the weights
            epoch_average_loss += loss.item() * batch_size / len(dataset)
        train_error.append(epoch_average_loss)
        print(f"Epoch [{epoch + 1}/{num_epochs}], Loss: {epoch_average_loss:.4f}")
    return train_error

In [None]:
train_error = train(num_epochs, batch_size, criterion, optimizer, model, train_set)

In [None]:
# plot the training error wrt. the number of epochs:
plt.plot(range(1, num_epochs + 1), train_error)
plt.xlabel("num_epochs")
plt.ylabel("Train error")
plt.title("Visualization of convergence")

### Evaluate the Model on validation set

In [None]:
# Calculate the accuracy to evaluate the model
@torch.no_grad()
def accuracy(dataset: Dataset, model: nn.Module) -> None:
    """Compute and print the classification accuracy of a model on a dataset.

    Parameters
    ----------
    dataset : Dataset
        Dataset containing (image, label) pairs to evaluate.
    model : nn.Module
        Trained neural network model to evaluate.

    Returns
    -------
    None
        Prints the accuracy percentage to stdout.

    Notes
    -----
    This function sets the model to evaluation mode and uses no gradient
    computation for efficiency. Images are expected to be 16x16 pixels
    and will be flattened to 256 features.
    """
    model.eval()  # Set the model to evaluation mode
    correct: int = 0
    dataloader = DataLoader(dataset)
    for images, labels in dataloader:
        images = images.view(-1, 16 * 16)
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        correct += (predicted == labels).sum()

    print(f"Accuracy of the model : {100 * correct.item() / len(dataset):.2f} %")

In [None]:
accuracy(val_set, model)

In [None]:
val_index = 66

(image, label) = val_set[val_index]
output = model(image.view(-1, 16 * 16))
_, prediction = torch.max(output.data, 1)

plt.imshow(image.view(16, 16), cmap=plt.cm.gray_r, interpolation="nearest")
plt.title("Prediction label: %d" % prediction)

# ⚠️  Note on the use of Softmax in the last layer

Using a Softmax layer in the last layer of a neural network for multi-class classification is typcally what is not done by default in PyTorch. Instead the transformation into probabilities is done inside the loss function `nn.CrossEntropyLoss`, which combines `nn.LogSoftmax` to compute the log-probabilities and `nn.NLLLoss` (negative log likelihood loss).

### Exercise 1: Impact of the architecture of the model
Define your own class `Model` to improve the predictions:

* The convolutional layer can be a good choice to deal with images. Replace nn.Linear with [nn.Conv2d](https://pytorch.org/docs/stable/nn.html#conv2d).
* Try to add more layers (1, 2, 3, more ?)
* Change the number of neurons in hidden layers (5, 10, 20, more ?)
* Try different activation functions such as [sigmoid](https://pytorch.org/docs/stable/nn.functional.html#torch.nn.functional.sigmoid), [tanh](https://pytorch.org/docs/stable/nn.functional.html#torch.nn.functional.tanh), [relu](https://pytorch.org/docs/stable/nn.functional.html#torch.nn.functional.relu), etc.

### Exercise 2: Impact of the optimizer
Retrain the model by using different parameters of the optimizer; you can change its parameters in the cell initializing it, after the definition of your model.

* Use different batch sizes, from 10 to 1 000 for instance
* Try different values of the learning rate (between 0.001 and 10), and see how these impact the training process. Do all network architectures react the same way to different learning rates?
* Change the duration of the training by increasing the number of epochs
* Try other optimizers, such as [Adam](https://pytorch.org/docs/stable/optim.html?highlight=adam#torch.optim.Adam) or [RMSprop](https://pytorch.org/docs/stable/optim.html?highlight=rmsprop#torch.optim.RMSprop)

### Exercise 3: Impact of the loss function
The MSE error is rarely used in this case. The cross entropy loss can be a better choice for multi-classification problems. In pytorch, the cross entropy loss is defined by [nn.CrossEntropyLoss](https://pytorch.org/docs/stable/nn.html#crossentropyloss). Replace the MSE loss by this one to observe its impact.

**Note:** In order to use nn.CrossEntropyLoss correctly, don't add an activation function to the last layer of your network. And one-hot encoding is no longer needed to calculate the loss, delete the encoding procedures in function `train`.   

### Exercise 4: Prediction on test set

Once you have a model that seems satisfying on the validation dataset, you SHOULD evaluate it on a test dataset that has never been used before, to obtain a final accuracy value.

In [None]:
url = "http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass/usps.t.bz2"
r = requests.get(url, allow_redirects=True)
open("USPS/usps.t.bz2", "wb").write(r.content)

In [None]:
# Loading MNIST test set from torchvision.dataset
test_set = torchvision.datasets.USPS(
    root="USPS/", train=False, transform=transforms.ToTensor(), download=False
)

In [None]:
accuracy(test_set, model)