# Multiclass Classification on MNIST ðŸ”¢

---

This notebook is a comprehensive application of the PyTorch framework, focusing on the large-scale **MNIST handwritten digit classification** problem. It combines the efficient data pipelines of Chapter 12 with the advanced model definition and training mechanics of **Chapter 13: Going Deeper**.

### 1. High-Performance Data Pipeline with `torchvision`

The notebook sets up a robust data handling system tailored for image data:

* **Dataset Loading:** Uses `torchvision.datasets.MNIST` to automatically download and structure the 60,000-sample training set and 10,000-sample test set.
* **Preprocessing:** The essential **`transforms.ToTensor()`** is applied to the images. This key transformation performs two critical steps:
    1.  Converts the input PIL image (or NumPy array) into a PyTorch **Tensor**.
    2.  Scales the pixel values from the initial integer range (0-255) to the required floating-point range of **0.0 to 1.0**.
* **Batching:** The **`torch.utils.data.DataLoader`** is initialized with a defined `batch_size` (e.g., 64). It handles shuffling the training data and streaming mini-batches to the model, preventing memory overload and speeding up training.

### 2. Multilayer Perceptron (MLP) Architecture

The model is defined as a **Multilayer Perceptron** designed for classification:

* **Input Layer:** MNIST images are $28 \times 28$ pixels. The input layer must accept a flattened tensor of size **784** features ($28 \times 28$).
* **Hidden Layers:** The network contains multiple **`nn.Linear`** layers separated by **non-linear activation functions** (like `nn.ReLU`). These layers allow the network to learn complex mappings from the raw pixel values to the class probabilities.
* **Output Layer:** The final linear layer has **10 output units**, corresponding to the 10 possible digits (0 through 9). No explicit Softmax layer is needed here because the chosen loss function handles it internally.

### 3. End-to-End Training and Gradient Flow

The core of the notebook implements the iterative optimization process:

* **Loss Function:** The **`nn.CrossEntropyLoss`** is used. This is the standard choice for multiclass problems. Crucially, it takes the **raw model outputs (logits)** and handles both the necessary **Softmax activation** and the **Negative Log Likelihood Loss** in one step.
* **Training Loop Mechanics:**
    1.  **Forward Pass:** `pred = model(x_batch)`.
    2.  **Loss Calculation:** `loss = loss_fn(pred, y_batch)`.
    3.  **Backward Pass (Autograd):** `loss.backward()` triggers the **Autograd engine** to calculate the gradient of the loss with respect to *every* parameter in the network.
    4.  **Parameter Update:** `optimizer.step()` adjusts all weights and biases in the direction of steepest descent.
    5.  **Gradient Reset:** `optimizer.zero_grad()` is called at the end of the loop iteration to prevent the accumulation of gradients from previous batches.
* **Accuracy Calculation:** Training accuracy is tracked by using `torch.argmax(pred, dim=1)` to find the predicted class and comparing it to the true label (`y_batch`).

### 4. Final Evaluation and Generalization

* **Test Set Evaluation:** After training is complete, the model's performance is measured on the separate **Test Set** (10,000 never-before-seen images).
* **Result:** The final printed **Test Accuracy** provides an objective measure of the model's ability to **generalize** to new data, confirming the success of the applied PyTorch architecture.

In [21]:
import torch
from torch.utils.data import DataLoader
from torchvision.datasets import MNIST
from torchvision import transforms
from torch import nn, optim
import matplotlib.pyplot as plt

In [2]:
transform = transforms.Compose([
    transforms.ToTensor()
])

In [10]:
image_path= './'
mnist_dataset_train = MNIST(root= image_path, train= True, transform= transform, download= False)
mnist_dataset_test = MNIST(root= image_path, train= False, transform= transform, download= False)
torch.manual_seed(1)
batch_size = 64
train_dl = DataLoader(mnist_dataset_train, batch_size, shuffle= True)

In [38]:
class MLP_MNIST(nn.Module):
    def __init__(self, image_size, hidden_units= [32, 16]):
        super().__init__()
        input_size = image_size[0] * image_size[1] * image_size[2]
        layers = [nn.Flatten()]
        for hidden_unit in hidden_units:
            layers += [nn.Linear(input_size, hidden_unit), nn.ReLU()]
            input_size = hidden_unit
        layers.append(nn.Linear(hidden_units[-1], 10))
        self.module_list = nn.ModuleList(layers)
    def forward(self, x):
        for l in self.module_list:
            x = l(x)
        return x

In [39]:
image_size = mnist_dataset_train[0][0].shape
model = MLP_MNIST(image_size)
model

MLP_MNIST(
  (module_list): ModuleList(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=784, out_features=32, bias=True)
    (2): ReLU()
    (3): Linear(in_features=32, out_features=16, bias=True)
    (4): ReLU()
    (5): Linear(in_features=16, out_features=10, bias=True)
  )
)

In [40]:
loss_fn = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr= 0.001)

In [41]:
torch.manual_seed(1)
epochs = 20
for epoch in range(epochs):
    model.train()
    acc_hist = 0
    for x_batch, y_batch in train_dl:
        pred = model(x_batch)
        loss = loss_fn(pred, y_batch)
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()
        is_correct = (torch.argmax(pred, dim= 1) == y_batch).float()
        acc_hist += is_correct.sum()
    acc_hist /= len(train_dl.dataset)
    print(f'Epoch {epoch}, Accuracy: {acc_hist:.4f}')

Epoch 0, Accuracy: 0.8592
Epoch 1, Accuracy: 0.9288
Epoch 2, Accuracy: 0.9417
Epoch 3, Accuracy: 0.9506
Epoch 4, Accuracy: 0.9556
Epoch 5, Accuracy: 0.9599
Epoch 6, Accuracy: 0.9631
Epoch 7, Accuracy: 0.9656
Epoch 8, Accuracy: 0.9677
Epoch 9, Accuracy: 0.9693
Epoch 10, Accuracy: 0.9712
Epoch 11, Accuracy: 0.9724
Epoch 12, Accuracy: 0.9745
Epoch 13, Accuracy: 0.9752
Epoch 14, Accuracy: 0.9759
Epoch 15, Accuracy: 0.9769
Epoch 16, Accuracy: 0.9778
Epoch 17, Accuracy: 0.9787
Epoch 18, Accuracy: 0.9803
Epoch 19, Accuracy: 0.9804


In [44]:
pred = model(mnist_dataset_test.data / 255.0)
is_correct = (torch.argmax(pred, dim= 1) == mnist_dataset_test.targets).float()
print(f'Test Accuracy: {is_correct.mean():.4f}')

Test Accuracy: 0.9695
