# 🧠 MNIST Digit Classifier with PyTorch

This notebook trains a simple neural network to classify handwritten digits (0–9) from the MNIST dataset using PyTorch.


In [1]:
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms

## 📦 Setup & Hyperparameters

We define:
- `input_size = 28x28 = 784` (since MNIST images are 28×28 pixels)
- `hidden_size = 64` (number of neurons in the hidden layer)
- `num_classes = 10` (digits from 0 to 9)
- `batch_size`, `learning_rate`, and `epochs` for training

In [2]:
input_size = 28 * 28
hidden_size = 64
num_classes = 10
num_epochs = 5
batch_size = 64
learning_rate = 0.001

## 🧹 Load and Transform the MNIST Dataset

We use `torchvision.datasets.MNIST` to load training and test sets. Images are converted to tensors.

In [3]:
train_dataset = datasets.MNIST(
    root='./data',
    train=True,
    transform=transforms.ToTensor(),
    download=True)

test_dataset = datasets.MNIST(
    root='./data',
    train=False,
    transform=transforms.ToTensor())

train_loader = torch.utils.data.DataLoader(
    dataset=train_dataset,
    batch_size=batch_size,
    shuffle=True)

test_loader = torch.utils.data.DataLoader(
    dataset=test_dataset,
    batch_size=batch_size,
    shuffle=False)

## 🧱 PyTorch Data Pipeline: Dataset vs DataLoader

Whenever we work with a dataset (e.g. MNIST, CIFAR10, etc.), we typically follow this **standard PyTorch pattern**:

---

### ✅ Step 1: Load Dataset

```python
from torchvision import datasets, transforms

# Training dataset
train_dataset = datasets.MNIST(
    root='./data',
    train=True,
    transform=transforms.ToTensor(),
    download=True
)

# Test dataset
test_dataset = datasets.MNIST(
    root='./data',
    train=False,
    transform=transforms.ToTensor()
)


### ✅ Step 2: Create DataLoaders

```python
from torch.utils.data import DataLoader

train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=64, shuffle=False)



***We almost always go:***
```python
Dataset ➜ DataLoader ➜ Training Loop

## 🧱 Neural Network Architecture

A simple feedforward neural network with:
- One hidden layer with ReLU activation
- Output layer with 10 neurons (for digits 0–9)
- CrossEntropyLoss, which includes Softmax internally

In [4]:
class NeuralNet(nn.Module):

    def __init__(self, input_size, hidden_size, num_classes):
        super(NeuralNet, self).__init__()
        self.linear1 = nn.Linear(input_size, hidden_size)
        self.relu = nn.ReLU()
        self.linear2 = nn.Linear(hidden_size, num_classes)

    def forward(self, x):
        out = self.linear1(x)
        out = self.relu(out)
        out = self.linear2(out)
        return out

### 📌 Understanding the Code: `NeuralNet` Class

- **`class NeuralNet(nn.Module)`**  
  Defines a neural network class by inheriting from `torch.nn.Module`.

- **`__init__()` constructor**
  - `linear1`: Fully connected layer from input → hidden layer.
  - `ReLU`: Non-linear activation function that adds non-linearity to the model.
  - `linear2`: Fully connected layer from hidden layer → output layer (final class scores).

- **`forward()` method**  
  This defines how data flows through the network:
  - Input is passed to `linear1`
  - Output goes through the ReLU activation
  - Result is passed to `linear2` to produce the final output

In [5]:
model = NeuralNet(input_size, hidden_size, num_classes)

**`model = NeuralNet(...)`**  
  Instantiates the model with the given architecture:
  - `input_size`: Size of the input features (e.g., 784 for 28x28 MNIST images)
  - `hidden_size`: Number of hidden neurons (e.g., 64)
  - `num_classes`: Total number of output classes (e.g., 10 for digits 0–9)

## 🎯 Loss Function & Optimizer

We use:
- `CrossEntropyLoss` (combines `LogSoftmax` and `NLLLoss`)
- `Adam` optimizer for faster convergence

In [6]:
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=learning_rate)

## 🏋️ Training the Model

We loop through the data multiple times (epochs), performing:
- Forward pass
- Loss computation
- Backward pass
- Optimization

In [7]:
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        images = images.view(-1, 28*28)  # Flatten

        outputs = model(images)
        loss = criterion(outputs, labels)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if (i+1) % 100 == 0:
            print(f'Epoch [{epoch+1}/{num_epochs}], Step [{i+1}/{len(train_loader)}], Loss: {loss.item():.4f}')

Epoch [1/5], Step [100/938], Loss: 0.4399
Epoch [1/5], Step [200/938], Loss: 0.4252
Epoch [1/5], Step [300/938], Loss: 0.3026
Epoch [1/5], Step [400/938], Loss: 0.3713
Epoch [1/5], Step [500/938], Loss: 0.1903
Epoch [1/5], Step [600/938], Loss: 0.3173
Epoch [1/5], Step [700/938], Loss: 0.1238
Epoch [1/5], Step [800/938], Loss: 0.1656
Epoch [1/5], Step [900/938], Loss: 0.2141
Epoch [2/5], Step [100/938], Loss: 0.1938
Epoch [2/5], Step [200/938], Loss: 0.0804
Epoch [2/5], Step [300/938], Loss: 0.0514
Epoch [2/5], Step [400/938], Loss: 0.3999
Epoch [2/5], Step [500/938], Loss: 0.1085
Epoch [2/5], Step [600/938], Loss: 0.1009
Epoch [2/5], Step [700/938], Loss: 0.1515
Epoch [2/5], Step [800/938], Loss: 0.2240
Epoch [2/5], Step [900/938], Loss: 0.2875
Epoch [3/5], Step [100/938], Loss: 0.1774
Epoch [3/5], Step [200/938], Loss: 0.0853
Epoch [3/5], Step [300/938], Loss: 0.0574
Epoch [3/5], Step [400/938], Loss: 0.1598
Epoch [3/5], Step [500/938], Loss: 0.1482
Epoch [3/5], Step [600/938], Loss:

### Training Loop Explanation

- **Outer loop (`for epoch in range(num_epochs)`):** Runs the training process multiple times over the whole dataset.
- **Inner loop (`for i, (images, labels) in enumerate(train_loader)`):** Goes through the dataset batch by batch.
- **Flatten images (`images.view(-1, 28*28)`):** Converts each 28x28 image into a 1D vector of size 784 so the model can process it.
- **Forward pass (`outputs = model(images)`):** Sends input images into the model to get predictions.
- **Loss calculation (`loss = criterion(outputs, labels)`):** Measures how far predictions are from the true labels.
- **Zero gradients (`optimizer.zero_grad()`):** Clears old gradients from the last step.
- **Backward pass (`loss.backward()`):** Computes gradients (how to adjust weights).
- **Update weights (`optimizer.step()`):** Updates model parameters using gradients and optimizer.
- **Progress print:** Every 100 steps, prints current epoch, step, and loss to track training progress.

In [8]:
model.eval()  # switch to evaluation mode
with torch.no_grad():
    correct = 0
    total = 0
    for images, labels in test_loader:
        images = images.view(-1, 28*28)
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

    print(f'Test Accuracy: {100 * correct / total:.2f}%')

Test Accuracy: 96.75%


### Evaluation Loop Explanation

- **`model.eval()`**  
  Switches the model into **evaluation mode**. This turns off certain training behaviors (like dropout and batch normalization updates), making the model more stable during testing.

- **`with torch.no_grad():`**  
  Disables gradient tracking since we don’t need to update weights during evaluation. This saves memory and speeds up computation.

- **Initialize counters (`correct = 0, total = 0`)**  
  We prepare to keep track of how many predictions are correct vs. the total number of test samples.

- **Loop over batches (`for images, labels in test_loader`)**  
  Goes through the test dataset in batches.

- **Flatten images (`images.view(-1, 28*28)`)**  
  Converts each test image from 28×28 pixels into a vector of size 784, just like we did in training.

- **Forward pass (`outputs = model(images)`)**  
  Sends the test images through the trained model to get predicted probabilities (logits).

- **Get predictions (`_, predicted = torch.max(outputs.data, 1)`)**  
  Chooses the class with the highest score (the model’s prediction) for each image.

- **Count total samples and correct predictions**  
  - `total += labels.size(0)` adds the number of samples in the batch.  
  - `correct += (predicted == labels).sum().item()` counts how many predictions match the true labels.

- **Final accuracy calculation**  
  After looping through all test data, we compute accuracy as:  
  Accuracy = $\frac{\text{correct predictions}}{\text{total samples}} \times 100$
  and print it as a percentage.
