# ðŸ”¢ Handwritten Digit Classification with PyTorch

This notebook builds a complete neural network to classify handwritten digits (0-9) using the MNIST dataset.

## What You'll Learn
- Loading and preprocessing MNIST data
- Building a neural network with `nn.Module`
- Training loop implementation
- Model evaluation and metrics

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision.transforms as transforms
import torchvision.datasets as datasets
from torch.utils.data import DataLoader

from matplotlib import pyplot as plt

## Data Preprocessing

We apply two transforms:
1. **ToTensor()**: Converts PIL image to tensor and scales pixels to [0, 1]
2. **Normalize((0.5), (0.5))**: Normalizes to mean=0.5, std=0.5, resulting in values in [-1, 1]

Normalization helps the model train faster and more stably.

In [None]:
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5), (0.5))
])

train_dataset = datasets.MNIST(
    root="data",
    train=True,
    transform=transform,
    download=True
)

test_dataset = datasets.MNIST(
    root="data",
    train=False,
    transform=transform,
    download=True
)

## Dataset Overview

MNIST contains:
- 60,000 training images
- 10,000 test images
- 10 classes (digits 0-9)
- Each image is 28Ã—28 grayscale

In [None]:
print(f"Training samples: {len(train_dataset)}")
print(f"Test samples: {len(test_dataset)}")

## Creating DataLoaders

- **batch_size=64**: Process 64 images at a time
- **shuffle=True** for training: Randomize order each epoch
- **shuffle=False** for testing: Keep order consistent for evaluation

In [None]:
train_loader = DataLoader(
    dataset=train_dataset,
    batch_size=64,
    shuffle=True
)

test_loader = DataLoader(
    dataset=test_dataset,
    batch_size=64,
    shuffle=False
)

## Visualizing a Sample

In [None]:
data_iter = iter(train_loader)
images, labels = next(data_iter)

plt.figure(figsize=(2,2))
plt.imshow(images[0].numpy().squeeze(), cmap='gray')
plt.title(f"Label: {labels[0].item()}")
plt.show()

## Model Architecture

Our neural network:
1. **Flatten**: Convert 28Ã—28 image to 784-element vector
2. **Linear(784, 128)**: First hidden layer
3. **ReLU**: Activation function
4. **Linear(128, 64)**: Second hidden layer
5. **ReLU**: Activation function
6. **Linear(64, 10)**: Output layer (10 classes, no softmax!)

**Note**: We don't apply softmax because `CrossEntropyLoss` includes it.

In [None]:
class DigitsClassifier(nn.Module):
    def __init__(self):
        super().__init__()
        self.network = nn.Sequential(
            nn.Flatten(),
            nn.Linear(784, 128),
            nn.ReLU(),
            nn.Linear(128, 64),
            nn.ReLU(),
            nn.Linear(64, 10)
        )
    
    def forward(self, x):
        return self.network(x)

## Setting Up Training

- **Model**: Our DigitsClassifier
- **Optimizer**: Adam with learning rate 0.001
- **Loss**: CrossEntropyLoss (for multi-class classification)

In [None]:
model = DigitsClassifier()
optimizer = optim.Adam(model.parameters(), lr=0.001)
criterion = nn.CrossEntropyLoss()

## Training Loop

For each epoch:
1. Iterate through batches
2. Forward pass: Get predictions
3. Calculate loss
4. Backward pass: Compute gradients
5. Update weights

**Important**: Call `optimizer.zero_grad()` before `loss.backward()` to clear old gradients!

In [None]:
epochs = 5

for epoch in range(epochs):
    running_loss = 0.0
    for images, labels in train_loader:
        # Forward pass
        outputs = model(images)
        loss = criterion(outputs, labels)
        
        # Track loss
        running_loss += loss.item()
        
        # Backward pass
        optimizer.zero_grad()
        loss.backward()
        
        # Update weights
        optimizer.step()
    
    avg_loss = running_loss / len(train_loader)
    print(f"Epoch [{epoch+1}/{epochs}], Loss: {avg_loss:.4f}")

## Model Evaluation

**Important steps for evaluation:**
1. `model.eval()`: Set model to evaluation mode (disables dropout, etc.)
2. `torch.no_grad()`: Disable gradient computation (saves memory)
3. `torch.max(outputs, 1)`: Get predicted class (highest score)

In [None]:
model.eval()

total = 0
correct = 0

with torch.no_grad():
    for images, labels in test_loader:
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        correct += (predicted == labels).sum().item()
        total += labels.size(0)

accuracy = 100 * correct / total
print(f'Accuracy on test set: {accuracy:.2f}%')
print(f'Correct: {correct} / {total}')

## Detailed Classification Report

Using sklearn to get precision, recall, and F1-score for each class.

In [None]:
model.eval()

all_predicted = []
all_labels = []

with torch.no_grad():
    for images, labels in test_loader:
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        all_labels.extend(labels.numpy())
        all_predicted.extend(predicted.numpy())

In [None]:
from sklearn.metrics import classification_report

report = classification_report(all_labels, all_predicted)
print(report)

## Confusion Matrix

The confusion matrix shows:
- Diagonal: Correct predictions
- Off-diagonal: Misclassifications (row = actual, column = predicted)

In [None]:
from sklearn.metrics import confusion_matrix
import seaborn as sns

cm = confusion_matrix(all_labels, all_predicted)

plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', 
            xticklabels=range(10), yticklabels=range(10))
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.title('Confusion Matrix')
plt.show()

## Key Takeaways

1. **Data preprocessing** (normalization) helps training stability
2. **nn.Sequential** makes it easy to chain layers
3. **CrossEntropyLoss** includes softmax - don't apply it twice!
4. **model.eval()** and **torch.no_grad()** are essential for evaluation
5. **Confusion matrix** reveals which digits are commonly confused