# LeNet-5 on MNIST with PyTorch

Replicate the Numpy model in pytorch and train it on Kaggles GPU T4 x2

## Network Architecture

| Layer      | Type                 | Parameters                             | Output Shape (input 28x28) |
| ---------- | -------------------- | -------------------------------------- | -------------------------- |
| **C1**     | Convolution          | 6 filters, 5×5 kernel, stride=1, pad=2 | (6, 28, 28)                |
|            | Activation (Sigmoid) |                                        | (6, 28, 28)                |
| **S2**     | Average Pooling      | 2×2 window, stride=2                   | (6, 14, 14)                |
| **C3**     | Convolution          | 16 filters, 5×5 kernel, stride=1       | (16, 10, 10)               |
|            | Activation (Sigmoid) |                                        | (16, 10, 10)               |
| **S4**     | Average Pooling      | 2×2 window, stride=2                   | (16, 5, 5)                 |
| **C5**     | Convolution          | 120 filters, 5×5 kernel, stride=1      | (120, 1, 1)                |
|            | Activation (Sigmoid) |                                        | (120,)                     |
| **F6**     | Fully Connected      | 120 → 84                               | (84,)                      |
|            | Activation (Sigmoid) |                                        | (84,)                      |
| **Output** | Fully Connected      | 84 → 10                                | (10,)                      |

![Architecture](figures/Architecture.png)

Image source: Zhang, Aston and Lipton, Zachary C. and Li, Mu and Smola, Alexander J. - https://github.com/d2l-ai/d2l-en

In [1]:
import torch
from torch import nn
from torch import optim
from torch.utils.data import DataLoader, TensorDataset
from sklearn.datasets import fetch_openml
import numpy as np

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

Normalize MNIST using the dataset's standard mean (0.1307) and std (0.3081).

In [2]:
mnist = fetch_openml('mnist_784', version=1, as_frame=False)

X = mnist['data']       # Shape: (70000, 784)
y = mnist['target']     # Shape: (70000,)

X = X / 255.0               # Normalize pixel values to [0, 1]
X = (X - 0.1307) / 0.3081   # Standardize
y = y.astype(np.int32)

X = X.reshape(-1, 1, 28, 28) # Reshape for CNN

# Split into train/test (60k train, 10k test)
X_train, X_test = X[:60000], X[60000:]
y_train, y_test = y[:60000], y[60000:]

  warn(


## PyTorch Tensors & DataLoaders
Convert data to torch tensors and wrap in DataLoader.

In [3]:
X_train_tensor = torch.tensor(X_train, dtype=torch.float32)
X_test_tensor = torch.tensor(X_test, dtype=torch.float32)
y_train_tensor = torch.tensor(y_train, dtype=torch.long)
y_test_tensor = torch.tensor(y_test, dtype=torch.long)

train_dataset = TensorDataset(X_train_tensor, y_train_tensor)
train_dataloader = DataLoader(train_dataset, batch_size=64, shuffle=True)
test_dataset = TensorDataset(X_test_tensor, y_test_tensor)
test_dataloader = DataLoader(test_dataset, batch_size=64, shuffle=False)

## Define LeNet-5 Model

In [4]:
class LeNet5(nn.Module):
    def __init__(self):
        super().__init__()
        self.net = nn.Sequential(
            nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5, stride=1, padding=2),
            nn.Tanh(),
            nn.AvgPool2d(kernel_size=2, stride=2),
            nn.Conv2d(in_channels=6, out_channels=16, kernel_size=5, stride=1),
            nn.Tanh(),
            nn.AvgPool2d(kernel_size=2, stride=2),
            nn.Conv2d(in_channels=16, out_channels=120, kernel_size=5, stride=1),
            nn.Tanh(),
            nn.Flatten(),
            nn.Linear(in_features=120, out_features=84),
            nn.Sigmoid(),
            nn.Linear(in_features=84, out_features=10)
        )
    def forward(self, x):
        return self.net(x)

In [5]:
model = LeNet5()

# Apply LeCun and scaled Xavier initialization to Conv2d and Linear layers.
def init_weights(m):
    if isinstance(m, nn.Conv2d):
        fan_in = m.in_channels * m.kernel_size[0] * m.kernel_size[1]
        limit = (3.0 / fan_in) ** 0.5
        nn.init.uniform_(m.weight, -limit, limit)
        if m.bias is not None:
            nn.init.zeros_(m.bias)
    elif isinstance(m, nn.Linear):
        fan_in, fan_out = m.in_features, m.out_features
        limit = 4.0 * (6.0 / (fan_in + fan_out)) ** 0.5
        nn.init.uniform_(m.weight, -limit, limit)
        if m.bias is not None:
            nn.init.zeros_(m.bias)

model.apply(init_weights)
# Use DataParallel for multi GPU support (Kaggles GPU T4 x2)
model = nn.DataParallel(model)
model = model.to(device)

optimizer = optim.SGD(model.parameters(), lr=0.01)
loss_fn = nn.CrossEntropyLoss()

# Training loop
epochs = 35
for epoch in range(epochs):
    model.train()
    total_loss = 0.0
    correct = 0
    total = 0
    
    for xb, yb in train_dataloader:
        xb, yb = xb.to(device), yb.to(device)
        
        optimizer.zero_grad()
        yhat = model(xb)
        loss = loss_fn(yhat, yb)
        loss.backward()
        optimizer.step()

        total_loss += loss.item() * xb.size(0)
        preds = torch.argmax(yhat, dim=1)
        correct += (preds == yb).sum().item()
        total += xb.size(0)

    avg_loss = total_loss / total
    accuracy = correct / total

    # Validation
    model.eval()
    val_loss = 0.0
    val_correct = 0
    val_total = 0
    with torch.no_grad():
        for xb, yb in test_dataloader:
            xb, yb = xb.to(device), yb.to(device)
            yhat = model(xb)
            loss = loss_fn(yhat, yb)
            val_loss += loss.item() * xb.size(0)
            preds = torch.argmax(yhat, dim=1)
            val_correct += (preds == yb).sum().item()
            val_total += xb.size(0)
    
    avg_val_loss = val_loss / val_total
    val_accuracy = val_correct / val_total

    print(f"Epoch {epoch+1}: "
          f"Train Loss: {avg_loss:.4f}, Train Acc: {accuracy:.4f}, "
          f"Val Loss: {avg_val_loss:.4f}, Val Acc: {val_accuracy:.4f}")

Epoch 1: Train Loss: 0.4126, Train Acc: 0.8905, Val Loss: 0.2272, Val Acc: 0.9383
Epoch 2: Train Loss: 0.1985, Train Acc: 0.9455, Val Loss: 0.1602, Val Acc: 0.9552
Epoch 3: Train Loss: 0.1473, Train Acc: 0.9591, Val Loss: 0.1240, Val Acc: 0.9657
Epoch 4: Train Loss: 0.1180, Train Acc: 0.9668, Val Loss: 0.1031, Val Acc: 0.9702
Epoch 5: Train Loss: 0.0999, Train Acc: 0.9722, Val Loss: 0.0892, Val Acc: 0.9743
Epoch 6: Train Loss: 0.0882, Train Acc: 0.9753, Val Loss: 0.0808, Val Acc: 0.9755
Epoch 7: Train Loss: 0.0794, Train Acc: 0.9776, Val Loss: 0.0733, Val Acc: 0.9773
Epoch 8: Train Loss: 0.0731, Train Acc: 0.9799, Val Loss: 0.0684, Val Acc: 0.9794
Epoch 9: Train Loss: 0.0677, Train Acc: 0.9813, Val Loss: 0.0616, Val Acc: 0.9812
Epoch 10: Train Loss: 0.0634, Train Acc: 0.9826, Val Loss: 0.0593, Val Acc: 0.9819
Epoch 11: Train Loss: 0.0597, Train Acc: 0.9839, Val Loss: 0.0564, Val Acc: 0.9829
Epoch 12: Train Loss: 0.0566, Train Acc: 0.9843, Val Loss: 0.0538, Val Acc: 0.9830
Epoch 13: Tra