## MNIST Digit Classification

The MNIST dataset contains 70,000 grayscale images of handwritten digits (28×28 pixels each).  
The task is to build a model that can automatically classify these digits into one of ten classes (0–9).

Although MNIST is considered an introductory benchmark, the dataset still presents meaningful challenges:  
handwriting varies widely in shape, thickness, orientation, and overall style.  
A successful classifier must therefore learn useful representations from raw pixel values and generalise across diverse writing patterns.

This project focuses on:

1. Implementing baseline and improved neural network models for MNIST classification  
   (a simple MLP and a convolutional model).
2. Understanding how architectural choices influence learning behaviour and accuracy.
3. Applying core deep-learning components such as activation functions, optimisers, and regularisation.
4. Producing clear, reproducible training results suitable for inclusion in a portfolio or research application.

The goal is to demonstrate a practical and principled approach to building image-classification models,  
while highlighting the difference between fully-connected and convolutional architectures.



In [9]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from torchvision import datasets, transforms
from torchvision.transforms import ToTensor
import matplotlib.pyplot as plt


In [10]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("Using device:", device)


Using device: cpu


In [None]:
batch_size = 128

transform = transforms.Compose([
    transforms.ToTensor(),  # [0,255] -> [0,1]
    transforms.Normalize((0.1307,), (0.3081,)) 
])





In [None]:
train_dataset = datasets.MNIST(
    root="./data",
    train=True,
    transform=ToTensor(),
    download=True
)

test_dataset = datasets.MNIST(
    root="./data",
    train=False,
    transform=ToTensor(),
    download=True
)
print("Train size:", len(train_dataset), "Test size:", len(test_dataset))

train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
test_loader  = DataLoader(test_dataset,  batch_size=batch_size, shuffle=False)

Train size: 60000 Test size: 10000


Here I did $\bf{NOT}$ normalise the tensors at the firsthand. Let's just see if it really matters.

Then we define the training process in a reusable way for the convenience of further use.

In [15]:
def train_one_epoch(model, optimizer, criterion, dataloader, device):
    model.train()
    running_loss = 0.0
    correct = 0
    total = 0
    
    for images, labels in dataloader:
        images = images.to(device)
        labels = labels.to(device)
        
        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        
        running_loss += loss.item() * images.size(0)
        
        _, predicted = torch.max(outputs, dim=1)
        correct += (predicted == labels).sum().item()
        total += labels.size(0)
    
    epoch_loss = running_loss / total
    epoch_acc = correct / total
    return epoch_loss, epoch_acc


In [16]:
def evaluate(model, criterion, dataloader, device):
    model.eval()
    running_loss = 0.0
    correct = 0
    total = 0
    
    with torch.no_grad():
        for images, labels in dataloader:
            images = images.to(device)
            labels = labels.to(device)
            
            outputs = model(images)
            loss = criterion(outputs, labels)
            
            running_loss += loss.item() * images.size(0)
            
            _, predicted = torch.max(outputs, dim=1)
            correct += (predicted == labels).sum().item()
            total += labels.size(0)
    
    epoch_loss = running_loss / total
    epoch_acc = correct / total
    return epoch_loss, epoch_acc

## REFERENCE:
1. https://pytorch.org/
2. https://blog.csdn.net/zdx2585503940/article/details/148641218
3. https://zhuanlan.zhihu.com/p/1976326763652088525