# Kaggle MNIST Challenge: Digit Recognizer
### By Aninda Metta Citta and Liao Ru Xin, Juliette

This notebook walks through our solution for the Digit Recognizer challenge. The goal is to build our own neural network that can take an image of a handwritten single digit from 28x28 pixel images, and determine what that digit is using the Kaggle MNIST dataset. Our workflow includes loading the data, preprocessing, model training, and evaluation.

In [1]:
import torch
import numpy as np
import pandas as pd

## Dataset and Preprocessing

The Kaggle dataset is provided as a CSV file. Each row is one handwritten digit image:
- **First column**: the actual digit (0-9)
- **Next 784 columns**: pixel intensities from the 28Ã—28 image, flattened into a single row

A custom PyTorch `Dataset` class is implemented to:
1. Load CSV file
2. Separate labels from pixels
3. Reshape the flattened array into 28x28 images
4. Normalise pixel intensities (between 0 and 1)

In [2]:
class MNISTTrainDataset(torch.utils.data.Dataset):
    def __init__(self, csv_filename, transform=None, target_transform=None):
        self.data = pd.read_csv(csv_filename, skiprows=0) # skip the header
        self.transform = transform
        self.target_transform = target_transform

    def __getitem__(self, idx):
        label = self.data.iloc[idx, 0]
        image = np.array(list(self.data.iloc[idx,1:]), dtype=np.float32) # all columns but the first
        image = image / image.max() # to set image dynamic range to [0, 1]
        image = image.reshape((28, 28)) # dimension conversion (784) to (28,28)
        if self.transform: # apply an transforms given as input
            image = self.transform(image)
        if self.target_transform:
            label = self.target_transform(label)

        return image, label
        
    def __len__(self):
        return len(self.data)

In [None]:
class MNISTTestDataset(torch.utils.data.Dataset):
    def __init__(self, csv_filename, transform=None, target_transform=None):
        self.data = pd.read_csv(csv_filename, skiprows=0) # skip the header
        self.transform = transform
        self.target_transform = target_transform

    def __getitem__(self, idx):
        image = np.array(list(self.data.iloc[idx,0:]), dtype=np.float32) # all columns since there is no label
        image = image / image.max() # to set image dynamic range to [0, 1]
        image = image.reshape((28, 28)) # dimension conversion (784) to (28,28)
        if self.transform: # apply an transforms given as input
            image = self.transform(image)

        return image
        
    def __len__(self):
        return len(self.data)

## Train and Test sets

The Kaggle Challenge provides two dataset: train.csv which is labelled and test.csv which is not labelled. From the previous section, we have created MNISTTrainDataset and MNISTTestDataset to read the two different csv respectively.

In [50]:
train_dataset = MNISTTrainDataset("data/train.csv")

In [51]:
test_dataset = MNISTTestDataset("data/test.csv")

## DataLoader Setup

DataLoaders are used to:
- batch samples (batch size = 64 which balances training speed and memory use)
- shuffle the training data each epoch (helps model with learning instead of just memorising)

In [52]:
train_loader = torch.utils.data.DataLoader(
    train_dataset,
    batch_size=64,
    shuffle=True
)

## Model: Baseline Fully Connected Neural Network

A simple baseline network is used:
- `fc1`: Input layer takes in all 784 pixels (28x28 flattened)
- 128 neurons with ReLU activation which introduces non-linearity
- `fc2`: Output layer maps 128 to 10 neurons (one for each digit)

In the forward pass:
- input image tensor is flattened from `(batch, 1, 28, 28)` into `(batch, 784)` using `view`
- run through hidden layer with ReLU
- final output is raw scores for each digit class

In [53]:
class DigitNet(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = torch.nn.Linear(28 * 28, 128)
        self.fc2 = torch.nn.Linear(128, 10)

    def forward(self, x):
        x = x.view(x.size(0), -1)  # flatten
        x = torch.relu(self.fc1(x))
        return self.fc2(x)

## Loss Function and Optimisation

- **Loss function:** `CrossEntropyLoss` is used for multi-class classification. It expects raw logits from the model and labels as integers
- **Optimiser:** `Adam` with learning rate 0.001 is used due to its stable performance and adaptive learning rate behavior.
- **Epochs:** 5

In [54]:
model = DigitNet()
criterion = torch.nn.CrossEntropyLoss()
optimiser = torch.optim.Adam(model.parameters(), lr=1e-3)
num_epochs = 5

## Training Loop

This is where learning occurs. For each epoch:
1. Set the model to training mode with `model.train()`
2. Iterate over batches from `train_loader`
3. Compute outputs (forward pass) and loss
4. Zero gradients, backpropagate (`loss.backward()`), and update parameters to improve predictions (`optimiser.step()`)
5. Track and print the average loss per epoch

We printed the training loss for each epoch to helps verify that the model is learning (loss should generally decrease over epochs).

In [55]:
for epoch in range(num_epochs):
    model.train()
    running_loss = 0.0

    for images, labels in train_loader:
        # forward pass
        outputs = model(images)
        loss = criterion(outputs, labels)

        # backward pass
        optimiser.zero_grad()
        loss.backward()
        optimiser.step()

        running_loss += loss.item()

    print(f"Epoch [{epoch+1}/{num_epochs}], "
          f"Loss: {running_loss / len(train_loader):.4f}")

Epoch [1/5], Loss: 0.3882
Epoch [2/5], Loss: 0.1735
Epoch [3/5], Loss: 0.1220
Epoch [4/5], Loss: 0.0934
Epoch [5/5], Loss: 0.0741


## Validation DataLoader

A separate DataLoader is created for the validation set. Shuffling is not needed as we want to evaluate performance consistently.

In [39]:
test_loader = torch.utils.data.DataLoader(
    test_dataset,
    batch_size=64,
    shuffle=False
)

## Model Prediction

This is where we see how well our model works on data is hasn't seen during training.
- Switch the model to evaluation mode with `model.eval()`
- Disable gradient computation with `torch.no_grad()` (saves memory and time)
- Compute predicted class labels using `torch.max(outputs, 1)`

Finally, we saved the predicted class labels into the CSV file "mnist_test_predictions.csv" for submission to Kaggle.

In [None]:
model.eval()

all_preds = []

with torch.no_grad():
    for images in test_loader:
        outputs = model(images)
        _, predicted = torch.max(outputs, 1)
        all_preds.append(predicted)

# Concatenate all batches
all_preds = torch.cat(all_preds)

In [None]:
# Save predictions to CSV
final_prediction = pd.DataFrame(all_preds.numpy(), columns=["Label"]).reset_index()
final_prediction['index'] = final_prediction.index + 1
final_prediction = final_prediction.rename(columns={'predicted': 'Label', 'index': 'ImageId'})
final_prediction.to_csv("mnist_test_predictions.csv", index=False)

## Conclusion

This notebook demonstrates an end-to-end baseline pipeline for Kaggle MNIST digit recogniser:
- A custom Dataset is used to load CSV rows as images and labels
- The dataset is split into training and validation sets
- A simple fully connected neural network is trained using CrossEntropyLoss and Adam
- Validation accuracy is computed to assess performance on unseen data.

#### Possible Improvements

- Use convolutional neural networks (CNNs) which would probably improve accuracy
- More training with greater number of epochs
- Experiment with learning rate, batch size etc.