This notebook demonstrates handwritten digit classification using:

- A custom-built Multi-Layer Perceptron (MLP) implemented with NumPy
- A standard PyTorch-based MLP

Both models are trained and evaluated on the MNIST dataset. The goal is to:

- Build a neural network from scratch for educational clarity
- Compare its performance with a PyTorch equivalent
- Visualize training metrics and conclude the strengths/limitations of both approaches


In [None]:
import torch
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split

from models import NeuralNetMLP
from training import custom_train, train_torch_model
from utils import compute_mse_and_acc, plot_training_curves

SEED = 10
np.random.seed(SEED)
torch.manual_seed(SEED)

mnist = fetch_openml('mnist_784', version=1, parser='auto')
X = mnist.data.values.astype(np.float32)
y = mnist.target.values.astype(np.int64)

# Scale to [-1, 1]
X = ((X / X[0].max()) - 0.5) * 2

# Train, validation, test split
X_temp, X_test, y_temp, y_test = train_test_split(X, y, test_size=10000, random_state=SEED, stratify=y)
X_train, X_valid, y_train, y_valid = train_test_split(X_temp, y_temp, test_size=5000, random_state=SEED, stratify=y_temp)

num_features = X_train.shape[1]
num_hidden = 50
num_classes = 10
num_epochs = 50
minibatch_size = 100
learning_rate = 0.1

custom_model = NeuralNetMLP(num_features, num_hidden, num_classes)
custom_loss, custom_train_acc, custom_valid_acc = custom_train(custom_model, X_train, y_train, X_valid,
                                                               y_valid,num_epochs, minibatch_size, learning_rate)



In [None]:
import torch.nn as nn

X_train_t = torch.tensor(X_train)
y_train_t = torch.tensor(y_train, dtype=torch.int64)
X_valid_t = torch.tensor(X_valid)
y_valid_t = torch.tensor(y_valid, dtype=torch.int64)

train_dataset = torch.utils.data.TensorDataset(X_train_t, y_train_t)
valid_dataset = torch.utils.data.TensorDataset(X_valid_t, y_valid_t)

torch_model = nn.Sequential(
    nn.Linear(num_features, num_hidden),
    nn.Sigmoid(),
    nn.Linear(num_hidden, num_classes),
    nn.Sigmoid()
)

loss_fn = nn.MSELoss()
optimizer = torch.optim.SGD(torch_model.parameters(), lr=learning_rate)

torch_loss, torch_train_acc, torch_valid_acc = train_torch_model(torch_model, train_dataset, valid_dataset,
                                                                 num_epochs, minibatch_size, loss_fn, optimizer)

plot_training_curves(custom_loss, custom_train_acc, custom_valid_acc,torch_loss, torch_train_acc, torch_valid_acc)


In [None]:
X_test_t = torch.tensor(X_test)
y_test_t = torch.tensor(y_test, dtype=torch.int64)

_, custom_test_acc = compute_mse_and_acc(custom_model, X_test, y_test, custom=True, minibatch_size=minibatch_size)
_, torch_test_acc = compute_mse_and_acc(torch_model, X_test, y_test, custom=False, minibatch_size=minibatch_size)

print(f"Custom Model Test Accuracy:  {custom_test_acc * 100:.2f}%")
print(f"PyTorch Model Test Accuracy: {torch_test_acc * 100:.2f}%")


### Conclusion

- The custom-built neural network achieved **~94.88% test accuracy**, surpassing the PyTorch model's **~90.63%**.
- Although the PyTorch model used the same architecture, the custom model outperformed it, likely due to better initialization or training dynamics.

### Takeaways:

- Building an MLP from scratch solidifies understanding of forward and backward propagation.
- PyTorch, while efficient, requires thoughtful setup to match custom logic (like using sigmoid + MSE).
- This project demonstrates confidence in both mathematical underpinnings and practical ML engineering.

Next steps could include:
- Switching to ReLU + Softmax + CrossEntropyLoss for PyTorch.
- Implementing momentum, weight decay, or Adam optimizers.
- Extending the custom model to include modular layers and activation functions.
