# Multi-Layer Perceptron from Scratch and with PyTorch

This notebook demonstrates training and evaluating both single-layer and two-layer MLP (Multi-Layer Perceptron) architectures:

- Implemented from scratch using NumPy
- Implemented using PyTorch's `nn.Module`

We will train these models on the **MNIST dataset** for handwritten digit classification.


In [None]:
import numpy as np
import torch
import torch.nn as nn
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split

from models import NeuralNetMLP, Two_Layers_NeuralNetMLP, TorchMLP, TorchMLP2
from training import custom_train, train_custom_2layer, train_torch_model
from utils import compute_ce_and_acc, plot_training_curves


## Load and Preprocess MNIST Data

We use the `fetch_openml` API to load the MNIST dataset and perform preprocessing:
- Normalize pixel values to range [-1, 1]
- Split into training, validation, and test sets


In [None]:
SEED = 10
np.random.seed(SEED)
torch.manual_seed(SEED)

mnist = fetch_openml('mnist_784', version=1, parser='auto')
X = mnist.data.values.astype(np.float32)
y = mnist.target.values.astype(np.int64)

# Normalize pixel values
X = ((X / 255.0) - 0.5) * 2

# Split data
X_temp, X_test, y_temp, y_test = train_test_split(
    X, y, test_size=10000, random_state=SEED, stratify=y)
X_train, X_valid, y_train, y_valid = train_test_split(
    X_temp, y_temp, test_size=5000, random_state=SEED, stratify=y_temp)

num_features = X_train.shape[1]
num_classes = 10


## Train NumPy Single-Layer MLP

We first train a custom neural network implemented from scratch using NumPy, with **one hidden layer**.
This model uses:

- Sigmoid activation
- Mean Squared Error (MSE) loss
- Manual backpropagation


In [None]:
# Hyperparameters
NUM_EPOCHS = 50
NUM_HIDDEN_1 = 50
MINIBATCH_SIZE = 100
LEARNING_RATE = 0.1

print("Training NumPy single-layer MLP...")
model_custom_1layer = NeuralNetMLP(num_features, NUM_HIDDEN_1, num_classes)

loss_custom_1layer, acc_train_custom_1layer, acc_valid_custom_1layer = custom_train(
    model_custom_1layer,
    X_train, y_train,
    X_valid, y_valid,
    num_epochs=NUM_EPOCHS,
    minibatch_size=MINIBATCH_SIZE,
    learning_rate=LEARNING_RATE
)


Evaluation on Test Set (Single-Layer NumPy)

After training, we evaluate the model's accuracy on the **unseen test set**.


In [None]:
_, acc_test_custom_1layer = compute_ce_and_acc(
    model_custom_1layer,
    X_test, y_test,
    custom=True,
    minibatch_size=MINIBATCH_SIZE
)

print(f"Custom NumPy Single-Layer MLP Test Accuracy: {acc_test_custom_1layer * 100:.2f}%")


## Train NumPy Two-Layer MLP

Now we train a custom neural network from scratch with **two hidden layers**, implemented using NumPy.

Key differences from the single-layer version:
- Two hidden layers with sigmoid activations
- Cross-entropy loss with softmax output
- Manual backpropagation extended for depth


In [None]:
from models import TwoLayerNeuralNetMLP

# Hidden layer sizes
NUM_HIDDEN_1 = 50
NUM_HIDDEN_2 = 50

print("Training NumPy two-layer MLP...")
model_custom_2layer = TwoLayerNeuralNetMLP(num_features, NUM_HIDDEN_1, NUM_HIDDEN_2, num_classes)

loss_custom_2layer, acc_train_custom_2layer, acc_valid_custom_2layer = custom_train(
    model_custom_2layer,
    X_train, y_train,
    X_valid, y_valid,
    num_epochs=NUM_EPOCHS,
    minibatch_size=MINIBATCH_SIZE,
    learning_rate=LEARNING_RATE
)


### Evaluation on Test Set (Two-Layer NumPy)

We now evaluate the custom NumPy MLP with two hidden layers on the test set.


In [None]:
_, acc_test_custom_2layer = compute_ce_and_acc(
    model_custom_2layer,
    X_test, y_test,
    custom=True,
    minibatch_size=MINIBATCH_SIZE
)

print(f"Custom NumPy Two-Layer MLP Test Accuracy: {acc_test_custom_2layer * 100:.2f}%")


## Train PyTorch Single-Layer MLP

Next, we'll train a PyTorch model with a single hidden layer. This will help us compare how the framework handles training versus our custom NumPy implementation.


In [None]:
import torch
import torch.nn as nn
from training import train_torch_model

# Define PyTorch single-layer model
model_torch_single = nn.Sequential(
    nn.Linear(num_features, NUM_HIDDEN_1),
    nn.Sigmoid(),
    nn.Linear(NUM_HIDDEN_1, num_classes)
)

# Loss and optimizer
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model_torch_single.parameters(), lr=LEARNING_RATE)

# Prepare datasets
import torch.utils.data as data_utils

train_dataset_torch = data_utils.TensorDataset(torch.tensor(X_train), torch.tensor(y_train))
valid_dataset_torch = data_utils.TensorDataset(torch.tensor(X_valid), torch.tensor(y_valid))

print("Training PyTorch single-layer MLP...")
loss_torch_single, train_acc_torch_single, valid_acc_torch_single = train_torch_model(
    model_torch_single,
    train_dataset_torch,
    valid_dataset_torch,
    num_epochs=NUM_EPOCHS,
    minibatch_size=MINIBATCH_SIZE,
    loss_fn=loss_fn,
    optimizer=optimizer
)


### Evaluation on Test Set (PyTorch Single-Layer)

Evaluating the PyTorch single-layer model on the test data.


In [None]:
_, acc_test_torch_single = compute_ce_and_acc(
    model_torch_single,
    X_test,
    y_test,
    custom=False,
    minibatch_size=MINIBATCH_SIZE
)

print(f"PyTorch Single-Layer MLP Test Accuracy: {acc_test_torch_single * 100:.2f}%")


## Train PyTorch Two-Layer MLP

Now, we'll train a PyTorch model with two hidden layers, allowing us to compare deeper architectures.


In [None]:
# Define PyTorch two-layer model
model_torch_two = nn.Sequential(
    nn.Linear(num_features, NUM_HIDDEN_1),
    nn.Sigmoid(),
    nn.Linear(NUM_HIDDEN_1, NUM_HIDDEN_2),
    nn.Sigmoid(),
    nn.Linear(NUM_HIDDEN_2, num_classes)
)

# Loss and optimizer (reuse)
optimizer_two = torch.optim.SGD(model_torch_two.parameters(), lr=LEARNING_RATE)

print("Training PyTorch two-layer MLP...")
loss_torch_two, train_acc_torch_two, valid_acc_torch_two = train_torch_model(
    model_torch_two,
    train_dataset_torch,
    valid_dataset_torch,
    num_epochs=NUM_EPOCHS,
    minibatch_size=MINIBATCH_SIZE,
    loss_fn=loss_fn,
    optimizer=optimizer_two
)


### Evaluation on Test Set (PyTorch Two-Layer)

Evaluating the PyTorch two-layer model on the test data.


In [None]:
_, acc_test_torch_two = compute_ce_and_acc(
    model_torch_two,
    X_test,
    y_test,
    custom=False,
    minibatch_size=MINIBATCH_SIZE
)

print(f"PyTorch Two-Layer MLP Test Accuracy: {acc_test_torch_two * 100:.2f}%")


## Training Curves Comparison

Let's visualize and compare the training loss and accuracy curves for:

- Custom NumPy single-layer MLP  
- PyTorch single-layer MLP  
- PyTorch two-layer MLP  


In [None]:
import matplotlib.pyplot as plt

epochs = range(1, NUM_EPOCHS + 1)

plt.figure(figsize=(14, 6))

# Plot training loss
plt.subplot(1, 2, 1)
plt.plot(epochs, loss_custom, label='Custom Single-Layer Loss')
plt.plot(epochs, loss_torch_single, label='PyTorch Single-Layer Loss')
plt.plot(epochs, loss_torch_two, label='PyTorch Two-Layer Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.title('Training Loss Over Epochs')
plt.legend()

# Plot training accuracy
plt.subplot(1, 2, 2)
plt.plot(epochs, train_acc_custom, label='Custom Single-Layer Accuracy')
plt.plot(epochs, train_acc_torch_single, label='PyTorch Single-Layer Accuracy')
plt.plot(epochs, train_acc_torch_two, label='PyTorch Two-Layer Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Training Accuracy (%)')
plt.title('Training Accuracy Over Epochs')
plt.legend()

plt.tight_layout()
plt.show()

## Conclusion

In this notebook, we implemented and trained multilayer perceptron (MLP) models from scratch using NumPy and compared them with equivalent PyTorch implementations.

Key takeaways:
- The custom NumPy implementation helps deepen understanding of the underlying math and mechanics of neural networks.
- PyTorch models offer more flexibility and efficiency, especially with automatic differentiation and GPU acceleration.
- Adding more hidden layers (going from one to two layers) generally improved the model's learning capacity and accuracy.
- Visualizing training curves enables us to better understand model convergence and compare different architectures.

This hands-on approach bridges theory and practice, preparing you to build and experiment with more advanced neural networks in real-world applications.
