# Multilayer Perceptrons

<div style="display: flex; align-items: center;">
    <img src="../imgs/MLP.jpg" alt="Your Image" width="600" style="margin-right: 20px;">
    <div>
        <p>The Multi-Layer Perceptron (MLP) serves as a fundamental building block in neural network architectures, offering a straightforward yet powerful framework for various machine learning tasks. A typical MLP comprises an input layer, hidden layer(s), and output layer, all densely interconnected—a characteristic known as fully connected.</p>
        <p>MLPs excel in classification tasks, making them essential tools in machine learning. They adeptly discern patterns and relationships in complex datasets, particularly in tasks like image recognition and natural language processing. Driven by the backpropagation algorithm, MLPs iteratively adjust parameters to optimize performance on a given task.</p>
        <p>Join us as we embark on implementing an MLP from scratch, exploring neural network architecture and diving into the intricacies of training and testing. Our destination? The renowned MNIST dataset—a staple in machine learning benchmarks. Through this journey, we aim to understand the inner workings of MLPs and witness their prowess in digit recognition!</p>
    </div>
</div>


## MLP code implementation

Let's first define a simple MLP with only one hidden layer.

This multilayer perceptron (MLP) model consists of three linear layers (fully connected layers). And during the forward propagation process, data passes through each linear layer, followed by the application of the ReLU activation function to introduce non-linearity into the model. Finally, the predicted output `y_hat` is returned from the output layer.

In [1]:
import torch
import torch.nn as nn
import torch.nn.functional as F

class MLP(nn.Module):
    def __init__(self, input_size, output_size):
        super().__init__()
        
        self.input_fc = nn.Linear(input_size, 256)
        self.hidden_fc_1 = nn.Linear(256, 128)
        self.output_fc = nn.Linear(128, output_size)
        
    def forward(self, x):
        batch_size = x.shape[0]
        x = x.view(batch_size, -1)
        
        x = F.relu(self.input_fc(x))
        x = F.relu(self.hidden_fc_1(x))
        y_hat = self.output_fc(x)
        
        return y_hat

## Train&Test MLP on MNIST

The MNIST dataset is a widely-used benchmark dataset in the field of machine learning. It consists of a large collection of handwritten digits, ranging from 0 to 9. Each digit image is a grayscale image with a size of 28x28 pixels. MNIST is often used for training and testing various machine learning algorithms, particularly for tasks like image classification and digit recognition. Its simplicity and accessibility make it an ideal starting point for learning and experimenting with machine learning techniques.

In [2]:
import torch
import torchvision

import numpy as np
import random
import time
from torch.cuda.amp import GradScaler, autocast

### Set the Seed

In [3]:
SEED = 1234
random.seed(SEED)
np.random.seed(SEED)
torch.manual_seed(SEED)
torch.cuda.manual_seed(SEED)
torch.backends.cudnn.deterministic = True

### Download and allocate data

In [4]:
ROOT = '.data'
train_valid_data = torchvision.datasets.MNIST(root=ROOT, train=True, download=True)
test_data = torchvision.datasets.MNIST(root=ROOT, train=False, download=False)
VALID_RATE = 0.1
train_data, valid_data = torch.utils.data.random_split(train_valid_data, 
                                           [int(len(train_valid_data)*(1-VALID_RATE)),
                                            int(len(train_valid_data)* VALID_RATE)])

print(f"The numble of train examples:{len(train_data)}")
print(f"The numble of valid examples:{len(valid_data)}")
print(f"The numble of test examples:{len(test_data)}")

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz to .data\MNIST\raw\train-images-idx3-ubyte.gz


100%|███████████████████████████████████████████████████████████████████| 9912422/9912422 [00:02<00:00, 3305933.81it/s]


Extracting .data\MNIST\raw\train-images-idx3-ubyte.gz to .data\MNIST\raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz to .data\MNIST\raw\train-labels-idx1-ubyte.gz


100%|████████████████████████████████████████████████████████████████████████| 28881/28881 [00:00<00:00, 134341.61it/s]


Extracting .data\MNIST\raw\train-labels-idx1-ubyte.gz to .data\MNIST\raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz to .data\MNIST\raw\t10k-images-idx3-ubyte.gz


100%|███████████████████████████████████████████████████████████████████| 1648877/1648877 [00:01<00:00, 1079676.14it/s]


Extracting .data\MNIST\raw\t10k-images-idx3-ubyte.gz to .data\MNIST\raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Failed to download (trying next):
HTTP Error 403: Forbidden

Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz
Downloading https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz to .data\MNIST\raw\t10k-labels-idx1-ubyte.gz


100%|█████████████████████████████████████████████████████████████████████████| 4542/4542 [00:00<00:00, 1512667.05it/s]

Extracting .data\MNIST\raw\t10k-labels-idx1-ubyte.gz to .data\MNIST\raw

The numble of train examples:54000
The numble of valid examples:6000
The numble of test examples:10000





### Data normalization

In [5]:
mean = train_valid_data.data.float().mean()/255.0
std = train_valid_data.data.float().std()/255.0

print(f"Calculated mean:{mean}")
print(f"Calculated std:{std}")

Calculated mean:0.13066047430038452
Calculated std:0.30810779333114624


### Data augmentation

In [6]:
train_transforms = torchvision.transforms.Compose([
    torchvision.transforms.RandomRotation(5, fill=(0,)),
    torchvision.transforms.RandomCrop(28, padding=2),
    torchvision.transforms.ToTensor(),
    torchvision.transforms.Normalize(mean=[mean], std=[std])
])
test_transforms = torchvision.transforms.Compose([
    torchvision.transforms.ToTensor(),
    torchvision.transforms.Normalize(mean=[mean], std=[std])
])

train_data.dataset.transform = train_transforms
valid_data.dataset.transform = test_transforms
test_data.transform = test_transforms

print(train_data[0][0].shape)

torch.Size([1, 28, 28])


### Load data

In [7]:
BATCH_SIZE = 256
train_dataloader = torch.utils.data.DataLoader(train_data, shuffle=True, batch_size=BATCH_SIZE)
valid_dataloader = torch.utils.data.DataLoader(valid_data, batch_size=BATCH_SIZE)
test_dataloader = torch.utils.data.DataLoader(test_data, batch_size=BATCH_SIZE)

### Load model and Start to train&valid

In [8]:
model = MLP(input_size=28*28, output_size=10)
if torch.cuda.is_available():
    model = model.cuda()

loss_fn = nn.CrossEntropyLoss()
if torch.cuda.is_available():
    loss_fn = loss_fn.cuda()

learning_rate = 0.01
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
scaler = GradScaler()

total_train_step = 0
total_valid_step = 0
epoch = 10
start_time = time.time()

print("----- Now start training -----")
for i in range(epoch):
    print(f"----- Start training with epoch {i + 1} -----")
    epoch_train_loss = 0.0

    model.train()
    for data in train_dataloader:
        inputs, targets = data
        if torch.cuda.is_available():
            inputs = inputs.cuda()
            targets = targets.cuda()
        with autocast():
            outputs = model(inputs)
            loss = loss_fn(outputs, targets)
        scaler.scale(loss).backward()
        scaler.step(optimizer)
        optimizer.zero_grad()
        scaler.update()

        total_train_step += 1
        epoch_train_loss += loss.item()

    end_time = time.time()
    avg_epoch_train_loss = epoch_train_loss / len(train_dataloader)
    print(f"This epoch training time is {end_time - start_time} seconds.")
    print(f"Training step: {total_train_step}, Average Loss: {avg_epoch_train_loss}")
    start_time = time.time()

    model.eval()
    total_valid_loss = 0
    total_accuracy = 0
    num_samples = 0
    with torch.no_grad():
        for data in valid_dataloader:
            inputs, targets = data
            if torch.cuda.is_available():
                inputs = inputs.cuda()
                targets = targets.cuda()
            outputs = model(inputs)
            loss = loss_fn(outputs, targets)
            total_valid_loss += loss.item()
            accuracy = (outputs.argmax(1) == targets).sum().item()
            total_accuracy += accuracy
            num_samples += len(inputs)

    avg_valid_loss = total_valid_loss / len(valid_dataloader)
    avg_accuracy = total_accuracy / num_samples * 100
    print(f"The total Loss on valid dataset: {avg_valid_loss}")
    print(f"The Accuracy on valid dataset: {avg_accuracy:.2f}%")
    total_valid_step += 1

----- Now start training -----
----- Start training with epoch 1 -----
This epoch training time is 8.421887397766113 seconds.
Training step: 211, Average Loss: 0.2805302936177683
The total Loss on valid dataset: 0.18387826159596443
The Accuracy on valid dataset: 94.57%
----- Start training with epoch 2 -----
This epoch training time is 9.451736450195312 seconds.
Training step: 422, Average Loss: 0.12983831309523627
The total Loss on valid dataset: 0.1489059990271926
The Accuracy on valid dataset: 95.53%
----- Start training with epoch 3 -----
This epoch training time is 9.430496215820312 seconds.
Training step: 633, Average Loss: 0.1062478459400447
The total Loss on valid dataset: 0.1367642773936192
The Accuracy on valid dataset: 96.10%
----- Start training with epoch 4 -----
This epoch training time is 9.696455717086792 seconds.
Training step: 844, Average Loss: 0.08660069806280576
The total Loss on valid dataset: 0.14670784818008542
The Accuracy on valid dataset: 96.02%
----- Start t

### Go testing!

In [9]:
model.eval()

total_test_loss = 0
total_test_accuracy = 0
num_samples = 0
with torch.no_grad():
    for data in test_dataloader:
        inputs, targets = data
        if torch.cuda.is_available():
            inputs = inputs.cuda()
            targets = targets.cuda()
        outputs = model(inputs)
        loss = loss_fn(outputs, targets)
        total_test_loss += loss.item()
        accuracy = (outputs.argmax(1) == targets).sum().item()
        total_test_accuracy += accuracy
        num_samples += len(inputs)

avg_test_loss = total_test_loss / len(test_dataloader)
avg_test_accuracy = total_test_accuracy / num_samples * 100

print("----- Test Results -----")
print(f"The total Loss on test dataset: {avg_test_loss}")
print(f"The Accuracy on test dataset: {avg_test_accuracy:.2f}%") 

----- Test Results -----
The total Loss on test dataset: 0.15674970799518634
The Accuracy on test dataset: 96.86%
