#### Project: A multilayer perceptron for multiclass classification with Batch Normalization

Batch Normalization: This is a technique of normalizing the input input data to every layer before applying an activation function to speed up the learning. we normalize by subrating the mean from the data and devide by standard deviation


#### Packages selection
- The first things is to import all the neccesary packages needed for this project

In [3]:
import time
import torch
import torch.nn as nn
import torch.nn.functional as F
from torchvision import datasets
from torchvision import transforms
from torch.utils.data import DataLoader

# select GPU when cuda is available
if torch.cuda.is_available():
    torch.backends.cudnn.deterministic = True

#### Settings
- Configure the device
- define all the hyperparameters to be used and needed to be tuned to achive a better accuracy
- Load and explore the data

In [6]:
# device
device = torch.device("cuda:3" if torch.cuda.is_available() else "cpu")

# hyperparameters
learning_rate = 0.001
random_seed = 1
num_epochs = 10
batch_size = 64


# model architecture parameters
num_features = 784
num_hidden_1 = 128
num_hidden_2 = 256
num_classes = 10


# dataset => MNIST
# Note: transform.ToTensor() scale image from 0-1 range

train_dataset = datasets.MNIST(root='data',
                              train=True,
                              transform=transforms.ToTensor(),
                              download=True)

test_dataset = datasets.MNIST(root='data',
                             train=False,
                             transform=transforms.ToTensor())

train_loader = DataLoader(dataset=train_dataset,
                         batch_size=batch_size,
                         shuffle=True)

test_loader = DataLoader(dataset=test_dataset,
                        batch_size=batch_size,
                        shuffle=False)

# check the dataset
for images, labels in train_loader:
    print("Image batch dimension", images.shape)
    print("Inage label dimension", labels.shape)
    break

Image batch dimension torch.Size([64, 1, 28, 28])
Inage label dimension torch.Size([64])


#### Define the architecture of the model such as
- The number of input layers; which is determined by the features of the data
- Number of total hidden layers in the model (iterative) of hidden units in each layers (iterative)
- The output layer node units is determined by the intended outcome to achieve
- Here: we build a 3 layers multilayer perceptron i.e 2 hidden layers and 1 output layer
- Note: We don't count the input layer as part of the layers.

In [8]:
"""
Architecture
X -> Linear -> BatchNorm -> ReLU -> Linear -> BatchNorm -> ReLU -> Linear -> Softmax layer -> y
"""

class MultilayerPerceptron(nn.Module):
    def __init__(self, num_features, num_classes):
        super(MultilayerPerceptron, self).__init__()
        
        # 1st hidden layer
        self.linear_1 = nn.Linear(num_features, num_hidden_1)
        # Nomalize this layer before applying some activation function
        self.linear_1_bn = nn.BatchNorm1d(num_hidden_1)
        
        # 2nd hidden layer
        self.linear_2 = nn.Linear(num_hidden_1, num_hidden_2)
        # Nomalize this layer before applying some activation function
        self.linear_2_bn = nn.BatchNorm1d(num_hidden_2)
        
        # output layer
        self.linear_out = nn.Linear(num_hidden_2, num_classes)
        
    def forward(self, x):
        out = self.linear_1(x)
        out = self.linear_1_bn(out)
        out = F.relu(out)
        
        out = self.linear_2(out)
        out = self.linear_2_bn(out)
        out = F.relu(out)
        
        outputs = self.linear_out(out)
        probas = F.softmax(outputs, dim=1)
        return outputs, probas

#### Loss function and optimizer¶
- Instantiate the model
- define the specific Loss function to be used either cross entropy, MSELoss, etc
- define the optimization algorithm to be used either SGD, Adam, RMSprop, Momentum etc.

In [9]:
torch.manual_seed(random_seed)
model = MultilayerPerceptron(num_features, num_classes).to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

#### compute accuracy
- A function to compute train and test accuracy

In [16]:
def compute_accuracy(model, data_loader):
    model.eval()
    correct_predictions, num_examples = 0, 0
    with torch.no_grad():
        for features, labels in data_loader:
            features = features.view(-1, 28*28).to(device)
            labels = labels.to(device)
            outputs, probas = model(features)
            _, predicted_labels = torch.max(probas, 1)
            num_examples += labels.size(0)
            correct_predictions += (predicted_labels == labels).sum()
        return correct_predictions.float() / num_examples * 100

#### Training a model requires the following steps¶
- Reset all the gradients to zero (0)
- Make a forward pass (make a prediction)
- Calculate the loss
- Perform back propagation
- Update all the parameters (weight and biases)

In [18]:
start_time = time.time()
total_step = len(train_loader)
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        images = images.view(-1, 28*28).to(device)
        labels = labels.to(device)
        
        # Forward and Back pass
        outputs, probas = model(images)
        loss = F.cross_entropy(outputs, labels)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        # Logging
        if not i % 50:
            print('Epoch %03d/%03d | Batch: %03d/%03d | loss: %.4f'
                 %(epoch+1, num_epochs, i, total_step, loss))
    print('Epoch: %03d/%03d training accuracy: %.2f%%' %(
    epoch+1, num_epochs, compute_accuracy(model, train_loader)))
    print('Time elapsed: %.2f min ' % ((time.time() - start_time) / 60))
print('Total training time: %.2f min' %(time.time() - start_time / 60))

Epoch 001/010 | Batch: 000/938 | loss: 0.0657
Epoch 001/010 | Batch: 050/938 | loss: 0.0457
Epoch 001/010 | Batch: 100/938 | loss: 0.0015
Epoch 001/010 | Batch: 150/938 | loss: 0.0003
Epoch 001/010 | Batch: 200/938 | loss: 0.0002
Epoch 001/010 | Batch: 250/938 | loss: 0.0003
Epoch 001/010 | Batch: 300/938 | loss: 0.0071
Epoch 001/010 | Batch: 350/938 | loss: 0.0136
Epoch 001/010 | Batch: 400/938 | loss: 0.0047
Epoch 001/010 | Batch: 450/938 | loss: 0.0001
Epoch 001/010 | Batch: 500/938 | loss: 0.0037
Epoch 001/010 | Batch: 550/938 | loss: 0.0001
Epoch 001/010 | Batch: 600/938 | loss: 0.0243
Epoch 001/010 | Batch: 650/938 | loss: 0.0019
Epoch 001/010 | Batch: 700/938 | loss: 0.0034
Epoch 001/010 | Batch: 750/938 | loss: 0.0109
Epoch 001/010 | Batch: 800/938 | loss: 0.0596
Epoch 001/010 | Batch: 850/938 | loss: 0.1468
Epoch 001/010 | Batch: 900/938 | loss: 0.0168
Epoch: 001/010 training accuracy: 99.58%
Time elapsed: 0.38 min 
Epoch 002/010 | Batch: 000/938 | loss: 0.0309
Epoch 002/010 |

Epoch 009/010 | Batch: 750/938 | loss: 0.0152
Epoch 009/010 | Batch: 800/938 | loss: 0.0002
Epoch 009/010 | Batch: 850/938 | loss: 0.0019
Epoch 009/010 | Batch: 900/938 | loss: 0.0001
Epoch: 009/010 training accuracy: 99.69%
Time elapsed: 3.82 min 
Epoch 010/010 | Batch: 000/938 | loss: 0.0000
Epoch 010/010 | Batch: 050/938 | loss: 0.0027
Epoch 010/010 | Batch: 100/938 | loss: 0.0000
Epoch 010/010 | Batch: 150/938 | loss: 0.0503
Epoch 010/010 | Batch: 200/938 | loss: 0.0000
Epoch 010/010 | Batch: 250/938 | loss: 0.0031
Epoch 010/010 | Batch: 300/938 | loss: 0.0058
Epoch 010/010 | Batch: 350/938 | loss: 0.0000
Epoch 010/010 | Batch: 400/938 | loss: 0.0016
Epoch 010/010 | Batch: 450/938 | loss: 0.0000
Epoch 010/010 | Batch: 500/938 | loss: 0.0065
Epoch 010/010 | Batch: 550/938 | loss: 0.0000
Epoch 010/010 | Batch: 600/938 | loss: 0.0005
Epoch 010/010 | Batch: 650/938 | loss: 0.0001
Epoch 010/010 | Batch: 700/938 | loss: 0.0139
Epoch 010/010 | Batch: 750/938 | loss: 0.0082
Epoch 010/010 |

#### Testing/Evaluation

In [20]:
# Print the accuracy
print("Test Accuracy:  %.2f%%" %(compute_accuracy(model, test_loader)))

Test Accuracy:  98.02%
