<img src="https://futurejobs.my/wp-content/uploads/2021/05/d-min-1024x297.png" width="300"> </img>

> **Copyright &copy; 2021 Skymind Education Group Sdn. Bhd.**<br>
 <br>
This program and the accompanying materials are made available under the
terms of the [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0). \
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
License for the specific language governing permissions and limitations
under the License. <br>
<br>**SPDX-License-Identifier: Apache-2.0** 

# MNIST - MultiLayer Perceptron
Authored by : [Nazurah Kamil](mailto:nazurah.kamil@skymind.my)

In this notebook, we will be applying <a href=https://en.wikipedia.org/wiki/Multilayer_perceptron>multilayer perceptron</a>
algorithm to classify number from 0 - 9 by using only linear layers. <a href=https://en.wikipedia.org/wiki/MNIST_database>MNIST
dataset</a> will be implemented throughout this hands-on. <br>

Let's import the library that we want to use.

At the end of the notebook you will be able to:
* Know how to use the MNIST dataset.
* Classify the numbers in the MNIST dataset.
* Understand how to work with the multilayer perceptron that use linear layers only.

In [None]:
import numpy as np
import os

import torch
import torchvision
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torchvision import transforms, datasets
from torchvision.utils import make_grid
from torch.utils.data import DataLoader

import time
from torch.utils.tensorboard import SummaryWriter
%load_ext tensorboard
import matplotlib.pyplot as plt
%matplotlib inline

The first step is to download and perform transformation to our dataset.

In [None]:
# Downloading and loading data... download may take some time
train_set = torchvision.datasets.MNIST(
    root='../datasets',
    train=True,
    download=True,
    transform=transforms.Compose([transforms.ToTensor()])
)
test_set = torchvision.datasets.MNIST(
    root='../datasets',
    train=False,
    download=True,
    transform=transforms.Compose([transforms.ToTensor()])
)

In [None]:
# Show the first result of training set
print(train_set[0])

From the first record, it returns a two-item tuple. The first item of our data is an image, and the second item is a label. In our dataset, it shows the first label is 5. Next, let see the size of our first record.

In [None]:
# Show the size and label for our first record data
image, label = train_set[0]
print("Shape : {}, Label : {}".format(image.shape, label))

In [None]:
# Show the first image of trainset
plt.imshow(train_set[0][0].reshape(28, 28), cmap='gray')
plt.show()

# DataLoader

Use `DataLoader` to load the dataset, so that the dataset is iterable. Batch size is configured here too.

In [None]:
torch.manual_seed(123)  # For consistent results
train_loader = DataLoader(dataset=train_set, batch_size=100, shuffle=True)
test_loader = DataLoader(dataset=test_set, batch_size=100, shuffle=False)

# Put loader into dictionary
dataloaders = {'train': train_loader, 'test': test_loader}

It's important to have a balanced dataset to prevent overfitting during model training.

In [None]:
# Count each label
total = 0
count_dict = {0: 0, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: 0, 8: 0, 9: 0}

for data in train_loader:
    image, label = data
    for y in label:
        count_dict[int(y)] += 1
        total += 1

print(count_dict)

In [None]:
# Percentage of each label
for i in count_dict:
    print("{} : {:.2f} % ".format(i, count_dict[i]/total*100))

It seems that our dataset is quite balanced. Next, let's view the first 10 images from `train_loader`.

In [None]:
np.set_printoptions(formatter={'int': (lambda x: f'{x:4}')})  # Widen the array

# Grab the first batch of image
for images, labels in train_loader:
    break

# Print the first 10 labels
print('Labels: ', labels[:10].numpy())

# Print the first 10 images
im = make_grid(images[:10], nrow=10)
plt.figure(figsize=(10, 6))
plt.imshow(np.transpose(im, (1, 2, 0)))  # tranpose from CHW to WHC
plt.show()

# Model Development

**Set Up Hyperparameter**<br>
Here, we know that our images are in 28x28 size. So, we need to flatten it into 784 (28x28) to fit into the model. This
784 will be the number of inputs in the model. The number of outputs is 10 because we want to classify 10 labels
(0, 1, 2, 3, 4, 5, 6, 7, 8, 9). <br>
Other than that, hyperparameters required are the number of epoch and learning rate.

In [None]:
# Set up hyperparameter
epochs = 10
num_input = 784  # 28x28
num_output = 10
lr_rate = 0.001

We will be using log_softmax as the activation function in the output layer. This part is **skipped** because we will be
using Cross-Entropy Loss as our loss function. As stated in <a href=https://pytorch.org/docs/stable/nn.html#crossentropyloss>
`torch.nn.CrossEntropyLoss()`</a> documentation. This ***criterion*** will combine `nn.LogSoftmax()` and `nn.NLLLoss()`
in one single class.

In [None]:
class MultiLayerPerceptron(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(num_input, 164)
        self.fc2 = nn.Linear(164, 100)
        self.fc3 = nn.Linear(100, num_output)

    def forward(self, x):
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x
        # Apply log_softmax in output layer (skip this part)
        # return F.log_softmax(x, dim = 1)

In [None]:
# Initialize model
model = MultiLayerPerceptron()
model

# Model Parameters

These few layers have been implemented into our model :
1. nn.Linear(784,164)
2. nn.ReLU()
3. nn.Linear(164,100)
4. nn.ReLU()
5. nn.Linear(100,10)

In [None]:
# Calculate total model parameters
sum = 0
for param in model.parameters():
    item = param.numel()
    print(f'{item:>6}')
    sum = sum + item
print(f'-------\n{sum}')

The sum of the model parameters is **146250**. Here you can see that we use a **large number of parameters** while in CNN,
fewer parameters will be used to reduce computation power.

# Flatten Image
Let's see the batch tensor of our training data. This batch tensor have a batch of [100,1,28,28]. In order to apply our
model to data, use `.view()` to flatten the size into [100, 784].

In [None]:
# Load first batch and print shape
for images, labels in train_loader:
    print('Batch shape (before flatten):', images.size(), labels.shape)
    break

# Flatten image
print('After flatten image :', images.view(-1, 784).size())

Take note that we need to perform **flatten image** before input the data into the model.

# Start the Training

Set up loss function and optimizer.

In [None]:
# Loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=lr_rate)

In [None]:
writer = SummaryWriter('../run/MLP_MNIST')

During the training, test score will be calculated for each epoch so that the loss graph for both train and test data
are comparable.

In [None]:
torch.manual_seed(0)
start_time = time.time()  # Start time

# Implement model training and validation loop
loss_score = {'train': [], 'test': []}
accuracy_score = {'train': [], 'test': []}

for epoch in range(1, epochs+1):
    print(f'Epoch {epoch}\n--------')

    for loader in ['train', 'test']:

        running_loss = 0.0
        running_size = 0
        correct = 0
        log_interval = 100

        if loader == 'train':
            model.train()
        else:
            model.eval()

        for iter, (X, y) in enumerate(dataloaders[loader]):
            iter += 1
            # Set gradient calculation on / off
            with torch.set_grad_enabled(loader == 'train'):
                output = model(X.view(-1, 784))  # Flatten the image here
                loss = criterion(output, y)

                # Calculate loss
                running_loss += loss.item() * output.size(0)
                running_size += output.size(0)

                # Calculate accuracy
                predict = torch.max(output, 1)[1]
                correct += (predict == y).sum().item()

                if loader == 'train':
                    optimizer.zero_grad()
                    loss.backward()
                    optimizer.step()
                    # Print every 100 iteration
                    if iter == 1 or (iter % log_interval) == 0:
                        print('Iteration:{} Loss:{:.6} Accuracy:{:.6} Batch size:[{}/{}]'.format(
                            int(iter),
                            running_loss/running_size,
                            (100*correct)/running_size,
                            running_size,
                            len(train_set)
                        ))

        # Accuracy and loss per epoch
        accuracy = (100*correct) / running_size
        loss_per_epoch = running_loss / running_size

        print('\n{} Loss:{}'.format(loader.capitalize(), loss_per_epoch))
        print('{} Accuracy:{}'.format(loader.capitalize(), accuracy))

        loss_score[loader].append(loss_per_epoch)
        accuracy_score[loader].append(accuracy)

        writer.add_scalars('Losses', {loader: loss_per_epoch}, epoch)
        writer.add_scalars('Accuracy', {loader: accuracy}, epoch)
    print('***************\n')

# Print the time elapsed
print(f'\nDuration: {time.time() - start_time:.0f} seconds')

In [None]:
# Visualize loss
fig, ax = plt.subplots()
fig.set_size_inches(14, 7)
ax.set_title("Loss Score against Epoch")
ax.grid(b=True)
ax.set_xlabel("Epoch Number")
ax.set_ylabel("Loss Score")

ax.plot(loss_score['train'], color='goldenrod', label='Training Loss')
ax.plot(loss_score['test'], color='green', label='Test Loss')
ax.legend()

In [None]:
# Visualize accuracy
fig, ax = plt.subplots()
fig.set_size_inches(14, 7)
ax.set_title("Accuracy against Epoch")
ax.grid(b=True)
ax.set_xlabel("Epoch Number")
ax.set_ylabel("Accuracy")

ax.plot(accuracy_score['train'], color='goldenrod', label='Training Accuracy')
ax.plot(accuracy_score['test'], color='green', label='Test Accuracy')
ax.legend()

From the graph above, we can see that **this model overfitting the training data**.

In [None]:
%tensorboard --logdir ../run/MLP_MNIST --port 6009

# Save Model

This step will save the learned parameter for our model.

In [None]:
if not os.path.isdir("../generated_model"): os.makedirs("../generated_model") # Create folder
torch.save(model.state_dict(), '../generated_model/mlp_MNIST.pt')  # Save the learned parameters

# Load Model

Here, we want to test the model object to our test set to make sure it works.

In [None]:
new_model = MultiLayerPerceptron()
new_model.load_state_dict(torch.load('../generated_model/mlp_MNIST.pt'))
new_model.eval()

In [None]:
correct = 0.0
with torch.no_grad():
    for X, y in test_loader:
        output = new_model(X.view(-1, 784))
        # Loss
        loss = criterion(output, y)

        # Accuracy
        predict = torch.max(output, 1)[1]
        correct += (predict == y).sum()

    correct = correct / len(test_loader.dataset)
print(f'Loss : {loss:.4f} Accuracy :{correct*100:.4f}')

# Test on a Single Data

We want to test if our model can predict the first data point of test set. Below shows the actual label for the first
data point of test set is 7.

In [None]:
image, label = test_set[0]
print("Shape : {}, Label : {}".format(image.shape, label))

In [None]:
for image, label in test_loader:

    output = new_model(image.view(-1, 784))
    predict = torch.max(output, 1)[1]

    print(f'Prediction: {predict[0]}')

    print(f'Label size: {label.size()}')

    print(f'Actual label: {label[0]}')
    break

Looks like our model predicted correctly. Next, we will be implemented gpu for faster computation.

# (Optional) - Using GPU

***When do we need GPU?***<br>
These are the common situation that we might face during training deep learning model:
* Large dataset
* Large model that contains up to million parameters
* Complex model<br>

All of the situations above might slow down training the deep learning model. So, we need to leverage GPU to speed up the training time. 

***Important Note :***<br><br>
You will need to install cudatoolkit and compatible cudnn within your laptop.
In this course, we will be using cuda version 10.2 and cudnn version 7.6.5.
Follow this <a href=https://docs.nvidia.com/deeplearning/cudnn/install-guide/index.html#install-windows>link</a> if you want to install cuda on your laptop. 

 ***Below steps are skipped because we have installed cuda on our environment***
* **cudatoolkit** - Please download pytorch version that includes "cudatoolkit" at <a href=https://pytorch.org/get-started/locally/>pytorch.org</a>.<br>
* **cudnn version** - Next step is to find cudnn version that is compatible with your laptop/computer.
Please refer <a href=https://www.programmersought.com/article/20033556147/>here</a> for more info.<br>
* **cuda-pytorch** - If you want to know more about cuda in pytorch, click <a href=https://pytorch.org/docs/stable/notes/cuda.html>here</a>.

First, we want to know if we have CUDA available on our laptop or desktop.

In [None]:
torch.cuda.is_available()

There are two things required to be GPU compatible:
1. Model
2. Tensor

In [None]:
# Get Id of default device
torch.cuda.current_device()

In [None]:
# Input our model in gpu
model_gpu = MultiLayerPerceptron()

if torch.cuda.is_available():
    device = torch.device("cuda:0")
    model_gpu.to(device)

In [None]:
# Loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model_gpu.parameters(), lr=0.001)

In [None]:
torch.manual_seed(0)
start_time = time.time()  # Start time

# Implement model training and validation loop
loss_score = {'train': [], 'test': []}
accuracy_score = {'train': [], 'test': []}

for epoch in range(1, epochs+1):
    print(f'Epoch {epoch}\n--------')

    for loader in ['train', 'test']:

        running_loss = 0.0
        running_size = 0
        correct = 0
        log_interval = 100

        if loader == 'train':
            model_gpu.train()
        else:
            model_gpu.eval()

        for iter, (X, y) in enumerate(dataloaders[loader]):

            """Use GPU For Model """
            if torch.cuda.is_available():
                X = X.to(device)
                y = y.to(device)

            # Set gradient calculation on / off
            with torch.set_grad_enabled(loader == 'train'):
                output = model_gpu(X.view(-1, 784))  # Flatten the image here
                loss = criterion(output, y)

                # Calculate loss
                running_loss += loss.item() * output.size(0)
                running_size += output.size(0)

                # Calculate accuracy
                predict = torch.max(output, 1)[1]
                correct += (predict == y).sum().item()

                if loader == 'train':
                    optimizer.zero_grad()
                    loss.backward()
                    optimizer.step()
                    # Print every 100 iteration
                    if iter == 1 or (iter % log_interval) == 0:
                        print('Iteration:{} Loss:{:.6} Accuracy:{:.6} Batch size:[{}/{}]'.format(
                            int(iter),
                            running_loss/running_size,
                            (100*correct)/running_size,
                            running_size,
                            len(train_set)
                        ))

        # Accuracy and loss per epoch
        accuracy = (100*correct) / running_size
        loss_per_epoch = running_loss / running_size

        print('\n{} Loss:{}'.format(loader.capitalize(), loss_per_epoch))
        print('{} Accuracy:{}'.format(loader.capitalize(), accuracy))

        loss_score[loader].append(loss_per_epoch)
        accuracy_score[loader].append(accuracy)

    print('***************\n')

# Print the time elapsed
print(f'\nDuration: {time.time() - start_time:.0f} seconds')

In [None]:
# Visualize loss
fig, ax = plt.subplots()
fig.set_size_inches(14, 7)
ax.set_title("Loss Score against Epoch")
ax.grid(b=True)
ax.set_xlabel("Epoch Number")
ax.set_ylabel("Loss Score")

ax.plot(loss_score['train'], color='goldenrod', label='Training Loss')
ax.plot(loss_score['test'], color='green', label='Test Loss')
ax.legend()

In [None]:
# Visualize accuracy
fig, ax = plt.subplots()
fig.set_size_inches(14, 7)
ax.set_title("Accuracy against Epoch")
ax.grid(b=True)
ax.set_xlabel("Epoch Number")
ax.set_ylabel("Accuracy")

ax.plot(accuracy_score['train'], color='goldenrod', label='Training Accuracy')
ax.plot(accuracy_score['test'], color='green', label='Test Accuracy')
ax.legend()

It took a shorter time to train the model if we are using GPU.

# Conclusion

Multilayer perceptron might cause our model to overfit the training data. Furthermore, full connectivity is wasteful
and used a huge number of parameters to train the model. This will take a longer time to train the model.