# Part 1: Train MLP

## Goals
* Learn how to train an MLP model using PyTorch.
* Learn how to export a trained model to ONNX.

## References
* [Ryzen AI Software Platform](https://ryzenai.docs.amd.com/en/latest/inst.html)

* [mnist dataset](https://www.tensorflow.org/datasets/catalog/mnist)


---

## Step 1: Set environment

**1.1**

Follow the instructions on the [Ryzen AI Software Platform](https://ryzenai.docs.amd.com/en/latest/inst.html) website to install the Ryzen AI software platform.

**1.2**

Activate the created environment and use `pip install <package_name>` to install the necessary packages.

**1.3: Import necessary package**

Run the following cell to import all the necessary packages to be able to run the inference in the Ryzen AI NPU.

In [3]:
import torch
import torchvision
from torchvision import datasets
from torch.utils.data import DataLoader
import numpy as np
import torchvision.transforms as transforms

---

## Step 2: Dataset
We will use the MNIST dataset for training and testing our MLP model. The dataset consists of handwritten digits and is available through the torchvision package.

Considering torchvision package can download the MNIST dataset, we use the torchvision API to download.

### Downloading the MNIST Dataset

* 'root' represents the path of the dataset.
* 'train' indicates whether it is the training set.
* 'transform = torchvision.transforms.ToTensor()' convert data in PIL Image format to torch.Tensor.

In [4]:
# train_data = datasets.MNIST(root="./dataset/", train=True, download=True, transform=torchvision.transforms.ToTensor())
# test_data  = datasets.MNIST(root="./dataset/", train=False,download=True, transform=torchvision.transforms.ToTensor())
train_data = torchvision.datasets.FashionMNIST("./data", download=True, transform=
                                                transforms.Compose([transforms.ToTensor()]))
test_data = torchvision.datasets.FashionMNIST("./data", download=True, train=False, transform=
                                               transforms.Compose([transforms.ToTensor()]))  

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz to ./data\FashionMNIST\raw\train-images-idx3-ubyte.gz


100%|█████████████████████████████████████████████████████████████████| 26421880/26421880 [00:12<00:00, 2095784.48it/s]


Extracting ./data\FashionMNIST\raw\train-images-idx3-ubyte.gz to ./data\FashionMNIST\raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz to ./data\FashionMNIST\raw\train-labels-idx1-ubyte.gz


100%|████████████████████████████████████████████████████████████████████████| 29515/29515 [00:00<00:00, 190216.04it/s]


Extracting ./data\FashionMNIST\raw\train-labels-idx1-ubyte.gz to ./data\FashionMNIST\raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz to ./data\FashionMNIST\raw\t10k-images-idx3-ubyte.gz


100%|███████████████████████████████████████████████████████████████████| 4422102/4422102 [00:02<00:00, 2206220.97it/s]


Extracting ./data\FashionMNIST\raw\t10k-images-idx3-ubyte.gz to ./data\FashionMNIST\raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz to ./data\FashionMNIST\raw\t10k-labels-idx1-ubyte.gz


100%|██████████████████████████████████████████████████████████████████████████████████████| 5148/5148 [00:00<?, ?it/s]

Extracting ./data\FashionMNIST\raw\t10k-labels-idx1-ubyte.gz to ./data\FashionMNIST\raw






In [5]:
print("train data length: ", len(train_data))
print("test data length: ", len(test_data))

print("train data shape: ", len(train_data[0]))
print("train data image: ", train_data[0][0].shape)
print("train data label: ", train_data[0][1])


train data length:  60000
test data length:  10000
train data shape:  2
train data image:  torch.Size([1, 28, 28])
train data label:  9


### Batch Size
Batch Size is one of the most important hyperparameters (which is set mannually rather than learned  by the model). 

Batch Size represents the number of samples in one forward propagation and backward pass. The batch size impact the accuracy and efficiency of the training processing. Small batch size can lead to slow training but relatively high accuray, while a large batch size will result in the opposite. 

In practice, it is common to try different batch sizes to find the optimal setting.


In [6]:
batch_size = 64 

### Create DataLoader
DataLoader is a very important interface for PyTorch. Its main function is to encapsulate a custom Dataset into a batch size Tensor based on the batch size, whether to shuffle, etc., for subsequent training.

Specifically, DataLoader can do the following:

* Batch processing: You can specify the amount of data used for training in each batch (batch size).
* Shuffle the data: When training a model, we usually want each epoch to process the samples in the data set in a random order. 


In [7]:
train_loader = DataLoader(dataset=train_data,batch_size=batch_size,shuffle=True)
test_loader  = DataLoader(dataset=test_data, batch_size=batch_size,shuffle=True)

---

## Step 3: Buiding a MLP model

A Multilayer Perceptron (MLP) is a type of feedforward artificial neural network (ANN). It consists of multiple layers of nodes, including an input layer, one or more hidden layers, and an output layer. Each node, except for those in the input layer, is a neuron that uses a nonlinear activation function. MLPs are capable of learning complex patterns by adjusting the weights of the connections through a process called backpropagation.

In this implementation, we will create an MLP by inheriting from the torch.nn.Module class provided by PyTorch. This allows us to leverage the rich functionality provided by PyTorch for defining and managing neural networks. We will override the __init__ method to set up the layers of the network and the forward method to specify the data flow during the forward pass.

### Implementation
1. Inheriting from torch.nn.Module:

* By inheriting from *'torch.nn.Module'*, we can create our custom neural network class. This inheritance allows us to define the architecture and forward propagation logic while taking advantage of PyTorch's built-in features for training and optimization.

2. Defining the *'__init__'* method:

* In the *'__init__'* method, we initialize the layers of the MLP. This involves defining the linear transformations (fully connected layers) and the activation functions. Each linear layer is represented by *'torch.nn.Linear'*, and we use *'torch.nn.ReLU'* as the activation function between the layers.

3. Defining the *'forward'* method:

* The *'forward'* method defines the forward pass of the network. It outlines how the input data passes through each layer, undergoing transformations and activations in sequence.

In [8]:
class MLP(torch.nn.Module):
    def __init__(self, num_input, num_hidden) -> None:
        super(MLP,self).__init__()
        self.linear1 = torch.nn.Linear(num_input, num_hidden)
        self.relu1 = torch.nn.ReLU()
        self.linear2 = torch.nn.Linear(num_hidden, num_hidden)
        self.relu2 = torch.nn.ReLU()
        self.linear3 = torch.nn.Linear(num_hidden, 10)
        

    def forward(self, x):
        x = self.linear1(x)
        x = self.relu1(x)
        x = self.linear2(x)
        x = self.relu2(x)
        x = self.linear3(x)

        return x

---

## Step 4: Train the model
We can have an intuitive understanding of the training process, which is to get the best parameters for this model on this dataset.

So, we have two questions:
* How to define what are the best parameters? (loss function)
* How to iterate the parameters of model? (optimizer)

### initialize model
According to the definition of the MLP class, we need the input data shape and hidden layer shape.

* The input data shape is the image size (28x28), which the image matrix has been flattened into a one-dimensional vector.
* The hidden layer shape will update while training the MLP model, therefore we just set the initial value.

In [9]:
num_input = 28*28
num_hidden = 320
model = MLP(num_input, num_hidden)

### Setting the device
We can select different devices to train the model: GPU or CPU.

If you install cuda and torch-cuda successfully, you can use cuda to speed ​​up the training process.If you use cuda as device to train, you should be careful about data transfers between different devices (cuda and cpu). 



In [10]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = model.to(device) # model transfer to device from 'cpu'

### loss function

The purpose of the loss function is to measure how well the model predicts. In practice, we define a function named loss function to compute the difference between the predicted value and ground truth.

Therefore, the smaller the loss, the better the model.

We can have a simple but intuitive understanding of the loss function: the loss can be the predicted value subtracted from the true value. But we often define more complex methods to compute loss.

In this case, we define a **'CrossEntropyLoss'** method to compute the difference between the predicted value and ground truth.

In [11]:
loss_fn = torch.nn.CrossEntropyLoss()

### optimizer

We solve the second question: 'How to iterate the parameters of the model?'

Use an optimizer to adjust model parameters to minimize the loss function.

PyTorch can select different optimizers in **'torch.optim'**.

In [12]:
optimizer = torch.optim.Adam(model.parameters())

### epochs

Epoch refers to the process of iterating the entire dataset.

In one Epoch, the model performs forward propagation and backpropagation on the entire dataset to update all parameters.

We set epoch equal to 10 in this example.

In [13]:
epochs = 10

### training process
**train the model**
* Get data(image and label) and copy to device
* Input image to the model and get the output(predicted value)
* Compute loss 
* Update paramters of model using optimizer
* Compare the output and label to compute accuracy

**evaluate the model**
* Get data(image and label) and copy to device
* Input image to the model and get the output(predicted value)
* Compute loss
* Compare the output and label to compute accuracy 

In [14]:
for epoch in range(epochs):
    train_loss = 0
    train_acc = 0
    model.train()
    for idx, data in enumerate(train_loader):
        img, labels = data
        img, labels = img.to(device), labels.to(device) # copy data to device
        img = torch.flatten(img, start_dim=1)#28*28 matrix flat to 1D vector

        outputs = model(img)
        loss = loss_fn(outputs,labels)#compute loss value

        # Update the parameters
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        #sum loss in one epoch
        train_loss += loss.item()
        #get the predicted value
        _, preds=torch.max(outputs.data,1)
        #Check if the predicted value is equal to ground truth and compute accuracy
        num_correct = torch.sum(preds == labels).item()
        train_acc += num_correct / img.shape[0]

    #for this epoch, we evaluate the current model using test dataset 
    model.eval()
    val_loss = 0
    val_acc = 0

    for data in test_loader:
        inputs, lables = data
        inputs, labels = inputs.to(device), lables.to(device) #copy data to device
        inputs = torch.flatten(inputs, start_dim=1) 

        #get the predicted value 
        outputs = model(inputs)
        loss = loss_fn(outputs,labels)

        #compute loss and accuracy on test dataset
        val_loss += loss.item()
        _, preds=torch.max(outputs.data,1)
        num_correct = torch.sum(preds==labels.data).item()
        val_acc += num_correct / inputs.shape[0]

    #print every epoch to check
    print('[{}/{}]: train loss:{:.3f}, train acc:{:.3f}%, val loss:{:.3f}, val acc:{:.3f}%'.format(epoch + 1, epochs, 
        train_loss / len(train_loader), 100*train_acc / len(train_loader),
        val_loss / len(test_loader),    100*val_acc / len(test_loader)))

[1/10]: train loss:0.500, train acc:81.836%, val loss:0.443, val acc:83.708%
[2/10]: train loss:0.360, train acc:86.822%, val loss:0.399, val acc:85.529%
[3/10]: train loss:0.326, train acc:87.973%, val loss:0.370, val acc:86.684%
[4/10]: train loss:0.302, train acc:88.826%, val loss:0.369, val acc:86.485%
[5/10]: train loss:0.284, train acc:89.471%, val loss:0.335, val acc:87.958%
[6/10]: train loss:0.267, train acc:89.907%, val loss:0.365, val acc:87.012%
[7/10]: train loss:0.255, train acc:90.372%, val loss:0.328, val acc:88.714%
[8/10]: train loss:0.243, train acc:90.747%, val loss:0.335, val acc:88.346%
[9/10]: train loss:0.232, train acc:91.206%, val loss:0.356, val acc:87.560%
[10/10]: train loss:0.220, train acc:91.671%, val loss:0.326, val acc:88.595%


---

## Step 5: Save the model
We have trained MLP model, now we save the model on our computer (model's parameters and structure). 

Later, we can not train again when we want to use the model.

In [15]:
import os

In [16]:
def save_model(model, path:str):
    if(not os.path.exists(path)):
        os.mkdir(path)
    torch.save(model, os.path.join(path, "mlp_trained.pt "))

In [17]:
model_path = "./models"
save_model(model, model_path)

---

## Step 6: Export ONNX model
We got a trained model after last section, and we can export to onnx format which a standard format for describing computational graphs. ONNX is used as a bridge from deep learning framework to inference engine and we usually just use ONNX to represent computational graphs that are easier to deploy.

In [18]:
path = "./models"
torch_model = os.path.join(path, "mlp_trained.pt ")
model = torch.load(torch_model)

### torch.onnx.export API
The torch.onnx.export API is to input the image the model and traverse the computional graph, therefore can get the computional graph to export onnx fotmat.

In [19]:
onnx_model = os.path.join(path, "mlp_trained.onnx")#onnx model path
img = torch.randn([1, 28*28]) #input data
input_names = ["input"]
output_names = ['output']
dynamic_axes = {'input': {0: 'batch_size'}, 'output': {0: 'batch_size'}}#{0: 'batch_size'} represent 0 dim's shape is dynamic and 0 dim'name is batch size
tmp_model_path = os.path.join(path, "mlp_trained.onnx")
torch.onnx.export(
        model,
        (img),
        onnx_model,
        export_params=True,
        opset_version=13,
        input_names=input_names,
        output_names=output_names,
        dynamic_axes=dynamic_axes,
    )

### Visual  Computational graph
we can use website: https://netron.app/ to view onnx model.
>Netron is a viewer for neural network, deep learning and machine learning models.

In [20]:
from IPython.display import IFrame

notebook_url = "https://netron.app/"

iframe = IFrame(notebook_url, width=800, height=1000)

display(iframe)