# Hello World Example

This is a simple Jupyter Notebook that walks through the 4 steps of compiling and running a PyTorch model on the embedded Neural Processing Unit (NPU) in your AMD Ryzen AI enabled PC. The steps are as follows:

1. Get model
2. Export to ONNX
3. Quantize
4. Run Model on CPU and IPU

In [12]:
# Before starting, be sure you've installed the requirements listed in the requirements.txt file:
!python -m pip install -r requirements.txt



### 1. Get Model
Here, we'll use the PyTorch library to define and instantiate a simple neural network model called `SmallModel`.

In [5]:
import torch

torch.manual_seed(0)

# Define model class
class SmallModel(torch.nn.Module):
    def __init__(self, input_size, output_size):
        super(SmallModel, self).__init__()
        self.fc = torch.nn.Linear(input_size, output_size)

    def forward(self, x):
        output = self.fc(x)
        return output
    

# Instantiate model and generate inputs
input_size = 10
output_size = 5
pytorch_model = SmallModel(input_size, output_size)

print(pytorch_model)

SmallModel(
  (fc): Linear(in_features=10, out_features=5, bias=True)
)


### 2. Export to ONNX

The following code is used for exporting a PyTorch model (pytorch_model) to the ONNX (Open Neural Network Exchange) format. The ONNX file is needed to use the VitisAI Quantizer. 

In [6]:
# Prep for ONNX export
inputs = {"x": torch.rand(input_size, input_size)}
input_names = ['input']
output_names = ['output']
dynamic_axes = {'input': {0: 'batch_size'}, 'output': {0: 'batch_size'}}
tmp_model_path = "models/helloworld.onnx"

# Call export function
torch.onnx.export(
        pytorch_model,
        inputs,
        tmp_model_path,
        export_params=True,
        opset_version=13,  # Recommended opset
        input_names=input_names,
        output_names=output_names,
        dynamic_axes=dynamic_axes,
    )
    

### 3. Quantize Model

Using the static quantization method provided by the Vitis AI Quantizer and providing the newly exported ONNX model, we'll quantize the model to INT8. For more information on this quantization method, see [Vitis AI ONNX Quantization](https://ryzenai.docs.amd.com/en/latest/vai_quant/vai_q_onnx.html).

In [7]:
import vai_q_onnx
from onnxruntime.quantization import QuantFormat, QuantType

# `input_model_path` is the path to the original, unquantized ONNX model.
input_model_path = "models/helloworld.onnx"

# `output_model_path` is the path where the quantized model will be saved.
output_model_path = "models/helloworld_quantized.onnx"

vai_q_onnx.quantize_static(
    input_model_path,
    output_model_path,
    calibration_data_reader=None,
    quant_format=QuantFormat.QDQ,
    calibrate_method=vai_q_onnx.PowerOfTwoMethod.MinMSE,
    activation_type=QuantType.QInt8,
    weight_type=QuantType.QInt8,
    enable_dpu=True,
    extra_options={'ActivationSymmetric': True}
)

print('Calibrated and quantized model saved at:', output_model_path)

INFO:vai_q_onnx.quantize:calibration_data_reader is None, using random data for calibration
INFO:vai_q_onnx.quant_utils:The input ONNX model models/helloworld.onnx can create InferenceSession successfully
INFO:vai_q_onnx.quant_utils:Random input name input shape [1, 10] type <class 'numpy.float32'> 
INFO:vai_q_onnx.quant_utils:The input ONNX model models/helloworld.onnx can run inference successfully
INFO:vai_q_onnx.quantize:Removed initializers from input
INFO:vai_q_onnx.quantize:Loading model...
INFO:vai_q_onnx.quantize:enable_dpu is True, optimize the model for better hardware compatibility.
INFO:vai_q_onnx.quantize:Start calibration...
INFO:vai_q_onnx.quantize:Start collecting data, runtime depends on your model size and the number of calibration dataset.
INFO:vai_q_onnx.quant_utils:Random input name input shape [1, 10] type <class 'numpy.float32'> 
INFO:vai_q_onnx.calibrate:Finding optimal threshold for each tensor using PowerOfTwoMethod.MinMSE algorithm ...
INFO:vai_q_onnx.calibr

[VAI_Q_ONNX_INFO]: Time information:
2023-12-06 16:41:30.666613
[VAI_Q_ONNX_INFO]: OS and CPU information:
                                        system --- Windows
                                          node --- vgodsoe-ryzen
                                       release --- 10
                                       version --- 10.0.22621
                                       machine --- AMD64
                                     processor --- AMD64 Family 25 Model 116 Stepping 1, AuthenticAMD
[VAI_Q_ONNX_INFO]: Tools version information:
                                        python --- 3.9.18
                                          onnx --- 1.15.0
                                   onnxruntime --- 1.15.1
                                    vai_q_onnx --- 1.16.0+be3c70b
[VAI_Q_ONNX_INFO]: Quantized Configuration information:
                                   model_input --- models/helloworld.onnx
                                  model_output --- models/helloworld_quantized

Computing range: 100%|██████████| 2/2 [00:00<00:00, 752.75tensor/s]
INFO:vai_q_onnx.qdq_quantizer:Remove QuantizeLinear & DequantizeLinear on certain operations(such as conv-relu).


Calibrated and quantized model saved at: models/helloworld_quantized.onnx


### 4. Run Model

#### CPU Run

Before runnning the model on the IPU, we'll run the model on the CPU and get the execution time for comparison with the IPU. We'll also use the ONNX Runtime Profiling to get some more information about the inference. For more information on this, see [Profiling Tools](https://onnxruntime.ai/docs/performance/tune-performance/profiling-tools.html) from ONNX Runtime. 

In [8]:
import onnxruntime
import numpy as np
from timeit import default_timer as timer

# Specify the path to the quantized ONNZ Model
onnx_model_path = "models/helloworld_quantized.onnx"

# Create some random input data for testing
input_data = np.random.uniform(low=-1, high=1, size=[1,10]).astype(np.float32)

cpu_options = onnxruntime.SessionOptions()
cpu_options.enable_profiling = True

# Create Inference Session to run the quantized model on the CPU
cpu_session = onnxruntime.InferenceSession(
    onnx_model_path,
    providers = ['CPUExecutionProvider'],
    sess_options=cpu_options,
)
start = timer()
cpu_results = cpu_session.run(None, {'input': input_data})
cpu_total = timer() - start

cpu_session.end_profiling()


'onnxruntime_profile__2023-12-06_16-44-40.json'

#### IPU Run

Now, we'll run it on the IPU and time the execution so that we can compare the results with the CPU.

In [9]:
# Compile and run

# Point to the config file path used for the VitisAI Execution Provider
config_file_path = "vaip_config.json"

aie_options = onnxruntime.SessionOptions()
aie_options.enable_profiling = True

aie_session = onnxruntime.InferenceSession(
    onnx_model_path,
    providers = ['VitisAIExecutionProvider'],
    sess_options=aie_options,
    provider_options=[{'config_file': config_file_path}]
)

start = timer()
ryzen_outputs = aie_session.run(None, {'input': input_data})
aie_total = timer() - start

aie_session.end_profiling()

'onnxruntime_profile__2023-12-06_16-44-58.json'

Let's gather our results and see what we have

In [11]:
print(f"Ryzen Results: {ryzen_outputs}")
print(f"CPU Results: {cpu_results}")

print(f"CPU Total Time: {cpu_total}")
print(f"IPU Total Time: {aie_total}")

Ryzen Results: [array([[ 0.18359375,  0.49609375,  0.49609375, -0.48828125,  0.15625   ]],
      dtype=float32)]
CPU Results: [array([[ 0.18359375,  0.49609375,  0.49609375, -0.48828125,  0.15625   ]],
      dtype=float32)]
CPU Total Time: 0.0006584999999859065
IPU Total Time: 0.0005332000000066728
