# How to Use Furiosa SDK from Start to Finish

This notebook demonstrates how to use Furiosa SDK from start to finish.

## Prerequisites

The Furiosa SDK needs to have been installed. If not, it can be installed following instructions on https://furiosa-ai.github.io/docs/latest/ko/ (Korean) or https://furiosa-ai.github.io/docs/latest/en/ (English). The `torchvision` and `scipy` packages also need to be installed for this demonstration.

```console
$ pip install 'furiosa-sdk[quantizer]' torchvision scipy
```

In [1]:
import time

import numpy as np
import onnx
import torch
import torchvision
from torchvision import transforms
import tqdm

from furiosa.optimizer import optimize_model
from furiosa.quantizer import quantize, Calibrator, CalibrationMethod
import furiosa.runtime.session

libfuriosa_hal.so --- v0.11.0, built @ 43c901f


## Load PyTorch Model

As a running example, we employ the pre-trained ResNet-50 model from Torchvision.

In [2]:
torch_model = torchvision.models.resnet50(weights='DEFAULT')
torch_model = torch_model.eval()  # Set the model to inference mode.

Downloading: "https://download.pytorch.org/models/resnet50-11ad3fa6.pth" to /home/elicer/.cache/torch/hub/checkpoints/resnet50-11ad3fa6.pth
100%|██████████| 97.8M/97.8M [00:01<00:00, 84.8MB/s]


The ResNet50 model has been trained with the following preprocessing applied: https://pytorch.org/vision/stable/models.html We will use the same preprocessing for calibration and inference.

In [3]:
preprocess = transforms.Compose(
    [
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
    ]
)

## Export PyTorch Model to ONNX Model

We call the `torch.onnx.export` function to export the PyTorch ResNet-50 model to an ONNX model. The function executes a PyTorch model provided as its first argument, recording a trace of what operators are used during the execution, and then converts those operators into ONNX equivalents. Because `torch.onnx.export` runs the model, we need to provide the function with an input tensor as its second argument, which can be random so long as it satisfies the shape and type of the model's input. As of Furiosa SDK v0.6, ONNX OpSet 12 is the most well-supported version.

In [4]:
# Generate a dummy input of the shape, (1, 3, 224, 224), of the model's input.
dummy_input = (torch.randn(1, 3, 224, 224),)

# Export the PyTorch model into an ONNX model.
torch.onnx.export(
    torch_model,  # PyTorch model to export
    dummy_input,  # model input
    "../convert/resnet50.onnx",  # where to save the exported ONNX model
    opset_version=13,  # the ONNX OpSet version to export the model to
    do_constant_folding=True,  # whether to execute constant folding for optimization
    input_names=["input"],  # the ONNX model's input names
    output_names=["output"],  # the ONNX model's output names
)

# Load the exported ONNX model.
onnx_model = onnx.load_model("../convert/resnet50.onnx")

## Load Dataset

We will use subsets of the ImageNet dataset for calibration and validation. 

You need to download `ILSVRC2012_img_val.tar` and `ILSVRC2012_devkit_t12.tar.gz` externally and place them in the `imagenet` directory. Torchvision cannot download the ImageNet dataset automatically because it is no longer publicly accessible: https://github.com/pytorch/vision/pull/1457.

Note that it may take several minutes to run this step for the first time because it involves decompressing the archive files. It will take much less time to complete subsequently.

In [8]:
imagenet = torchvision.datasets.ImageNet("imagenet", split="val", transform=preprocess)

RuntimeError: The archive ILSVRC2012_devkit_t12.tar.gz is not present in the root directory or is corrupted. You need to download it externally and place it in imagenet.

## Calibrate and Quantize ONNX Model

For quick demonstration, a small number of samples randomly chosen from the ImageNet dataset is used for calibration.

In [9]:
calibration_dataset = torch.utils.data.Subset(imagenet, torch.randperm(len(imagenet))[:100])
calibration_dataloader = torch.utils.data.DataLoader(calibration_dataset, batch_size=1)

NameError: name 'imagenet' is not defined

We call the `optimize_model` function to optimize onnx model, before calibration/quantization of the model.

In [10]:
onnx_model = optimize_model(onnx_model)

2023-11-25 08:12:53.983820575 [E:onnxruntime:Default, env.cc:251 ThreadMain] pthread_setaffinity_np failed for thread: 2414, index: 2, mask: {2, 5, }, error code: 22 error msg: Invalid argument. Specify the number of threads explicitly so the affinity is not set.
2023-11-25 08:12:53.983991238 [E:onnxruntime:Default, env.cc:251 ThreadMain] pthread_setaffinity_np failed for thread: 2413, index: 1, mask: {1, 4, }, error code: 22 error msg: Invalid argument. Specify the number of threads explicitly so the affinity is not set.
2023-11-25 08:12:53.984020917 [E:onnxruntime:Default, env.cc:251 ThreadMain] pthread_setaffinity_np failed for thread: 2412, index: 0, mask: {0, 3, }, error code: 22 error msg: Invalid argument. Specify the number of threads explicitly so the affinity is not set.
2023-11-25 08:12:54.629961970 [E:onnxruntime:Default, env.cc:251 ThreadMain] pthread_setaffinity_np failed for thread: 2417, index: 2, mask: {2, 5, }, error code: 22 error msg: Invalid argument. Specify the n

We use Calibrator to calibrate the model with various CalibrationMethod (e.g. MIN_MAX_ASYM, ENTROPY_ASYM, ...)

In [11]:
help(Calibrator)

Help on class Calibrator in module furiosa.quantizer.calibrator:

class Calibrator(builtins.object)
 |  Calibrator(model: Union[onnx.onnx_ml_pb2.ModelProto, bytes], calibration_method: furiosa.quantizer.calibrator.CalibrationMethod, *, percentage: float = 99.99)
 |  
 |  Calibrator.
 |  
 |  This collects the values of tensors in an ONNX model and computes
 |  their ranges.
 |  
 |  Methods defined here:
 |  
 |  __init__(self, model: Union[onnx.onnx_ml_pb2.ModelProto, bytes], calibration_method: furiosa.quantizer.calibrator.CalibrationMethod, *, percentage: float = 99.99)
 |      Args:
 |          model (onnx.ModelProto or bytes): An ONNX model to
 |              calibrate.
 |          calibration_method (CalibrationMethod): A calibration
 |              method.
 |          percentage (float): A percentage to use with percentile
 |              calibration. Defaults to 99.99 (i.e. 99.99%-percentile
 |              calibration).
 |  
 |  collect_data(self, calibration_dataset: Iterable

In [9]:
help(CalibrationMethod)

Help on class CalibrationMethod in module furiosa.quantizer:

class CalibrationMethod(enum.IntEnum)
 |  CalibrationMethod(value, names=None, *, module=None, qualname=None, type=None, start=1)
 |  
 |  Calibration method.
 |  
 |  Attributes:
 |      MIN_MAX_ASYM (CalibrationMethod):
 |          Min-max calibration (Asymmetric).
 |      MIN_MAX_SYM (CalibrationMethod):
 |          Min-max calibration (Symmetric).
 |      ENTROPY_ASYM (CalibrationMethod):
 |          Entropy calibration (Aymmetric).
 |      ENTROPY_SYM (CalibrationMethod):
 |          Entropy calibration (Symmetric).
 |      PERCENTILE_ASYM (CalibrationMethod):
 |          Percentile calibration (Asymmetric).
 |      PERCENTILE_SYM (CalibrationMethod):
 |          Percentile calibration (Symmetric).
 |      MSE_ASYM (CalibrationMethod):
 |          Mean squared error (MSE) calibration (Asymmetric).
 |      MSE_SYM (CalibrationMethod):
 |          Mean squared error (MSE) calibration (Symmetric).
 |      SQNR_ASYM (Calibr

In [35]:
import onnxruntime as rt
# onnx_model2 = onnx.load_model("../convert/resnet50.onnx")
sess = rt.InferenceSession("../convert/resnet50.onnx", None)

input_name = sess.get_inputs()[0].name
print("input name", input_name)
input_shape = sess.get_inputs()[0].shape
print("input shape", input_shape)
input_type = sess.get_inputs()[0].type
print("input type", input_type)

output_name = sess.get_outputs()[0].name
print("output name", output_name)
output_shape = sess.get_outputs()[0].shape
print("output shape", output_shape)
output_type = sess.get_outputs()[0].type
print("output type", output_type)

x = np.random.rand(1, 3, 224, 224)
x = x.astype(np.float32)
res = sess.run([output_name], {input_name: x})
print(res[0].shape)

input name input
input shape [1, 3, 224, 224]
input type tensor(float)
output name output
output shape [1, 1000]
output type tensor(float)
(1, 1000)


2023-11-25 08:26:44.480583480 [E:onnxruntime:Default, env.cc:251 ThreadMain] pthread_setaffinity_np failed for thread: 2562, index: 0, mask: {0, 3, }, error code: 22 error msg: Invalid argument. Specify the number of threads explicitly so the affinity is not set.
2023-11-25 08:26:44.480595524 [E:onnxruntime:Default, env.cc:251 ThreadMain] pthread_setaffinity_np failed for thread: 2563, index: 1, mask: {1, 4, }, error code: 22 error msg: Invalid argument. Specify the number of threads explicitly so the affinity is not set.
2023-11-25 08:26:44.480606326 [E:onnxruntime:Default, env.cc:251 ThreadMain] pthread_setaffinity_np failed for thread: 2564, index: 2, mask: {2, 5, }, error code: 22 error msg: Invalid argument. Specify the number of threads explicitly so the affinity is not set.


Before the Calibrator actually computes the ranges, input data should be collected.

In [36]:
calibrator = Calibrator(onnx_model, CalibrationMethod.MIN_MAX_ASYM)

for i in range(10):
    calibrator.collect_data([[np.random.rand(1,3,224,224).astype(np.float32)]])

# for calibration_data, _ in tqdm.tqdm(calibration_dataloader, desc="Calibration", unit="images", mininterval=0.5):
#     calibrator.collect_data([[calibration_data.numpy()]])

2023-11-25 08:26:52.175669302 [E:onnxruntime:, env.cc:251 ThreadMain] pthread_setaffinity_np failed for thread: 2568, index: 2, mask: {2, 5, }, error code: 22 error msg: Invalid argument. Specify the number of threads explicitly so the affinity is not set.
2023-11-25 08:26:52.175674230 [E:onnxruntime:, env.cc:251 ThreadMain] pthread_setaffinity_np failed for thread: 2567, index: 1, mask: {1, 4, }, error code: 22 error msg: Invalid argument. Specify the number of threads explicitly so the affinity is not set.
2023-11-25 08:26:52.175670792 [E:onnxruntime:, env.cc:251 ThreadMain] pthread_setaffinity_np failed for thread: 2566, index: 0, mask: {0, 3, }, error code: 22 error msg: Invalid argument. Specify the number of threads explicitly so the affinity is not set.


In [37]:
ranges = calibrator.compute_range()

With the range computed, now we can quantize the model by calling `quantize` function.

In [38]:
graph = quantize(onnx_model, ranges)

In [13]:
help(quantize)

Help on function quantize in module furiosa.quantizer:

quantize(model: Union[onnx.onnx_ml_pb2.ModelProto, bytes], tensor_name_to_range: Mapping[str, Sequence[float]], *, with_quantize: bool = True, normalized_pixel_outputs: Optional[Sequence[int]] = None) -> Graph
    Quantize an ONNX model on the basis of the range of its tensors.
    
    Args:
        model (onnx.ModelProto or bytes): An ONNX model to quantize.
        tensor_name_to_range (Mapping[str, Sequence[float]]):
            A mapping from a tensor name to a 2-tuple (or list) of the
            tensor's min and max.
        with_quantize (bool): Whether to put a Quantize operator at the
            beginning of the resulting model. Defaults to True.
        normalized_pixel_outputs (Optional[Sequence[int]]):
            A sequence of indices of output tensors in the ONNX model
            that produce pixel values in a normalized format ranging
            from 0.0 to 1.0. If specified, the corresponding output
           

In case you want already have calibrated model once and have ranges info, you can save the ranges info inside a file and
load it in order to skip calibration phase.

In [40]:
import json

with open("../convert/resnet_ranges.json", "w") as f:
    f.write(json.dumps(ranges, indent=4))
with open("../convert/resnet_ranges.json", "r") as f:
    ranges = json.load(f)

graph = quantize(onnx_model, ranges)

## Run Inference with Quantized Model

For quick demonstration, we use randomly chosen 1000 samples from the ImageNet dataset for validation.

In [44]:
# validation_dataset = torch.utils.data.Subset(imagenet, torch.randperm(len(imagenet))[:1000])
# validation_dataloader = torch.utils.data.DataLoader(validation_dataset, batch_size=1)

correct_predictions, total_predictions = 0, 0
elapsed_time = 0
with furiosa.runtime.session.create(graph) as session:
    for i in range(10):
        image = np.random.rand(1,3,224,224).astype(np.float32)

    # for image, label in tqdm.tqdm(validation_dataloader, desc="Evaluation", unit="images", mininterval=0.5):
    #     image = image.numpy()
        start = time.perf_counter_ns()
        outputs = session.run(image)
        elapsed_time += time.perf_counter_ns() - start
        
        # prediction = np.argmax(outputs[0].numpy(), axis=1)  # postprocessing  
        # if prediction == label.numpy():
        #     correct_predictions += 1
        # total_predictions += 1

[2m2023-11-25T08:31:33.973269Z[0m [32m INFO[0m [2mfuriosa_rt_core::driver::event_driven::coord[0m[2m:[0m FuriosaRT (v0.10.3, rev: 394c19392, built at: 2023-11-22T08:53:04Z) bootstrapping ...
[2m2023-11-25T08:31:33.976967Z[0m [32m INFO[0m [2mfuriosa_rt_core::driver::event_driven::coord[0m[2m:[0m Found furiosa-compiler (v0.10.1, rev: 8b00177, built at: 2023-11-23T02:22:00Z)
[2m2023-11-25T08:31:33.976975Z[0m [32m INFO[0m [2mfuriosa_rt_core::driver::event_driven::coord[0m[2m:[0m Found libhal (type: warboy, v0.12.0, rev: 56530c0 built at: 2023-11-16T12:37:25Z)
[2m2023-11-25T08:31:33.976980Z[0m [32m INFO[0m [2mfuriosa_rt_core::driver::event_driven::coord[0m[2m:[0m [Runtime-3] detected 1 NPU device(s):
[2m2023-11-25T08:31:34.007398Z[0m [32m INFO[0m [2mfuriosa_rt_core::driver::event_driven::coord[0m[2m:[0m - [0] npu:10:0-1 (warboy-b0-2pe, 128dpes, firmware: 1.7.8, e9f371e)
[2m2023-11-25T08:31:34.007474Z[0m [32m INFO[0m [2mfuriosa_rt_core::driver::ev

In [16]:
accuracy = correct_predictions / total_predictions
print(f"Accuracy: {accuracy:%}")

latency = elapsed_time / total_predictions
print(f"Average Latency: {latency / 1_000_000} ms")

Accuracy: 82.800000%
Average Latency: 12.019611846 ms
