# Torch Onnx Export

Contining the previous section, [fpgaconvnet-torch](https://github.com/Yu-Zhewen/fpgaconvnet-torch/tree/main/models) can compress CNN models using techniques such as quantization, pruning, encoding, and etc. These techniques will impact the performance and resource utilization of the hardware accelerator. Therefore, we annotate these compression-related information as attributes of onnx nodes using the [`generate_onnx_files`](https://github.com/Yu-Zhewen/fpgaconvnet-torch/blob/main/models/utils.py#L147) function.

In [1]:
import os
from models import initialize_wrapper
from quantization.utils import QuantMode, quantize_model

dataset_name = 'cifar10'
dataset_path = os.path.expanduser("~/dataset/cifar10")
model_name = 'vgg16_bn'
batch_size = 64
workers = 4

model_wrapper = initialize_wrapper(dataset_name, model_name,
                                    dataset_path, batch_size, workers)

Files already downloaded and verified
Files already downloaded and verified


Using cache found in /home/zy18/.cache/torch/hub/chenyaofo_pytorch-cifar-models_master


In [2]:
print("NETWORK FP16 Inference")
# reload the model everytime a new quantization mode is tested
model_wrapper.load_model()
quantize_model(model_wrapper, {
                'weight_width': 16, 'data_width': 16, 'mode': QuantMode.NETWORK_FP})
model_wrapper.inference("test")
model_wrapper.generate_onnx_files("fp16")

NETWORK FP16 Inference
network weight min: tensor(-0.6226, grad_fn=<MinimumBackward0>)
network weight max: tensor(0.4982, grad_fn=<MaximumBackward0>)
Inference mode: calibrate


Using cache found in /home/zy18/.cache/torch/hub/chenyaofo_pytorch-cifar-models_master


 * Acc@1 100.000 Acc@5 100.000
activation min: tensor(-10.3823)
activation max: tensor(13.2761)
Inference mode: test
 * Acc@1 94.170 Acc@5 99.710


Using cache found in /home/zy18/.cache/torch/hub/chenyaofo_pytorch-cifar-models_master


'fp16/vgg16_bn.onnx'

We can also visualize the exported onnx graph in [`netron`](https://github.com/lutzroeder/netron)

In [3]:
import netron

netron.start('fp16/vgg16_bn.onnx')

Serving 'fp16/vgg16_bn.onnx' at http://localhost:8080


('localhost', 8080)

Take convolutional layers for example, in each layer's attributes, there are `acc_width`, `block_floating_point`, `data_width` and `weight_width` related to the quantization. If an onnx file does not contain these customized quantization information, fpgaconvnet-model will interpret the precision fp16 by default.

In addition, in case you don't need the accuracy result, you can directly modify the `sideband_info` attribute of the `model_wrapper` object to switch a precision. For example,

In [4]:
model_wrapper.load_model()
model_wrapper.sideband_info['quantization'] = {
                'weight_width': 8, 'data_width': 8, 'mode': QuantMode.CHANNEL_BFP}
model_wrapper.generate_onnx_files("bfp8")
netron.start('bfp8/vgg16_bn.onnx')

Using cache found in /home/zy18/.cache/torch/hub/chenyaofo_pytorch-cifar-models_master
Using cache found in /home/zy18/.cache/torch/hub/chenyaofo_pytorch-cifar-models_master


Serving 'bfp8/vgg16_bn.onnx' at http://localhost:8081


('localhost', 8081)