This iPythonNotebook can be used to experiment with the Post Training Calibration based method of quantization, where the user will be able to introduce quantization to any network. Further, the user can use the quantized model for evaluations on the pytorch framework itself. 

In [1]:
!pip install netron

import torch
import torch.nn as nn
import edgeai_torchmodelopt
import copy
import netron
import torchvision
from tqdm import tqdm



  from .autonotebook import tqdm as notebook_tqdm


In [2]:
model = torchvision.models.resnet50(weights='DEFAULT')
example_input = torch.rand((1, 3, 224, 224))

y = model(example_input)
print("Output Shape is : {}".format(y.shape))

Output Shape is : torch.Size([1, 1000])


In [3]:
model_export_name = "./orig_simple_network_ptc.onnx"
torch.onnx.export(model, example_input, model_export_name)
netron.start(model_export_name, 8080)

Serving './orig_simple_network_ptq.onnx' at http://localhost:8080


('localhost', 8080)

Here we will be wrapping our model in the PTCFxModule which will be responsible for the calibration of the models and conversion to the final quantized network. It also enables bias calibration of the layers having a bias value, we can set a bias calibration factor (generally 0.01 works well) to enable it. Further, num_batch_norm_update_epochs and num_observer_update_epochs are used to define the epochs for which batch norm params and the observer are updated respectively. Each epoch is updated when a call to model.train() is done. Calibration is suggested to be performed in the training mode to utilise full functionality. Here, we do calibration for 3 epochs, and it can be done for very small examples from the distribution (generally 100 is good enough).   

In [4]:
model = edgeai_torchmodelopt.xmodelopt.quantization.v2.PTCFxModule(model, backend='qnnpack', bias_calibration_factor=0.01, num_batch_norm_update_epochs=1, num_observer_update_epochs=2)



Here is the Calibration Step for the network, where random data is used currently just for an example. **The data should be changed to your own dataset.**

In [5]:
num_calib_images = 10
num_epochs = 3
for epoch in range(num_epochs):
    model.train()
    for i in tqdm(range(num_calib_images)):
        output = model(torch.rand(1,3,224,224))

100%|██████████| 10/10 [00:01<00:00,  9.73it/s]


Freezing BN for subsequent epochs


100%|██████████| 10/10 [00:00<00:00, 10.24it/s]


Freezing ranges for subsequent epochs


100%|██████████| 10/10 [00:00<00:00, 12.09it/s]


We have the quantized and calibrated 8-bit network now.

In [6]:
print(model)

PTCFxModule(
  (module): GraphModule(
    (activation_post_process_0): AdaptiveActivationFakeQuantize(
      fake_quant_enabled=tensor([1], dtype=torch.uint8), observer_enabled=tensor([0], dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0039]), zero_point=tensor([0], dtype=torch.int32)
      (activation_post_process): CustomAdaptiveActivationObserverqscheme_torch_per_tensor_affine__range_shrink_percentile_0(min_val=1.8891554418587475e-06, max_val=0.9999961853027344)
    )
    (conv1): ConvReLU2d(
      (0): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3))
      (1): ReLU(inplace=True)
    )
    (activation_post_process_1): AdaptiveActivationFakeQuantize(
      fake_quant_enabled=tensor([1], dtype=torch.uint8), observer_enabled=tensor([0], dtype=torch.uint8), quant_min=0, quant_max=255, dtype=torch.quint8, qscheme=torch.per_tensor_affine, ch_axis=-1, scale=tensor([0.0843]), zero_point=tensor

In [7]:
model_export_name = "./converted_simple_network_ptc.onnx"
model.export(example_input, model_export_name)
netron.start(model_export_name, 8080)


Stopping http://localhost:8080
Serving './converted_simple_network_ptq.onnx' at http://localhost:8080


('localhost', 8080)

The netron might show the quantized fused operators as separate because the fake-quantized (Q-DQ) models are exported. 