# Post-training Optimization Toolkit Basics Tutorial

Post-training optimization Toolkit (POT) is helping to optimize the model applying different optimization teqniques like low precision qunatization and sparsity. Please refer to the documention here: https://docs.openvinotoolkit.org/latest/_README.html In that tutorial we'll concentrate on INT8 quantization capabilities.

This notebook demonstrates basic capabilities of POT:
* POT configuration files structure
* How to run POT in simplified mode
* How to measure accuracy of FP32, INT8 models using POT config 
* How to create your own POT config
* How to properly benchmark the workload

## Step 0. Prerequisites.

To do a quantization you need pre-trained model in IR format and calibration dataset. Let's prepare both components in this step. In this tutorial, we'll use SimpLeNet - very simple model trained specially for sample purposes on cifar-10 dataset. 

### Step 0.1 Converting model to IR.

SampLeNet is distributed as a part of OpenVINO and used in AccuracyChecker sample.

In [28]:
!ls /opt/intel/openvino/deployment_tools/open_model_zoo/tools/accuracy_checker/data/test_models/

pytorch_model	       SampLeNet.caffemodel  samplenet-symbol.json
samplenet-0000.params  samplenet.onnx	     SampLeNet.xml
SampLeNet.bin	       samplenet.pb
SampLeNet.blob	       SampLeNet.prototxt


Here are SampleNets trained with different frameworks. Let's take Caffe one and convert it to IR.
BKM: To achieve proper accuracy always check what normalization was applied on the model training stage. Model Optimizert can apply mean and scale values if appropriate. This information can be obtained from the model training script.
In our case means and scales were applied to the model. 

In [21]:
!/opt/intel/openvino/deployment_tools/model_optimizer/mo.py \
--input_model /opt/intel/openvino/deployment_tools/open_model_zoo/tools/accuracy_checker/data/test_models/SampLeNet.caffemodel \
--output_dir IR \
--mean_values [125.307,122.961,113.8575] \
--scale_values [51.5865,50.847,51.255]

Model Optimizer arguments:
Common parameters:
	- Path to the Input Model: 	/opt/intel/openvino/deployment_tools/open_model_zoo/tools/accuracy_checker/data/test_models/samplenet.onnx
	- Path for generated IR: 	/home/u40686/My-Notebooks/POT_training_simplenet/IR
	- IR output name: 	samplenet
	- Log level: 	ERROR
	- Batch: 	Not specified, inherited from the model
	- Input layers: 	Not specified, inherited from the model
	- Output layers: 	Not specified, inherited from the model
	- Input shapes: 	Not specified, inherited from the model
	- Mean values: 	[125.307,122.961,113.8575]
	- Scale values: 	[51.5865,50.847,51.255]
	- Scale factor: 	Not specified
	- Precision of IR: 	FP32
	- Enable fusing: 	True
	- Enable grouped convolutions fusing: 	True
	- Move mean values to preprocess section: 	False
	- Reverse input channels: 	False
ONNX specific parameters:
Model Optimizer version: 	2020.2.0-60-g0bc66e26ff

[ SUCCESS ] Generated IR version 10 model.
[ SUCCESS ] XML file: /home/u40686/My-Noteboo

### Step 0.2. Getting the dataset.

The dataset is an essential part of quantization. It's needed to collect calibration statistics and measure accuracy using accuracy chcker tool. That's why big portion of information we'll need further is about the dataset on which model was trained
Let's download download and prepare that. 
In our example the model was trained on cifar10 which consists of 60000 32x32 colour images in 10 classes.

In [None]:
!wget http://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz

Unzip the data

In [None]:
!tar -xzf cifar-10-python.tar.gz

In [None]:
!ls cifar-10-batches-py

## Step 1: Getting familiar with POT. 

Lets check how POT is working and how to work with that.

In [None]:
!pot -h

As you can see, all "magic" is inside the config.json file, lets look into that. 
OpenVINO has POT config templates and config examples inside. 

In [None]:
!ls /opt/intel/openvino/deployment_tools/tools/post_training_optimization_toolkit/configs

Templates (template_accuracy_aware_quantization.json, template_default_quantization.json, template_tpe.json) contain all possible POT parameters with very detailed explanation. If you need no know meaning of certain parameter, this is a goos resource to learn. 
Examples provide accuracy_checker, qunatization and sparsity configs for several well-known public topologies. This is good material to getting started if you need to quantize models listed here or similar models. 

In [None]:
!ls /opt/intel/openvino/deployment_tools/tools/post_training_optimization_toolkit/configs/examples/

In [None]:
!ls /opt/intel/openvino/deployment_tools/tools/post_training_optimization_toolkit/configs/examples/accuracy_checker

In [None]:
!ls /opt/intel/openvino/deployment_tools/tools/post_training_optimization_toolkit/configs/examples/quantization

In [None]:
!ls /opt/intel/openvino/deployment_tools/tools/post_training_optimization_toolkit/configs/examples/quantization/classification

In [None]:
!ls /opt/intel/openvino/deployment_tools/tools/post_training_optimization_toolkit/configs/examples/sparsity

Let's take a look into the DefaultQuantization template

In [None]:
!cat /opt/intel/openvino/deployment_tools/tools/post_training_optimization_toolkit/configs/template_default_quantization.json

It contains 3 main sections: "model", "engine", "compression".

"model" is a simpliest, it keeps model name and path to IR.

"engine" includes information of how the model will be executed. There are 2 modes: "simplified" - to run basic scenarios to roughly estimate performance gain and doesn't requre dataset labels; and "accuracy checker" - allows to produce more accurate qinatized model, allowes to tune the image preprocessing, reading, etc., labeled dataset required.

"compression" section should have all needed optimization algorithm onformation. 

More detailed information is available at the README.md file.

In [None]:
!cat /opt/intel/openvino/deployment_tools/tools/post_training_optimization_toolkit/configs/README.md

Lets run POT in differenmt modes, compare them and prectice with POT configs creation.

## Step 2. Rough INT8 performance estimation (simplified mode). 

Let's imagine if we have the model and we're not satisfied with it's performance level. Low precision quantization is one of the optimization options. But not all models are well-qunatizable - sometimes the performance gain can be insignificant and use this approach is a waste of the time. It's really depends on the workload model/data etc.. You can quickly check whether it's worth to apply quantyzation technique or not using "simplified mode". Let's do that.

First of all, lets estimate the performance of the full pfecision (FP32) model using benchmark app. Benchmark app is a specual tool recommended for performance estimation. This is how to work with the tool:

In [None]:
!python3 /opt/intel/openvino/deployment_tools/tools/benchmark_tool/benchmark_app.py -h

In [33]:
!python3 /opt/intel/openvino/deployment_tools/tools/benchmark_tool/benchmark_app.py -m IR/SampLeNet.xml -i cifar-10-python/01_cat.png

[Step 1/11] Parsing and validating input arguments
[Step 2/11] Loading Inference Engine
[ INFO ] InferenceEngine:
         API version............. 2.1.42025
[ INFO ] Device info
         CPU
         MKLDNNPlugin............ version 2.1
         Build................... 42025

[Step 3/11] Reading the Intermediate Representation network
[ INFO ] Read network took 8.79 ms
[Step 4/11] Resizing network to match image sizes and given batch
[ INFO ] Network batch size: 1
[Step 5/11] Configuring input of the model
[Step 6/11] Setting device configuration
[Step 7/11] Loading the model to the device
[ INFO ] Load network took 46.60 ms
[Step 8/11] Setting optimal runtime parameters
[Step 9/11] Creating infer requests and filling input blobs with images
[ INFO ] Network input 'data' precision U8, dimensions (NCHW): 1 3 32 32
[ INFO ] Infer Request 0 filling
[ INFO ] Fill input 'data' with random values (image is expected)
[ INFO ] Infer Request 1 filling
[ INFO ] Fill input 'data' with random va

That's great throughput and latency numbers, but lets see can we improve it or not. So, we need to create POT config for that purpose. To use simplified mode we need specify "type": "simplified" and "data_source": "path/to/the/dataset" fields at the compression section. 

In [30]:
!cat SampLeNet_simplified.json

{
  "model": {
    "model_name": "SampLeNet_simplified",
    "model": "IR/SampLeNet.xml",
    "weights": "IR/SampLeNet.bin"
  },
  "engine": {
    "type": "simplified",
    "data_source": "cifar-10-python" 
  },
  "compression": {
    "target_device": "CPU",
    "algorithms": [
      {
        "name": "DefaultQuantization",
        "params": {
          "preset": "performance",
          "stat_subset_size": 300,
        }
      }
    ]
  }
}

Let's run quantization in simplified mode. Using "-d" option to simplify results stirage and further reuse of the models by benchmark app.

In [31]:
!pot -c SampLeNet_simplified.json -d

INFO:app.run:Output log dir: ./results
INFO:app.run:Creating pipeline:
 Algorithm: DefaultQuantization
 Parameters:
	preset                     : performance
	stat_subset_size           : 300
	target_device              : CPU
	exec_log_dir               : ./results
INFO:compression.statistics.collector:Start computing statistics for algorithms : DefaultQuantization
INFO:compression.statistics.collector:Computing statistics finished
INFO:compression.pipeline.pipeline:Start algorithm: DefaultQuantization
INFO:compression.statistics.collector:Start computing statistics for algorithms : ActivationChannelAlignment
INFO:compression.statistics.collector:Computing statistics finished
INFO:compression.statistics.collector:Start computing statistics for algorithms : MinMaxQuantization,FastBiasCorrection
INFO:compression.statistics.collector:Computing statistics finished
INFO:compression.pipeline.pipeline:Finished: DefaultQuantization


Rough estimation of how we can benefit from INT8:

In [34]:
!python3 /opt/intel/openvino/deployment_tools/tools/benchmark_tool/benchmark_app.py -m results/optimized/SampLeNet_simplified.xml -i cifar-10-python/01_cat.png

[Step 1/11] Parsing and validating input arguments
[Step 2/11] Loading Inference Engine
[ INFO ] InferenceEngine:
         API version............. 2.1.42025
[ INFO ] Device info
         CPU
         MKLDNNPlugin............ version 2.1
         Build................... 42025

[Step 3/11] Reading the Intermediate Representation network
[ INFO ] Read network took 14.69 ms
[Step 4/11] Resizing network to match image sizes and given batch
[ INFO ] Network batch size: 1
[Step 5/11] Configuring input of the model
[Step 6/11] Setting device configuration
[Step 7/11] Loading the model to the device
[ INFO ] Load network took 91.30 ms
[Step 8/11] Setting optimal runtime parameters
[Step 9/11] Creating infer requests and filling input blobs with images
[ INFO ] Network input 'data' precision U8, dimensions (NCHW): 1 3 32 32
[ INFO ] Infer Request 0 filling
[ INFO ] Fill input 'data' with random values (image is expected)
[ INFO ] Infer Request 1 filling
[ INFO ] Fill input 'data' with random v

## Step 3. Measuring accuracy of FP32 model. Accuracy Checker configuration. 

Looks like we can dramatically accelerate our workload. But what's about performance. If we're able to run the model with benchmark it's not guaranteed that the inference result is correct. To confirm that the model output is correct you can check visually/manually, but we're offering Accuracy Checker tool which estimates accuracy metrics of given model on given dataset. It can be used directly using accuracy_check alias or call it from the POT config. 

In [35]:
!accuracy_check -h

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])

  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])

  _np_qint16 = np.dtype([("qint16", np.int16, 1)])

  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])

  _np_qint32 = np.dtype([("qint32", np.int32, 1)])

  np_resource = np.dtype([("resource", np.ubyte, 1)])

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])

  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])

  _np_qint16 = np.dtype([("qint16", np.int16, 1)])

  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])

  _np_qint32 = np.dtype([("qint32", np.int32, 1)])

  np_resource = np.dtype([("resource", np.ubyte, 1)])

usage: accuracy_check [-h] [-d DEFINITIONS] -c CONFIG [-m MODELS [MODELS ...]]
                      [-s SOURCE] [-a ANNOTATIONS] [-e EXTENSIONS]
                      [--cpu_extensions_mode {avx512,avx2,sse4}]
                      [-b BITSTREAMS]
                      [--stored_predictions STORED_PREDICTIONS]
                      [-C CONVERTED_MODELS] [-M MODEL_OPT

It was how to use Accuracy Checker directly, it uses it's own .yaml configuration files. But to avoid producing too many configs, here we'll call it from "engine" section at the same POT config. Lets change it like this: 

In [25]:
!cat SampLeNet_FP32.json

{
  "model": {
    "model_name": "SampLeNet_FP32",
    "model": "IR/SampLeNet.xml",
    "weights": "IR/SampLeNet.bin"
  },
  "engine": {
    "launchers": [
      {
        "framework": "dlsdk",
        "device": "CPU",
        "adapter": "classification"
      }
    ],
    "datasets": [
      {
        "name": "classification_dataset",
        "data_source": "cifar-10-python",
        "annotation_conversion": {
          "converter": "cifar",
          "data_batch_file": "cifar-10-batches-py/test_batch",
          "convert_images": true,
          "converted_images_dir": "cifar-10-python",
          "num_classes": 10
        },
        "reader": "pillow_imread",
        "metrics": [
          {
            "name": "accuracy@top1",
            "type": "accuracy",
            "top_k": 1
          },
          {
            "name": "accuracy@top5",
            "type": "accuracy",
            "top_k": 5
          }
        ]
      }
    ]
  }
}

**Accuracy Checker configuration files have the following parts:**

* **Launchers** are the inference backend. It can be OpenVINO Inference Engine or fameworks like Tensorflow, Pytorch, ONNX Runtime, etc. So, choosong different launchers you can compare an accuracy of the model inferred using OpenVINO and by the framemowork it was trained. Here you can also specify inference device. At the example above we're using Inference Engine Launcher on CPU. Click [here](https://docs.openvinotoolkit.org/latest/_tools_accuracy_checker_README.html) to see full launchers list.

   **Please note that for quanization via POT OpenVINO Inference Engine is  only available inference backend.**

* **Adapters**. Adapter converts network infer output to metric specific format. [Here are available adapters](https://docs.openvinotoolkit.org/latest/_tools_accuracy_checker_accuracy_checker_adapters_README.html). We have simple classification model, so, our adapter is "classification". 

* **Annotation Converters**. Today there are thousants of datasets and eche has its own annotation format. AC uses it's own internal dataset annotation format. So, annotation converted convers the dataset annotation from its fromat to AC one. [Check supported datasets](https://docs.openvinotoolkit.org/latest/_tools_accuracy_checker_accuracy_checker_annotation_converters_README.html)
    At this example we're using cifar10 - lets specify that, and don't forget put the numper of classes.

* **Readers**. It's how images from the dataset will be read. Look for [implemented readerd](https://docs.openvinotoolkit.org/latest/_tools_accuracy_checker_accuracy_checker_data_readers_README.html)
We're going to read RGB images in .png format - several readers from the list works for us. OpenCV imread is the default one, we'll use because the same reader was used on the training stage - so, that makes our experimets more precise.

* [Preprocesors](https://docs.openvinotoolkit.org/latest/_tools_accuracy_checker_accuracy_checker_preprocessor_README.html) and [Postprocessors](https://docs.openvinotoolkit.org/latest/_tools_accuracy_checker_accuracy_checker_postprocessor_README.html) - addotional calibration datased pre- and post-processing can be added if appropriate (it that's done in model training). The most popular preprocessing is resize images to fit model input shape. In our case it is not required because images already have the same size like model input. We alredy added some normalization (scale and mean values) on Model Optimizer side, so, we don't need to put it here. If we forget to add "--mean" and -"--scale" at the MO command line, we can do it here. Only one preprocessing we should add is BGR to RGB conversion, because topology is trained on RGB images, but OpenCV reader (opencv_imread) reads in BGR.

* **Metrics**. It's the way of accuracy measuring. Different CV tasks like classification, detection, segmentation, etc has diferent approaches to measure accuracy. Ypu can choose among [these metrics](https://docs.openvinotoolkit.org/latest/_tools_accuracy_checker_accuracy_checker_metrics_README.html).
Our SampLeNet is a classification model, cifar is classification dataset, so, lets apply most popular classification metrics: `top1` and `top5`. 

To calculate accuracy of full precision model using POT and Accuracy Checker - leave empty "compression" section at the POT config and add "-e" (evaluation) parameter to the command line. 

In [39]:
!pot -c SampLeNet_FP32.json -e -d

INFO:app.run:Output log dir: ./results
INFO:app.run:Creating pipeline:
IE version: 2.1.42025
Loaded CPU plugin version:
    CPU - MKLDNNPlugin: 2.1.42025
INFO:compression.pipeline.pipeline:Evaluation of generated model
INFO:compression.engines.ac_engine:Start inference on the whole dataset
Total dataset size: 10000
1000 / 10000 processed in 2.036s
2000 / 10000 processed in 1.999s
3000 / 10000 processed in 2.038s
4000 / 10000 processed in 2.005s
5000 / 10000 processed in 2.041s
6000 / 10000 processed in 2.001s
7000 / 10000 processed in 2.037s
8000 / 10000 processed in 1.999s
9000 / 10000 processed in 2.030s
10000 / 10000 processed in 2.016s
10000 objects processed in 20.202 seconds
INFO:compression.engines.ac_engine:Inference finished
INFO:app.run:accuracy@top1              : 0.7502
INFO:app.run:accuracy@top5              : 0.9822


FP32 model accuracy level is exactly the same with Caffe output, so, our model is running correctly. Lets quantize it.

## Step 4. Running calibration algorithms. Evaluating it's accuracy and perfromance.

Currently (OpenVINO 2020.2) there are 2 "production quality" quantization algorithms: DefaultQauntization and AccuracyAwareQuantization. Lets read how do they work at the README file below: 

In [None]:
!cat /opt/intel/openvino/deployment_tools/tools/post_training_optimization_toolkit/compression/algorithms/quantization/README.md

### Step 4.1 Running DefaultQuantization

Let's apply DefaultQuantization, filing "compression" section as at the config below:

In [40]:
!cat SampLeNet_DefaultQuantization.json

{
  "model": {
    "model_name": "SampLeNet_DefaultQuantization",
    "model": "IR/SampLeNet.xml",
    "weights": "IR/SampLeNet.bin"
  },
  "engine": {
    "launchers": [
      {
        "framework": "dlsdk",
        "device": "CPU",
        "adapter": "classification"
      }
    ],
    "datasets": [
      {
        "name": "classification_dataset",
        "data_source": "cifar-10-python",
        "annotation_conversion": {
          "converter": "cifar",
          "data_batch_file": "cifar-10-batches-py/test_batch",
          "convert_images": true,
          "converted_images_dir": "cifar-10-python",
          "num_classes": 10
        },
        "reader": "opencv_imread",
        "preprocessing": [
            {
                "type": "bgr_to_rgb"
            }
        ],
        "metrics": [
          {
            "name": "accuracy@top1",
            "type": "accuracy",
            "top_k": 1
          },
          {
            "name": "accuracy@top5",
            "type": "acc

Running it, again using "-e" option to see accuracy results.

In [41]:
!pot -c SampLeNet_DefaultQuantization.json -e -d

INFO:app.run:Output log dir: ./results
INFO:app.run:Creating pipeline:
 Algorithm: DefaultQuantization
 Parameters:
	preset                     : performance
	stat_subset_size           : 300
	target_device              : CPU
	exec_log_dir               : ./results
IE version: 2.1.42025
Loaded CPU plugin version:
    CPU - MKLDNNPlugin: 2.1.42025
INFO:compression.statistics.collector:Start computing statistics for algorithms : DefaultQuantization
INFO:compression.statistics.collector:Computing statistics finished
INFO:compression.pipeline.pipeline:Start algorithm: DefaultQuantization
INFO:compression.statistics.collector:Start computing statistics for algorithms : ActivationChannelAlignment
INFO:compression.statistics.collector:Computing statistics finished
INFO:compression.statistics.collector:Start computing statistics for algorithms : MinMaxQuantization,FastBiasCorrection
INFO:compression.statistics.collector:Computing statistics finished
INFO:compression.pipeline.pipeline:Finished:

Accuracy doesn't deviate so much from FP32 model - we can stop here, but lets' check AccuracyAwareAlgorithm. Just change the algo name in compression section.

### Step 4.2 Running AccuracyAwareQuantization

In [19]:
!cat SampLeNet_AccuracyAwareQuantization.json

{
  "model": {
    "model_name": "SampLeNet_AccuracyAware",
    "model": "IR/SampLeNet.xml",
    "weights": "IR/SampLeNet.bin"
  },
  "engine": {
    "launchers": [
      {
        "framework": "dlsdk",
        "device": "CPU",
        "adapter": "classification"
      }
    ],
    "datasets": [
      {
        "name": "classification_dataset",
        "data_source": "cifar-10-python",
        "annotation_conversion": {
          "converter": "cifar",
          "data_batch_file": "cifar-10-batches-py/test_batch",
          "convert_images": true,
          "converted_images_dir": "cifar-10-python",
          "num_classes": 10
        },
        "reader": "pillow_imread",
        "metrics": [
          {
            "name": "accuracy@top1",
            "type": "accuracy",
            "top_k": 1
          },
          {
            "name": "accuracy@top5",
            "type": "accuracy",
            "top_k": 5
          }
        ]
      }
    ]
  },
  "compression": {
    "target_device

In [20]:
!pot -c SampLeNet_AccuracyAwareQuantization.json -e -d

INFO:app.run:Output log dir: ./results
INFO:app.run:Creating pipeline:
 Algorithm: AccuracyAwareQuantization
 Parameters:
	preset                     : performance
	stat_subset_size           : 300
	target_device              : CPU
	exec_log_dir               : ./results
IE version: 2.1.42025
Loaded CPU plugin version:
    CPU - MKLDNNPlugin: 2.1.42025
INFO:compression.statistics.collector:Start computing statistics for algorithms : AccuracyAwareQuantization
INFO:compression.statistics.collector:Computing statistics finished
INFO:compression.pipeline.pipeline:Start algorithm: AccuracyAwareQuantization
INFO:compression.algorithms.quantization.accuracy_aware.algorithm:Start original model inference
INFO:compression.engines.ac_engine:Start inference of 10000 images
Total dataset size: 10000
1000 / 10000 processed in 1.545s
2000 / 10000 processed in 1.504s
3000 / 10000 processed in 1.560s
4000 / 10000 processed in 1.492s
5000 / 10000 processed in 1.519s
6000 / 10000 processed in 1.473s
700

AccuracyAwareQuantization produced INT8 model with the same accuracy level with default one.

Now it's time to compare performance level of quantized model using Benchmark App:

In [42]:
!python3 /opt/intel/openvino/deployment_tools/tools/benchmark_tool/benchmark_app.py -m results/optimized/SampLeNet_DefaultQuantization.xml

[Step 1/11] Parsing and validating input arguments
[Step 2/11] Loading Inference Engine
[ INFO ] InferenceEngine:
         API version............. 2.1.42025
[ INFO ] Device info
         CPU
         MKLDNNPlugin............ version 2.1
         Build................... 42025

[Step 3/11] Reading the Intermediate Representation network
[ INFO ] Read network took 14.91 ms
[Step 4/11] Resizing network to match image sizes and given batch
[ INFO ] Network batch size: 1
[Step 5/11] Configuring input of the model
[Step 6/11] Setting device configuration
[Step 7/11] Loading the model to the device
[ INFO ] Load network took 89.70 ms
[Step 8/11] Setting optimal runtime parameters
[Step 9/11] Creating infer requests and filling input blobs with images
[ INFO ] Network input 'data' precision U8, dimensions (NCHW): 1 3 32 32
[ INFO ] Infer Request 0 filling
[ INFO ] Fill input 'data' with random values (image is expected)
[ INFO ] Infer Request 1 filling
[ INFO ] Fill input 'data' with random v

Looks like we have good performance gain!