# Tutorial: compress and run custom network

This brief tutorial shows how to compress a custom network with EfficientBioAI and do the inference.
- Model: naive 2d unet picked from:[github](https://github.com/milesial/Pytorch-UNet/blob/master/unet/unet_model.py)
- data: [Simulated nuclei of HL60 cells stained with Hoescht](http://celltrackingchallenge.net/2d-datasets/)
- Compression strategy: L2 Norm Prune and QAT int8 quantization

Since our package just focus on the compression part, and have no idea what is about the pre-processing of dataset and the logic of train/infer the data, users need to provide the following info for the compression:

- a calibration dataloader containing several images;
- the training api, which is used to do the fine-tuning of the compressed model; 
- the inference api, which is used to do the calibration during the quantization step.

After providing these logics, users can use our package to compress the model and do the inference.

In [1]:
import torch
from model.unet import Unet
from tqdm.contrib import tenumerate
from copy import deepcopy

  from .autonotebook import tqdm as notebook_tqdm


In [None]:
# We don't have the pretrained model, so need to train it from scratch:
!wget http://data.celltrackingchallenge.net/training-datasets/Fluo-N2DH-SIM+.zip -P ./data
!unzip ./data/Fluo-N2DH-SIM+.zip -d ./data
!rm ./data/Fluo-N2DH-SIM+.zip
!python train_unet.py --data_path "./data/Fluo-N2DH-SIM+/02" --gt_path "./data/Fluo-N2DH-SIM+/02_GT/SEG" --num_epoch 20

In [2]:
# Set the seed for reproducibility:
from monai.utils import set_determinism

seed_value = 2023
torch.manual_seed(seed_value)
torch.backends.cudnn.deterministic = True
set_determinism(seed=seed_value)

## 1. Compress the model

### 1.1 Load the model:

In [3]:
state_dict = torch.load("./unet.pth")
net = Unet(in_channels=1, classes=2)
net.load_state_dict(state_dict)

<All keys matched successfully>

### 1.2 Some logics required to be provided by users:

In [4]:
from functools import partial
from monai.data import DataLoader, Dataset
from custom import train, infer
from data import generate_data_dict, train_transform, test_transform
import yaml
import os
from pathlib import Path

# 1. train logic and infer logic:
fine_tune = partial(train, num_epoch=1)
calibrate = partial(infer, calib_num=4)

# 2. Iterable data, here is a dataloader, used for calibration and fine-tuning:
train_data_path = Path("./data/Fluo-N2DH-SIM+/02")
train_gt_path = Path("./data/Fluo-N2DH-SIM+/02_GT/SEG")
dataset = Dataset(
    data=generate_data_dict(train_data_path, train_gt_path), transform=train_transform
)
dataloader = DataLoader(dataset, batch_size=2, shuffle=True, num_workers=0)

2023-05-04 17:41:43,662 - Resource 'XMLSchema.xsd' is already loaded


150it [00:00, 327339.02it/s]


### 1.3 Compress the model:

In [21]:
from efficientbioai.compress_ppl import Pipeline
from efficientbioai.utils.misc import Dict2ObjParser

In [22]:
cfg_path = Path("./custom_config.yaml")
with open(cfg_path, "r") as stream:
    config_yml = yaml.safe_load(stream)
    config = Dict2ObjParser(config_yml).parse()

In [23]:
exp_path = Path("./exp/test_opv_int4")
Path.mkdir(exp_path, exist_ok=True)
pipeline = Pipeline.setup(config_yml)
pipeline(deepcopy(net), dataloader, fine_tune, calibrate, exp_path)
pipeline.network2ir()

2023-05-04 18:37:18 [EFFICIENTBIOAI] INFO: start to compress: quantize: False, prune: False, backend: openvino, model_name: academic
2023-05-04 18:37:18 [EFFICIENTBIOAI] INFO: Start quantization...
[MQBENCH] INFO: Quantize model Scheme: BackendType.OPENVINO Mode: Training
[MQBENCH] INFO: Weight Quant Scheme is overrided!
[MQBENCH] INFO: Activation Quant Scheme is overrided!
[MQBENCH] INFO: Weight Qconfig:
    FakeQuantize: LearnableFakeQuantize Params: {}
    Oberver:      EMAQuantileObserver Params: Symmetric: True / Bitwidth: 4 / Per channel: False / Pot scale: False / Extra kwargs: {}
[MQBENCH] INFO: Activation Qconfig:
    FakeQuantize: LearnableFakeQuantize Params: {}
    Oberver:      EMAQuantileObserver Params: Symmetric: True / Bitwidth: 4 / Per channel: False / Pot scale: False / Extra kwargs: {}
[MQBENCH] INFO: Replace module to qat module.
[MQBENCH] INFO: Now all weight quantizers will effectively use only 7 bits out of 8 bits. This resolves the overflow issue problem on AVX

  5%|▌         | 4/75 [00:09<02:54,  2.46s/it]

[MQBENCH] INFO: Disable observer and Enable quantize.
[MQBENCH] INFO: Merge BN for deploy.





[MQBENCH] INFO: Export to onnx.




[MQBENCH] INFO: Extract qparams for OPENVINO.
[MQBENCH] INFO: Finish deploy process.
2023-05-04 18:37:29 [EFFICIENTBIOAI] INFO: Quantization finished!




Check for a new version of Intel(R) Distribution of OpenVINO(TM) toolkit here https://software.intel.com/content/www/us/en/develop/tools/openvino-toolkit/download.html?cid=other&source=prod&campid=ww_2023_bu_IOTG_OpenVINO-2022-3&content=upg_all&medium=organic or on https://github.com/openvinotoolkit/openvino
[ INFO ] The model was converted to IR v11, the latest model format that corresponds to the source DL framework input/output format. While IR v11 is backwards compatible with OpenVINO Inference Engine API v1.0, please use API v2.0 (as of 2022.1) to take advantage of the latest improvements in IR v11.
Find more information about API v2.0 and IR v11 at https://docs.openvino.ai/latest/openvino_2_0_transition_guide.html
[ SUCCESS ] Generated IR version 11 model.
[ SUCCESS ] XML file: /home/ISAS.DE/yu.zhou/EfficientBioAI/tutorial/academic_deploy_model.xml
[ SUCCESS ] BIN file: /home/ISAS.DE/yu.zhou/EfficientBioAI/tutorial/academic_deploy_model.bin
2023-05-04 18:37:31 [EFFICIENTBIOAI] IN

# 2. Infer the model

We use the openvino inference engine to do the inference.

In [24]:
from efficientbioai.infer.backend.openvino import create_opv_model
from monai.inferers import sliding_window_inference

In [25]:
model_name = config.model.model_name
cfg_path = exp_path / f"{model_name}.yaml"
infer_path = exp_path / "academic_deploy_model.xml"

In [26]:
test_data_path = Path("./data/Fluo-N2DH-SIM+/01")
test_gt_path = Path("./data/Fluo-N2DH-SIM+/01_GT/SEG")
test_dataset = Dataset(
    data=generate_data_dict(test_data_path, test_gt_path), transform=test_transform
)
test_dataloader = DataLoader(test_dataset, batch_size=1, shuffle=False, num_workers=0)

65it [00:00, 75187.47it/s]


inference with the quantized model.

In [11]:
quantized_model = create_opv_model(infer_path)

In [12]:
for i, batch_data in tenumerate(test_dataloader):
    data, label = batch_data["img"], batch_data["seg"]
    sliding_window_inference(
        inputs=data,
        predictor=quantized_model,
        device=torch.device("cpu"),
        roi_size=(128, 128),
        sw_batch_size=1,
        overlap=0.1,
        mode="constant",
    )

100%|██████████| 65/65 [00:16<00:00,  3.93it/s]


inference with the normal model (float32, not on the engine).

In [13]:
normal_model = net
normal_model.eval()
for i, batch_data in tenumerate(test_dataloader):
    data, label = batch_data["img"], batch_data["seg"]
    sliding_window_inference(
        inputs=data,
        predictor=normal_model,
        device=torch.device("cpu"),
        roi_size=(128, 128),
        sw_batch_size=1,
        overlap=0.1,
        mode="constant",
    )

100%|██████████| 65/65 [01:05<00:00,  1.00s/it]


Through compression, the inference speed is improved by 4x.

Test latency and throughput using benchmark_app

In [19]:
!benchmark_app -m "exp/test_opv_int8/academic_deploy_model.xml" -d CPU -api async -t 15 -shape [-1,1,128,128] -data_shape [4,1,128,128]

[Step 1/11] Parsing and validating input arguments
[ INFO ] Parsing input parameters
[Step 2/11] Loading OpenVINO Runtime
[ INFO ] OpenVINO:
[ INFO ] Build ................................. 2022.3.0-9052-9752fafe8eb-releases/2022/3
[ INFO ] 
[ INFO ] Device info:
[ INFO ] CPU
[ INFO ] Build ................................. 2022.3.0-9052-9752fafe8eb-releases/2022/3
[ INFO ] 
[ INFO ] 
[Step 3/11] Setting device configuration
[Step 4/11] Reading model files
[ INFO ] Loading model files
[ INFO ] Read model took 27.69 ms
[ INFO ] Original model I/O parameters:
[ INFO ] Model inputs:
[ INFO ]     input (node: input) : f32 / [...] / [?,1,128,128]
[ INFO ] Model outputs:
[ INFO ]     output (node: output) : f32 / [...] / [?,2,128,128]
[Step 5/11] Resizing model to match image sizes and given batch
[ INFO ] Model batch size: ?
[ INFO ] Reshaping model: 'input': [?,1,128,128]
[ INFO ] Reshape model took 1.45 ms
[Step 6/11] Configuring input of the model
[ INFO ] Model inputs:
[ INFO ]     inpu

In [20]:
!benchmark_app -m "exp/test_opv_fp32/academic_deploy_model.xml" -d CPU -api async -t 15 -shape [-1,1,128,128] -data_shape [4,1,128,128]

[Step 1/11] Parsing and validating input arguments
[ INFO ] Parsing input parameters
[Step 2/11] Loading OpenVINO Runtime
[ INFO ] OpenVINO:
[ INFO ] Build ................................. 2022.3.0-9052-9752fafe8eb-releases/2022/3
[ INFO ] 
[ INFO ] Device info:
[ INFO ] CPU
[ INFO ] Build ................................. 2022.3.0-9052-9752fafe8eb-releases/2022/3
[ INFO ] 
[ INFO ] 
[Step 3/11] Setting device configuration
[Step 4/11] Reading model files
[ INFO ] Loading model files
[ INFO ] Read model took 19.15 ms
[ INFO ] Original model I/O parameters:
[ INFO ] Model inputs:
[ INFO ]     input (node: input) : f32 / [...] / [?,1,128,128]
[ INFO ] Model outputs:
[ INFO ]     output (node: output) : f32 / [...] / [?,2,128,128]
[Step 5/11] Resizing model to match image sizes and given batch
[ INFO ] Model batch size: ?
[ INFO ] Reshaping model: 'input': [?,1,128,128]
[ INFO ] Reshape model took 0.61 ms
[Step 6/11] Configuring input of the model
[ INFO ] Model inputs:
[ INFO ]     inpu

In [27]:
!benchmark_app -m "exp/test_opv_int4/academic_deploy_model.xml" -d CPU -api async -t 15 -shape [-1,1,128,128] -data_shape [4,1,128,128]

[Step 1/11] Parsing and validating input arguments
[ INFO ] Parsing input parameters
[Step 2/11] Loading OpenVINO Runtime
[ INFO ] OpenVINO:
[ INFO ] Build ................................. 2022.3.0-9052-9752fafe8eb-releases/2022/3
[ INFO ] 
[ INFO ] Device info:
[ INFO ] CPU
[ INFO ] Build ................................. 2022.3.0-9052-9752fafe8eb-releases/2022/3
[ INFO ] 
[ INFO ] 
[Step 3/11] Setting device configuration
[Step 4/11] Reading model files
[ INFO ] Loading model files
[ INFO ] Read model took 13.50 ms
[ INFO ] Original model I/O parameters:
[ INFO ] Model inputs:
[ INFO ]     input (node: input) : f32 / [...] / [?,1,128,128]
[ INFO ] Model outputs:
[ INFO ]     output (node: output) : f32 / [...] / [?,2,128,128]
[Step 5/11] Resizing model to match image sizes and given batch
[ INFO ] Model batch size: ?
[ INFO ] Reshaping model: 'input': [?,1,128,128]
[ INFO ] Reshape model took 1.02 ms
[Step 6/11] Configuring input of the model
[ INFO ] Model inputs:
[ INFO ]     inpu