# Tutorial: quantize and run custom network

This brief tutorial shows how to compress a custom network with EfficientBioAI and do the inference.
- Model: naive 2d unet picked from:[github](https://github.com/milesial/Pytorch-UNet/blob/master/unet/unet_model.py)
- data: [Simulated nuclei of HL60 cells stained with Hoescht](http://celltrackingchallenge.net/2d-datasets/)
- Compression strategy: L1 Norm Prune and PTQ int8 quantization

Since our package just focus on the compression part, and have no idea what is about the pre-processing of dataset and the logic of train/infer the data, users need to provide the following info for the compression:

- a calibration dataloader containing several images;
- the training api, which is used to do the fine-tuning of the compressed model; 
- the inference api, which is used to do the calibration during the quantization step.

After providing these logics, users can use our package to compress the model and do the inference.

In [1]:
import torch
from model.unet import Unet
from tqdm.contrib import tenumerate

  from .autonotebook import tqdm as notebook_tqdm


In [None]:
# We don't have the pretrained model, so need to train it from scratch:
!wget http://data.celltrackingchallenge.net/training-datasets/Fluo-N2DH-SIM+.zip -P ./data
!unzip ./data/Fluo-N2DH-SIM+.zip -d ./data
!rm ./data/Fluo-N2DH-SIM+.zip
!python train_unet.py --data_path "./data/Fluo-N2DH-SIM+/02" --gt_path "./data/Fluo-N2DH-SIM+/02_GT/SEG" --num_epoch 20

In [2]:
# Set the seed for reproducibility:
from monai.utils import set_determinism

seed_value = 2023
torch.manual_seed(seed_value)
torch.backends.cudnn.deterministic = True
set_determinism(seed=seed_value)

## 1. Compress the model

### 1.1 Load the model:

In [3]:
state_dict = torch.load("./unet.pth")
net = Unet(in_channels=1, classes=2)
net.load_state_dict(state_dict)

<All keys matched successfully>

### 1.2 Some logics required to be provided by users:

In [4]:
from functools import partial
from monai.data import DataLoader, Dataset
from custom import train, infer
from data import generate_data_dict, train_transform, test_transform
import yaml
import os
from pathlib import Path

# 1. train logic and infer logic:
fine_tune = partial(train, num_epoch=2)
calibrate = partial(infer, calib_num=4)

# 2. Iterable data, here is a dataloader, used for calibration and fine-tuning:
train_data_path = Path("./data/Fluo-N2DH-SIM+/02")
train_gt_path = Path("./data/Fluo-N2DH-SIM+/02_GT/SEG")
dataset = Dataset(
    data=generate_data_dict(train_data_path, train_gt_path), transform=train_transform
)
dataloader = DataLoader(dataset, batch_size=2, shuffle=True, num_workers=0)

2023-04-27 15:32:53,006 - Resource 'XMLSchema.xsd' is already loaded


150it [00:00, 124091.83it/s]


### 1.3 Compress the model:

In [5]:
from efficientbioai.compress_ppl import Pipeline
from efficientbioai.utils import Dict2ObjParser



In [6]:
cfg_path = Path("./custom_config.yaml")
with open(cfg_path, "r") as stream:
    config_yml = yaml.safe_load(stream)
    config = Dict2ObjParser(config_yml).parse()

In [None]:
exp_path = Path("./exp")
Path.mkdir(exp_path, exist_ok=True)
pipeline = Pipeline.setup(config_yml)
pipeline(net, dataloader, fine_tune, calibrate, exp_path)
pipeline.network2ir()

# 2. Infer the model

We use the openvino inference engine to do the inference.

In [8]:
from efficientbioai.infer.backend.openvino import create_opv_model
from monai.inferers import sliding_window_inference

In [9]:
model_name = config.model.model_name
cfg_path = exp_path / f"{model_name}.yaml"
infer_path = exp_path / "academic_deploy_model.xml"

In [10]:
test_data_path = Path("./data/Fluo-N2DH-SIM+/01")
test_gt_path = Path("./data/Fluo-N2DH-SIM+/01_GT/SEG")
test_dataset = Dataset(
    data=generate_data_dict(test_data_path, test_gt_path), transform=test_transform
)
test_dataloader = DataLoader(test_dataset, batch_size=1, shuffle=False, num_workers=0)

65it [00:00, 77188.49it/s]


inference with the quantized model.

In [11]:
quantized_model = create_opv_model(infer_path)

In [12]:
for i, batch_data in tenumerate(test_dataloader):
    data, label = batch_data["img"], batch_data["seg"]
    sliding_window_inference(
        inputs=data,
        predictor=quantized_model,
        device=torch.device("cpu"),
        roi_size=(128, 128),
        sw_batch_size=1,
        overlap=0.1,
        mode="constant",
    )

100%|██████████| 65/65 [00:16<00:00,  3.96it/s]


inference with the normal model (float32, not on the engine).

In [28]:
normal_model = net
normal_model.eval()
for i, batch_data in tenumerate(test_dataloader):
    data, label = batch_data["img"], batch_data["seg"]
    sliding_window_inference(
        inputs=data,
        predictor=normal_model,
        device=torch.device("cpu"),
        roi_size=(128, 128),
        sw_batch_size=1,
        overlap=0.1,
        mode="constant",
    )

100%|██████████| 65/65 [01:05<00:00,  1.00s/it]


Through compression, the inference speed is improved by 4x.