## Automatic quantization for YOLOv8

This notebook demonstrates simple procedure for Ultralytics YOLOv8 quantization for OpenVINO.

Our quantization process consists of quantized model calibration, quantization thresholds adjustment and weight fine-tuning using distillation. Finally, we demonstrate inference of our quantized model using YOLOv8 and OpenVINO frameworks.

### Main chapters of this notebook:
1. Setup the environment
2. Prepare dataset and create dataloaders
3. Export YOLOv8 to ONNX
4. Quantize YOLOv8
5. Measure inference time using OpenVINO framework
6. Measure mAP

## Setup the environment

First, let's set up the environment and make some common imports.

1. Install `enot-autodl` package and create Jupyter kernel with it
2. Install `ultralytics` package with YOLOv8
3. Install `openvino` and `openvino-dev`

To install `enot-autodl` package follow the [installation guide](https://enot-autodl.rtd.enot.ai/en/latest/installation_guide.html).  
For p. 2-3 see commands below:

In [None]:
!pip install ultralytics==8.0.199
!pip install openvino==2023.2.0 openvino-dev==2023.2.0

In [None]:
# You may need to uncomment and change this variable to match free GPU index
# %env CUDA_VISIBLE_DEVICES=0

In [None]:
import shutil
import itertools
import numpy as np
from tqdm.auto import tqdm

import torch
from torch.optim import RAdam
from torch.optim.lr_scheduler import CosineAnnealingLR

# quantization procedure
from enot.quantization import distill
from enot.quantization import OpenVINOFakeQuantizedModel
from enot.quantization import calibrate
from enot.quantization import RMSELoss

# converters from onnx to pytorch
from onnx2torch import convert

# dataset creation functions
from ultralytics.utils import DEFAULT_CFG
from ultralytics.cfg import get_cfg
from ultralytics.data.utils import check_det_dataset
from ultralytics.data import build_dataloader, build_yolo_dataset

# function for loading yolo checkpoint
from ultralytics import YOLO

# openvino functions
from tutorial_utils.openvino import benchmark
from tutorial_utils.openvino import convert_model

In [None]:
QUANT_ONNX_PATH = './yolov8s_openvino_int8.onnx'

OV_FP32_NAME = "yolov8s_fp32"
OV_INT8_NAME = "yolov8s_int8"

OV_FP32_FULL_NAME = f"{OV_FP32_NAME}_openvino_model/{OV_FP32_NAME}.xml"
OV_INT8_FULL_NAME = f"{OV_INT8_NAME}_openvino_model/{OV_INT8_NAME}.xml"

BATCH_SIZE = 8
IMG_SIZE = 640
IMG_SHAPE = (BATCH_SIZE, 3, IMG_SIZE, IMG_SIZE)

## Prepare dataset and create dataloaders

We will use MS COCO128 dataset in this example.


`build_dataloader`, `check_det_dataset` and `build_yolo_dataset` functions prepare datasets for you in this example; specifically, it:
1. Downloads and unpacks dataset to `datasets/coco128` or to existing YOLOv8 data path if `ultralytics` package is installed
2. Creates and returns train and validation dataloaders

**IMPORTANT NOTE**: since this is example notebook we will train and validate model in **THE SAME DATASET**. For better performance and generalization use separate dataset for train and val procedure. 

In [None]:
cfg = get_cfg(DEFAULT_CFG, None)
cfg.data = 'coco128.yaml'

In [None]:
data = check_det_dataset(cfg.data)
if 'yaml_file' in data:
    cfg.data = data['yaml_file']

trainset, testset = data['train'], data.get('val') or data.get('test')

In [None]:
dataset = build_yolo_dataset(
    cfg=cfg,
    img_path=trainset,
    batch=BATCH_SIZE,
    data=data,
)

dataloader = build_dataloader(
    dataset=dataset,
    batch=BATCH_SIZE,
    workers=cfg.workers,
)

## Baseline YOLOv8 ONNX creation

Since the default YOLOv8 model contains conditional execution ('if' nodes), we have to save it to ONNX format and convert back to PyTorch to perform quantization.

In [None]:
model = YOLO(model='yolov8s')
onnx_path = model.export(format='onnx', dynamic=True, imgsz=IMG_SIZE)

In [None]:
regular_model = convert(onnx_path).cuda()
regular_model.eval();

## YOLOv8 Quantization

In [None]:
# Let's define function for converting dataset samples to model inputs.


def sample_to_model_inputs(x):
    # x[0] is the first item from dataloader sample. Sample is a tuple where 0'th element is a tensor with images.
    x = x['img']

    # Model is on CUDA, so input images should also be on CUDA.
    x = x.cuda()

    # Converting tensor from int8 to float data type.
    x = x.float()

    # YOLOv8 image normalization (0-255 to 0-1 normalization)
    x /= 255
    return x

In [None]:
# See for details: https://enot-autodl.rtd.enot.ai/en/stable/reference_documentation/quantization.html#enot.quantization.OpenVINOFakeQuantizedModel

fake_quantized_model = OpenVINOFakeQuantizedModel(regular_model).cuda()

In [None]:
# Calibrate quantization thresholds using 10 batches.

with torch.no_grad(), calibrate(fake_quantized_model):
    for batch in itertools.islice(dataloader, 10):
        batch = sample_to_model_inputs(batch)
        fake_quantized_model(batch)

In [None]:
# Distill model quantization thresholds and weights using RMSE loss.

n_epochs = 5
with distill(fq_model=fake_quantized_model, tune_weight_scale_factors=True) as (qdistill_model, params):
    optimizer = RAdam(params=params, lr=0.005, betas=(0.9, 0.95))
    scheduler = CosineAnnealingLR(optimizer=optimizer, T_max=len(dataloader) * n_epochs)
    distillation_criterion = RMSELoss()

    for _ in range(n_epochs):
        for batch in (tqdm_it := tqdm(dataloader)):
            batch = sample_to_model_inputs(batch)

            optimizer.zero_grad()
            loss: torch.Tensor = torch.tensor(0.0).cuda()
            for student_output, teacher_output in qdistill_model(batch):
                loss += distillation_criterion(student_output, teacher_output)

            loss.backward()
            optimizer.step()
            scheduler.step()

            tqdm_it.set_description(f'loss: {loss.item():.3f}')

In [None]:
fake_quantized_model.cuda()
fake_quantized_model.enable_quantization_mode(True)
fake_quantized_model.cpu()

torch.onnx.export(
    model=fake_quantized_model,
    args=torch.ones(*IMG_SHAPE),
    f=QUANT_ONNX_PATH,
    input_names=['images'],
    output_names=['output'],
    opset_version=13,
    dynamic_axes={'images': {0: 'batch_size'}},
)

In [None]:
torch.cuda.empty_cache()

## Measure models speed using OpenVINO framework

### OpenVINO FP32 

In [None]:
# Convert yolov8s.pt to OpenVINO model
yolov8s = YOLO('yolov8s.pt')
yolov8s.model.pt_path = OV_FP32_NAME + ".pt"
yolov8s.export(format='openvino', dynamic=True);

In [None]:
benchmark(OV_FP32_FULL_NAME, IMG_SHAPE)

### ENOT OpenVINO INT8

In [None]:
# Convert yolov8s_enot_int8.onnx to OpenVINO model
convert_model(QUANT_ONNX_PATH, OV_INT8_NAME)

# copy metadata for YOLO to understand classes and shapes
shutil.copy(OV_FP32_NAME + "_openvino_model/metadata.yaml", OV_INT8_NAME + "_openvino_model");

In [None]:
benchmark(OV_INT8_FULL_NAME, IMG_SHAPE)

## Measure mAP

### OpenVINO FP32 

In [None]:
# Make sure you have converted OpenVINO model
YOLO(OV_FP32_NAME + "_openvino_model", task='detect').val(data=cfg.data, imgsz=IMG_SIZE);

### OpenVINO INT8 

In [None]:
# Make sure you have converted OpenVINO model
YOLO(OV_INT8_NAME + "_openvino_model", task='detect').val(data=cfg.data, imgsz=IMG_SIZE);

---