## Automatic quantization and optimized inference for YOLO-v5 with enot-lite backend

This notebook demonstrates simple procedure for Ultralytics Yolo-v5 quantization.

Our quantization process consists of quantized model calibration, quantization thresholds adjustment and weight fine-tuning using distillation. Finally, we demonstrate inference of our quantized model using [enot-lite](https://enot-lite.rtd.enot.ai/en/stable/) framework.

### Main chapters of this notebook:
1. Setup the environment
1. Prepare dataset and create dataloaders
1. Baseline Yolo-v5 onnx creation
1. Quantize Yolo-v5
1. Measure speed of default YOLO inferenced via default pytorch and quantized YOLO inferenced via enot-lite with TensorRT int8 backend.
1. Measure mAP for float and quantized versions

Before running this example make sure that TensorRT supports your GPU for int8 inference  (``cuda compute capability`` > 6.1, as described [here](https://docs.nvidia.com/deeplearning/tensorrt/support-matrix/index.html#hardware-precision-matrix)).

## Setup the environment

First, let's set up the environment and make some common imports.

In [None]:
%env CUDA_VISIBLE_DEVICES=1

1. Install enot-autodl and enot-lite libraries and create jupyter kernel with them.
2. Clone specific commit from YOLOv5 repository: https://github.com/ultralytics/yolov5/commit/f76a78e7078185ecdc67470d8658103cf2067c81
3. Replace the val.py script with our val.py
4. Replace path to COCO dataset folder in 'yolov5/data/coco.yaml' file. If you do not have pre-downloaded MS COCO dataset - you can leave it as is and the dataset will be automatically downloaded.

Steps 2 and 3 will be done with these commands:

In [None]:
! git clone https://github.com/ultralytics/yolov5
! cd yolov5/ && git checkout f76a78e7078185ecdc67470d8658103cf2067c81
! cp tutorial_utils/val.py yolov5/val.py

In [None]:
import sys
sys.path.append('yolov5/')

import numpy as np

import torch
from pathlib import Path

from itertools import islice

# quantization procedure
from enot.quantization import TrtFakeQuantizedModel
from enot.quantization import DefaultQuantizationDistiller

# optimized inference
from enot_lite.benchmark import Benchmark
from enot_lite.type import BackendType
from enot_lite.type import ModelType

# converters from onnx to pytorch
from onnx2torch import convert
from onnx2torch.utils.custom_export_to_onnx import OnnxToTorchModuleWithCustomExport

# dataset creation functions
from yolov5.utils.dataloaders import create_dataloader
from yolov5.utils.general import check_dataset

# onnx conversion function
from yolov5.export import export_onnx

# common validation function for Ultralytics YOLO models
from yolov5.val import run

### In the following cell we setup all necessary contants

* `ENOT_HOME_DIR` - ENOT framework home directory
* `PROJECT_DIR` - project directory to save training logs, checkpoints, ...
* `ONNX_MODEL_PATH` - onnx model path

In [None]:
ENOT_HOME_DIR = Path.home() / '.enot'
ENOT_DATASETS_DIR = ENOT_HOME_DIR / 'datasets/coco_for_yolo'
PROJECT_DIR = ENOT_HOME_DIR / 'enot-lite_quantization'
QUANT_ONNX_PATH = PROJECT_DIR / 'yolov5s_trt_int8.onnx'
ONNX_PATH = PROJECT_DIR / 'yolov5s_fp32.onnx'

ENOT_HOME_DIR.mkdir(exist_ok=True)
PROJECT_DIR.mkdir(exist_ok=True)

BATCH_SIZE = 8
IMG_SIZE = 640
IMG_SHAPE = (BATCH_SIZE, 3, IMG_SIZE, IMG_SIZE)

## Prepare dataset and create dataloaders

We will use MS COCO dataset in this example.


`create_dataloader` and `check_dataset` functions prepare datasets for you in this example; specifically, it:
1. downloads and unpacks dataset into folder pointed out in `yolov5/data/coco.yaml`;
1. creates and returns train and validation dataloaders.

**IMPORTANT NOTE**: since this is example notebook we will train and validate model in **THE SAME DATASET**. For better performance and generalization use separate dataset for train and val procedure. 


In [None]:
import yaml

In [None]:
with open('yolov5/data/coco.yaml', 'r') as f:
    coco_cfg = yaml.load(f, yaml.Loader)

coco_cfg['path'] = ENOT_DATASETS_DIR.as_posix()

with open('yolov5/data/coco.yaml', 'w') as f:
    yaml.dump(coco_cfg, f)

data = check_dataset('yolov5/data/coco.yaml', autodownload=False)

valid_dataloader = create_dataloader(data["val"], IMG_SIZE, BATCH_SIZE, 32, False, pad=0.5, rect=False)[0]

## Baseline YOLO-v5 onnx creation

In [None]:
# Since the default YOLOv5 model contains conditional execution ('if' nodes), we have to save
# it to ONNX format and convert back to PyTorch to perform quantization.


yolov5s = torch.hub.load(
    'ultralytics/yolov5', 
    'yolov5s', 
    autoshape=False,
)

In [None]:
# You can also get ONNX YOLOv5 model by using Utlralytic's default export script:
# `python export.py --weights yolov5.pt --include onnx --dynamic`
# We simply export our model to ONNX with the function defined in this cell.


export_onnx(
    yolov5s, 
    torch.zeros(*IMG_SHAPE, dtype=torch.float32), 
    Path(ONNX_PATH), 
    opset=13, 
    train=False, 
    dynamic=True, 
    simplify=True,
)

regular_model = convert(ONNX_PATH).cuda()
regular_model.eval();

## Quantization YOLO-v5

In [None]:
# Let's define function for converting dataset samples to model inputs.
# This is required since we have to pass samples into any network in an unified manner.
# This function may also perform different manipulations with images as done below.
# For complete documentation of such conversion functions, see 
# https://enot-autodl.rtd.enot.ai/en/latest/reference_documentation/dataloader2model.html


def sample_to_model_inputs(x):
    # x[0] is the first item from dataloader sample. Sample is a tuple where 0'th element is a tensor with images.
    x = x[0]
    
    # Model is on CUDA, so input images should also be on CUDA.
    x = x.cuda()  
    
    # Converting tensor from int8 to float data type.
    x = x.float()
    
    # YOLOv5 image normalization (0-255 to 0-1 normalization)
    x /= 255  
    return (x, ), {}

In [None]:
fake_quantized_model = TrtFakeQuantizedModel(regular_model).cuda()

# Distill model quantization thresholds and weights using RMSE loss.
# Note that we are using **valid_dataloader** for fast calculation. 
# For real purpose you have to use your train data, at least some part of it.

dist = DefaultQuantizationDistiller(
    quantized_model=fake_quantized_model,
    dataloader=valid_dataloader,
    sample_to_model_inputs=sample_to_model_inputs,
    device='cuda',
    logdir=PROJECT_DIR,
    verbose=2,
)

dist.distill()

In [None]:
fake_quantized_model.cuda()
fake_quantized_model.enable_quantization_mode(True)
fake_quantized_model.cpu()

torch.onnx.export(
    model=fake_quantized_model,
    args=torch.ones(*IMG_SHAPE),
    f=QUANT_ONNX_PATH,
    input_names=['images'],
    output_names=['output'],
    opset_version=13,
    dynamic_axes={
        'images': {0 : 'batch_size'}
    },
)

## Speed measurement

In [None]:
torch.cuda.empty_cache()

In [None]:
BATCH_SIZE = 8

torch_input = torch.ones((BATCH_SIZE, 3, IMG_SIZE, IMG_SIZE), dtype=torch.float32).cpu()
onnx_input = {
    'images': np.ones((BATCH_SIZE, 3, IMG_SIZE, IMG_SIZE), dtype=np.float32)
}

benchmark = Benchmark(
    batch_size=BATCH_SIZE,
    torch_model=yolov5s,
    torch_input=torch_input,
    backends=[
        BackendType.TORCH_CUDA, 
        (BackendType.AUTO_GPU, ModelType.YOLO_V5),
    ],
    onnx_model=QUANT_ONNX_PATH,
    onnx_input=onnx_input,
)
benchmark.run()
benchmark.print_results()

In [None]:
torch.cuda.empty_cache()

## mAP evaluation

In [None]:
opt = {
    'data':'yolov5/data/coco.yaml',
    'weights':'yolov5s.pt',
    'half': True,
    'batch_size': 8,
}

In [None]:
run(**opt);

In [None]:
torch.cuda.empty_cache()

In [None]:
opt['use_enot'] = True
opt['enot_weights'] = QUANT_ONNX_PATH
opt['half'] = False
opt['device'] = 'cpu'
opt['batch_size'] = 8

In [None]:
run(**opt);