# 🚀 Quantize classification models for tiny edge deployment 

## Get Pretrained Model

Focoos offers three pretrained classification models in different sizes:

- fai-cls-n-coco  (nano, optimized for Arduino Nicla Vision) 
- fai-cls-s-coco  (small)
- fai-cls-m-coco  (medium)

all models are trained on coco dataset at 224px resolution.

Choose the model size that best fits your accuracy and efficiency needs.

In [None]:
from pprint import pprint

from focoos import ModelManager

model_name = "fai-cls-n-coco"  # you can also take model from focoos hub with "hub://YOUR_MODEL_REF"

model = ModelManager.get(model_name)
pprint(model.model_info)

## Export as optimized ONNX for edge deployment

For edge deployment, we need to export model to more portable runtime, like onnxruntime.

In [None]:
import os

from PIL import Image

from focoos import ASSETS_DIR, MODELS_DIR, RuntimeType

image_size = 96  # 96px input size

exported_model = model.export(
    runtime_type=RuntimeType.ONNX_CPU,  # optimized for edge or cpu
    image_size=image_size,
    dynamic_axes=False,  # quantization need static axes!
    simplify_onnx=True,  # simplify and optimize onnx model graph
    onnx_opset=18,
    out_dir=os.path.join(MODELS_DIR, "my_edge_model"),
)  # save to models dir

# benchmark onnx model
exported_model.benchmark(iterations=100)

# test onnx model
im = ASSETS_DIR / "federer.jpg"
result = exported_model.infer(im, annotate=True)
Image.fromarray(result.image)

## Quantize exported model to int8 (or uint8)

In [None]:
from focoos.infer.quantizer import OnnxQuantizer, QuantizationCfg

quantization_cfg = QuantizationCfg(
    size=image_size,  # input size: must be same as exported model
    calibration_images_folder=str(ASSETS_DIR),  # Calibration images folder: It is strongly recommended
    # to use the dataset validation split on which the model was trained.
    # Here, for example, we will use the assets folder.
    format="QO",  # QO (QOperator): All the quantized operators have their own ONNX definitions, like QLinearConv, MatMulInteger etc.
    # QDQ (Quantize-DeQuantize): inserts DeQuantizeLinear(QuantizeLinear(tensor)) between the original operators to simulate the quantization and dequantization process.
    per_channel=False,  # Per-channel quantization: each channel has its own scale/zero-point → more accurate,
    # especially for convolutions, at the cost of extra memory and computation.
    normalize_images=True,  # normalize images during preprocessing: some models have normalization outside of model forward
)

quantizer = OnnxQuantizer(input_model_path=exported_model.model_path, cfg=quantization_cfg)
model_path = quantizer.quantize(
    benchmark=True  # benchmark bot fp32 and int8 models
)

## Inference with quantized model on cpu

In [None]:
from focoos import InferModel

quantized_model = InferModel(model_path, runtime_type=RuntimeType.ONNX_CPU)

res = quantized_model.infer(im, annotate=True)
Image.fromarray(res.image)