# Post-Training Quantization + Conversion to IMX500 of a MobileNetV2 PyTorch Model


## Overview
This tutorial demonstrates how to apply Post Training Quantization to a PyTorch pretrained model using the [**Model Compression Toolkit (MCT)**](https://github.com/sony/model_optimization) and how to convert the resulting model to a binary format suitable to load to IMX500 using the [**IMX500-converter**](https://developer.aitrios.sony-semicon.com/en/raspberrypi-ai-camera/documentation/imx500-converter?version=3.14.3&progLang=) . 

This example is not intended to demonstrate evaluating MCT PTQ performance and as such intentionally uses generated random data  to speed up the process.
 
For the full tutorial on MCT's PTQ  see - [*MCT PTQ PyTorch Tutorial*](https://github.com/sony/model_optimization/blob/main/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_post_training_quantization.ipynb)

For tutorials on other quantization features of MCT see [*MCT Features Tutorials*](https://github.com/sony/model_optimization/blob/main/tutorials/notebooks/mct_features_notebooks/README.md)

## Summary
In this tutorial we cover the following steps:

1. Post-Training Quantization using MCT.
2. Converting the model to a IMX500 suitable representation using IMX500-Converter

## Setup
Install the relevant packages:

In [None]:
from importlib import util

if not util.find_spec('edge_mdt') or not util.find_spec("uni.pytorch"):
    print(f"Installing edge-mdt")
    !pip install edge-mdt[pt]

if not util.find_spec('torch') or not util.find_spec("torchvision"):
    !pip install -q torch torchvision

In [None]:
!pip install -q onnx
import torch
from torchvision.models import mobilenet_v2, MobileNet_V2_Weights

Load a pre-trained MobileNetV2 model from torchvision, in 32-bits floating-point precision format.

In [None]:
weights = MobileNet_V2_Weights.IMAGENET1K_V2

float_model = mobilenet_v2(weights=weights)

## Dataset preparation


## Representative Dataset
We're all set to use MCT's post-training quantization. To begin, we'll define a representative dataset generator. Please note that for demonstration purposes, we will generate random data of the desired image shape instead of using real images. Then, we will apply PTQ on our model using the dataset generator we have created. For more details on using MCT, refer to the MCT tutorials

In [None]:
from typing import Iterator, List
 
NUM_ITERS = 20
BATCH_SIZE = 32
def get_representative_dataset(n_iter: int):
    """
    This function creates a representative dataset generator. The generator yields numpy
        arrays of batches of shape: [Batch, C, H, W].
    Args:
        n_iter: number of iterations for MCT to calibrate on
    Returns:
        A representative dataset generator
    """
    def representative_dataset() -> Iterator[List]:
        for _ in range(n_iter):
            yield [torch.rand(BATCH_SIZE, 3, 224, 224)]
    return representative_dataset
representative_data_generator = get_representative_dataset(n_iter=NUM_ITERS)

## Target Platform Capabilities (TPC)
In addition, MCT optimizes the model for dedicated hardware platforms. This is done using TPC (for more details, please visit our [documentation](https://github.com/SonySemiconductorSolutions/aitrios-edge-mdt-tpc)).

In [None]:
from edgemdt_tpc import get_target_platform_capabilities
import model_compression_toolkit as mct

# Get a TPC object representing the imx500 hardware and use it for PyTorch model quantization in MCT
tpc = get_target_platform_capabilities(tpc_version='1.0', device_type='imx500')

## Post-Training Quantization using MCT
Now for the exciting part! Let’s run PTQ on the model. 

In [None]:
quantized_model, quantization_info = mct.ptq.pytorch_post_training_quantization(
        in_module=float_model,
        representative_data_gen=representative_data_generator,
        target_platform_capabilities=tpc
)

Our model is now quantized. MCT has created a simulated quantized model within the original PyTorch framework by inserting [quantization representation modules](https://github.com/sony/mct_quantizers). These modules, such as `PytorchQuantizationWrapper` and `PytorchActivationQuantizationHolder`, wrap PyTorch layers to simulate the quantization of weights and activations, respectively. While the size of the saved model remains unchanged, all the quantization parameters are stored within these modules and are ready for deployment on the target hardware. In this example, we used the default MCT settings, which compressed the model from 32 bits to 8 bits, resulting in a compression ratio of 4x. 

## Model Conversion

### Exporting to ONNX serialization 
In order to convert our model to an binary suitable to load to IMX500, we first need to serialize it to ONNX format. Please ensure that the `save_model_path` has been set correctly.

In [None]:
import os
import model_compression_toolkit as mct
save_folder = './mobilenet_pt'
os.makedirs(save_folder, exist_ok=True)
onnx_path = os.path.join(save_folder, 'qmodel.onnx')
mct.exporter.pytorch_export_model(quantized_model, save_model_path=onnx_path, repr_dataset=representative_data_generator)

before proceeding to convert the model we need to make sure java 17 or up is installed. for colab you can use this dist

In [None]:
!sudo apt install -y openjdk-17-jre

### Running the IMX500 Converter
Now, we can convert the model to create the PackerOut which can be loaded to IMX500

In [None]:
import subprocess
import sys
cmd = ["imxconv-pt", "-i", onnx_path,  "-o", save_folder, "--overwrite-output"]

env_bin_path = os.path.dirname(sys.executable)
os.environ["PATH"] = f"{env_bin_path}:{os.environ['PATH']}"
env = os.environ.copy()

subprocess.run(cmd, env=env, check=True)

## Conclusion

In this tutorial, we demonstrated how to quantize a pre-trained model using MCT then convert it to a binary suitable for IMX500 execution, all with a few lines of code. for full documentation of the IMX500 converter see [here](https://developer.aitrios.sony-semicon.com/en/raspberrypi-ai-camera/documentation/imx500-converter?version=3.14.3&progLang=).

## Copyrights

Copyright 2025 Sony Semiconductor Israel, Inc. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
