[![Open notebook in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://github.com/afondiel/computer-science-notebook/tree/master/core/systems/edge-computing/edge-ai/lab/examples/deploy-with-qualcomm/notebooks/deploy_with_AIMET_quantization.ipynb)

## Deploy with Qualcomm AI Model Efficiency Toolkit (AIMET): **Quantization**

![](https://github.com/quic/aimet/blob/develop/Docs/images/how-it-works.png?raw=true)

In [None]:
# prompt: create the code to install torch ecosystems lib
!pip install torch torchvision torchaudio aimet_torch

Collecting aimet_torch
  Downloading aimet_torch-2.3.0-py38-none-any.whl.metadata (11 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch)
  Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.2.1.3 (from torch)
  Downloading nvidia_cufft_cu12-11.2.1.3-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Coll

In [None]:
import torch
from aimet_torch.quantsim import QuantizationSimModel
from torchvision.models import resnet18

In [None]:
# Step 1: Load a pretrained model
model = resnet18(pretrained=True).to(torch.device('cuda'))

# Step 2: Create a Quantization Simulation Model
sim = QuantizationSimModel(
    model=model,
    default_output_bw=8,  # Output bit-width
    default_param_bw=8,   # Parameter bit-width
    dummy_input=torch.rand(1, 3, 224, 224).to(torch.device('cuda')),  # Example input tensor
    config_file='path_to_default_config.json'  # Path to configuration file
)

# Step 3: Compute Encodings (Simulate Quantization)
def evaluate_model(model, num_batches):
    """Dummy evaluation function for computing encodings."""
    model.eval()
    with torch.no_grad():
        for _ in range(num_batches):
            dummy_input = torch.rand(1, 3, 224, 224).to(torch.device('cuda'))
            model(dummy_input)

sim.compute_encodings(forward_pass_callback=evaluate_model, forward_pass_callback_args=5)

# Step 4: Fine-tune the quantized model (optional)
# You can use your training pipeline here to further train the quantized model.

# Step 5: Export the quantized model for deployment
sim.export(path='./quantized_model', filename_prefix='resnet18_quantized')

print("Quantization simulation completed.")


## References
- https://github.com/quic/aimet
- https://quic.github.io/aimet-pages/releases/1.24.0/install/index.html
- https://www.qualcomm.com/developer/software/ai-model-efficiency-toolkit