# Gradient-Based Post Training Quantization using the Model Compression Toolkit - A Quick-Start Guide

[Run this tutorial in Google Colab](https://colab.research.google.com/github/sony/model_optimization/blob/main/tutorials/notebooks/mct_features_notebooks/keras/example_keras_mobilenet_gptq.ipynb)

## Overview

This tutorial demonstrates a pre-trained model quantization using the **Model Compression Toolkit (MCT)** with **Gradient-based PTQ (GPTQ)**. 
GPTQ stands as an optimization procedure that markedly enhances the performance of models undergoing post-training quantization.
This is achieved through an optimization process applied post-quantization, specifically adjusting the rounding of quantized weights.
GPTQ is especially effective in case of low bit width quantization and mixed precision quantization. 

This tutorial's scope is limited to demonstrating GPTQ usage. In this example, we quantize the model and evaluate the accuracy before and after quantization.

For an example of a full quantization flow utilizing GPTQ see [full quantization tutorial](https://github.com/sony/model_optimization/blob/main/tutorials/notebooks/imx500_notebooks/keras/keras_yolov8n_for_imx500.ipynb)

## Summary

In this tutorial we will cover:

1. Gradient-Based Post-Training Quantization using MCT.
2. Loading and preprocessing ImageNet's validation dataset.
3. Constructing an unlabeled representative dataset.
4. Accuracy evaluation of the floating-point and the quantized models.

## Setup

Install and import the relevant packages:

In [None]:
TF_VER = '2.14.0'

!pip install -q tensorflow=={TF_VER}


In [None]:
import tensorflow as tf
import keras
import importlib.util

if not importlib.util.find_spec('model_compression_toolkit'):
    !pip install -q model_compression_toolkit

## Dataset preparation

**Note** that for demonstration purposes we use the validation set for the model quantization and GPTQ optimization. Usually, a subset of the training dataset is used, but loading it is a heavy procedure that is unnecessary for the sake of this demonstration.

This step may take several minutes...

In [None]:
import os
 
if not os.path.isdir('imagenet'):
    !mkdir imagenet
    !wget -P imagenet https://image-net.org/data/ILSVRC/2012/ILSVRC2012_devkit_t12.tar.gz
    !wget -P imagenet https://image-net.org/data/ILSVRC/2012/ILSVRC2012_img_val.tar
    
    !cd imagenet && tar -xzf ILSVRC2012_devkit_t12.tar.gz && \
     mkdir ILSVRC2012_img_val && tar -xf ILSVRC2012_img_val.tar -C ILSVRC2012_img_val

Rearrange the extracted data into folders per label 

In [None]:
from pathlib import Path
import shutil

root = Path('./imagenet')
imgs_dir = root / 'ILSVRC2012_img_val'
target_dir = root /'val'

def extract_labels():
    !pip install -q scipy
    import scipy
    mat = scipy.io.loadmat(root / 'ILSVRC2012_devkit_t12/data/meta.mat', squeeze_me=True)
    cls_to_nid = {s[0]: s[1] for i, s in enumerate(mat['synsets']) if s[4] == 0} 
    with open(root / 'ILSVRC2012_devkit_t12/data/ILSVRC2012_validation_ground_truth.txt', 'r') as f:
        return [cls_to_nid[int(cls)] for cls in f.readlines()]

if not target_dir.exists():
    labels = extract_labels()
    for lbl in set(labels):
        os.makedirs(target_dir / lbl)
    
    for img_file, lbl in zip(sorted(os.listdir(imgs_dir)), labels):
        shutil.move(imgs_dir / img_file, target_dir / lbl)


### Representative Dataset

GPTQ is a gradient-based optimization process, which requires representative dataset to perform inference and compute gradients. 

Separate representative datasets can be used for the PTQ statistics collection and for GPTQ. In this tutorial we use the same representative dataset for both.

A complete pass through the representative dataset generator constitutes an epoch (batch_size x n_iter samples). 

In [None]:
def imagenet_preprocess_input(images, labels):
    return tf.keras.applications.mobilenet_v2.preprocess_input(images), labels

In [None]:
def get_representative_dataset(n_iter=10, batch_size=50):
    dataset = tf.keras.utils.image_dataset_from_directory(
        directory='./imagenet/val',
        batch_size=batch_size,
        image_size=[224, 224],
        shuffle=True,
        crop_to_aspect_ratio=True,
        interpolation='bilinear')
    dataset = dataset.map(lambda x, y: (imagenet_preprocess_input(x, y)))

    def representative_dataset():
        for _ in range(n_iter):
            yield [dataset.take(1).get_single_element()[0].numpy()]

    return representative_dataset

representative_dataset_gen = get_representative_dataset()

## Model Gradient-Based Post-Training quantization using MCT

This is the main part in which we quantize and our model.

First, we load a pre-trained MobileNetV2 model from Keras, in 32-bits floating-point precision format.

In [None]:
from keras.applications.mobilenet_v2 import MobileNetV2

float_model = MobileNetV2()

Next, we create a GPTQ configuration with possible GPTQ optimization options (such as the number of epochs for the optimization process). MCT will quantize the model and start the GPTQ process to optimize the model’s parameters and quantization parameters.

In addition, we need to define a TargetPlatformCapability object, representing the HW specifications on which we wish to eventually deploy our quantized model.

In [None]:
import model_compression_toolkit as mct

# Create a GPTQ quantization configuration and set the number of training iterations. 
# 50 epochs are sufficient for this tutorial. For GPTQ run after mixed precision quantization a higher number of iterations
# will be required.
gptq_config = mct.gptq.get_keras_gptq_config(n_epochs=50)

# Specify the target platform capability (TPC)
tpc = mct.get_target_platform_capabilities("tensorflow", 'imx500', target_platform_version='v1')


### Run model Gradient-based Post-Training Quantization
Finally, we quantize our model using MCT's GPTQ API.

In [None]:
quantized_model, quantization_info = mct.gptq.keras_gradient_post_training_quantization(
    float_model,
    representative_dataset_gen,
    gptq_config=gptq_config,
    target_platform_capabilities=tpc)

That's it! Our model is now quantized.

## Models evaluation

In order to evaluate our models, we first need to load the validation dataset. As before, let's assume we downloaded the ImageNet validation dataset to a folder with the path below:

In [None]:
def get_validation_dataset():
    dataset = tf.keras.utils.image_dataset_from_directory(
        directory='./imagenet/val',
        batch_size=50,
        image_size=[224, 224],
        shuffle=False,
        crop_to_aspect_ratio=True,
        interpolation='bilinear')
    dataset = dataset.map(lambda x, y: (imagenet_preprocess_input(x, y)))
    return dataset

evaluation_dataset = get_validation_dataset()

Let's start with the floating-point model evaluation.

We need to compile the model before evaluation and set the loss and the evaluation metric:

In [None]:
float_model.compile(loss=keras.losses.SparseCategoricalCrossentropy(), metrics="accuracy")
results = float_model.evaluate(evaluation_dataset)

Finally, let's evaluate the quantized model:

In [None]:
quantized_model.compile(loss=keras.losses.SparseCategoricalCrossentropy(), metrics="accuracy")
results = quantized_model.evaluate(evaluation_dataset)

You can see that we got a very small degradation with a compression rate of x4 !

Now, we can export the model to Keras and TFLite:

In [None]:
mct.exporter.keras_export_model(model=quantized_model, save_model_path='qmodel.tflite',
                                serialization_format=mct.exporter.KerasExportSerializationFormat.TFLITE, quantization_format=mct.exporter.QuantizationFormat.FAKELY_QUANT)

mct.exporter.keras_export_model(model=quantized_model, save_model_path='qmodel.keras')


## Conclusion

In this tutorial, we demonstrated how to quantize a pre-trained model using MCT with gradient-based optimization with a few lines of code. We saw that we can achieve an x4 compression ratio with minimal performance degradation.





Copyright 2023 Sony Semiconductor Israel, Inc. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
