[View the runnable example on GitHub](https://github.com/intel-analytics/BigDL/tree/main/python/nano/tutorial/notebook/inference/tensorflow/quantize_tensorflow_inference_inc.ipynb)

# Quantize Tensorflow Model for Inference using Intel Neural Compressor

With Intel Neural Compressor (INC) as quantization engine, you can apply `InferenceOptimizer.quantize` API to realize post-training quantization on your Tensorflow Keras models, which takes only a few lines.

To quantize your model with INC, the following dependencies need to be installed first:

In [None]:
# for BigDL-Nano
!pip install --pre --upgrade bigdl-nano[tensorflow,inference]
!source bigdl-nano-init

> 📝 **Note**
> 
> We recommend to run the commands above, especially `source bigdl-nano-init` before jupyter kernel is started, or some of the optimizations may not take effect.

Let's take an [EfficientNetB0 model](https://www.tensorflow.org/api_docs/python/tf/keras/applications/efficientnet/EfficientNetB0) pretrained on ImageNet dataset as an example:

In [None]:
from tensorflow.keras.applications import EfficientNetB0

model = EfficientNetB0(weights='imagenet')

And we obtain our training and testing dataset as follows. 

In [None]:
import tensorflow_datasets as tfds

batch_size = 128    
img_size = 224
  
def prepare_dataset():
  
    (ds_train, ds_test), ds_info = tfds.load(
        "imagenet2012",
        data_dir="./data/",
        split=['train', 'test'],
        with_info=True,
        as_supervised=True
    )

    num_classes = ds_info.features['label'].num_classes

    def preprocessing(img, label):
        return tf.image.resize(img, (img_size, img_size)), tf.one_hot(label, num_classes)

    AUTOTUNE = tf.data.AUTOTUNE
    ds_train = ds_train.shuffle(1000).map(preprocessing).batch(batch_size, drop_remainder=False).prefetch(AUTOTUNE)
    ds_test = ds_test.map(preprocessing).batch(batch_size, drop_remainder=False).prefetch(AUTOTUNE)

    for img, _ in tqdm(ds_train):
        calib_set = img
        break
        
    return ds_train, ds_test, calib_set 

> 📝 **Note**
> 
>  `tensorflow_dataset` requires downloading the source data manually into `download_config.manual_dir` (defaults to ~/tensorflow_datasets/downloads/manual/): manual_dir should contain two files: ILSVRC2012_img_train.tar and ILSVRC2012_img_val.tar.
> 
>  Please refer to [Tensorflow documentation](https://www.tensorflow.org/datasets/catalog/imagenet2012) for more information. 

To enable quantization using INC for inference, you could simply **import BigDL-Nano** `InferenceOptimizer`**, and use** `InferenceOptimizer` **to quantize your TensorFlow model**:

In [None]:
from bigdl.nano.tf.keras import InferenceOptimizer

train_set, test_set, calibration_set = prepare_dataset()
q_model = InferenceOptimizer.quantize(model, 
                                      x=calibration_set)

> 📝 **Note**
> 
>  `InferenceOptimizer` will by default quantize your TensorFlow models using int8 precision through **static** post-training quantization. Currently 'dynamic' approach is not supported yet. For this case, `x` (for calibration data) is required. To avoid data leak during calibration, it is suggested using training dataset or the subset of training set. 
> 
> Please refer to [API documentation](https://bigdl.readthedocs.io/en/latest/doc/PythonAPI/Nano/tensorflow.html) for more information on `InferenceOptimizer.quantize`.

You could then do the normal inference steps with the quantized model:

In [None]:
x = tf.random.normal(shape=(2, 224, 224, 3))
# use the optimized model here
y_hat = ort_model(x)
predictions = tf.argmax(y_hat, axis=1)
print(predictions)

> 📚 **Related Readings**
> 
> - [How to install BigDL-Nano](https://bigdl.readthedocs.io/en/latest/doc/Nano/Overview/nano.html#install)
> - [How to install BigDL-Nano in Google Colab](https://bigdl.readthedocs.io/en/latest/doc/Nano/Howto/install_in_colab.html)