# Network quantization: reducing the number of bits used for model parameters

A model quantization is the process of remapping a range of numeric values that the model interacts with to a number system that can be represented with fewer bits.

## Post-training quantization

Let's use a simple dummy model. If you wan to make this exercise more meaningfull feel free to use one of models from
```
# tf.keras.applications,
# for example 
model = tf.keras.applications.ResNet50(include_top=True, weights='imagenet')
``` 


In [1]:
import tensorflow as tf
print(tf.__version__)
from tensorflow import keras
from tensorflow.keras import layers
import pandas as pd
import numpy as np
# data
data = pd.read_csv("sample_google_scholar.csv")
data = data.dropna()
def convert_first_ten_characters_into_tensor(data):
    first_ten_characters = data[:10]
    converted = [ord(char)/256 for char in first_ten_characters]
    while len(converted) < 10:
        converted.append(0.0)
    return np.array(converted)
converted_affiliation = data['affiliation'].map(convert_first_ten_characters_into_tensor)
affiliation = np.vstack(converted_affiliation.values)
converted_email = data['email'].str.contains('.edu')
labels = converted_email.values
# model 
input_shape = 10
model = keras.Sequential(
    [
        keras.Input(shape=input_shape),
        layers.Dense(128, activation="relu", name="layer1"),
        layers.Dense(64, activation="relu", name="layer2"),
        layers.Dense(1, activation="sigmoid", name="layer3"),
    ])
loss = tf.keras.losses.BinaryCrossentropy()
optimizer = tf.keras.optimizers.Adam()
model.compile(loss=loss, optimizer=optimizer)
# model fit 
model.fit(affiliation, labels, batch_size=16, epochs=5, validation_split=0.2)

2022-08-20 16:59:49.657295: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


2.11.0-dev20220820
Epoch 1/5
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: 'arguments' object has no attribute 'posonlyargs'
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: 'arguments' object has no attribute 'posonlyargs'


2022-08-20 16:59:55.308443: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: 'arguments' object has no attribute 'posonlyargs'
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: 'arguments' object has no attribute 'posonlyargs'
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7f7fc2e2a6d0>

In [2]:
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 layer1 (Dense)              (None, 128)               1408      
                                                                 
 layer2 (Dense)              (None, 64)                8256      
                                                                 
 layer3 (Dense)              (None, 1)                 65        
                                                                 
Total params: 9,729
Trainable params: 9,729
Non-trainable params: 0
_________________________________________________________________


We can save this model 
```python
tensorflow_model_path = './tf_model'
model.save('./tf_model')
```

In [3]:
tensorflow_model_path = './tf_model'
model.save('./tf_model')

Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: 'arguments' object has no attribute 'posonlyargs'
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: 'arguments' object has no attribute 'posonlyargs'
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: 'arguments' object has no attribute 'posonlyargs'
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: 'arguments' object has no attribute 'posonlyargs'
INFO:tensorflow:Assets written to: ./tf_model/assets


In [4]:
model.save('model.hdf5') 

```
and use from_saved_model function from TFLiteConverter   
```python
tf.lite.TFLiteConverter.from_saved_model(tensorflow_model_path)
```
or as we already have this model, we can also use from_keras_model from TFLiteConverter as follows:

In [5]:
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_types = [tf.float16]

Finally, we a single line of code we can convert our model:

In [6]:
tfl_model = converter.convert()

Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: 'arguments' object has no attribute 'posonlyargs'
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: 'arguments' object has no attribute 'posonlyargs'
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: 'arguments' object has no attribute 'posonlyargs'
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: 'arguments' object has no attribute 'posonlyargs'
INFO:tensorflow:Assets written to: /var/folders/lr/sp74bxw50pz1ylmkv7qtlf_m0000gp/T/tmpxswq5_rj/assets


2022-08-20 16:59:58.290062: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:362] Ignored output_format.
2022-08-20 16:59:58.290082: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:365] Ignored drop_control_dependency.
2022-08-20 16:59:58.290760: I tensorflow/cc/saved_model/reader.cc:45] Reading SavedModel from: /var/folders/lr/sp74bxw50pz1ylmkv7qtlf_m0000gp/T/tmpxswq5_rj
2022-08-20 16:59:58.292086: I tensorflow/cc/saved_model/reader.cc:89] Reading meta graph with tags { serve }
2022-08-20 16:59:58.292101: I tensorflow/cc/saved_model/reader.cc:130] Reading SavedModel debug info (if present) from: /var/folders/lr/sp74bxw50pz1ylmkv7qtlf_m0000gp/T/tmpxswq5_rj
2022-08-20 16:59:58.295621: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:365] MLIR V1 optimization pass is not enabled
2022-08-20 16:59:58.296533: I tensorflow/cc/saved_model/loader.cc:229] Restoring SavedModel bundle.
2022-08-20 16:59:58.332753: I tensorflow/cc/saved_model/loader.

print( dir( tfl_model) ) 

In [7]:
open("tfl_model.tflite", "wb").write(tfl_model)

22444

In [8]:
import os
print("Original model in Mb:", os.path.getsize('model.hdf5') / float(2**20) )

Original model in Mb: 0.14318084716796875


In [9]:
print("Quantized model in Mb:", os.path.getsize('tfl_model.tflite') / float(2**20))

Quantized model in Mb: 0.021404266357421875


## Post-training quantization - Full integer quantization

Full integer quantization where every component for the model inference (inputs, activations, as well as weights) is quantized to lower precision. For this type of quantization, you need to provide a representative dataset to estimate the ranges for the activations.

In [10]:
def gen_rep():
    data = affiliation.astype(np.float32)
    data = tf.data.Dataset.from_tensor_slices(data).batch(1)
    for i in data.take(BATCH_SIZE):
        yield [i]

In [11]:
BATCH_SIZE = 16
converter = tf.lite.TFLiteConverter.from_saved_model('./tf_model/')

converter.optimizations = [tf.lite.Optimize.DEFAULT]

converter.representative_dataset = gen_rep

converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.int8  # or tf.uint8
converter.inference_output_type = tf.int8  # or tf.uint8

In [12]:
tflite_quant_model = converter.convert()

2022-08-20 16:59:58.660070: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:362] Ignored output_format.
2022-08-20 16:59:58.660085: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:365] Ignored drop_control_dependency.
2022-08-20 16:59:58.660191: I tensorflow/cc/saved_model/reader.cc:45] Reading SavedModel from: ./tf_model/
2022-08-20 16:59:58.661288: I tensorflow/cc/saved_model/reader.cc:89] Reading meta graph with tags { serve }
2022-08-20 16:59:58.661299: I tensorflow/cc/saved_model/reader.cc:130] Reading SavedModel debug info (if present) from: ./tf_model/
2022-08-20 16:59:58.669294: I tensorflow/cc/saved_model/loader.cc:229] Restoring SavedModel bundle.
2022-08-20 16:59:58.704235: I tensorflow/cc/saved_model/loader.cc:213] Running initialization op on SavedModel bundle at path: ./tf_model/
2022-08-20 16:59:58.712540: I tensorflow/cc/saved_model/loader.cc:305] SavedModel load for tags { serve }; Status: success: OK. Took 52350 microseconds.
fu

In [13]:
open("tflite_quant_model.tflite", "wb").write(tflite_quant_model)

12768

In [14]:
print("Original model in Mb:", os.path.getsize('model.hdf5') / float(2**20) )

Original model in Mb: 0.14318084716796875


In [15]:
print("Quantized model in Mb:", os.path.getsize('tfl_model.tflite') / float(2**20))

Quantized model in Mb: 0.021404266357421875


In [16]:
print("Quantized model (Full integer quantization¶) in Mb:", os.path.getsize('tflite_quant_model.tflite') / float(2**20))

Quantized model (Full integer quantization¶) in Mb: 0.012176513671875


## Performing quantization aware training

In [17]:
import tensorflow_model_optimization as tfmot

In [18]:
# model 
input_shape = 10
model = keras.Sequential(
    [
        keras.Input(shape=input_shape),
        layers.Dense(128, activation="relu", name="layer1"),
        layers.Dense(64, activation="relu", name="layer2"),
        layers.Dense(1, activation="sigmoid", name="layer3"),
    ])

In [19]:
q_aware_model = tfmot.quantization.keras.quantize_model(model)

Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module 'gast' has no attribute 'Constant'


Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module 'gast' has no attribute 'Constant'


Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module 'gast' has no attribute 'Constant'
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: 'arguments' object has no attribute 'posonlyargs'


Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: 'arguments' object has no attribute 'posonlyargs'


Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: 'arguments' object has no attribute 'posonlyargs'
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module 'gast' has no attribute 'Constant'


Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module 'gast' has no attribute 'Constant'


Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module 'gast' has no attribute 'Constant'


In [20]:
q_aware_model.compile(
              optimizer=optimizer,
              loss=loss,
              metrics=['accuracy'])

In [21]:
q_aware_model.fit(affiliation, labels, batch_size=16, epochs=5, validation_split=0.2)

Epoch 1/5
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: 'arguments' object has no attribute 'posonlyargs'


Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: 'arguments' object has no attribute 'posonlyargs'


Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: 'arguments' object has no attribute 'posonlyargs'
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: 'arguments' object has no attribute 'posonlyargs'


Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: 'arguments' object has no attribute 'posonlyargs'


Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: 'arguments' object has no attribute 'posonlyargs'
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7f7fc37b6f10>

In [22]:
q_aware_model.summary()

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 quantize_layer (QuantizeLay  (None, 10)               3         
 er)                                                             
                                                                 
 quant_layer1 (QuantizeWrapp  (None, 128)              1413      
 erV2)                                                           
                                                                 
 quant_layer2 (QuantizeWrapp  (None, 64)               8261      
 erV2)                                                           
                                                                 
 quant_layer3 (QuantizeWrapp  (None, 1)                70        
 erV2)                                                           
                                                                 
Total params: 9,747
Trainable params: 9,729
Non-traina