# Network quantization: reducing the number of bits used for model parameters

A model quantization is the process of remapping a range of numeric values that the model interacts with to a number system that can be represented with fewer bits.

## Post-training quantization

### TensorFlow

In [1]:
import absl.logging
absl.logging.set_verbosity(absl.logging.ERROR)
import tensorflow as tf
print(tf.__version__)

2.9.1


Let's use pretrained VGG16 model for the perpouse of this presentation

In [2]:
model = tf.keras.applications.ResNet50(include_top=True, weights='imagenet')

2022-08-15 23:08:22.534592: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/resnet/resnet50_weights_tf_dim_ordering_tf_kernels.h5


In [3]:
model.summary()

Model: "resnet50"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 input_1 (InputLayer)           [(None, 224, 224, 3  0           []                               
                                )]                                                                
                                                                                                  
 conv1_pad (ZeroPadding2D)      (None, 230, 230, 3)  0           ['input_1[0][0]']                
                                                                                                  
 conv1_conv (Conv2D)            (None, 112, 112, 64  9472        ['conv1_pad[0][0]']              
                                )                                                                 
                                                                                           

We can save this model 
```python
tensorflow_model_path = './tf_model'
model.save('./tf_model')
```

In [4]:
tensorflow_model_path = './tf_model'
model.save('./tf_model')

Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module 'gast' has no attribute 'Constant'
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module 'gast' has no attribute 'Constant'
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: 'arguments' object has no attribute 'posonlyargs'
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: 'arguments' object has no attribute 'posonlyargs'
INFO:tensorflow:Assets written to: ./tf_model/assets


INFO:tensorflow:Assets written to: ./tf_model/assets


In [14]:
model.save('model.hdf5') 





```
and use from_saved_model function from TFLiteConverter   
```python
tf.lite.TFLiteConverter.from_saved_model(tensorflow_model_path)
```
or as we already have this model, we can also use from_keras_model from TFLiteConverter as follows:

In [5]:
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_types = [tf.float16]

In [6]:
tfl_model = converter.convert()

Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module 'gast' has no attribute 'Constant'


Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module 'gast' has no attribute 'Constant'


Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module 'gast' has no attribute 'Constant'
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: 'arguments' object has no attribute 'posonlyargs'


Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: 'arguments' object has no attribute 'posonlyargs'


Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: 'arguments' object has no attribute 'posonlyargs'
INFO:tensorflow:Assets written to: /var/folders/lr/sp74bxw50pz1ylmkv7qtlf_m0000gp/T/tmp3ere2wgu/assets


INFO:tensorflow:Assets written to: /var/folders/lr/sp74bxw50pz1ylmkv7qtlf_m0000gp/T/tmp3ere2wgu/assets
2022-08-15 23:09:35.046843: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:362] Ignored output_format.
2022-08-15 23:09:35.047480: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:365] Ignored drop_control_dependency.
2022-08-15 23:09:35.058892: I tensorflow/cc/saved_model/reader.cc:43] Reading SavedModel from: /var/folders/lr/sp74bxw50pz1ylmkv7qtlf_m0000gp/T/tmp3ere2wgu
2022-08-15 23:09:35.106937: I tensorflow/cc/saved_model/reader.cc:81] Reading meta graph with tags { serve }
2022-08-15 23:09:35.106966: I tensorflow/cc/saved_model/reader.cc:122] Reading SavedModel debug info (if present) from: /var/folders/lr/sp74bxw50pz1ylmkv7qtlf_m0000gp/T/tmp3ere2wgu
2022-08-15 23:09:35.246640: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:354] MLIR V1 optimization pass is not enabled
2022-08-15 23:09:35.291778: I tensorflow/cc/saved_model/load

In [7]:
print( dir( tfl_model) ) 

['__add__', '__class__', '__contains__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__mod__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmod__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'capitalize', 'center', 'count', 'decode', 'endswith', 'expandtabs', 'find', 'fromhex', 'hex', 'index', 'isalnum', 'isalpha', 'isascii', 'isdigit', 'islower', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'maketrans', 'partition', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']


In [8]:
open("tfl_model.tflite", "wb").write(tfl_model)

51118632

In [15]:
import os
print("Original model in Mb:", os.path.getsize('model.hdf5') / float(2**20) )

Original model in Mb: 98.29207611083984


In [16]:
print("Quantized model in Mb:", os.path.getsize('tfl_model.tflite') / float(2**20))

Quantized model in Mb: 48.750526428222656


TO-DO:
    - Compose model from ResNet50 and retrain on cifar10 for example 
        - Check acc 
        - Check acc of quanitzed version 
    
    - Do training quantization 
        - Check acc before and after 
    
    - ... Pruning, weight sharing 
    
    - Everything together? 
        