# **Quantization in Deep Learning**

It is a process of reducing the model size so that it can run on EDGE(microprocessors etc.) devices

Benefits : 

1) Run ML models efficiently on EDGE devices
2) Faster Inference

2 Ways to do it in TensorFLow:

1) Post Training Quantization : take your model and implement tf.lite convert. This process will automatically reduce the size. If you also apply quantization with tf.lite then the resulting model will be much more smaller

2) Quantization aware training : Apply quantize_model(tf_model) and then you train again q_model.fit() (similar to transfer model). The resulting model will be fine tuned and then you can apply again tf lite to convert for EDGE devices.

This notebook includes : 

(1) Train a hand written digits model

(2) Export to a disk and check the size of that model

(3) Use two techniques for quantization (1) post training quantization (3) quantization aware trainin

In [None]:
import tensorflow as tf
from tensorflow import keras
import matplotlib.pyplot as plt
%matplotlib inline
import numpy as np

In [None]:
(X_train, y_train) , (X_test, y_test) = keras.datasets.mnist.load_data()

# Scale 
X_train = X_train / 255
X_test = X_test / 255

# FLatten

X_train_flattened = X_train.reshape(len(X_train), 28*28)
X_test_flattened = X_test.reshape(len(X_test), 28*28)

In [None]:
# Create model Using flatten layer to avoid calling reshape 

model = keras.Sequential([
    keras.layers.Flatten(input_shape=(28, 28)),
    keras.layers.Dense(100, activation='relu'),
    keras.layers.Dense(10, activation='sigmoid')
])

model.build(input_shape=(None, 28, 28))


model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

model.fit(X_train, y_train, epochs=5)

In [None]:
# Evaluate the model
model.evaluate(X_test, y_test)

In [None]:
model.save("ml_model.keras")

## **1) Post Training Quantization**

In [None]:
# Load model and convert it in tf model
model = tf.keras.models.load_model('ml_model.keras')
model.export('exported_model')



In [None]:
# COnvert the model with lite
converter = tf.lite.TFLiteConverter.from_saved_model('exported_model')  # Use the new exported model path
tflite_model = converter.convert()

len(tflite_model)

In [None]:
# COnvert the model with lite and qunatize it 
converter = tf.lite.TFLiteConverter.from_saved_model('exported_model')  # Use the new exported model path
converter.optimizations = [tf.lite.Optimize.DEFAULT]
tflite_quant_model = converter.convert()

len(tflite_quant_model)

The above lengths represent the bytes of each model. The results show that the size of the model with quantization has decreased a lot 

In [None]:
# Save the models
with open("tflite_model.tflite", "wb") as f:
    f.write(tflite_model)
    
with open("tflite_quant_model.tflite", "wb") as f:
    f.write(tflite_quant_model)

## **2) Quantization aware training**

In [None]:
import tensorflow_model_optimization as tfmot

quantize_model = tfmot.quantization.keras.quantize_model

# q_aware stands for for quantization aware.
q_aware_model = quantize_model(model)

# `quantize_model` requires a recompile.
q_aware_model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

q_aware_model.summary()

In [None]:
q_aware_model.fit(X_train, y_train, epochs=1)
q_aware_model.evaluate(X_test, y_test)


In [None]:
converter = tf.lite.TFLiteConverter.from_keras_model(q_aware_model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]

tflite_qaware_model = converter.convert()

In [None]:
len(tflite_qaware_model)


In [None]:
with open("tflite_qaware_model.tflite", 'wb') as f:
    f.write(tflite_qaware_model)