# Your Details

Your Name: Dylan Rodrigues

Your ID Number: 24121479

# Etivity Task 4 - Part 2: Quantizing a TensorFlow/Keras Model

For this exercise, you will apply various quantization strategies to a convolutional neural network (CNN) trained on the Fashion MNIST dataset. The first section of this exercise is already completed (Sections 1 and 2). Your task is to perform various quantizations on this model uses the TF Model optimisations toolkit and report on the results with your own code in Sections 3, 4 and 5.

By the end of this notebook, you'll be able to: 

* Understand Quantizations in TensorFlow 
* Quantize a CNN using the TensorFlow Model optimisation framework
* Analyse the model perfromance
* Results analysis

### Let's get started!
**Start** with sections [1] and [2] for which code is provided - then proceed with sections [3], [4] and [5] to begin this model quantization exercise.

    [1] Import data dependencies
    [2] Generate a TensorFlow/keras CNN model for the Fashion MNIST dataset
    [3] Convert model to TF Lite model
    [4] Perform Post Training Quantization (PTQ) to generate TF Lite model for:
        (a) PTQ using Float 16 Quantization
        (b) PTQ using Dynamic Range Quantization
        (c) PTQ using Full Integer (int8) Quantization 
        (d) Evaluate the TF Lite models
    [5] Perform Quantization Aware Training (QAT)
        (a) Train a TF model through tf.keras
        (b) Make it quantization-aware
        (c) Quantize the model using Dynamic Range Quantization
        (d) Evaluate the TF Lite model performance
    
   
### Important Note on Submission 

There are code exercises to complete in this task.  Insert your code entries into the cell areas marked with the 'enter code here' text as below, so that grading can easily be assessed.

\### **ENTER CODE HERE**

Please make sure you are not doing the following:

1. You have not added any _extra_ `print` statement(s) in the assignment.
2. You have not added any _extra_ code cell(s) in the assignment.
3. You have not changed any of the function parameters.
4. You are not using any global variables inside your graded exercises. Unless specifically instructed to do so, please refrain from it and use the local variables instead.
5. You are not changing the assignment code where it is not required, like creating _extra_ variables.

### Installing the TensorFlow Model Optimisation toolkit

You must first install it using pip (comment this out once you have done this).

<span style='color: red;'>**Note:**</span> There is no need to run this command again if used ok from the previous tutorial. (Hence commented out here)

In [1]:
# Install the TF optimization toolkit the first time 
#! pip install -q tensorflow-model-optimization

## 1. Import the data dependencies

In [2]:
import numpy as np
import tensorflow as tf
import tensorflow 
import time
import os
import pathlib
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from tensorflow import keras

In [3]:
# Check that we are using a GPU
physical_devices = tf.config.experimental.list_physical_devices('GPU')
print("Num GPUs Available: ", len(physical_devices))

Num GPUs Available:  0


## 2. Generate a TensorFlow Model

We'll build a CNN model to classify the 10 fashion item categories from the [FASHION_MNIST dataset](https://www.tensorflow.org/datasets/catalog/fashion_mnist).

This training won't take long because you're training the model for just 5 epochs, which trains to about ~90% accuracy.

In [4]:
# Load Fashion MNIST dataset
fashion_mnist = tf.keras.datasets.fashion_mnist
(X_train, y_train), (X_test, y_test) = fashion_mnist.load_data()

# Reshape data for CNN input
img_width, img_height = 28, 28
X_train = X_train.reshape(X_train.shape[0], img_width, img_height, 1)
X_test = X_test.reshape(X_test.shape[0], img_width, img_height, 1)
input_shape = (img_width, img_height, 1)

# Normalize the input image so that each pixel value is between 0 to 1.
X_train = X_train.astype(np.float32) / 255.0
X_test = X_test.astype(np.float32) / 255.0


# Define the model architecture
model = tf.keras.Sequential([
    tf.keras.layers.Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=input_shape),
    tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
    tf.keras.layers.Dropout(rate=0.1), # Randomly disable 10% of neurons
    tf.keras.layers.Conv2D(64, kernel_size=(3, 3), activation='relu'),
    tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
    tf.keras.layers.Dropout(rate=0.1), # Randomly disable 10% of neurons
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(256, activation='relu'),
    tf.keras.layers.Dense(100, activation='relu'),
    tf.keras.layers.Dense(10, activation='softmax')
])


# Build the model
model.compile(
    loss=tf.keras.losses.sparse_categorical_crossentropy, # loss function
    optimizer=tf.keras.optimizers.Adam(), # optimizer function
    metrics=['accuracy'] # reporting metric
)

# Train the fashion MNIST classification model
model.fit(
  X_train,
  y_train,
  epochs=5,
  validation_split=0.1
)

Epoch 1/5


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m1688/1688[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 6ms/step - accuracy: 0.7624 - loss: 0.6511 - val_accuracy: 0.8723 - val_loss: 0.3501
Epoch 2/5
[1m1688/1688[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 6ms/step - accuracy: 0.8787 - loss: 0.3279 - val_accuracy: 0.8967 - val_loss: 0.2822
Epoch 3/5
[1m1688/1688[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 6ms/step - accuracy: 0.8958 - loss: 0.2763 - val_accuracy: 0.9027 - val_loss: 0.2667
Epoch 4/5
[1m1688/1688[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 6ms/step - accuracy: 0.9103 - loss: 0.2417 - val_accuracy: 0.9025 - val_loss: 0.2622
Epoch 5/5
[1m1688/1688[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 6ms/step - accuracy: 0.9189 - loss: 0.2147 - val_accuracy: 0.9093 - val_loss: 0.2530


<keras.src.callbacks.history.History at 0x1aabd3ce540>

**Evaluate and save the model**

In [5]:
score = model.evaluate(X_test, y_test, verbose=1)
print("Test loss {:.4f}, accuracy {:.2f}%".format(score[0], score[1] * 100))

[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.9046 - loss: 0.2671
Test loss 0.2629, accuracy 90.52%


In [6]:
#Save the entire model into a model.h5 file
model.save("models/model.h5")
print("Saved model to disk")



Saved model to disk


## 3. Convert the trained model to TensorFlow Lite format

In the code cell below, convert the model to a **TensorFlow Lite** model and then save this unquantized TFLite model to the ./fashion_mnist_tflite_model directory

In [7]:
### ENTER CODE HERE
model = tf.keras.models.load_model('models/model.h5')
converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()

import pathlib
tflite_models_dir = pathlib.Path("./mnist_tflite_models/")
tflite_models_dir.mkdir(exist_ok=True, parents=True)

# Save the unquantized float model:
tflite_model_file = tflite_models_dir/"mnist_model.tflite"
tflite_model_file.write_bytes(tflite_model)



INFO:tensorflow:Assets written to: C:\Users\24121479\AppData\Local\Temp\tmpi15_a892\assets


INFO:tensorflow:Assets written to: C:\Users\24121479\AppData\Local\Temp\tmpi15_a892\assets


Saved artifact at 'C:\Users\24121479\AppData\Local\Temp\tmpi15_a892'. The following endpoints are available:

* Endpoint 'serve'
  args_0 (POSITIONAL_ONLY): TensorSpec(shape=(None, 28, 28, 1), dtype=tf.float32, name='input_layer')
Output Type:
  TensorSpec(shape=(None, 10), dtype=tf.float32, name=None)
Captures:
  1832830871760: TensorSpec(shape=(), dtype=tf.resource, name=None)
  1832830871376: TensorSpec(shape=(), dtype=tf.resource, name=None)
  1832830870800: TensorSpec(shape=(), dtype=tf.resource, name=None)
  1832830869072: TensorSpec(shape=(), dtype=tf.resource, name=None)
  1832830867152: TensorSpec(shape=(), dtype=tf.resource, name=None)
  1832830870608: TensorSpec(shape=(), dtype=tf.resource, name=None)
  1832830872144: TensorSpec(shape=(), dtype=tf.resource, name=None)
  1832830872528: TensorSpec(shape=(), dtype=tf.resource, name=None)
  1832839214096: TensorSpec(shape=(), dtype=tf.resource, name=None)
  1832839210832: TensorSpec(shape=(), dtype=tf.resource, name=None)


1825276

It's now a TensorFlow Lite model, but it's still using 32-bit float values for all parameter data.

## 4. Post-Training Quantization (PTQ)

### Part (a): PTQ using Float 16 Quantization
Here you will insert code for post-training float 16 quantization and then evaluate the file size compared to the unquantized tflite model size.

In [60]:
model = tf.keras.models.load_model('models/model.h5')
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_types = [tf.float16]
tflite_quant16_model = converter.convert()

# Info: Save the quantized 16-bit model
tflite_quant16_model_file = tflite_models_dir/"mnist_model_quant16.tflite"
tflite_quant16_model_file.write_bytes(tflite_quant16_model)



INFO:tensorflow:Assets written to: C:\Users\24121479\AppData\Local\Temp\tmpf07uelvm\assets


INFO:tensorflow:Assets written to: C:\Users\24121479\AppData\Local\Temp\tmpf07uelvm\assets


Saved artifact at 'C:\Users\24121479\AppData\Local\Temp\tmpf07uelvm'. The following endpoints are available:

* Endpoint 'serve'
  args_0 (POSITIONAL_ONLY): TensorSpec(shape=(None, 28, 28, 1), dtype=tf.float32, name='input_layer')
Output Type:
  TensorSpec(shape=(None, 10), dtype=tf.float32, name=None)
Captures:
  1832830869456: TensorSpec(shape=(), dtype=tf.resource, name=None)
  1832830866384: TensorSpec(shape=(), dtype=tf.resource, name=None)
  1832839210064: TensorSpec(shape=(), dtype=tf.resource, name=None)
  1832839207184: TensorSpec(shape=(), dtype=tf.resource, name=None)
  1832839209296: TensorSpec(shape=(), dtype=tf.resource, name=None)
  1832839207760: TensorSpec(shape=(), dtype=tf.resource, name=None)
  1832839215632: TensorSpec(shape=(), dtype=tf.resource, name=None)
  1832839215824: TensorSpec(shape=(), dtype=tf.resource, name=None)
  1832839215248: TensorSpec(shape=(), dtype=tf.resource, name=None)
  1832839216208: TensorSpec(shape=(), dtype=tf.resource, name=None)


915704

**Evaluate the reduction in size of the model** - how much smaller is the Quantized 16-bit model?

In [65]:
print("Float model in Mb:", os.path.getsize(tflite_model_file) / float(2**20))
print("Quantized 16-bit model in Mb:", os.path.getsize(tflite_quant16_model_file) / float(2**20))
print("Compression ratio:", os.path.getsize(tflite_model_file)/os.path.getsize(tflite_quant16_model_file))

Float model in Mb: 1.7407188415527344
Quantized 16-bit model in Mb: 0.8732833862304688
Compression ratio: 1.9933035129255743


### Part (b): PTQ using Dynamic Range Quantization
Next you will quantize the original model dynamically to change the model weight and activations from float to int8 format. Convert the model using **Dynamic Range Quantization** and evaluate the model file size reduction.

In [10]:
model = tf.keras.models.load_model('models/model.h5')
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
tflite_quant_model = converter.convert()

# Info: Save the quantized model
tflite_quant_model_file = tflite_models_dir/"mnist_model_quant.tflite"
tflite_quant_model_file.write_bytes(tflite_quant_model)



INFO:tensorflow:Assets written to: C:\Users\24121479\AppData\Local\Temp\tmpgbxl3_0s\assets


INFO:tensorflow:Assets written to: C:\Users\24121479\AppData\Local\Temp\tmpgbxl3_0s\assets


Saved artifact at 'C:\Users\24121479\AppData\Local\Temp\tmpgbxl3_0s'. The following endpoints are available:

* Endpoint 'serve'
  args_0 (POSITIONAL_ONLY): TensorSpec(shape=(None, 28, 28, 1), dtype=tf.float32, name='input_layer')
Output Type:
  TensorSpec(shape=(None, 10), dtype=tf.float32, name=None)
Captures:
  1832874300944: TensorSpec(shape=(), dtype=tf.resource, name=None)
  1832874308240: TensorSpec(shape=(), dtype=tf.resource, name=None)
  1832874309008: TensorSpec(shape=(), dtype=tf.resource, name=None)
  1832874306896: TensorSpec(shape=(), dtype=tf.resource, name=None)
  1832874307088: TensorSpec(shape=(), dtype=tf.resource, name=None)
  1832874305744: TensorSpec(shape=(), dtype=tf.resource, name=None)
  1832874304400: TensorSpec(shape=(), dtype=tf.resource, name=None)
  1832874305552: TensorSpec(shape=(), dtype=tf.resource, name=None)
  1832874304592: TensorSpec(shape=(), dtype=tf.resource, name=None)
  1832874310352: TensorSpec(shape=(), dtype=tf.resource, name=None)


469432

 **Evaluate the reduction in size of the model** - how much smaller is the Quantized model?

In [66]:
print("Float model in Mb:", os.path.getsize(tflite_model_file) / float(2**20))
print("Quantized model in Mb:", os.path.getsize(tflite_quant_model_file) / float(2**20))
print("Compression ratio:", os.path.getsize(tflite_model_file)/os.path.getsize(tflite_quant_model_file))

Float model in Mb: 1.7407188415527344
Quantized model in Mb: 0.44768524169921875
Compression ratio: 3.8882649670239777


### Part (c): PTQ using Full Integer (int8) Quantization 
Convert the original model to satisfy **full integer quantization** so that everything is converted (including activations) from float32 into int8 format. Evaluate the model file size reduction. Note you will need to use the OPTIMIZE_FOR_SIZE option by using a small representative dataset of the model and also make sure the input and output tensors are in int8 format.

In [32]:
def representative_data_gen():
  for input_value in tf.data.Dataset.from_tensor_slices(X_train).batch(1).take(100):
    yield [input_value]

model = tf.keras.models.load_model('models/model.h5')
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_data_gen
# Ensure that if any ops can't be quantized, the converter throws an error
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
# Set the input and output tensors to uint8 (APIs added in r2.3)
converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.uint8

tflite_fullquant_model = converter.convert()

# Saving the fully-quantized 8-bit model:
tflite_fullquant_model_file = tflite_models_dir/"mnist_model_fullquant.tflite"
tflite_fullquant_model_file.write_bytes(tflite_fullquant_model)



INFO:tensorflow:Assets written to: C:\Users\24121479\AppData\Local\Temp\tmpdk29la6_\assets


INFO:tensorflow:Assets written to: C:\Users\24121479\AppData\Local\Temp\tmpdk29la6_\assets


Saved artifact at 'C:\Users\24121479\AppData\Local\Temp\tmpdk29la6_'. The following endpoints are available:

* Endpoint 'serve'
  args_0 (POSITIONAL_ONLY): TensorSpec(shape=(None, 28, 28, 1), dtype=tf.float32, name='input_layer')
Output Type:
  TensorSpec(shape=(None, 10), dtype=tf.float32, name=None)
Captures:
  1834045648016: TensorSpec(shape=(), dtype=tf.resource, name=None)
  1834045645136: TensorSpec(shape=(), dtype=tf.resource, name=None)
  1834045646864: TensorSpec(shape=(), dtype=tf.resource, name=None)
  1834045650128: TensorSpec(shape=(), dtype=tf.resource, name=None)
  1834045649936: TensorSpec(shape=(), dtype=tf.resource, name=None)
  1834045651280: TensorSpec(shape=(), dtype=tf.resource, name=None)
  1834045651472: TensorSpec(shape=(), dtype=tf.resource, name=None)
  1834045652240: TensorSpec(shape=(), dtype=tf.resource, name=None)
  1834045651856: TensorSpec(shape=(), dtype=tf.resource, name=None)
  1834045650704: TensorSpec(shape=(), dtype=tf.resource, name=None)




472568

**Check that the input and output tensors are in int8 format**

In [33]:
interpreter = tf.lite.Interpreter(model_content=tflite_fullquant_model)
input_type = interpreter.get_input_details()[0]['dtype']
print('input: ', input_type)
output_type = interpreter.get_output_details()[0]['dtype']
print('output: ', output_type)

input:  <class 'numpy.uint8'>
output:  <class 'numpy.uint8'>


 **Evaluate the reduction in size of the model** - how much smaller is the Quantized model?

In [67]:
print("Float model in Mb:", os.path.getsize(tflite_model_file) / float(2**20))
print("Full Integer Quantized model in Mb:", os.path.getsize(tflite_fullquant_model_file) / float(2**20))
print("Compression ratio:", os.path.getsize(tflite_model_file)/os.path.getsize(tflite_fullquant_model_file))

Float model in Mb: 1.7407188415527344
Full Integer Quantized model in Mb: 0.45067596435546875
Compression ratio: 3.862462121853363


### Part (d):  Evaluate the TF Lite models on all images

In this section, evaluate the four TF Lite models by running inference using the TensorFlow Lite [`Interpreter`](https://www.tensorflow.org/api_docs/python/tf/lite/Interpreter) to compare the model accuracies. First, build a **run_tflite_model()** function to run inference on a TF Lite model and then an **evaluate_model()** function to evaluate the TF Lite model on all images in the X_test dataset.

**Evaluate the model performance for these models** by reporting on the model accuracies.
1. Float model (Unquantized)
2. 16-bit quantized model
3. Initial quantized 8-bit model
4. Fully quantized 8-bit model 

In [51]:
# Helper function to run inference on a TFLite model
def run_tflite_model(tflite_file, test_image_indices):
  global X_test

  # Initialize the interpreter
  interpreter = tf.lite.Interpreter(model_path=str(tflite_file))
  interpreter.allocate_tensors()

  input_details = interpreter.get_input_details()[0]
  output_details = interpreter.get_output_details()[0]

  predictions = np.zeros((len(test_image_indices),), dtype=int)
  for i, test_image_index in enumerate(test_image_indices):
    test_image = X_test[test_image_index]
    test_label = y_test[test_image_index]

    # Check if the input type is quantized, then rescale input data to uint8
    if input_details['dtype'] == np.uint8:
      input_scale, input_zero_point = input_details["quantization"]
      test_image = test_image / input_scale + input_zero_point

    test_image = np.expand_dims(test_image, axis=0).astype(input_details["dtype"])
    interpreter.set_tensor(input_details["index"], test_image)
    interpreter.invoke()
    output = interpreter.get_tensor(output_details["index"])[0]

    predictions[i] = output.argmax()

  return predictions

In [52]:
# Helper function to evaluate a TFLite model on all images
def evaluate_model(tflite_file, model_type):
  global X_test
  global y_test

  test_image_indices = range(X_test.shape[0])
  predictions = run_tflite_model(tflite_file, test_image_indices)

  accuracy = (np.sum(y_test== predictions) * 100) / len(X_test)

  print('%s model accuracy is %.4f%% (Number of test samples=%d)' % (
      model_type, accuracy, len(X_test)))

1. Evaluate the float model

In [53]:
tflite_model_file

WindowsPath('mnist_tflite_models/mnist_model.tflite')

In [54]:
evaluate_model(tflite_model_file, model_type="Float")

Float model accuracy is 90.5200% (Number of test samples=10000)


2. Evaluate the 16-bit quantized model

In [55]:
evaluate_model(tflite_quant16_model_file, model_type="16-bit Quantized")

16-bit Quantized model accuracy is 90.5000% (Number of test samples=10000)


3. Evaluate the initial quantized 8-bit model

In [56]:
evaluate_model(tflite_quant_model_file, model_type="Quantized")

Quantized model accuracy is 90.5500% (Number of test samples=10000)


4. Evaluate the fully quantized 8-bit integer model

In [58]:
evaluate_model(tflite_fullquant_model_file, model_type="Fully Quantized")

Fully Quantized model accuracy is 90.4300% (Number of test samples=10000)


## 5. Quantization-Aware Training (QAT)

QAT models quantization during training and typically provides higher accuracies as compared to post-training quantization. 
Generally, QAT is a three-step process:

    (a) Train a regular model through tf.keras 
        YOU MAY HAVE TO 'import tf_keras as keras' and use model = keras.Sequential([...]) format.
    (b) Make it quantization-aware by applying the related API, allowing it to learn those loss-robust parameters.
    (c) Quantize the model use one of the approaches mentioned above and analyse performance


### **Part (a)**: Train a model for the FASHION MNIST dataset again

In [68]:
# Load MNIST dataset
mnist = tf.keras.datasets.mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

# Normalize the input image so that each pixel value is between 0 to 1.
train_images = train_images.astype(np.float32) / 255.0
test_images = test_images.astype(np.float32) / 255.0

# Define the model architecture
#model = tf.keras.Sequential([
#  tf.keras.layers.InputLayer(input_shape=(28, 28)),
#  tf.keras.layers.Reshape(target_shape=(28, 28, 1)),
#  tf.keras.layers.Conv2D(filters=12, kernel_size=(3, 3), activation='relu'),
#  tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
#  tf.keras.layers.Flatten(),
#  tf.keras.layers.Dense(10)
#])

import tf_keras as keras

model = keras.Sequential([
  keras.layers.InputLayer(input_shape=(28, 28)),
  keras.layers.Reshape(target_shape=(28, 28, 1)),
  keras.layers.Conv2D(filters=12, kernel_size=(3, 3), activation='relu'),
  keras.layers.MaxPooling2D(pool_size=(2, 2)),
  keras.layers.Flatten(),
  keras.layers.Dense(10)
])


# Train the digit classification model
model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(
                  from_logits=True),
              metrics=['accuracy'])

model.fit(
  train_images,
  train_labels,
  epochs=5,
  validation_split=0.1
  #validation_data=(test_images, test_labels)
)

























Epoch 1/5












Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<tf_keras.src.callbacks.History at 0x1ab08462600>

In [85]:
# Save the entire model into a model.h5 file
model.save("models/model.h5")
print("Saved model to disk")



Saved model to disk


### Part (b): Make the model quantization aware
Hint: Use q_aware_model = quantize_model(model)

In [69]:
import tensorflow_model_optimization as tfmot

quantize_model = tfmot.quantization.keras.quantize_model

# q_aware stands for quantization aware.
q_aware_model = quantize_model(model)

# `quantize_model` requires a recompile.
q_aware_model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(
                  from_logits=True),
              metrics=['accuracy'])

q_aware_model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 quantize_layer (QuantizeLa  (None, 28, 28)            3         
 yer)                                                            
                                                                 
 quant_reshape (QuantizeWra  (None, 28, 28, 1)         1         
 pperV2)                                                         
                                                                 
 quant_conv2d (QuantizeWrap  (None, 26, 26, 12)        147       
 perV2)                                                          
                                                                 
 quant_max_pooling2d (Quant  (None, 13, 13, 12)        1         
 izeWrapperV2)                                                   
                                                                 
 quant_flatten (QuantizeWra  (None, 2028)              1

#### Retrain the quantization aware model

In [70]:
q_aware_model.fit(
  train_images,
  train_labels,
  epochs=5,
  validation_split=0.1
  #validation_data=(test_images, test_labels)
)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<tf_keras.src.callbacks.History at 0x1ab08a339e0>

#### Compare the accuracy of the baseline model to the new QAT model

In [71]:
_, baseline_model_accuracy = model.evaluate(
    test_images, test_labels, verbose=1)

_, q_aware_model_accuracy = q_aware_model.evaluate(
    test_images, test_labels, verbose=1)

print('Baseline test accuracy:', baseline_model_accuracy*100)
print('Quant test accuracy:', q_aware_model_accuracy*100)

Baseline test accuracy: 97.60000109672546
Quant test accuracy: 98.17000031471252


#### Fine tune with QAT on a subset of the training data

In [72]:
q_aware_model.fit(
  train_images[:1000],
  train_labels[:1000],
  epochs=5,
  validation_split=0.1
  #validation_data=(test_images, test_labels)
)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<tf_keras.src.callbacks.History at 0x1ab05cb1a30>

#### Re-evaluate the model accuracies.

In [73]:
_, baseline_model_accuracy = model.evaluate(
    test_images, test_labels, verbose=1)

_, q_aware_model_accuracy = q_aware_model.evaluate(
    test_images, test_labels, verbose=1)

print('Baseline test accuracy:', baseline_model_accuracy*100)
print('Quant test accuracy:', q_aware_model_accuracy*100)

Baseline test accuracy: 97.60000109672546
Quant test accuracy: 97.9099988937378


#### Save the QAT model to the ./models directory

In [74]:
# Save the entire model into a qat_model.h5 file
# Note: The previous model has already been saved as model.h5; now after customizing it, im saving the model in qat_model.h5
model.save("models/qat_model.h5")
print("Saved model to disk")

Saved model to disk


  saving_api.save_model(


### Part (c): Convert the model to TF Lite format  using Dynamic Range Quantization

In [75]:
model = tf.keras.models.load_model('models/qat_model.h5')
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
tflite_quantaware_model = converter.convert()

# Saving the quantized aware model:
tflite_quantaware_model_file = tflite_models_dir/"mnist_model_quantaware.tflite"
tflite_quantaware_model_file.write_bytes(tflite_quantaware_model)



INFO:tensorflow:Assets written to: C:\Users\24121479\AppData\Local\Temp\tmprnvgt0ni\assets


INFO:tensorflow:Assets written to: C:\Users\24121479\AppData\Local\Temp\tmprnvgt0ni\assets


Saved artifact at 'C:\Users\24121479\AppData\Local\Temp\tmprnvgt0ni'. The following endpoints are available:

* Endpoint 'serve'
  args_0 (POSITIONAL_ONLY): TensorSpec(shape=(None, 28, 28), dtype=tf.float32, name='input_1')
Output Type:
  TensorSpec(shape=(None, 10), dtype=tf.float32, name=None)
Captures:
  1834045637840: TensorSpec(shape=(), dtype=tf.resource, name=None)
  1834088026512: TensorSpec(shape=(), dtype=tf.resource, name=None)
  1834088026320: TensorSpec(shape=(), dtype=tf.resource, name=None)
  1834088028048: TensorSpec(shape=(), dtype=tf.resource, name=None)


24016

**Evaluate the reduction in size of the model.** 

In [87]:
print("Float model in Mb:", os.path.getsize("models/model.h5") / float(2**20))
print("Quantized aware (QAT) model in Mb:", os.path.getsize(tflite_quantaware_model_file) / float(2**20))
print("Compression ratio:", os.path.getsize("models/model.h5")/os.path.getsize(tflite_quantaware_model_file))

Float model in Mb: 0.09896087646484375
Quantized aware (QAT) model in Mb: 0.0229034423828125
Compression ratio: 4.3207861425716185


### Part (d): Evaluate the TF Lite QAT model accuracy
Hint: Use the intrepreter evaluate_model() function to get the accuracy result.

In [88]:
import numpy as np

def evaluate_model(interpreter):
  input_index = interpreter.get_input_details()[0]["index"]
  output_index = interpreter.get_output_details()[0]["index"]

  # Run predictions on every image in the "test" dataset.
  prediction_digits = []
  for i, test_image in enumerate(test_images):
    if i % 1000 == 0:
      print('Evaluated on {n} results so far.'.format(n=i))
    # Pre-processing: add batch dimension and convert to float32 to match with
    # the model's input data format.
    test_image = np.expand_dims(test_image, axis=0).astype(np.float32)
    interpreter.set_tensor(input_index, test_image)

    # Run inference.
    interpreter.invoke()

    # Post-processing: remove batch dimension and find the digit with highest
    # probability.
    output = interpreter.tensor(output_index)
    digit = np.argmax(output()[0])
    prediction_digits.append(digit)

  print('\n')
  # Compare prediction results with ground truth labels to calculate accuracy.
  prediction_digits = np.array(prediction_digits)
  accuracy = (prediction_digits == test_labels).mean()
  return accuracy

In [89]:
interpreter = tf.lite.Interpreter(model_content=tflite_quantaware_model)
interpreter.allocate_tensors()

test_accuracy = evaluate_model(interpreter)

print('Quant TFLite test_accuracy:', test_accuracy)
print('Quant TF test accuracy:', q_aware_model_accuracy)

Evaluated on 0 results so far.
Evaluated on 1000 results so far.
Evaluated on 2000 results so far.
Evaluated on 3000 results so far.
Evaluated on 4000 results so far.
Evaluated on 5000 results so far.
Evaluated on 6000 results so far.
Evaluated on 7000 results so far.
Evaluated on 8000 results so far.
Evaluated on 9000 results so far.


Quant TFLite test_accuracy: 0.9761
Quant TF test accuracy: 0.9790999889373779


## <span style='color: red;'>Comment on the results of this exercise:</span> ##


The results of this Quantization-Aware Training (QAT) exercise demonstrate that QAT improves model accuracy while significantly reducing model size. The baseline model achieved a test accuracy of **97.6%**, while the QAT model slightly improved to **98.17%**, showing that quantization-aware training helps maintain accuracy. Fine-tuning on a subset of data resulted in a minor accuracy drop to **97.91%**, which is still competitive. After converting to a TF Lite model with dynamic range quantization, the model size was reduced by over **4.3x** (from ~0.099MB to ~0.023MB), demonstrating efficient compression. The quantized TFLite model achieved a **97.61%** accuracy, closely matching the QAT-trained TensorFlow model (**97.91%**), proving that quantization had minimal impact on performance while significantly reducing the model footprint. This suggests that QAT is an effective approach for deploying models in resource-constrained environments while maintaining high accuracy.