<a href="https://colab.research.google.com/github/AhmedFarrukh/DeepLearning-EdgeComputing/blob/main/PreliminaryExperiment_ResNet101V2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In this notebook, I will load the pretrained ResNet101V2 model, apply Post-training Dynamic Range Quantization, and then measure the approximate change in inference time.

First, we import the neccessary libraries.

In [None]:
import tensorflow as tf
from PIL import Image
import numpy as np
import os
import sys
import time
import numpy as np


Next, we load the ResNet101V2 Model.

In [None]:
INPUT_IMG_SIZE = 224
INPUT_IMG_SHAPE = (224, 224, 3)
model = tf.keras.applications.ResNet101V2(
  input_shape=INPUT_IMG_SHAPE
)

First, we convert the model to TensorFlow Lite format, without any optimization.

In [None]:
converter = tf.lite.TFLiteConverter.from_keras_model(model)

tflite_model = converter.convert()

Next, we convert the model to TensorFlow Lite format, with Dynamic Range Quantization.

In [None]:
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]

tflite_model_quant = converter.convert()

Write it out to tflite file

In [None]:
import pathlib

tflite_models_dir = pathlib.Path("/tmp/tflite_models/")
tflite_models_dir.mkdir(exist_ok=True, parents=True)

# Save the unquantized/float model:
tflite_model_file = tflite_models_dir/"model.tflite"
tflite_model_file.write_bytes(tflite_model)
# Save the quantized model:
tflite_model_quant_file = tflite_models_dir/"model_quant.tflite"
tflite_model_quant_file.write_bytes(tflite_model_quant)

45646584

Let's check the sizes of the tflite models

In [None]:
!ls -lh {tflite_models_dir}

total 214M
-rw-r--r-- 1 root root  44M Jun 21 13:33 model_quant.tflite
-rw-r--r-- 1 root root 171M Jun 21 13:33 model.tflite


Next, I upload a sample file.

In [None]:
from google.colab import files

first_time = True
if(first_time):
  uploaded = files.upload()

  for fn in uploaded.keys():
    print('User uploaded file "{name}" with length {length} bytes'.format(
        name=fn, length=len(uploaded[fn])))

Saving parrot.jpg to parrot (1).jpg
User uploaded file "parrot (1).jpg" with length 596977 bytes


Next, use the model to run an inference on the sample image.

In [None]:
image_path = fn
img = Image.open(image_path).convert('RGB')
img = img.resize((224, 224), Image.BICUBIC)
input_data = np.array(img, dtype=np.float32) / 255.0
input_data = input_data.reshape(1, 224, 224, 3)

Load the labels:

In [None]:
url = tf.keras.utils.get_file(
    'ImageNetLabels.txt',
    'https://storage.googleapis.com/download.tensorflow.org/data/ImageNetLabels.txt')
imagenet_labels = np.array(open(url).read().splitlines())[1:]

First, load the interpreters.

In [None]:
interpreter = tf.lite.Interpreter(model_path=str(tflite_model_file))
interpreter.allocate_tensors()

interpreter_quant = tf.lite.Interpreter(model_path=str(tflite_model_quant_file))
interpreter_quant.allocate_tensors()

Next, run the inference using the un-optimized model.

In [None]:
import time
input_index = interpreter.get_input_details()[0]["index"]
output_index = interpreter.get_output_details()[0]["index"]

interpreter.set_tensor(input_index, input_data)
interpreter.invoke()
start_time = time.time()
interpreter.invoke()
print(time.time() - start_time)
predictions = interpreter.get_tensor(output_index)
print(imagenet_labels[np.argmax(predictions[0])])

0.3248581886291504
macaw


Finally, run the inference using the quantized model.

In [None]:
input_index = interpreter_quant.get_input_details()[0]["index"]
output_index = interpreter_quant.get_output_details()[0]["index"]

interpreter_quant.set_tensor(input_index, input_data)
interpreter_quant.invoke()
start_time = time.time()
interpreter_quant.invoke()
print(time.time() - start_time)
predictions = interpreter_quant.get_tensor(output_index)
print(imagenet_labels[np.argmax(predictions[0])])

0.5406627655029297
macaw


Now, we use the validation set from the imagenette dataset to run inference on the two models. First, we prepare the dataset.

In [None]:
!wget https://s3.amazonaws.com/fast-ai-imageclas/imagenette2-320.tgz
!tar -xzf imagenette2-320.tgz

label_map = dict(
    n01440764='tench',
    n02102040='English springer',
    n02979186='cassette player',
    n03000684='chain saw',
    n03028079='church',
    n03394916='French horn',
    n03417042='garbage truck',
    n03425413='gas pump',
    n03445777='golf ball',
    n03888257='parachute'
)


def preprocess_image(file_path):
    # Open the image file
    img = Image.open(file_path).convert('RGB')

    # Resize using BICUBIC interpolation
    img = img.resize((224, 224), Image.BICUBIC)

    # Convert to NumPy array and normalize the pixel values
    input_data = np.array(img, dtype=np.float32) / 255.0

    # Reshape for the model input
    input_data = input_data.reshape(1, 224, 224, 3)
    return input_data

def load_and_preprocess_from_path_label(path):
    return preprocess_image(path)

# Set the directory path for validation data
val_dir = 'imagenette2-320/val'

# List dataset paths
val_image_paths = [os.path.join(dp, f) for dp, dn, filenames in os.walk(val_dir) for f in filenames if os.path.splitext(f)[1].lower() in ['.png','.jpg','.jpeg']]

# Create dataset from paths
val_images = tf.data.Dataset.from_tensor_slices(val_image_paths)

# Apply preprocessing using a map with the numpy function to ensure correct application
val_images_preprocessed = val_images.map(lambda x: tf.numpy_function(load_and_preprocess_from_path_label, [x], tf.float32))


--2024-06-21 13:20:48--  https://s3.amazonaws.com/fast-ai-imageclas/imagenette2-320.tgz
Resolving s3.amazonaws.com (s3.amazonaws.com)... 52.216.179.125, 52.217.141.192, 54.231.171.16, ...
Connecting to s3.amazonaws.com (s3.amazonaws.com)|52.216.179.125|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 341663724 (326M) [application/x-tar]
Saving to: ‘imagenette2-320.tgz’

imagenette2-320.tgz  31%[=====>              ] 101.27M  16.7MB/s    eta 17s    ^C

gzip: stdin: unexpected end of file
tar: Unexpected EOF in archive
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now


Next, we carry out inference using the un-quantized model:

In [None]:
infer_time = 0
for image in val_images_preprocessed:
  input_index = interpreter.get_input_details()[0]["index"]
  output_index = interpreter.get_output_details()[0]["index"]

  interpreter.set_tensor(input_index, image)
  start_time = time.time()
  interpreter.invoke()
  infer_time += time.time() - start_time
  predictions = interpreter.get_tensor(output_index)
print("avg infer_time: ", infer_time/len(val_images_preprocessed))

KeyboardInterrupt: 

Finally, we carry out inference using the quantized model:

In [None]:
infer_time = 0
for image in val_images_preprocessed.take(1000):
  input_index = interpreter_quant.get_input_details()[0]["index"]
  output_index = interpreter_quant.get_output_details()[0]["index"]

  interpreter_quant.set_tensor(input_index, image)
  start_time = time.time()
  interpreter_quant.invoke()
  infer_time += time.time() - start_time
  predictions = interpreter_quant.get_tensor(output_index)
print("avg infer_time: ", infer_time/len(val_images_preprocessed))

In order to test the differences in models before and after quantization, the TFlite benchmark can be used. First, download the benchmark binary.

In [None]:
!mkdir /tmp/benchmark
!wget https://storage.googleapis.com/tensorflow-nightly-public/prod/tensorflow/release/lite/tools/nightly/latest/linux_x86-64_benchmark_model -P /tmp/benchmark
!chmod +x /tmp/benchmark/linux_x86-64_benchmark_model

mkdir: cannot create directory ‘/tmp/benchmark’: File exists
--2024-06-21 13:38:51--  https://storage.googleapis.com/tensorflow-nightly-public/prod/tensorflow/release/lite/tools/nightly/latest/linux_x86-64_benchmark_model
Resolving storage.googleapis.com (storage.googleapis.com)... 74.125.204.207, 64.233.187.207, 64.233.188.207, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|74.125.204.207|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 6237624 (5.9M) [application/octet-stream]
Saving to: ‘/tmp/benchmark/linux_x86-64_benchmark_model.2’


2024-06-21 13:38:52 (5.72 MB/s) - ‘/tmp/benchmark/linux_x86-64_benchmark_model.2’ saved [6237624/6237624]



Next, benchmark the original model:

In [None]:
!/tmp/benchmark/linux_x86-64_benchmark_model \
  --graph=/tmp/tflite_models/model.tflite \
  --num_threads=1

INFO: STARTING!
INFO: Log parameter values verbosely: [0]
INFO: Num threads: [1]
INFO: Graph: [/tmp/tflite_models/model.tflite]
INFO: Signature to run: []
INFO: #threads used for CPU inference: [1]
INFO: Loaded model /tmp/tflite_models/model.tflite
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
INFO: The input model file size (MB): 178.341
INFO: Initialized session in 1231.85ms.
INFO: Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.
INFO: count=1 curr=511374

INFO: Running benchmark for at least 50 iterations and at least 1 seconds but terminate if exceeding 150 seconds.
INFO: count=50 first=515025 curr=286101 min=284629 max=515025 avg=334084 std=78671

INFO: Inference timings in us: Init: 1231851, First inference: 511374, Warmup (avg): 511374, Inference (avg): 334084
INFO: Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runt

Finally, benchmark the quantified model:

In [None]:
!/tmp/benchmark/linux_x86-64_benchmark_model \
  --graph=/tmp/tflite_models/model_quant.tflite \
  --num_threads=1

INFO: STARTING!
INFO: Log parameter values verbosely: [0]
INFO: Num threads: [1]
INFO: Graph: [/tmp/tflite_models/model_quant.tflite]
INFO: Signature to run: []
INFO: #threads used for CPU inference: [1]
INFO: Loaded model /tmp/tflite_models/model_quant.tflite
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
INFO: The input model file size (MB): 45.6466
INFO: Initialized session in 144.042ms.
INFO: Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.
INFO: count=2 first=289289 curr=278135 min=278135 max=289289 avg=283712 std=5577

INFO: Running benchmark for at least 50 iterations and at least 1 seconds but terminate if exceeding 150 seconds.
INFO: count=50 first=285218 curr=274631 min=268874 max=455319 avg=308401 std=61370

INFO: Inference timings in us: Init: 144042, First inference: 289289, Warmup (avg): 283712, Inference (avg): 308401
INFO: Note: as the benchmark tool itself affects memory footprint, the following is on