## Model compression demo

This notebook demonstrates model compression through quantization using TFLite. We trained a ResNet50 mask/no-mask model to demonstrate this, which can be found in ../data/classifier_model_weights/resnet50_classifier.h5. Of course you are free to train your own model using the train-mask-nomask notebook.

In [8]:
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.applications.resnet50 import preprocess_input

from sklearn import metrics 
from pathlib import Path

import numpy as np

### Extract test data and set up generator

In [9]:
test_dir = Path('../data/test')
model_dir = Path('../data/classifier_model_weights')

In [10]:
target_size = (112,112)
batch_size = 32

test_datagen = ImageDataGenerator(preprocessing_function=preprocess_input)
test_generator = test_datagen.flow_from_directory(test_dir,
                                                  target_size=target_size,
                                                  batch_size=batch_size,
                                                  class_mode='binary',
                                                  classes=['not_masked', 'masked'],
                                                  shuffle=False)

Found 268 images belonging to 2 classes.


### Load the original model and check accuracy

In [11]:
model = tf.keras.models.load_model(str(model_dir / 'resnet50_classifier.h5'))
preds = [x[0] > 0.5 for x in model.predict(test_generator)]
acc = metrics.accuracy_score(test_generator.classes, preds)
print(f"The original model accuracy = {acc:.3f}")

The original model accuracy = 0.907


### Convert to tflite model and check accuracy

In [12]:
converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()
with open(model_dir / 'resnet50_classifier.tflite', 'wb') as f:
    f.write(tflite_model)

Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.
Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.
INFO:tensorflow:Assets written to: /tmp/tmp40i8qnqo/assets


We see that the TFlite file is slightly smaller than the original .h5 file, but this is only due to the format conversion. No compression is done at this point.

In [13]:
!ls -lh $model_dir

total 199M
-rw-r--r-- 1 toon toon 15M oct.  22  2020 best.h5
-rw-rw-r-- 1 toon toon 94M oct.  30 18:10 resnet50_classifier.h5
-rw-rw-r-- 1 toon toon 91M déc.  23 15:22 resnet50_classifier.tflite


As this is still the same model but just in a different format, we expect to see the same accuracy

In [14]:
interpreter = tf.lite.Interpreter(str(model_dir / 'resnet50_classifier.tflite'))

input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

interpreter.allocate_tensors()

preds = []
for batch_idx in range(len(test_generator)):
    for img in test_generator[batch_idx][0]:
        interpreter.set_tensor(input_details[0]['index'], np.expand_dims(img, axis=0))
        interpreter.invoke()
        output_data = interpreter.get_tensor(output_details[0]['index'])
        preds.append(output_data[0][0])
preds = [x > 0.5 for x in preds]

tflit_acc = metrics.accuracy_score(test_generator.labels, preds)
print(f"The TFlite model accuracy = {acc:.3f}")

The TFlite model accuracy = 0.907


### Dynamic range quantization

We do the same as before, but now enabling the default optimization. This will result in weights being quantized to 8 bit precision.

In [15]:
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
tflite_model = converter.convert()
with open(model_dir / 'resnet50_classifier_quantized.tflite', 'wb') as f:
    f.write(tflite_model)

INFO:tensorflow:Assets written to: /tmp/tmpisa9sfso/assets


INFO:tensorflow:Assets written to: /tmp/tmpisa9sfso/assets


In [16]:
!ls -lh $model_dir

total 221M
-rw-r--r-- 1 toon toon 15M oct.  22  2020 best.h5
-rw-rw-r-- 1 toon toon 94M oct.  30 18:10 resnet50_classifier.h5
-rw-rw-r-- 1 toon toon 23M déc.  23 15:24 resnet50_classifier_quantized.tflite
-rw-rw-r-- 1 toon toon 91M déc.  23 15:22 resnet50_classifier.tflite


We can see that, as expected, the quantized model takes up ca. 1/4 of the disk space.
Let's check the accurcay of this model as well.

In [17]:
interpreter = tf.lite.Interpreter(str(model_dir / 'resnet50_classifier_quantized.tflite'))
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
interpreter.allocate_tensors()

preds = []
for batch_idx in range(len(test_generator)):
    for img in test_generator[batch_idx][0]:
        interpreter.set_tensor(input_details[0]['index'], np.expand_dims(img, axis=0))
        interpreter.invoke()
        output_data = interpreter.get_tensor(output_details[0]['index'])
        preds.append(output_data[0][0])

preds = [x > 0.5 for x in preds]
tflite_acc = metrics.accuracy_score(test_generator.labels, preds)
print(f"The quantized TFlite model accuracy = {tflite_acc:.3f}")

The quantized TFlite model accuracy = 0.918


### Benchmark RAM memory usage

We use the TFLite benchmark tool to compare inference memory usage.

In [18]:
!wget https://storage.googleapis.com/tensorflow-nightly-public/prod/tensorflow/release/lite/tools/nightly/latest/linux_x86-64_benchmark_model
!chmod +x linux_x86-64_benchmark_model

--2021-12-23 15:24:54--  https://storage.googleapis.com/tensorflow-nightly-public/prod/tensorflow/release/lite/tools/nightly/latest/linux_x86-64_benchmark_model
Resolving storage.googleapis.com (storage.googleapis.com)... 2a00:1450:400e:80c::2010, 2a00:1450:400e:811::2010, 2a00:1450:400e:80e::2010, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|2a00:1450:400e:80c::2010|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3715136 (3,5M) [application/octet-stream]
Saving to: ‘linux_x86-64_benchmark_model.1’


2021-12-23 15:24:55 (8,36 MB/s) - ‘linux_x86-64_benchmark_model.1’ saved [3715136/3715136]



In [19]:
non_compressed_model_path = (model_dir / 'resnet50_classifier.tflite').as_posix()
!./linux_x86-64_benchmark_model --graph=$non_compressed_model_path --num_threads=4

STARTING!
Log parameter values verbosely: [0]
Num threads: [4]
Graph: [../data/classifier_model_weights/resnet50_classifier.tflite]
#threads used for CPU inference: [4]
Loaded model ../data/classifier_model_weights/resnet50_classifier.tflite
The input model file size (MB): 95.0199
Initialized session in 0.539ms.
Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.
count=1 curr=645537

Running benchmark for at least 50 iterations and at least 1 seconds but terminate if exceeding 150 seconds.
count=50 first=736118 curr=840687 min=610993 max=1034606 avg=777758 std=118600

Inference timings in us: Init: 539, First inference: 645537, Warmup (avg): 645537, Inference (avg): 777758
Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion.
Memory footprint delta from the start of the tool (MB): init=3.11328 

In [20]:
compressed_model_path = (model_dir / 'resnet50_classifier_quantized.tflite').as_posix()
!./linux_x86-64_benchmark_model --graph=$compressed_model_path --num_threads=4

STARTING!
Log parameter values verbosely: [0]
Num threads: [4]
Graph: [../data/classifier_model_weights/resnet50_classifier_quantized.tflite]
#threads used for CPU inference: [4]
Loaded model ../data/classifier_model_weights/resnet50_classifier_quantized.tflite
The input model file size (MB): 23.8721
Initialized session in 0.92ms.
Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.
count=3 first=245132 curr=169779 min=168356 max=245132 avg=194422 std=35861

Running benchmark for at least 50 iterations and at least 1 seconds but terminate if exceeding 150 seconds.
count=50 first=177161 curr=173635 min=167548 max=219950 avg=173387 std=7466

Inference timings in us: Init: 920, First inference: 245132, Warmup (avg): 194422, Inference (avg): 173387
Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretio