## Setup

In [53]:
import logging
logging.getLogger("tensorflow").setLevel(logging.DEBUG)

import tensorflow as tf
import numpy as np
print("TensorFlow version: ", tf.__version__)

TensorFlow version:  2.10.0


## Load Model

In [54]:
prefix_path = './CNN/'
model_name = '1230_005043'
path = prefix_path + model_name + '/Net.h5'
model = tf.keras.models.load_model(path,compile=False)

## Convert  model

Now you can convert the trained model to TensorFlow Lite format using the TensorFlow Lite [Converter](https://www.tensorflow.org/lite/models/convert), and apply varying degrees of quantization.

Beware that some versions of quantization leave some of the data in float format. So the following sections show each option with increasing amounts of quantization, until we get a model that's entirely int8 or uint8 data. (Notice we duplicate some code in each section so you can see all the quantization steps for each option.)

First, here's a converted model with no quantization:

In [55]:
converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()



INFO:tensorflow:Assets written to: C:\Users\Tiastly\AppData\Local\Temp\tmp4xg94wdf\assets


INFO:tensorflow:Assets written to: C:\Users\Tiastly\AppData\Local\Temp\tmp4xg94wdf\assets


It's now a TensorFlow Lite model, but it's still using 32-bit float values for all parameter data.

### Convert using dynamic range quantization


Now let's enable the default `optimizations` flag to quantize all fixed parameters (such as weights):

In [56]:
test_segments = np.load("test_segments.npy")
x_test = test_segments.reshape(test_segments.shape + (1,))
def representative_dataset_generator():
    for value in x_test:
    # Each scalar value must be inside of a 2D array that is wrapped in a list
        yield [np.array(value, dtype=np.float32, ndmin=3)]

In [57]:
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
# converter.target_spec.supported_ops = [tf.lite.OpsSet.EXPERIMENTAL_TFLITE_BUILTINS_ACTIVATIONS_INT16_WEIGHTS_INT8]
converter.representative_dataset = representative_dataset_generator
tflite_model_quant = converter.convert()


interpreter = tf.lite.Interpreter(model_content=tflite_model_quant)
input_type = interpreter.get_input_details()[0]['dtype']
print('input: ', input_type)
output_type = interpreter.get_output_details()[0]['dtype']
print('output: ', output_type)



INFO:tensorflow:Assets written to: C:\Users\Tiastly\AppData\Local\Temp\tmplljuofdy\assets


INFO:tensorflow:Assets written to: C:\Users\Tiastly\AppData\Local\Temp\tmplljuofdy\assets


input:  <class 'numpy.float32'>
output:  <class 'numpy.float32'>


The model is now a bit smaller with quantized weights, but other variable data is still in float format.

### Save the models as files

You'll need a `.tflite` file to deploy your model on other devices. So let's save the converted models to files and then load them when we run inferences below.

In [58]:
# Save the quantized model:
import pathlib
tflite_models_dir = pathlib.Path(prefix_path + model_name + '/')
tflite_models_dir.mkdir(exist_ok=True, parents=True)

# Save the unquantized/float model:
# tflite_model_file = tflite_models_dir/f"{model_name}_unquantized.tflite"
# tflite_model_file.write_bytes(tflite_model)
# Save the quantized model:
tflite_model_quant_file = tflite_models_dir/f"quantized.tflite"
tflite_model_quant_file.write_bytes(tflite_model_quant)

65856

xxd -i quantized.tflite > model.cpp