<a href="https://colab.research.google.com/github/Shra1surya/embedded-ai-model-export/blob/main/day2_day3_tflite_conversion_and_inference.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 📦 ONNX → TFLite Conversion and Deployment Simulation

This notebook continues from the ONNX quantization project. It shows how to:
- Convert a model from ONNX to TensorFlow (via ONNX-TF)
- Convert TensorFlow model to TensorFlow Lite (TFLite)
- Simulate inference with TFLite Interpreter
- Compare size and latency

✅ This simulates embedded deployment without hardware.

In [8]:
!pip install --upgrade typeguard

Collecting typeguard
  Downloading typeguard-4.4.4-py3-none-any.whl.metadata (3.3 kB)
Downloading typeguard-4.4.4-py3-none-any.whl (34 kB)
Installing collected packages: typeguard
  Attempting uninstall: typeguard
    Found existing installation: typeguard 2.13.3
    Uninstalling typeguard-2.13.3:
      Successfully uninstalled typeguard-2.13.3
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
tensorflow-addons 0.23.0 requires typeguard<3.0.0,>=2.7, but you have typeguard 4.4.4 which is incompatible.[0m[31m
[0mSuccessfully installed typeguard-4.4.4


In [1]:
!pip install onnx tensorflow tf2onnx onnx-tf tflite-runtime


Collecting onnx
  Downloading onnx-1.18.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.9 kB)
Collecting tf2onnx
  Downloading tf2onnx-1.16.1-py3-none-any.whl.metadata (1.3 kB)
Collecting onnx-tf
  Downloading onnx_tf-1.10.0-py3-none-any.whl.metadata (510 bytes)
Collecting tflite-runtime
  Downloading tflite_runtime-2.14.0-cp311-cp311-manylinux2014_x86_64.whl.metadata (1.4 kB)
INFO: pip is looking at multiple versions of tf2onnx to determine which version is compatible with other requirements. This could take a while.
Collecting tf2onnx
  Downloading tf2onnx-1.16.0-py3-none-any.whl.metadata (1.2 kB)
  Downloading tf2onnx-1.15.1-py3-none-any.whl.metadata (1.2 kB)
  Downloading tf2onnx-1.15.0-py3-none-any.whl.metadata (1.2 kB)
  Downloading tf2onnx-1.14.0-py3-none-any.whl.metadata (1.2 kB)
  Downloading tf2onnx-1.13.0-py3-none-any.whl.metadata (1.2 kB)
  Downloading tf2onnx-1.12.1-py3-none-any.whl.metadata (1.2 kB)
  Downloading tf2onnx-1.12.0-py3-none-any.whl.me

In [3]:
!pip -q install tflite-runtime || echo "If this fails, we will use tf.lite.Interpreter from TensorFlow."

In [4]:
import tensorflow as tf, onnx, onnx_tf
print("TF:", tf.__version__)
print("onnx:", onnx.__version__)
print("onnx-tf:", onnx_tf.__version__)


TensorFlow Addons (TFA) has ended development and introduction of new features.
TFA has entered a minimal maintenance and release mode until a planned end of life in May 2024.
Please modify downstream libraries to take dependencies from other repositories in our TensorFlow community (e.g. Keras, Keras-CV, and Keras-NLP). 

For more information see: https://github.com/tensorflow/addons/issues/2807 

 The versions of TensorFlow you are currently using is 2.19.0 and is not supported. 
Some things might work, some things might not.
If you were to encounter a bug, do not file an issue.
If you want to make sure you're using a tested and supported configuration, either change the TensorFlow version or the TensorFlow Addons's version. 
You can find the compatibility matrix in TensorFlow Addon's readme:
https://github.com/tensorflow/addons


ModuleNotFoundError: No module named 'keras.src.engine'

In [10]:
# 1) Remove the thing causing the conflict
!pip -q uninstall -y tensorflow-addons

# 2) (Optional) sanity print of versions after removal
import tensorflow as tf, onnx
print("TF:", tf.__version__)
print("onnx:", onnx.__version__)


TF: 2.19.0
onnx: 1.18.0


✅ Train → Convert → Infer (clean path)
1) Train the TensorFlow model (1–2 epochs)

Paste this into a new Colab cell (even if you already created model_tf, this will (re)create+train it):

In [23]:
import tensorflow as tf
import numpy as np

# Load MNIST from Keras (already normalized after /255.0 below)
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
x_train = (x_train.astype("float32") / 255.0)[..., None]  # (N, 28, 28, 1)
x_test  = (x_test.astype("float32")  / 255.0)[..., None]

# Define model (same topology as before)
inputs = tf.keras.Input(shape=(28,28,1))
x = tf.keras.layers.Conv2D(8, 3, activation='relu')(inputs)
x = tf.keras.layers.MaxPool2D()(x)
x = tf.keras.layers.Conv2D(16, 3, activation='relu')(x)
x = tf.keras.layers.MaxPool2D()(x)
x = tf.keras.layers.Flatten()(x)
x = tf.keras.layers.Dense(50, activation='relu')(x)
outputs = tf.keras.layers.Dense(10)(x)  # logits
model_tf = tf.keras.Model(inputs, outputs)

# Compile & train briefly
model_tf.compile(
    optimizer='adam',
    loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=['accuracy']
)

history = model_tf.fit(
    x_train, y_train,
    epochs=2, batch_size=128, validation_split=0.1, verbose=1
)

test_loss, test_acc = model_tf.evaluate(x_test, y_test, verbose=0)
print(f"✅ Test accuracy after training: {test_acc:.4f}")

# Save SavedModel for TFLite conversion
model_tf.export("tf_model")
print("✅ Saved trained TF model at ./tf_model")


Epoch 1/2
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 45ms/step - accuracy: 0.7717 - loss: 0.8701 - val_accuracy: 0.9713 - val_loss: 0.1074
Epoch 2/2
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m18s[0m 38ms/step - accuracy: 0.9649 - loss: 0.1195 - val_accuracy: 0.9785 - val_loss: 0.0788
✅ Test accuracy after training: 0.9734
Saved artifact at 'tf_model'. The following endpoints are available:

* Endpoint 'serve'
  args_0 (POSITIONAL_ONLY): TensorSpec(shape=(None, 28, 28, 1), dtype=tf.float32, name='keras_tensor_41')
Output Type:
  TensorSpec(shape=(None, 10), dtype=tf.float32, name=None)
Captures:
  134514170737680: TensorSpec(shape=(), dtype=tf.resource, name=None)
  134514170734032: TensorSpec(shape=(), dtype=tf.resource, name=None)
  134514170736528: TensorSpec(shape=(), dtype=tf.resource, name=None)
  134514170733840: TensorSpec(shape=(), dtype=tf.resource, name=None)
  134514170736720: TensorSpec(shape=(), dtype=tf.resource, name=None)
  13451

In [11]:
import tensorflow as tf

# Tiny TF CNN similar to PyTorch one
inputs = tf.keras.Input(shape=(28,28,1))
x = tf.keras.layers.Conv2D(8, 3, activation='relu')(inputs)
x = tf.keras.layers.MaxPooling2D()(x)
x = tf.keras.layers.Conv2D(16, 3, activation='relu')(x)
x = tf.keras.layers.MaxPooling2D()(x)
x = tf.keras.layers.Flatten()(x)
x = tf.keras.layers.Dense(50, activation='relu')(x)
outputs = tf.keras.layers.Dense(10)(x)
model_tf = tf.keras.Model(inputs, outputs)

model_tf.export("tf_model")
print("✅ Fallback TF SavedModel created at ./tf_model")

Saved artifact at 'tf_model'. The following endpoints are available:

* Endpoint 'serve'
  args_0 (POSITIONAL_ONLY): TensorSpec(shape=(None, 28, 28, 1), dtype=tf.float32, name='keras_tensor_25')
Output Type:
  TensorSpec(shape=(None, 10), dtype=tf.float32, name=None)
Captures:
  134518221011088: TensorSpec(shape=(), dtype=tf.resource, name=None)
  134518317510160: TensorSpec(shape=(), dtype=tf.resource, name=None)
  134518317510928: TensorSpec(shape=(), dtype=tf.resource, name=None)
  134518277508752: TensorSpec(shape=(), dtype=tf.resource, name=None)
  134518277513168: TensorSpec(shape=(), dtype=tf.resource, name=None)
  134518317510736: TensorSpec(shape=(), dtype=tf.resource, name=None)
  134518277506832: TensorSpec(shape=(), dtype=tf.resource, name=None)
  134518277509136: TensorSpec(shape=(), dtype=tf.resource, name=None)
✅ Fallback TF SavedModel created at ./tf_model


In [25]:
import tensorflow as tf, os

converter = tf.lite.TFLiteConverter.from_saved_model("tf_model")
converter.optimizations = [tf.lite.Optimize.DEFAULT] # size/latency hints
tflite_model = converter.convert()

with open("mnist_model.tflite", "wb") as f:
    f.write(tflite_model)
print(" TFLite saved. Size:", os.path.getsize("mnist_model.tflite")/1024, "KB")

 TFLite saved. Size: 27.1640625 KB


## 🔎 Run TFLite Inference

Two ways to feed an image:

A) Use Keras test image (matches the training preprocessing exactly)


In [26]:
import numpy as np, tensorflow as tf
from torchvision.datasets import MNIST
from torchvision import transforms

# Load one test sample
test_tf = transforms.ToTensor()
mnist = MNIST(root='./data', train=False, download=True, transform=test_tf)
img, label = mnist[0]
x = img.unsqueeze(0).numpy().astype(np.float32)
x = np.transpose(x, (0,2,3,1))

# TFLite interpreter from TensorFlow
interpreter = tf.lite.Interpreter(model_path="mnist_model.tflite")
interpreter.allocate_tensors()
inp = interpreter.get_input_details()[0]['index']
out = interpreter.get_output_details()[0]['index']

# Pick a test sample
idx = 0
x = x_test[idx:idx+1].astype(np.float32)  # shape: (1,28,28,1)
y = y_test[idx]

interpreter.set_tensor(inp, x)
interpreter.invoke()
logits = interpreter.get_tensor(out)
pred = np.argmax(logits)
print(f"✅ TFLite Prediction: {pred} | Label: {y}")

✅ TFLite Prediction: 7 | Label: 7


    TF 2.20. Please use the LiteRT interpreter from the ai_edge_litert package.
    See the [migration guide](https://ai.google.dev/edge/litert/migration)
    for details.
    


B) Or use the torchvision sample (ensure NHWC + float32 in [0,1])

In [27]:
import numpy as np, tensorflow as tf
from torchvision.datasets import MNIST
from torchvision import transforms

test_tf = transforms.ToTensor()  # returns float32 in [0,1]
mnist = MNIST(root="./data", train=False, download=True, transform=test_tf)
img, label = mnist[0]                           # tensor [1,28,28]
x = img.unsqueeze(0).numpy().astype(np.float32) # [1,1,28,28]
x = np.transpose(x, (0,2,3,1))                  # → [1,28,28,1] NHWC

interpreter = tf.lite.Interpreter(model_path="mnist_model.tflite")
interpreter.allocate_tensors()
inp = interpreter.get_input_details()[0]['index']
out = interpreter.get_output_details()[0]['index']

interpreter.set_tensor(inp, x)
interpreter.invoke()
logits = interpreter.get_tensor(out)
pred = np.argmax(logits)
print(f"✅ TFLite Prediction: {pred} | Label: {label}")

✅ TFLite Prediction: 7 | Label: 7


You should now see the prediction match the label most of the time.

## ⏱️ Benchmark TFLite Inference Time

In [35]:
import time

def benchmark_tflite(interpreter, x, runs=100):
    input_index = interpreter.get_input_details()[0]["index"]
    output_index = interpreter.get_output_details()[0]["index"]
    start = time.time()
    for _ in range(runs):
        interpreter.set_tensor(input_index, x)
        interpreter.invoke()
        _ = interpreter.get_tensor(output_index)
    end = time.time()
    return (end - start) * 1000 / runs

avg_time = benchmark_tflite(interpreter, x)
print(f"📊 Average inference time (TFLite): {avg_time:.3f} ms")


📊 Average inference time (TFLite): 0.025 ms


(Optional) FP16 TFLite for smaller file

In [39]:
converter = tf.lite.TFLiteConverter.from_saved_model("tf_model")
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_types = [tf.float16]  # weights to FP16
tflite_fp16 = converter.convert()
open("mnist_model_fp16.tflite", "wb").write(tflite_fp16)

import os
print("✅ FP16 TFLite saved. Size:", os.path.getsize("mnist_model_fp16.tflite")/1024, "KB")


✅ FP16 TFLite saved. Size: 46.78125 KB


In [21]:
!ls -l mnist_model*


-rw-r--r-- 1 root root 47116 Aug  8 22:16 mnist_model_fp16.tflite
-rw-r--r-- 1 root root 27248 Aug  8 22:05 mnist_model.tflite


## 📘 Summary

- ONNX → TensorFlow conversion succeeded
- TensorFlow → TFLite conversion succeeded
- TFLite inference gave correct prediction
- Inference time benchmarked

This simulation is useful for embedded developers targeting TFLite Micro or ARM ML SDKs.