<table class="tfo-notebook-buttons" align="left">
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/ds-kiel/TinyML-Labs/blob/WS23-24/Lab2.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
  <td>
    <a target="_blank" href="https://github.com/ds-kiel/TinyML-Labs/blob/WS23-24/Lab2.ipynb"><img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" />View source on GitHub</a>
  </td>
  <td>
    <a href="https://raw.githubusercontent.com/ds-kiel/TinyML-Labs/WS23-24/Lab2.ipynb" download><img src="https://www.tensorflow.org/images/download_logo_32px.png" />Download notebook</a>
  </td>
</table>

---


Before starting, you must click on the "Copy To Drive" option in the top bar. Go to File --> Save a Copy to Drive. Name it *'Group\<Your group number\>_Lab1.ipynb'*. <ins>This is the master notebook so you will not be able to save your changes without copying it !</ins> Once you click on that, make sure you are working on that version of the notebook so that your work is saved.



---

---

THIS IS THE FINAL VERSION OF THE LAB!
---

---

# Lab 2: Quantization and On-Device Execution

In the first lab you looked at the first part of the pipeline from data to executing models on low-power devices. You explored how to train neural networks with Tensorflow and how to profile a model's memory usage for a specific micrcontroller. In this lab we continue the pipeline and you will explore how to [convert](https://www.tensorflow.org/lite/models/convert/convert_models) a model to a [Tensorfow Lite (TFLite)](https://www.tensorflow.org/lite) model, how to [quantize](https://www.tensorflow.org/lite/performance/post_training_integer_quant) [a model](https://www.tensorflow.org/model_optimization/guide/quantization/post_training), how to use [quantization-aware training](https://www.tensorflow.org/model_optimization/guide/quantization/training) and finally how to deploy the model and use the model with a microcontroller.

Like in the last lab, you once explore the full pipeline using a model trained on the [MNIST dataset](https://www.tensorflow.org/datasets/catalog/mnist). Afterward, you use your trained models of the second part of lab 1 and convert, deploy, and execute them on a microcontroller, specifically the [Arduino Nano 33 BLE Sense](https://store.arduino.cc/products/arduino-tiny-machine-learning-kit).





## Setup

In [None]:
# If you have not done so already, install the following dependencies
!python -m pip install tensorflow tensorflow-model-optimization scikit-learn edgeimpulse numpy matplotlib seaborn

### Imports

In [None]:
import numpy as np
import pandas as pd

import tensorflow as tf
from tensorflow import keras
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D, Dense, Flatten, Dropout
from keras.callbacks import EarlyStopping

# from tensorflow.lite import TFLiteConverter
import tensorflow_model_optimization as tfmot

from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
import edgeimpulse as ei

import matplotlib.pyplot as plt
import seaborn as sns

### Helper Functions

In [None]:
plt.style.use('seaborn-v0_8')

def plot_training_history(history, model_name):
    fig, (ax1, ax2) = plt.subplots(1, 2)
    fig.suptitle(f'Model {model_name}')
    fig.set_figwidth(15)

    ax1.plot(range(1, len(history.history['accuracy'])+1), history.history['accuracy'])
    ax1.plot(range(1, len(history.history['val_accuracy'])+1), history.history['val_accuracy'])
    ax1.set_title('Model accuracy')
    ax1.set(xlabel='epoch', ylabel='accuracy')
    ax1.legend(['training', 'validation'], loc='best')

    ax2.plot(range(1, len(history.history['loss'])+1), history.history['loss'])
    ax2.plot(range(1, len(history.history['val_loss'])+1), history.history['val_loss'])
    ax2.set_title('Model loss')
    ax2.set(xlabel='epoch', ylabel='loss')
    ax2.legend(['training', 'validation'], loc='best')
    plt.show()

### Edge Impulse API Key

Insert your Edge Impulse API Key as in Lab 1:

In [None]:
ei.API_KEY = "ei_dae2..." # Change this to your Edge Impulse API key

## MNIST

### Prepare the data

In [None]:
# Model / data parameters
labels = ["0", "1", "2", "3", "4", "5", "6", "7", "8", "9"]
num_classes = len(labels)
input_shape = (28, 28, 1)

# Load the data and split it between train and test sets
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

# Scale images to the [0, 1] range
x_train = x_train.astype("float32") / 255
x_test = x_test.astype("float32") / 255

# Make sure images have shape (28, 28, 1)
x_train = np.expand_dims(x_train, -1)
x_test = np.expand_dims(x_test, -1)
print("x_train shape:", x_train.shape)
print(x_train.shape[0], "train samples")
print(x_test.shape[0], "test samples")

# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

### Build the model

---
**Task 1:** Add and train your best MNIST model from lab 1, that has a small enough memory footprint to fit onto the target platform.

---

In [None]:
# Build MNIST model
def build_model_mnist(summary=False):
    model = Sequential()

    # ADD YOUR LAYERS HERE

    model.add(Dense(num_classes, activation='softmax'))

    # Compile model_mnist
    model.compile(
        optimizer='adam',
        loss='categorical_crossentropy',
        metrics=['accuracy']
    )

    if summary:
        model.summary()

    return model

In [None]:
model_mnist = build_model_mnist()

### Train the model



In [None]:
early_stopping_cb = EarlyStopping(
    monitor=...,
    patience=...,
    min_delta=...,
    mode=...
)

num_epochs = 200
history_mnist = model_mnist.fit(x_train, y_train, batch_size=128, epochs=num_epochs, validation_split=0.1, callbacks=[early_stopping_cb])
plot_training_history(history_mnist, 1)

### Save Model

In the previous lab, you always retrained a model when you continued working on the tasks. However, to come back to a model it might be useful to save it. We can use the `model.save()` [Function](https://www.tensorflow.org/guide/keras/serialization_and_saving) that exports a TensorFlow model object to SavedModel format.

If you use Google Colab, you can find the saved model as a `.keras`-file on the left under `Files/`.

In [None]:
export_path = 'model_mnist.keras'
model_mnist.save(export_path)

---
**Task 2:** Profile the memory usage of your model with Edge Impulse.

---

In [None]:
# ADD YOUR MEMORY ESTIMATION HERE!

### Model Quantization

Your microcontroller cannot use the Tensoflow model directly. Instead there is [Tensorflow Lite](https://www.tensorflow.org/lite) for deploying models on mobile and edge devices.

---
**Task 3:** Load your MNIST model and convert it with Tensorflow Lite and save the model to a `.tflite`-file. (HINT: Check out [this](https://github.com/tensorflow/tflite-micro/tree/main/tensorflow/lite/micro/examples/hello_world) *Hello World* example.)

**Task 4:** Create a second Tensorflow Lite conversion that uses [optimizations](https://www.tensorflow.org/lite/api_docs/python/tf/lite/Optimize) and enforce integer-only weights.
(Maybe a helpful [resource](https://www.tensorflow.org/lite/performance/post_training_quantization).)

**Task 5:** Create a third Tensorflow Lite conversion that in addition to the conversion in *Task 4* enforces integer-only quantization. (*Hint: Use a [representative dataset](https://www.tensorflow.org/lite/api_docs/python/tf/lite/RepresentativeDataset).*)

**Task 6:** Evaluate all three converted models and compare them to the Tensorflow model they are based on regarding profiled memory usage and accuracy. Use plots.

**Task 7:** Explain your findings from Task 6. Why is there such a difference in performance and in memory usage?

**Answer:** ...

---

In [None]:
# ADD YOUR MODEL CONVERSIONS HERE

In [None]:
# Save the converted model

with open('model.tflite', 'wb') as f:
    f.write(tflite_model)


### Quantization Aware Training

To improve on the performance of your converted models, you can use Quantization Aware Training before converting the model.

---
**Task 8:** Run the code below and explain the resulting model.

**Answer:** ...

---

In [None]:
model = keras.models.load_model('model_mnist.keras')
quantize_model = tfmot.quantization.keras.quantize_model

# q_aware stands for for quantization aware.
q_aware_model = quantize_model(model)

# `quantize_model` requires a recompile.
q_aware_model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

q_aware_model.summary()

---
**Task 9:** Train (fit) the model and save it for future use. Make sensible choices for the number of epochs. Show the training performance.

---

In [None]:
# TRAIN THE MODEL HERE

---
**Task 10:** Quantize this model. And save it.

**Task 11:** Evaluate the performance of the model after quantization-aware training and after additional quantization. Set it into perspective to the original model and the best quantized version of the original model. Compare memory usage and accuracy. Use plots. (You should compare 4 models here.)

**Task 12:** Explain your findings from *Task 11*.

**Answer:** ...

---

### Model Export - Library Creation

Up until now we created different models that we can test and evaluate using Python. However, most microcontrollers don't speak Python. Instead they work with C/C++ and thus we need a C(++) library of the models to execute it. Here you explore different ways to export your models to a C(++) library.

#### Manual conversion of the model

---
**Task 13:** Convert your best performing quantized model to a C++ library with the code below and explain the content of the two resulting files.

**Answer:** ...

---

In [None]:
!apt-get update && apt-get -qq install xxd

In [None]:
MODEL_TFLITE = 'model.tflite' #enter the name of your TFlite file uploaded to the folders section
MODEL_TFLITE_MICRO = 'model.cc' #update the name of your .cc file (This can be anything)
!xxd -i {MODEL_TFLITE} > {MODEL_TFLITE_MICRO}
REPLACE_TEXT = MODEL_TFLITE.replace('/', '_').replace('.', '_')
!sed -i 's/'{REPLACE_TEXT}'/g_model/g' {MODEL_TFLITE_MICRO}

In [None]:
LIBRARY_NAME = 'mnist_model'
max_label_str_length = max([len(lbl) for lbl in labels]) + 1

model_str = f"alignas(16) const unsigned char {LIBRARY_NAME}[] = "
with open(MODEL_TFLITE_MICRO, 'r') as file:
    data = file.read();
    model_str += data[data.index("{"): len(data)].replace("unsigned", "const")

labels_str = f"const char available_classes[][{max_label_str_length}] = {{"
for i in range(0, len(labels)):
    if i != 0:
        labels_str += ", "
    labels_str += "\""+labels[i]+"\""
labels_str += "};"

output_str = f"#include \"{LIBRARY_NAME}.h\"\n"
output_str += labels_str + "\n"
output_str += "const int available_classes_num = "+str(len(labels)) +";\n"
output_str += model_str

with open(f"{LIBRARY_NAME}.cpp", "w") as file:
    file.write(output_str)

header_str = "#ifndef TENSORFLOW_LITE_MODEL_H_\n#define TENSORFLOW_LITE_MODEL_H_\n\n"
header_str += "// Classes that can be detected by the neural network\n"
header_str += f"extern const char available_classes[][{max_label_str_length}];\n"
header_str += "extern const int available_classes_num;\n\n"
header_str += "// Pre-trained netural network\n"
header_str += f"extern const unsigned char {LIBRARY_NAME}[];\n"
header_str += f"extern const int {LIBRARY_NAME}_len;\n\n"
header_str += "#endif /* TENSORFLOW_LITE_MODEL_H_ */"

with open(f"{LIBRARY_NAME}.h", "w") as file:
    file.write(header_str)


Next you will use your library in an Arduino program to (or if you prefer, in a Zephyr program) and execute the [inference on a microcontroller](https://www.tensorflow.org/lite/microcontrollers/get_started_low_level). I strongly recommend you to use [this](https://docs.arduino.cc/tutorials/nano-33-ble-sense/get-started-with-machine-learning) Arduino example as a starting point to write the code. (If you prefer to use Zephyr, have a look at [this](https://github.com/ds-kiel/blueseer/) repository.)

---
**Task 14:** Write an Arduino program that takes a representation (e.g., an array of values) of an MNIST image as input (through the serial interface), classifies the image and reports the result back to you through the serial interface. *If you like, write a companion program on that performs the image transformation and serial communication.*

**Task 15:** Upload the program to the Arduino and compare the real memory usage with the Edge Impulse estimate. Was the estimate correct? How much does it differ?

**Answer:** ...

**Task 16:** Extend your Arduino program and measure the inference time on the Arduino. Was the estimate correct?

**Answer:** ...

**Task 17:** Perform inference for at least 20 images (more is better) and plot statistics (e.g., bar plot (mean) with error bar (standard deviation)) for the inference time. Does it vary? Why or why not?

**Answer:** ...



---

#### Model conversion with Edge Impulse (Optional task)

In the last lab, you used the profiling capabilities of Edge Impulse. Now you will explore how to use Edge Impulse to [create a library](https://docs.edgeimpulse.com/docs/tools/edge-impulse-python-sdk) to deploy your model. First, check the available target devices for deployment (`ei.model.list_deployment_targets()`) and find the correct Arduino corresponding to your hardware.

---
**Task 18 (optional):** Create an Arduino Library with Edge Impulse and answer Tasks 14-17 for this method. Doing this is a good preperation for Lab 3, in which we will dive deeper into Edge Impulse.

---

## FashionMNIST or CIFAR10

---
**Tasks:**
- **Task 19:** Use a well performing model from lab 1 and extend it with quantization aware training, quantize it, and convert it to a C++ library.
- **Task 20:** Evaluate and compare the performance of all the intermedite models (including the input model and the output model) regarding accuracy, memory consumption and inference time.
- **Task 21:** Create an Arduino program that takes an image as serial input and outputs the prediction of the class of the image (e.g., 'truck' if you use CIFAR10).
- **Task 22:** Measure the memory consumption on the Arduino.  Does it match the predicted values?
- **Task 23:** Measure the inference time on the Arduino (for multiple (>20) images). Does it match the predicted value?

---