# CS 677 Lab 4 - AI/ML (and Semi-Cloud)

Welcome to the final lab of the course! In this lab, we will explore deploying an AI model on an edge device. You'll notice that training models for embedded edge devices is quite similar to training traditional models, but with a few additional parameters and considerations.

For this lab, you will create a Fitbit-like activity recognition band that utilizes a neural network (NN) along with an accelerometer and gyroscope to classify various activities. We will use TensorFlow Lite to develop our models.
Considerations for Edge ML:

- Models need to be small and optimized for limited resources.
- Not all edge devices have floating-point (FP) arithmetic logic units (ALUs).
- Input data storage must be efficiently managed due to space constraints.

Finally, we will integrate ThingsBoard to create a dashboard for your Fitbit-like activity tracker.

In [None]:
import numpy as np
import tensorflow as tf
from tensorflow.keras.utils import to_categorical
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder

## Dataset

You are provied a modifled version of a Human Activity Recognition Dataset (We added some standing position data to the dataset). This dataset contains data from **33 volunteers** with demographic attributes including **gender**, **height**, **weight**, and **age**. Volunteers wore the wristband named **BandX** on their dominant hand while performing specific activities. The **MPU-6050** sensor module captured data from the **accelerometer** and **gyroscope**. This dataset captures human activity data for the following **seven types of activities**:  

1. **Walking** (Wa)  
2. **Jogging** (J)  
3. **Typing** (T)  
4. **Writing** (Wr)  
5. **Upstairs movement** (U)  
6. **Downstairs movement** (D)  
7. **Cycling** (C) 

We then modified the dataset and added a few samples of **Standing** (S), and removed **Upstairs movement** (U) and **Downstairs movement** (D)   for better working of the application.
The sensor data includes Accelerometer (`ax`, `ay`, `az`) and Gyroscope (`gx`, `gy`, `gz`) data collected using the **MPU-6050** sensor, sampling at **20 Hz**.

The processed dataset has been created by take **2 second samples** of an activity (2 second sample at 20Hz means that each input tensor is of 40 readings).

## Task 1.1
Load the dataset using NumPy and extract the train, validation and test sets. The given file is in NPZ format, so you can extract the data using its keys. (Print the loaded data temporarily for clarity).

After extracting the sets, ensure that all the data in the feature sets are of type `float32`. Additionally, one-hot encode the target sets. We advise using sklearn's `LabelEncoder` and ` to_categorical` from `keras.utils`, but its up to you.

The dataset has already been normalized and filtered for you.

Also, make a note of how the encodings correspond to specific activities (i.e., which encoding represents which activity).

In [None]:
encoder = LabelEncoder()
# Your code here

# Ensure data is numerical

# Convert labels to one-hot encoding

# Make a note of the class the class
print("Classes and their encoding:")
for i, label in enumerate(encoder.classes_):
    print(f"{label} -> {i}")

x_train.shape


Now that we have loaded our dataset, let's design and train our model. We will be training a **Neural Network** for our use case using the **TensorFlow Keras** library to construct the layers of our model. 

The `keras.layers` module provides a variety of different layers, which you can explore in the [official documentation](https://www.tensorflow.org/api_docs/python/tf/keras/layers).

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv1D, MaxPooling1D, Flatten, Dense, Dropout, BatchNormalization, LSTM, SeparableConv1D

## Task 1.2 Creating and Training NN
Select a sequence of layers for your neural network. Use your prior knowledge of NN architectures to construct what you believe would be a suitable model.  

The `keras.layers` library offers many different layers to choose from, so feel free to experiment with different sequences.  

*(For example, in my attempt, I used `Conv1D`, `MaxPooling1D`, `Flatten`, `Dense`, `Dropout`, and `BatchNormalization`, but you are free to try different configurations!)*

In [None]:

# Model Architecture
model = Sequential([
    # Example of how to define the layers. Change these out with your own set of layers.
    Flatten(input_shape=(x_train.shape[1], x_train.shape[2])),  # Flatten (40,6) to 240
    Dense(y_train.shape[1], activation='softmax')

    ])


Using the Keras Sequential model defined above, proceed with the following steps:  

1. Compile the model:  
   - Choose an appropriate **optimizer** (e.g., `Adam` or `SGD`).  
   - Use an appropriate **loss function** (e.g., `categorical_crossentropy` for multi-class classification).  
   - Select **metrics** to evaluate the model’s performance (e.g., `accuracy`).  

2. **Train** the model:  
   - Use the `.fit()` function with training data.  
   - Set an appropriate number of **epochs** and **batch size**.  
   - Use **validation data** to monitor performance.  

3. **Evaluate** the model:  
   - Use the `.evaluate()` function on the test set to measure final accuracy and loss.  


In [None]:

# Compile the model
model.compile(optimizer=, loss=, metrics=[])

# Train the model
history = model.fit(
    x_train, y_train,
    validation_data=(x_val, y_val),
    epochs=5,
    batch_size=32,
    verbose=1
)

# Evaluate the model
eval_results = model.evaluate(x_test, y_test, verbose=1)
print(f"Test Accuracy: {eval_results[1] * 100:.2f}%")

In [None]:
# Save the model
model.save("trained_model.h5")

In this section, evaluate your model’s classification report (the code for this has already been provided). Is the precision and recall for all target classes above the required threshold (90%)? Is the overall accuracy over 95%? If not, can you make adjustments can we make to improve the model’s performance?

In [None]:
from sklearn.metrics import precision_score, recall_score, f1_score, classification_report, accuracy_score

def evaluate_model(model, x_test, y_test, encoder=None):
    """
    Evaluate the model on test data and compute precision, recall, and F1-score.

    Parameters:
    - model: Trained model.
    - x_test: Test features.
    - y_test: True labels for test data (categorical or one-hot encoded).
    - encoder: LabelEncoder instance for decoding labels (optional).

    Returns:
    - Classification report with precision, recall, and F1-score.
    """

    # Get predictions
    y_pred = model.predict(x_test)
    y_pred_classes = np.argmax(y_pred, axis=1)

    # If y_test is one-hot encoded, decode it
    if len(y_test.shape) > 1 and y_test.shape[1] > 1:
        y_test = np.argmax(y_test, axis=1)

    # Compute metrics
    precision = precision_score(y_test, y_pred_classes, average='weighted')
    recall = recall_score(y_test, y_pred_classes, average='weighted')
    f1 = f1_score(y_test, y_pred_classes, average='weighted')

    # Print metrics
    print("Classification Report:")
    print(classification_report(y_test, y_pred_classes))
    print(f"Precision: {precision:.2f}")
    print(f"Recall: {recall:.2f}")
    print(f"F1 Score: {f1:.2f}")

# Evaluate the model
evaluate_model(model, x_test, y_test, encoder=encoder)


## Task 1.3 TF Lite

Now that you have created, trained, and saved your model, let's convert it into a **TensorFlow Lite (.tflite)** format. TensorFlow Lite is a lightweight version of TensorFlow designed for running machine learning models on edge devices such as microcontrollers, mobile phones, and IoT devices. It enables fast inference with low power consumption and reduced memory footprint.

The following block convert the trained model into a .tflite file.

In [None]:
# Load the trained model
model = tf.keras.models.load_model("trained_model.h5")

# Convert the model to TensorFlow Lite format
converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()

# Save the converted model to a .tflite file
with open("unquantized_model.tflite", "wb") as f:
    f.write(tflite_model)

One of the most powerful features of TensorFlow Lite (TFLite) is its ability to perform **quantization**. Quantization significantly reduces the **memory footprint** of the model and decreases the **computational cycles** required for inference, making it highly efficient for deployment on edge devices.  

However, quantization comes with a trade-off. The process introduces a certain level of **information loss**, which can negatively impact the model's accuracy. 

In this task, your goal is to **quantize the model** so that its size remains **below 100 KB** (size of .h file). When converting a TensorFlow model to TFLite, additional parameters and conditions can be applied to the **converter object** to modify the model. The TensorFlow Lite **TFLiteConverter API** provides various options for optimizing model size and performance. You can explore the available attributes and methods [here](https://www.tensorflow.org/api_docs/python/tf/lite/TFLiteConverter).  

One of the most useful attributes is **`target_spec`**, which allows control over the **data types** used in computations. By leveraging this, you can convert certain layers from floating-point precision to integer values, significantly reducing model size while maintaining acceptable accuracy. More details on this process can be found in [this guide](https://www.tensorflow.org/api_docs/python/tf/lite/TargetSpec).  

Your task is to experiment with **different quantization techniques** and apply modifications to ensure the model remains within the required size constraints while retaining as much accuracy as possible.

In [None]:
# Load the trained model
model = tf.keras.models.load_model("trained_model.h5")

# Convert to fully quantized TFLite model (int8)
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]

# Your code here (Addional method calls)





converter._experimental_disable_per_channel_quantization = True
converter._experimental_disable_per_channel_quantization_for_dense_layers = True

# Convert and save
tflite_model = converter.convert()
with open("quantized_model.tflite", "wb") as f:
    f.write(tflite_model)

To run your model on your esp you need to **convert** the tflite into a **.h file** using ``xxd -i model.tflite > model.h`` on unix systems or by using the following code block.

In [None]:
# Read .tflite model
with open("quantized_model.tflite", "rb") as f:
    model_bytes = f.read()

# Convert to C array format
c_array = ", ".join(f"0x{b:02x}" for b in model_bytes)

# Write to .h file
with open("model.h", "w") as f:
    f.write("#ifndef MODEL_H\n#define MODEL_H\n\n")
    f.write(f"const unsigned char model_tflite[] = {{ {c_array} }};\n")
    f.write(f"const unsigned int model_tflite_len = {len(model_bytes)};\n")
    f.write("#endif // MODEL_H\n")


Here is some code to evaluate the accuracy of your quntized model.

In [None]:
# Load the TFLite model
interpreter = tf.lite.Interpreter(model_path="quantized_model.tflite")
interpreter.allocate_tensors()

# Get input and output tensors
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

# Load test data (assuming X_test and y_test are already preprocessed)
x_test = x_test.astype(np.float32)  # Ensure correct data type
y_true = np.argmax(y_test, axis=1)  # Convert one-hot to class indices

# Run inference on test dataset
y_pred = []
for i in range(len(x_test)):
    # Set the input tensor
    interpreter.set_tensor(input_details[0]['index'], x_test[i:i+1])
    
    # Run inference
    interpreter.invoke()
    
    # Get output tensor and store prediction
    output_data = interpreter.get_tensor(output_details[0]['index'])
    y_pred.append(np.argmax(output_data))

# Calculate accuracy
accuracy = accuracy_score(y_true, y_pred)
print(f"TFLite Model Accuracy: {accuracy:.4f}")

***A concluding point***

Why do we need such a small model? Well, the ESP32-S3 dev board that you have is quite featureful. It has 16 MB of flash, a powerful dual-core CPU, and a floating point unit. These are by no means guarantees in the IoT world. Additionally, utilizing all these resources to their fullest is quite expensive in terms of energy consumption.

That's why it's really important to minimize the model's memory footprint as much as possible.

In [None]:
with open("model.tflite", "rb") as f:
    model_content = f.read()

tf.lite.experimental.Analyzer.analyze(model_content=model_content)


# Fin.