# Visual Wake Words (VWW) Project - Phase 3: Quantization & TFLite Conversion
**Goal:** Convert the trained MobileNetV2 model to TensorFlow Lite, apply Int8 quantization, and evaluate performance on the "Edge" (simulated).
**Environment:** Google Colab / TensorFlow 2.x

## 1. Setup & Data Loading
We need the validation set to evaluate the quantized model's accuracy.
*   **Note:** Ensure `vww_dataset` folder exists (from Phase 2).

In [1]:
import tensorflow as tf
import numpy as np
import pathlib
import os

# Check if dataset exists
dataset_dir = pathlib.Path("vww_dataset")
if not dataset_dir.exists():
    raise FileNotFoundError("Dataset not found! Please run Phase 2 notebook first to download and organize data.")

IMG_SIZE = 96
BATCH_SIZE = 32

print("Loading validation data...")
val_ds = tf.keras.utils.image_dataset_from_directory(
    dataset_dir / 'validation',
    image_size=(IMG_SIZE, IMG_SIZE),
    batch_size=BATCH_SIZE,
    shuffle=False
)

# Preprocessing (Must match Phase 2)
def preprocess(image, label):
    image = tf.image.resize(image, (IMG_SIZE, IMG_SIZE))
    image = (image / 127.5) - 1.0
    return image, label

val_ds = val_ds.map(preprocess)

# Load the trained Keras model
model_path = 'best_model_v2.keras'
if not os.path.exists(model_path):
    raise FileNotFoundError(f"Model {model_path} not found! Please run Phase 2 training first.")

model = tf.keras.models.load_model(model_path)
print("✅ Model loaded successfully.")
model.summary()

Loading validation data...
Found 741 files belonging to 2 classes.
✅ Model loaded successfully.
✅ Model loaded successfully.


## 2. TFLite Conversion Strategy
We will generate three versions of the model to compare:
1.  **Float32 (Baseline):** No quantization.
2.  **Dynamic Range Quantization:** Weights are Int8, activations are Float32. Good size reduction, decent speed.
3.  **Full Integer Quantization (Int8):** **Crucial for MCUs.** Inputs, outputs, weights, and activations are all Int8. Requires a "Representative Dataset" to calibrate activation ranges.

In [2]:
# 1. Representative Dataset Generator (For Int8 Calibration)
# We need ~100 samples from the training set to help the converter estimate activation ranges.
train_ds_rep = tf.keras.utils.image_dataset_from_directory(
    dataset_dir / 'train',
    image_size=(IMG_SIZE, IMG_SIZE),
    batch_size=1, # Unbatched for generator
    shuffle=True
).map(preprocess).take(100)

def representative_data_gen():
    for input_value, _ in train_ds_rep:
        # Model expects [1, 96, 96, 3]
        yield [input_value]

print("Representative dataset generator ready.")

Found 1832 files belonging to 2 classes.
Representative dataset generator ready.
Representative dataset generator ready.


In [3]:
# 2. Conversion Functions

def convert_tflite(quantization_type='float32'):
    converter = tf.lite.TFLiteConverter.from_keras_model(model)
    
    if quantization_type == 'float32':
        filename = 'vww_v2_float32.tflite'
        
    elif quantization_type == 'dynamic':
        converter.optimizations = [tf.lite.Optimize.DEFAULT]
        filename = 'vww_v2_dynamic.tflite'
        
    elif quantization_type == 'int8':
        converter.optimizations = [tf.lite.Optimize.DEFAULT]
        converter.representative_dataset = representative_data_gen
        # Ensure input/output are also quantized (for pure MCU compatibility)
        converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
        converter.inference_input_type = tf.int8
        converter.inference_output_type = tf.int8
        filename = 'vww_v2_int8.tflite'
        
    tflite_model = converter.convert()
    
    with open(filename, 'wb') as f:
        f.write(tflite_model)
    
    print(f"Saved {filename} ({len(tflite_model) / 1024:.2f} KB)")
    return filename

# Run Conversions
print("Converting models...")
tflite_float = convert_tflite('float32')
tflite_dynamic = convert_tflite('dynamic')
tflite_int8 = convert_tflite('int8')

Converting models...
Saved artifact at '/tmp/tmpq9lqpi1y'. The following endpoints are available:

* Endpoint 'serve'
  args_0 (POSITIONAL_ONLY): TensorSpec(shape=(None, 96, 96, 3), dtype=tf.float32, name='input_layer_1')
Output Type:
  TensorSpec(shape=(None, 1), dtype=tf.float32, name=None)
Captures:
  132125052996112: TensorSpec(shape=(), dtype=tf.resource, name=None)
  132125052998800: TensorSpec(shape=(), dtype=tf.resource, name=None)
  132125052998608: TensorSpec(shape=(), dtype=tf.resource, name=None)
  132125052998992: TensorSpec(shape=(), dtype=tf.resource, name=None)
  132125058297744: TensorSpec(shape=(), dtype=tf.resource, name=None)
  132125052998224: TensorSpec(shape=(), dtype=tf.resource, name=None)
  132125053000336: TensorSpec(shape=(), dtype=tf.resource, name=None)
  132125052997264: TensorSpec(shape=(), dtype=tf.resource, name=None)
  132125052999568: TensorSpec(shape=(), dtype=tf.resource, name=None)
  132125053000912: TensorSpec(shape=(), dtype=tf.resource, name=No



Saved vww_v2_int8.tflite (606.96 KB)


## 3. Evaluation: Accuracy vs. Size
Now we must verify that the **Int8 model** didn't lose too much accuracy compared to the Float32 model.
We will use the `tf.lite.Interpreter` to run inference on the entire validation set.

In [None]:
def evaluate_tflite_model(tflite_file, dataset):
    # Load the TFLite model and allocate tensors.
    interpreter = tf.lite.Interpreter(model_path=tflite_file)
    interpreter.allocate_tensors()

    input_details = interpreter.get_input_details()
    output_details = interpreter.get_output_details()
    
    input_scale, input_zero_point = input_details[0]['quantization']
    output_scale, output_zero_point = output_details[0]['quantization']
    
    # Check if model expects Int8 input
    is_int8_input = (input_details[0]['dtype'] == np.int8)

    correct_predictions = 0
    total_samples = 0
    
    print(f"Evaluating {tflite_file}...")
    
    # Iterate through the dataset
    # Note: dataset is batched, but Interpreter runs one by one (simplest implementation)
    # For speed, we'll unbatch.
    for images, labels in dataset.unbatch():
        # Preprocessing for Int8 Input
        if is_int8_input:
            # Quantize input: (float_val / scale) + zero_point
            images_q = (images / input_scale) + input_zero_point
            images_q = np.clip(images_q, -128, 127).astype(np.int8)
            input_data = np.expand_dims(images_q, axis=0)
        else:
            input_data = np.expand_dims(images, axis=0)

        interpreter.set_tensor(input_details[0]['index'], input_data)
        interpreter.invoke()
        output_data = interpreter.get_tensor(output_details[0]['index'])
        
        # Dequantize output if needed (though for binary classification argmax/threshold works same)
        if is_int8_input:
             prediction = (output_data.astype(np.float32) - output_zero_point) * output_scale
        else:
             prediction = output_data
             
        # Binary classification threshold 0.5
        predicted_label = 1 if prediction[0][0] > 0.5 else 0
        
        if predicted_label == labels.numpy():
            correct_predictions += 1
        total_samples += 1
        
        if total_samples % 100 == 0:
            print(f".", end="")
            
    accuracy = correct_predictions / total_samples
    print(f"\nAccuracy: {accuracy:.4f}")
    return accuracy

# Evaluate Float32 vs Int8
# (Skipping Dynamic for time, as Int8 is our target)
acc_float = evaluate_tflite_model(tflite_float, val_ds)
acc_int8 = evaluate_tflite_model(tflite_int8, val_ds)

    TF 2.20. Please use the LiteRT interpreter from the ai_edge_litert package.
    See the [migration guide](https://ai.google.dev/edge/litert/migration)
    for details.
    


Evaluating vww_v2_float32.tflite...
..............
Accuracy: 0.9433
Evaluating vww_v2_int8.tflite...

Accuracy: 0.9433
Evaluating vww_v2_int8.tflite...
..............
Accuracy: 0.9433

Accuracy: 0.9433


## 4. Final Report
Compare the file size reduction and accuracy loss.

In [5]:
size_float = os.path.getsize(tflite_float) / 1024
size_int8 = os.path.getsize(tflite_int8) / 1024

print("\n" + "="*40)
print(f"Model Comparison Summary")
print("="*40)
print(f"{'Model Type':<20} | {'Size (KB)':<10} | {'Accuracy':<10}")
print("-" * 46)
print(f"{'Float32':<20} | {size_float:<10.2f} | {acc_float:<10.4f}")
print(f"{'Int8 (Quantized)':<20} | {size_int8:<10.2f} | {acc_int8:<10.4f}")
print("-" * 46)
print(f"Size Reduction: {size_float / size_int8:.1f}x")
print(f"Accuracy Drop:  {(acc_float - acc_int8) * 100:.2f}%")
print("="*40)


Model Comparison Summary
Model Type           | Size (KB)  | Accuracy  
----------------------------------------------
Float32              | 1560.45    | 0.9433    
Int8 (Quantized)     | 606.96     | 0.9433    
----------------------------------------------
Size Reduction: 2.6x
Accuracy Drop:  0.00%


## 5. Deployment Analysis
### Model Size & MCU Compatibility
The quantized model size is approximately **606 KB**. Let's analyze its compatibility with common microcontrollers:

*   **ESP32 (Xtensa LX6/LX7):**
    *   **Flash:** Typically 4MB. **(✅ Compatible)**
    *   **RAM:** 520KB SRAM. The model weights are stored in Flash, but we need RAM for the "Arena" (intermediate activation tensors). MobileNetV2 (alpha=0.35) usually requires ~250KB-300KB RAM. This fits comfortably.
*   **STM32H7 (Cortex-M7):**
    *   **Flash:** 1MB - 2MB. **(✅ Compatible)**
    *   **RAM:** ~1MB. **(✅ Compatible)**
*   **STM32F4 (Cortex-M4):**
    *   **Flash:** 512KB - 1MB. **(⚠️ Tight)**
    *   If Flash is 512KB, this model (606KB) **will not fit**. You would need to reduce `alpha` to 0.25 or use a smaller input resolution (e.g., 64x64).
*   **Arduino Nano 33 BLE Sense (Cortex-M4):**
    *   **Flash:** 1MB. **(✅ Compatible)**
    *   **RAM:** 256KB. **(⚠️ Tight)** Might need careful memory management.

### C++ Code Generation
To deploy this model to a microcontroller using **TensorFlow Lite for Microcontrollers (TFLM)**, we need to convert the `.tflite` file into a C byte array.
We will use the standard `xxd` tool (or a Python equivalent) to generate this header file.

In [None]:
def convert_to_c_array(tflite_file, output_header_name):
    with open(tflite_file, 'rb') as f:
        data = f.read()
    
    hex_data = ', '.join([f'0x{byte:02x}' for byte in data])
    
    c_code = f"""
#ifndef VWW_MODEL_H_
#define VWW_MODEL_H_

extern const unsigned char g_vww_model_data[];
extern const unsigned int g_vww_model_data_len;

#endif // VWW_MODEL_H_
"""
    
    c_source = f"""
#include "{output_header_name}"

const unsigned char g_vww_model_data[] = {{
  {hex_data}
}};
const unsigned int g_vww_model_data_len = {len(data)};
"""
    
    with open(output_header_name, 'w') as f:
        f.write(c_code)
        
    source_filename = output_header_name.replace('.h', '.cpp')
    with open(source_filename, 'w') as f:
        f.write(c_source)
        
    print(f"✅ Generated C++ files: {output_header_name}, {source_filename}")
    
    # Print first few lines to show user
    print("\n--- Sample of generated C++ code ---")
    print(c_source[:500] + "... (truncated)")

# Generate C++ code for the Int8 model
convert_to_c_array('vww_v2_int8.tflite', 'vww_model_data.h')