# Quantization-Aware Training (QAT)

This Jupyter Notebook provides a comprehensive walkthrough of **Quantization-Aware Training (QAT)**. Unlike Post-Training Quantization (PTQ), where a fully-trained model is converted to a lower precision format, QAT simulates the effects of this lower precision *during* the training process itself. This allows the model's weights and biases to adapt to the constraints of quantization, which often results in higher accuracy for the final, compressed model.

The key steps in this notebook are:
1.  **Data Loading**: We will load the training and testing datasets from Weights & Biases (W&B) artifacts.
2.  **Model Configuration**: We will define the architecture and hyperparameters for a Multilayer Perceptron model.
3.  **Quantization-Aware Training**: We will train the model from scratch while applying quantization-aware techniques.
4.  **Hyperparameter Sweep**: We will perform an automated search across various configurations to identify the optimal model, tracking every experiment with W&B.
5.  **Artifact Logging**: The resulting models will be saved as C++ and JSON files and logged as versioned artifacts to W&B for deployment and analysis.

## 1. Library Imports

We begin by importing the necessary Python libraries. These packages provide the core functionalities for data handling, model building, and experiment tracking.

* `wandb`: The primary tool for logging experiments, managing configurations, and versioning artifacts.
* `os`: Used for interacting with the file system, particularly for creating file paths.
* `pandas` & `numpy`: Essential for data manipulation and high-performance numerical computation.
* `sklearn.model_selection`: Provides tools for splitting datasets.
* `tensorflores.models.multilayer_perceptron`: Our custom implementation of a Multilayer Perceptron that supports QAT.
* `tensorflores.utils.clustering`: Contains the clustering algorithms used for the quantization process.
* `itertools`: A Python module for creating iterators for efficient looping, used here to generate hyperparameter combinations.
* `time`: To measure the duration of the training processes.

In [None]:
import wandb
import os
import pandas as pd
import numpy as np
import itertools
import time
import warnings

# Suppress warnings for a cleaner output
warnings.filterwarnings("ignore")

from tensorflores.models.multilayer_perceptron import MultilayerPerceptron
from tensorflores.utils.clustering import ClusteringMethods

# To run this notebook, you need a Wandb account and an API key.
# You can create a file named my_key.py with the line: WANDB_KEY = 'your_api_key_here'
# and then uncomment the line below.
from my_key import WANDB_KEY

## 2. Weights & Biases Initialization

To ensure every part of our training pipeline is reproducible and tracked, we connect to the Weights & Biases platform.

### 2.1. Authentication

First, we log in using a W&B API key.

**Security Best Practice**: It is highly recommended to manage API keys using environment variables (`WANDB_API_KEY`) or the W&B CLI (`wandb login`) rather than hardcoding them in source files.

In [None]:
wandb.login(key = WANDB_KEY)

### 2.2. Initial Run for Data Loading

We start a preliminary run with the `job_type` set to `data-loading`. This helps organize our project by separating the data preparation stage from model training.

In [None]:
run =  wandb.init(project = "SBAI 2025", job_type = "data-loading", save_code=True)

## 3. Data Loading and Preparation

The model's performance is dependent on the data it's trained on. We will download our versioned training and test datasets directly from W&B artifacts.

### 3.1. Download Datasets

We use `run.use_artifact` to specify the desired dataset versions and `artifact.download()` to retrieve them locally.

In [None]:
# Download the training dataset
artifact = run.use_artifact(artifact_or_name = "thommasflores-ufrn/SBAI 2025/train_dataset:latest")
path = artifact.download()

# List files to identify the CSV
print("Files in the downloaded directory:", os.listdir(path))
csv_file_path_train = os.path.join(path, os.listdir(path)[0])
df_train = pd.read_csv(csv_file_path_train)
df_train

In [None]:
# Download the test dataset
artifact = run.use_artifact(artifact_or_name = "thommasflores-ufrn/SBAI 2025/test_dataset:latest")
path = artifact.download()
csv_file_path_test = os.path.join(path, os.listdir(path)[0])
df_test = pd.read_csv(csv_file_path_test)
df_test

### 3.2. Prepare Data for Training

We process the data by separating it into features (inputs) and the target variable, and then converting the pandas DataFrames into NumPy arrays, which is the required format for our model.

In [None]:
# Define input and target columns
target = ['CO2 (g/s) [estimated maf]']
input = ['intake_pressure','intake_temperature','rpm', 'speed']

# Separate features and targets
X_train = df_train[input]
y_train = df_train[target]
X_test = df_test[input]
y_test = df_test[target]

# Convert to NumPy arrays
X_value_train = X_train.values
y_value_train = y_train.values
X_value_test = X_test.values
y_value_test = y_test.values

print('Train input shape: ', X_value_train.shape)
print('Train output shape: ', y_value_train.shape)
print('Test input shape: ', X_value_test.shape)
print('Test output shape: ', y_value_test.shape)

## 4. Single Quantization-Aware Training (QAT) Run

Before performing a large-scale hyperparameter sweep, it is instructive to walk through a single, complete QAT cycle. This helps in understanding the components involved in the process.

### 4.1. Define Quantization and Model Parameters

In QAT, the quantization method is a key part of the training configuration. We use clustering algorithms to quantize the weights and biases.

* **Clustering Method**: We choose from several options to group the model's parameters. Here, we select `autocloud`, a density-based algorithm. Other choices include `meanshift`, `affinity_propagation`, and `dbstream`.
* **Distance Metric**: This metric is used by the clustering algorithm to measure the similarity between parameter values. We will use the `euclidean` distance. Other common choices include `manhattan` and `cosine`.
* **Model Configuration**: All hyperparameters are defined in a `config` dictionary. This includes network architecture (`hidden_layer_sizes`, `activation_functions`), training parameters (`learning_rate`, `epochs`), and QAT settings (`training_with_quantization`, `epochs_quantization`). Logging this dictionary to W&B ensures complete reproducibility.

In [None]:
# Define the Clustering Method
Clustering_method = ClusteringMethods()

bias_clustering_method = Clustering_method.autocloud_biases(threshold_biases = 1.4148)
weight_clustering_method = Clustering_method.autocloud_weight(threshold_weights = 1.4148)

In [None]:
# Define the full configuration for a single run
config = {
    'input_size': 4,
    'output_size': 1,
    'hidden_layer_sizes': [16, 8],
    'activation_functions': ['relu', 'relu', 'linear'],
    'weight_bias_init': 'RandomNormal',
    'training_with_quantization': True,
    'epochs': 100,
    'epochs_quantization': 100,
    'learning_rate': 0.001,
    'loss_function': 'mean_squared_error',
    'optimizer': 'adamax',
    'batch_size': 36,
    'validation_split': 0.2,
    'distance_metric': "euclidean",
}

# Initialize the W&B run with the specified config
wandb.init(project="SBAI 2025", job_type = "training-model-QAT", config=config, save_code=True)

### 4.2. Model Instantiation and Training

We create an instance of our `MultilayerPerceptron` and call the `.train()` method. The `training_with_quantization=True` flag enables the QAT process. The model first trains normally for `epochs` and then fine-tunes the quantized parameters for `epochs_quantization`, allowing it to recover accuracy lost during the initial quantization step.

In [None]:
# Instantiate the model
nn = MultilayerPerceptron(input_size=config['input_size'],
                          output_size=config['output_size'],
                          hidden_layer_sizes=config['hidden_layer_sizes'],
                          activation_functions=config['activation_functions'],
                          weight_bias_init=config['weight_bias_init'],
                          training_with_quantization=config['training_with_quantization'])

# Train the model with QAT
nn.train(X=X_value_train, 
         y=y_value_train,
         epochs=config['epochs'], 
         epochs_quantization = config['epochs_quantization'],
         learning_rate=config['learning_rate'],
         loss_function=config['loss_function'], 
         optimizer=config['optimizer'], 
         batch_size=config['batch_size'], 
         validation_split=config['validation_split'],
         distance_metric = config['distance_metric'],
         bias_clustering_method = bias_clustering_method,
         weight_clustering_method = weight_clustering_method,         
         )

### 4.3. Evaluation and Logging

After training, we evaluate the model's performance on the unseen test set by calculating the Mean Squared Error (MSE) and other error metrics. These results are logged to W&B to track the performance of this specific configuration.

In [None]:
# Make predictions and calculate the error
pred = nn.predict(X_value_test)
error = pred - y_value_test
mse = np.mean(np.square(error))

# Log metrics to wandb and finish the run
wandb.log({
    'mse_test': mse,
    'mean_error': np.mean(error),
    'mean_absolute_error': np.mean(np.abs(error)),
})
wandb.finish()

## 5. Hyperparameter Sweep for Optimal QAT Model

While a single run is useful for understanding the process, finding the best model requires exploring multiple hyperparameter combinations. A hyperparameter sweep (or grid search) automates this exploration.

### 5.1. Define the Hyperparameter Search Space

We create lists of possible values for each hyperparameter we want to test. This includes network architectures, activation functions, learning rates, and quantization settings. The `itertools.product` function is then used to generate a list of all unique combinations from these lists.

In [None]:
# Define the sets of hyperparameters to explore
activation_sets = [
    ['relu', 'relu', 'linear'],
    ['tanh', 'relu', 'linear'],
    ['sigmoid', 'sigmoid', 'linear']
]

hidden_layer_sets = [
    [16, 8],
    [32, 16],
    [64, 32]
]

learning_rates = [0.01, 0.001]

bias_clustering_methods = [ 
    Clustering_method.autocloud_biases(threshold_biases=1.4148),
    Clustering_method.meanshift_biases(bandwidth_biases=0.005)
]

weight_clustering_methods = [ 
    Clustering_method.autocloud_weight(threshold_weights=1.4148),
    Clustering_method.meanshift_weight(bandwidth_weights=0.005)
]

distance_metrics = ["euclidean", "minkowski"]
epochs_quantization = [100, 60]

# Create all possible combinations
combinations = list(itertools.product(
    activation_sets,
    hidden_layer_sets,
    learning_rates,
    bias_clustering_methods,
    weight_clustering_methods,
    distance_metrics,
    epochs_quantization
))

print(f"Total combinations to be tested: {len(combinations)}")

### 5.2. Executing the Sweep

We now loop through every combination generated. For each one, the script will:
1.  Define the `config` dictionary for the specific trial.
2.  Initialize a new, separate W&B run to track the trial.
3.  Instantiate and train the model with the given configuration.
4.  Measure the training time.
5.  Evaluate the final model's performance on both train and test sets.
6.  Save the quantized model as deployable `tensorflores_QAT.h` (C++) and `tensorflores_QAT.json` files.
7.  Log these files as versioned artifacts to the W&B run.
8.  Log all performance metrics (MSE, MAE, training time) to W&B.
9.  Finalize the run.

This systematic approach allows us to use the W&B dashboard to easily compare all configurations and identify the one that provides the best balance of accuracy and efficiency.

In [None]:
for act_funcs, hidden_layers, lr, bias_method, weight_method, dist_metric, quant_epochs in combinations:
    # Define the configuration for the current experiment
    config = {
        'input_size': 4,
        'output_size': 1,
        'hidden_layer_sizes': hidden_layers,
        'activation_functions': act_funcs,
        'weight_bias_init': 'RandomNormal',
        'training_with_quantization': True,
        'epochs': 100,
        'learning_rate': lr,
        'loss_function': 'mean_squared_error',
        'optimizer': 'adamax',
        'batch_size': 36,
        'validation_split': 0.2,
        'bias_clustering_method': bias_method,
        'weight_clustering_method': weight_method,
        'distance_metric': dist_metric,
        'epochs_quantization': quant_epochs
    }

    # Initialize a new W&B run for this combination
    wandb.init(project="SBAI 2025", job_type="training-model-QAT-sweep", config=config, save_code=True)

    # Instantiate and train the model
    nn = MultilayerPerceptron(
        input_size=config['input_size'],
        output_size=config['output_size'],
        hidden_layer_sizes=config['hidden_layer_sizes'],
        activation_functions=config['activation_functions'],
        weight_bias_init=config['weight_bias_init'],
        training_with_quantization=config['training_with_quantization'],

    )

    start_time = time.time()
    nn.train(
        X=X_value_train,
        y=y_value_train,
        epochs=config['epochs'],
        learning_rate=config['learning_rate'],
        loss_function=config['loss_function'],
        optimizer=config['optimizer'],
        batch_size=config['batch_size'],
        validation_split=config['validation_split'],
        bias_clustering_method=config['bias_clustering_method'],
        weight_clustering_method=config['weight_clustering_method'],
        distance_metric=config['distance_metric'],
        epochs_quantization=config['epochs_quantization']
    )
    train_time = time.time() - start_time

    # Evaluate the trained model
    def evaluate(model, X, y):
        pred = model.predict(X)
        error = pred - y
        return np.mean(np.square(error)), np.mean(np.abs(error))

    mse_train, mae_train = evaluate(nn, X_value_train, y_value_train)
    mse_test, mae_test = evaluate(nn, X_value_test, y_value_test)

    # Save the model in C++ and JSON formats
    nn.save_model_as_cpp('./cpp_models/tensorflores_QAT')
    nn.save_model_as_json('./json_models/tensorflores_QAT')

    # Upload the saved models as W&B artifacts
    cpp_artifact = wandb.Artifact("cpp_QAT", type="model_QAT")
    cpp_artifact.add_file('./cpp_models/tensorflores_QAT.h')
    wandb.log_artifact(cpp_artifact)

    json_artifact = wandb.Artifact("json_QAT", type="model_QAT")
    json_artifact.add_file('./json_models/tensorflores_QAT.json')
    wandb.log_artifact(json_artifact)

    # Log performance metrics and training time
    wandb.log({
        'train_time': train_time,
        'mse_train': mse_train,
        'mae_train': mae_train,
        'mse_test': mse_test,
        'mae_test': mae_test
    })

    # Finish the W&B run for this combination
    wandb.finish()

Epoch 56/100, Loss: 1.6206464560251592, Val Loss: 3.961390921629187, Bias Clusters: 67, Weight Clusters: 51
