# Task B: Part-2 : Int8 Quantization and
# Task C: Power of 2 Quantisation

This notebook builds and trains a recurrent neural network (LSTM) to classify spoken digits (0–9) from audio recordings.

- Dataset: [Free Spoken Digit Dataset (FSDD)](https://github.com/Jakobovski/free-spoken-digit-dataset)
- Framework: PyTorch
- Architecture: RNN with LSTM layers

In [1]:
import sys
import os

# Add the project root (parent of current folder) to Python path
project_root_dir = os.path.abspath(os.path.join(os.getcwd(), '..'))
sys.path.append(project_root_dir)


## Load Model Configuration from YAML

To make the training pipeline configurable and modular, we store model parameters like number of LSTM layers, hidden size, and learning rate etc in a YAML file. This structure enables quick adaptation to related tasks B, and C.

This section loads the model configuration using a custom utility function.

In [2]:
import src.utils as utils

In [3]:
utils.set_seed(42)

[INFO] Random seed set to: 42


In [4]:
import yaml
import json

model_config_path = os.path.join(project_root_dir, 'config', 'model_config.yaml')
model_config = utils.read_yaml_file(model_config_path)
# print(json.dumps(model_config, indent=2))

## Load and Split Dataset for Training and Evaluation

In this section, we load the recordings data from disk, generate data-label pairs, and split them into training and test sets according to the `test_size` defined in the YAML file.

Using `test_size` and `seed` from the YAML config ensures that experiments are reproducible and easily tunable for other tasks by simply updating the configuration.


In [5]:
data_path = model_config['dataset']['path']
test_data_size = model_config['data_splitting']['test_size']
seed = model_config['experiment']['seed']

In [6]:
data_label_pairs, calibration_samples = utils.prepare_data_label_pairs(data_path, calibration_samples_per_class=50)

In [7]:
print(len(calibration_samples))

500


In [8]:
from sklearn.model_selection import train_test_split
train_data, test_data = train_test_split(data_label_pairs, test_size=test_data_size, random_state=seed)

## Transform Raw Data into PyTorch Dataset Objects

The `AudioFeaturesDataset` class converts raw data-label pairs into PyTorch-compatible datasets that provide easy access to samples and labels.

AudioFeaturesDataset is a custom dataset class that:

- Loads audio recordings of spoken digits along with their labels.
- Optionally cleans the audio by filtering out noise.
- Extracts MFCC features (a common speech feature).
- Pads or trims these features to a fixed length so all inputs have the same shape.
- Works with PyTorch to provide samples one-by-one when training or testing a model.
- It helps prepare your audio data in the right format for training neural networks efficiently.


In [9]:
from src.data_preprocessor import AudioFeaturesDataset

train_dataset = AudioFeaturesDataset(train_data)
test_dataset = AudioFeaturesDataset(test_data)

In [10]:
print(f"Train size: {len(train_dataset)}")
print(f"Test size: {len(test_dataset)}")


Train size: 2400
Test size: 600


## Create DataLoaders for Batch Processing

Using PyTorch DataLoaders, we enable efficient loading, batching, and shuffling of data during training and evaluation.

In [11]:
import torch

train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=32, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=32, shuffle=False)

In [12]:
input_dim = model_config['model']['input_dim']
hidden_dim = model_config['model']['hidden_dim']
num_layers = model_config['model']['num_layers']
output_dim = model_config['model']['output_dim']

In [13]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

## Static Quantization

In [14]:
import src.model as model

hidden_dim = 24
quant_model = model.LSTMClassifier(input_dim=input_dim,
                       hidden_dim=hidden_dim,
                       num_layers=num_layers,
                       output_dim=output_dim).to(device)

In [15]:
memory_constraint_model_load_path = os.path.join(project_root_dir, 'outputs', 'models', 'task-b-part-1_weights.pth')
quant_model.load_state_dict(torch.load(memory_constraint_model_load_path))

<All keys matched successfully>

In [16]:
quant_model.eval()

LSTMClassifier(
  (lstm): LSTM(13, 24, num_layers=2, batch_first=True, dropout=0.3)
  (dropout): Dropout(p=0.3, inplace=False)
  (fc): Linear(in_features=24, out_features=10, bias=True)
)

In [17]:
import src.quantize as quantize
static_quant_model = quantize.StaticQuantizableModel(quant_model)

In [18]:
static_quant_model.qconfig = torch.quantization.get_default_qconfig('x86')  # For edge devices
# torch.backends.quantized.engine = 'fbgemm'

from torch.ao.quantization import MinMaxObserver, PerChannelMinMaxObserver, QConfig, MovingAverageMinMaxObserver, MovingAveragePerChannelMinMaxObserver
qconfig = QConfig(activation=MovingAveragePerChannelMinMaxObserver.with_args(qscheme=torch.per_tensor_affine, dtype=torch.quint8),
       weight=PerChannelMinMaxObserver.with_args(dtype=torch.qint8, qscheme=torch.per_channel_symmetric))

static_quant_model_prepared = torch.quantization.prepare(static_quant_model)



In [19]:
calibration_sampleset = AudioFeaturesDataset(calibration_samples)
calibration_loader = torch.utils.data.DataLoader(calibration_sampleset, batch_size=32, shuffle=True)
print(f"Calibration set size : {len(calibration_sampleset)}")

Calibration set size : 500


In [20]:
static_quant_model_prepared_cpu = static_quant_model_prepared.to('cpu')
device = torch.device("cpu")

quantize.calibrate(static_quant_model_prepared, calibration_loader, device)

In [21]:
static_quantized_model_converted = torch.quantization.convert(static_quant_model_prepared)

In [22]:
from src.evaluate import ModelEvaluator
print(device)
quant_test_instance = ModelEvaluator(
    static_quantized_model_converted, 
    test_loader,
    device
)

cpu


In [23]:
quant_test_instance.evaluate()

  cx_tensor = torch.stack(cx_list)



 Accuracy on test data: 85.00%

 Classification Report:
              precision    recall  f1-score   support

           0     0.7907    0.9444    0.8608        72
           1     0.9574    0.6522    0.7759        69
           2     0.9070    0.6842    0.7800        57
           3     0.7887    1.0000    0.8819        56
           4     0.8750    0.9492    0.9106        59
           5     0.9259    0.7937    0.8547        63
           6     0.8958    0.7679    0.8269        56
           7     0.7571    0.9636    0.8480        55
           8     0.8727    0.8421    0.8571        57
           9     0.8387    0.9286    0.8814        56

    accuracy                         0.8500       600
   macro avg     0.8609    0.8526    0.8477       600
weighted avg     0.8622    0.8500    0.8468       600


 Confusion Matrix:
[[68  0  2  1  0  0  0  0  0  1]
 [ 4 45  0  0  6  2  0 11  0  1]
 [ 8  0 39  2  0  0  0  2  0  6]
 [ 0  0  0 56  0  0  0  0  0  0]
 [ 1  1  0  0 56  0  0  1  0  0]

### Compute Inference Time¶

In [30]:
utils.compute_inference_time(static_quantized_model_converted, test_loader)

Median inference time: 8.7123 ms


### Save the model

In [31]:
static_quantized_model_converted.eval()

static_quant_model_save_path = os.path.join(project_root_dir, 'outputs', 'models', 'task-b-part-2_weights_jit.pth')

script_model = torch.jit.script(static_quantized_model_converted)
torch.jit.save(script_model, static_quant_model_save_path)

## Is the Model INT8 Quantised? Does it meet the 36 KB Memory Constraint?

In [32]:
for name, module in static_quantized_model_converted.named_modules():
    if isinstance(module, torch.nn.quantized.Linear):
        print(f"\n{name} is a quantized Linear layer")
        weight = module.weight()
        print("Weight type:", type(weight))
        print("Weight dtype:", weight.dtype)
        print("Weight shape:", weight.shape)
        print("---")


model.lstm.layers.0.layer_fw.cell.igates is a quantized Linear layer
Weight type: <class 'torch.Tensor'>
Weight dtype: torch.qint8
Weight shape: torch.Size([96, 13])
---

model.lstm.layers.0.layer_fw.cell.hgates is a quantized Linear layer
Weight type: <class 'torch.Tensor'>
Weight dtype: torch.qint8
Weight shape: torch.Size([96, 24])
---

model.lstm.layers.1.layer_fw.cell.igates is a quantized Linear layer
Weight type: <class 'torch.Tensor'>
Weight dtype: torch.qint8
Weight shape: torch.Size([96, 24])
---

model.lstm.layers.1.layer_fw.cell.hgates is a quantized Linear layer
Weight type: <class 'torch.Tensor'>
Weight dtype: torch.qint8
Weight shape: torch.Size([96, 24])
---

model.fc is a quantized Linear layer
Weight type: <class 'torch.Tensor'>
Weight dtype: torch.qint8
Weight shape: torch.Size([10, 24])
---


In [33]:
total_fp32_bytes = 0
total_int8_bytes = 0

for name, module in static_quantized_model_converted.named_modules():
    if isinstance(module, torch.nn.quantized.Linear):
        weight = module.weight()
        if weight.is_quantized:
            num_elements = weight.numel()
            total_fp32_bytes += num_elements * 4  # FP32
            total_int8_bytes += num_elements * 1  # INT8

print(f"Estimated FP32 size: {total_fp32_bytes / 1024:.2f} KB")
print(f"Estimated INT8 size: {total_int8_bytes / 1024:.2f} KB")
print(f"Compression ratio: {total_fp32_bytes / total_int8_bytes:.2f}x")

Estimated FP32 size: 32.81 KB
Estimated INT8 size: 8.20 KB
Compression ratio: 4.00x


In [34]:
utils.print_quantized_layer_analysis(static_quantized_model_converted, "INT8 Model")


 INT8 Model - Layer Analysis
Layer Name                               | Num Parameters | Size (INT8) | Fits in 36KB?
---------------------------------------------------------------------------
model.lstm.layers.0.layer_fw.cell.igates |          1248 | 1.219 KB     | ✅
model.lstm.layers.0.layer_fw.cell.hgates |          2304 | 2.250 KB     | ✅
model.lstm.layers.1.layer_fw.cell.igates |          2304 | 2.250 KB     | ✅
model.lstm.layers.1.layer_fw.cell.hgates |          2304 | 2.250 KB     | ✅
model.fc                                 |           240 | 0.234 KB     | ✅

📦 Total Estimated Memory Usage
Total number of parameters:      8400
Estimated total size (INT8):     8.203 KB
Memory per parameter (INT8):     1 byte
Meets 36KB per-layer limit?      ✅ Yes


## Task C: Power of 2 Quantization: Transforming only Scale factors to Powers of Two

In [36]:
import copy

po2_quantized_model = copy.deepcopy(static_quantized_model_converted)


In [37]:
import torch
import torch.nn as nn
import math

def round_scale_to_power_of_two(scale):
    if scale <= 0:
        return scale
    return 2 ** round(math.log2(scale))

def update_quantized_weights_and_show_scales(module: nn.Module):
    for name, submodule in module.named_children():
        if isinstance(submodule, torch.nn.quantized.Linear):
            old_wt = submodule.weight()
            qscheme = old_wt.qscheme()

            if qscheme == torch.per_tensor_affine:
                old_scale = old_wt.q_scale()
                old_zp = old_wt.q_zero_point()
                float_wt = old_wt.dequantize()
                new_scale = round_scale_to_power_of_two(old_scale)
                new_qweight = torch.quantize_per_tensor(
                    float_wt, scale=new_scale, zero_point=old_zp, dtype=torch.qint8
                )
                submodule.set_weight_bias(new_qweight, submodule.bias())
                print(f"[Per-tensor] {name}:")
                print(f"  Old scale: {old_scale:.8f}")
                print(f"  New scale: {new_scale:.8f}")
                print(f"  Zero point: {old_zp}")

            elif qscheme == torch.per_channel_affine:
                old_scales = old_wt.q_per_channel_scales()
                old_zps = old_wt.q_per_channel_zero_points()
                axis = old_wt.q_per_channel_axis()
                float_wt = old_wt.dequantize()
                new_scales = torch.tensor(
                    [round_scale_to_power_of_two(s.item()) for s in old_scales],
                    dtype=old_scales.dtype,
                    device=old_scales.device
                )
                new_qweight = torch.quantize_per_channel(
                    float_wt, new_scales, old_zps, axis=axis, dtype=torch.qint8
                )
                submodule.set_weight_bias(new_qweight, submodule.bias())
                print(f"[Per-channel] {name}:")
                print(f"  Old scales: {old_scales.tolist()}")
                print(f"  New scales: {new_scales.tolist()}")
                print(f"  Zero points: {old_zps.tolist()}")

            else:
                print(f"[Skipped] {name}: Unsupported qscheme {qscheme}")

        else:
            update_quantized_weights_and_show_scales(submodule)


In [38]:
update_quantized_weights_and_show_scales(po2_quantized_model)

[Per-channel] igates:
  Old scales: [0.001615653745830059, 0.0018714802572503686, 0.0020117308013141155, 0.002106273779645562, 0.0022053238935768604, 0.002103707753121853, 0.0016710897907614708, 0.002046100329607725, 0.0017540731932967901, 0.00208006682805717, 0.001712300698272884, 0.0019382458413019776, 0.0018706321716308594, 0.0017586955800652504, 0.0016365181654691696, 0.00345902843400836, 0.002487811027094722, 0.0021910173818469048, 0.0018338965019211173, 0.0013152362080290914, 0.002594183199107647, 0.002215113490819931, 0.001944198738783598, 0.0017140066483989358, 0.0021132363472133875, 0.0026520786341279745, 0.0017188069177791476, 0.002268699463456869, 0.0019005556823685765, 0.002209493424743414, 0.0021241006907075644, 0.002217937959358096, 0.001790273585356772, 0.0012400405248627067, 0.0017527616582810879, 0.0018183921929448843, 0.0016870113322511315, 0.0017570874188095331, 0.0020349151454865932, 0.0015564362984150648, 0.0017177117988467216, 0.0025220997631549835, 0.001715537626

In [41]:
from src.evaluate import ModelEvaluator
quant_test_instance_2 = ModelEvaluator(
    po2_quantized_model, 
    test_loader,
    device
)

In [42]:
quant_test_instance_2.evaluate()

  cx_tensor = torch.stack(cx_list)



 Accuracy on test data: 83.17%

 Classification Report:
              precision    recall  f1-score   support

           0     0.8072    0.9306    0.8645        72
           1     0.8776    0.6232    0.7288        69
           2     0.9231    0.6316    0.7500        57
           3     0.7397    0.9643    0.8372        56
           4     0.9016    0.9322    0.9167        59
           5     0.9492    0.8889    0.9180        63
           6     0.8571    0.7500    0.8000        56
           7     0.7059    0.8727    0.7805        55
           8     0.8571    0.8421    0.8496        57
           9     0.7937    0.8929    0.8403        56

    accuracy                         0.8317       600
   macro avg     0.8412    0.8328    0.8286       600
weighted avg     0.8430    0.8317    0.8288       600


 Confusion Matrix:
[[67  0  3  1  0  0  0  1  0  0]
 [ 7 43  0  0  5  0  1 13  0  0]
 [ 8  1 36  4  0  0  0  1  0  7]
 [ 0  0  0 54  0  0  0  0  1  1]
 [ 1  2  0  0 55  0  0  1  0  0]

### Compute Inference Time

In [44]:
utils.compute_inference_time(po2_quantized_model, test_loader)

Median inference time: 20.1323 ms


### Save the model

In [45]:
po2_quantized_model.eval()

po2_quantized_model_save_path = os.path.join(project_root_dir, 'outputs', 'models', 'task-b-part-2_weights_jit.pth')
script_model = torch.jit.script(po2_quantized_model)
torch.jit.save(script_model, po2_quantized_model_save_path)

## Does the Model Layer Parameters meet 36 KB Memory Constraint?

In [46]:
utils.print_quantized_layer_analysis(po2_quantized_model)


 Model - Layer Analysis
Layer Name                               | Num Parameters | Size (INT8) | Fits in 36KB?
---------------------------------------------------------------------------
model.lstm.layers.0.layer_fw.cell.igates |          1248 | 1.219 KB     | ✅
model.lstm.layers.0.layer_fw.cell.hgates |          2304 | 2.250 KB     | ✅
model.lstm.layers.1.layer_fw.cell.igates |          2304 | 2.250 KB     | ✅
model.lstm.layers.1.layer_fw.cell.hgates |          2304 | 2.250 KB     | ✅
model.fc                                 |           240 | 0.234 KB     | ✅

📦 Total Estimated Memory Usage
Total number of parameters:      8400
Estimated total size (INT8):     8.203 KB
Memory per parameter (INT8):     1 byte
Meets 36KB per-layer limit?      ✅ Yes


## Does the Model use Power-of-Two Scale factors?

In [47]:
utils.verify_power_of_two_scales(po2_quantized_model)

Layer Name                          | Scale Value     | Is Power of Two?
----------------------------------------------------------------------
model.lstm.layers.0.layer_fw.cell.igates | [0.001953125, 0.001953125, 0.001953125, ... | ✅
model.lstm.layers.0.layer_fw.cell.hgates | [0.00390625, 0.001953125, 0.001953125, 0... | ✅
model.lstm.layers.1.layer_fw.cell.igates | [0.001953125, 0.001953125, 0.001953125, ... | ✅
model.lstm.layers.1.layer_fw.cell.hgates | [0.001953125, 0.001953125, 0.001953125, ... | ✅
model.fc                            | [0.0078125, 0.00390625, 0.0078125, 0.003... | ✅

 Final Result
✅ All layers use power-of-two scale values.


True