# Task B: Part-2 : Int8 Quantization

This notebook builds and trains a recurrent neural network (LSTM) to classify spoken digits (0–9) from audio recordings.

- Dataset: [Free Spoken Digit Dataset (FSDD)](https://github.com/Jakobovski/free-spoken-digit-dataset)
- Framework: PyTorch
- Architecture: RNN with LSTM layers

In [1]:
import sys
import os

# Add the project root (parent of current folder) to Python path
project_root_dir = os.path.abspath(os.path.join(os.getcwd(), '..'))
sys.path.append(project_root_dir)


## Load Model Configuration from YAML

To make the training pipeline configurable and modular, we store model parameters like number of LSTM layers, hidden size, and learning rate etc in a YAML file. This structure enables quick adaptation to related tasks B, and C.

This section loads the model configuration using a custom utility function.

In [2]:
import src.utils as utils

In [3]:
utils.set_seed(42)

[INFO] Random seed set to: 42


In [4]:
import yaml
import json

model_config_path = os.path.join(project_root_dir, 'config', 'model_config.yaml')
model_config = utils.read_yaml_file(model_config_path)
# print(json.dumps(model_config, indent=2))

## Load and Split Dataset for Training and Evaluation

In this section, we load the recordings data from disk, generate data-label pairs, and split them into training and test sets according to the `test_size` defined in the YAML file.

Using `test_size` and `seed` from the YAML config ensures that experiments are reproducible and easily tunable for other tasks by simply updating the configuration.


In [5]:
data_path = model_config['dataset']['path']
test_data_size = model_config['data_splitting']['test_size']
seed = model_config['experiment']['seed']

In [6]:
data_label_pairs, calibration_samples = utils.prepare_data_label_pairs(data_path, calibration_samples_per_class=50)

In [7]:
print(len(calibration_samples))

500


In [8]:
from sklearn.model_selection import train_test_split
train_data, test_data = train_test_split(data_label_pairs, test_size=test_data_size, random_state=seed)

## Transform Raw Data into PyTorch Dataset Objects

The `AudioFeaturesDataset` class converts raw data-label pairs into PyTorch-compatible datasets that provide easy access to samples and labels.

AudioFeaturesDataset is a custom dataset class that:

- Loads audio recordings of spoken digits along with their labels.
- Optionally cleans the audio by filtering out noise.
- Extracts MFCC features (a common speech feature).
- Pads or trims these features to a fixed length so all inputs have the same shape.
- Works with PyTorch to provide samples one-by-one when training or testing a model.
- It helps prepare your audio data in the right format for training neural networks efficiently.


In [9]:
from src.data_preprocessor import AudioFeaturesDataset

train_dataset = AudioFeaturesDataset(train_data)
test_dataset = AudioFeaturesDataset(test_data)

In [10]:
print(f"Train size: {len(train_dataset)}")
print(f"Test size: {len(test_dataset)}")


Train size: 2400
Test size: 600


## Create DataLoaders for Batch Processing

Using PyTorch DataLoaders, we enable efficient loading, batching, and shuffling of data during training and evaluation.

In [11]:
import torch

train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=32, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=32, shuffle=False)

In [12]:
input_dim = model_config['model']['input_dim']
hidden_dim = model_config['model']['hidden_dim']
num_layers = model_config['model']['num_layers']
output_dim = model_config['model']['output_dim']

In [13]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

## Static Quantization

In [14]:
import src.model as model

hidden_dim = 24
quant_model = model.LSTMClassifier(input_dim=input_dim,
                       hidden_dim=hidden_dim,
                       num_layers=num_layers,
                       output_dim=output_dim).to(device)

In [15]:
memory_constraint_model_load_path = os.path.join(project_root_dir, 'outputs', 'models', 'task-b-part-1_weights.pth')
quant_model.load_state_dict(torch.load(memory_constraint_model_load_path))

<All keys matched successfully>

In [16]:
quant_model.eval()

LSTMClassifier(
  (lstm): LSTM(13, 24, num_layers=2, batch_first=True, dropout=0.3)
  (dropout): Dropout(p=0.3, inplace=False)
  (fc): Linear(in_features=24, out_features=10, bias=True)
)

In [17]:
import src.quantize as quantize
static_quant_model = quantize.StaticQuantizableModel(quant_model)

In [18]:
static_quant_model.qconfig = torch.quantization.get_default_qconfig('x86')  # For edge devices
# torch.backends.quantized.engine = 'fbgemm'

from torch.ao.quantization import MinMaxObserver, PerChannelMinMaxObserver, QConfig, MovingAverageMinMaxObserver, MovingAveragePerChannelMinMaxObserver
qconfig = QConfig(activation=MovingAveragePerChannelMinMaxObserver.with_args(qscheme=torch.per_tensor_affine, dtype=torch.quint8),
       weight=PerChannelMinMaxObserver.with_args(dtype=torch.qint8, qscheme=torch.per_channel_symmetric))

static_quant_model_prepared = torch.quantization.prepare(static_quant_model)



In [21]:
calibration_sampleset = AudioFeaturesDataset(calibration_samples)
calibration_loader = torch.utils.data.DataLoader(calibration_sampleset, batch_size=32, shuffle=True)
print(f"Calibration set size : {len(calibration_sampleset)}")

Calibration set size : 500


In [22]:
static_quant_model_prepared_cpu = static_quant_model_prepared.to('cpu')
device = torch.device("cpu")

quantize.calibrate(static_quant_model_prepared, calibration_loader, device)

In [23]:
static_quantized_model_converted = torch.quantization.convert(static_quant_model_prepared)

In [24]:
from src.evaluate import ModelEvaluator
print(device)
quant_test_instance = ModelEvaluator(
    static_quantized_model_converted, 
    test_loader,
    device
)

cpu


In [25]:
quant_test_instance.evaluate()

  cx_tensor = torch.stack(cx_list)



 Accuracy on test data: 85.83%

 Classification Report:
              precision    recall  f1-score   support

           0     0.8272    0.9306    0.8758        72
           1     0.9219    0.8551    0.8872        69
           2     0.8511    0.7018    0.7692        57
           3     0.7179    1.0000    0.8358        56
           4     1.0000    0.9492    0.9739        59
           5     0.8393    0.7460    0.7899        63
           6     0.9600    0.8571    0.9057        56
           7     0.8070    0.8364    0.8214        55
           8     0.9792    0.8246    0.8952        57
           9     0.7778    0.8750    0.8235        56

    accuracy                         0.8583       600
   macro avg     0.8681    0.8576    0.8578       600
weighted avg     0.8688    0.8583    0.8587       600


 Confusion Matrix:
[[67  0  2  2  0  0  0  0  0  1]
 [ 1 59  0  0  0  2  0  6  0  1]
 [12  0 40  4  0  0  0  1  0  0]
 [ 0  0  0 56  0  0  0  0  0  0]
 [ 0  3  0  0 56  0  0  0  0  0]

In [29]:
for name, module in static_quantized_model_converted.named_modules():
    if isinstance(module, torch.nn.quantized.Linear):
        print(f"\n{name} is a quantized Linear layer")
        weight = module.weight()
        print("Weight type:", type(weight))
        print("Weight dtype:", weight.dtype)
        print("Weight shape:", weight.shape)
        print("---")


model.lstm.layers.0.layer_fw.cell.igates is a quantized Linear layer
Weight type: <class 'torch.Tensor'>
Weight dtype: torch.qint8
Weight shape: torch.Size([96, 13])
---

model.lstm.layers.0.layer_fw.cell.hgates is a quantized Linear layer
Weight type: <class 'torch.Tensor'>
Weight dtype: torch.qint8
Weight shape: torch.Size([96, 24])
---

model.lstm.layers.1.layer_fw.cell.igates is a quantized Linear layer
Weight type: <class 'torch.Tensor'>
Weight dtype: torch.qint8
Weight shape: torch.Size([96, 24])
---

model.lstm.layers.1.layer_fw.cell.hgates is a quantized Linear layer
Weight type: <class 'torch.Tensor'>
Weight dtype: torch.qint8
Weight shape: torch.Size([96, 24])
---

model.fc is a quantized Linear layer
Weight type: <class 'torch.Tensor'>
Weight dtype: torch.qint8
Weight shape: torch.Size([10, 24])
---


In [49]:
total_fp32_bytes = 0
total_int8_bytes = 0

for name, module in static_quantized_model_converted.named_modules():
    if isinstance(module, torch.nn.quantized.Linear):
        weight = module.weight()
        if weight.is_quantized:
            num_elements = weight.numel()
            total_fp32_bytes += num_elements * 4  # FP32
            total_int8_bytes += num_elements * 1  # INT8

print(f"Estimated FP32 size: {total_fp32_bytes / 1024:.2f} KB")
print(f"Estimated INT8 size: {total_int8_bytes / 1024:.2f} KB")
print(f"Compression ratio: {total_fp32_bytes / total_int8_bytes:.2f}x")

Estimated FP32 size: 32.81 KB
Estimated INT8 size: 8.20 KB
Compression ratio: 4.00x


In [58]:
def print_full_layer_analysis(model):
    print("Layer Name".ljust(40) + " | Num Parameters | Size (INT8) | Fits in 36KB?")
    print("-" * 75)

    total_params = 0
    total_kb = 0

    for name, module in model.named_modules():
        if isinstance(module, torch.nn.quantized.Linear):
            weight = module.weight()
            if weight.is_quantized:
                num_params = weight.numel()
                kb = (num_params * 1) / 1024  # int8 → 1 byte per param
                total_params += num_params
                total_kb += kb
                fits = kb <= 36.0
                print(f"{name.ljust(40)} | {str(num_params).rjust(13)} | {kb:.3f} KB     | {'✅' if fits else '❌'}")

    print("\n📊 Total Model Summary")
    print(f"Total number of parameters:      {total_params}")
    print(f"Estimated total size (INT8):     {total_kb:.3f} KB")
    print(f"Memory per parameter (INT8):     1 byte")

print_full_layer_analysis(static_quantized_model_converted)

Layer Name                               | Num Parameters | Size (INT8) | Fits in 36KB?
---------------------------------------------------------------------------
model.lstm.layers.0.layer_fw.cell.igates |          1248 | 1.219 KB     | ✅
model.lstm.layers.0.layer_fw.cell.hgates |          2304 | 2.250 KB     | ✅
model.lstm.layers.1.layer_fw.cell.igates |          2304 | 2.250 KB     | ✅
model.lstm.layers.1.layer_fw.cell.hgates |          2304 | 2.250 KB     | ✅
model.fc                                 |           240 | 0.234 KB     | ✅

📊 Total Model Summary
Total number of parameters:      8400
Estimated total size (INT8):     8.203 KB
Memory per parameter (INT8):     1 byte


In [59]:
def count_quantized_weights(model):
    total_params = 0
    for name, module in model.named_modules():
        if isinstance(module, torch.nn.quantized.Linear):
            weight = module.weight()
            if weight.is_quantized:
                num_params = weight.numel()
                print(f"{name} → {num_params} params")
                total_params += num_params
    print(f"\n✅ Total quantized parameters: {total_params}")

count_quantized_weights(static_quantized_model_converted)

model.lstm.layers.0.layer_fw.cell.igates → 1248 params
model.lstm.layers.0.layer_fw.cell.hgates → 2304 params
model.lstm.layers.1.layer_fw.cell.igates → 2304 params
model.lstm.layers.1.layer_fw.cell.hgates → 2304 params
model.fc → 240 params

✅ Total quantized parameters: 8400


In [41]:
static_quantized_model_converted.eval()
static_quant_model_save_path = os.path.join(project_root_dir, 'outputs', 'models', 'task-b-part-2_weights_no_jit.pth')
torch.save(static_quantized_model_converted.state_dict(), static_quant_model_save_path)

In [32]:
# static_quantized_model_converted.eval()

# static_quant_model_save_path = os.path.join(project_root_dir, 'outputs', 'models', 'task-b-part-2_weights_jit.pth')

# script_model = torch.jit.script(static_quantized_model_converted)
# torch.jit.save(script_model, static_quant_model_save_path)

In [35]:
import copy

copied_model = copy.deepcopy(static_quantized_model_converted)

print(copied_model)

StaticQuantizableModel(
  (quant): Quantize(scale=tensor([2.2913]), zero_point=tensor([87]), dtype=torch.quint8)
  (dequant): DeQuantize()
  (model): LSTMClassifier(
    (lstm): QuantizedLSTM(
      (layers): ModuleList(
        (0): _LSTMLayer(
          (layer_fw): _LSTMSingleLayer(
            (cell): QuantizableLSTMCell(
              (igates): QuantizedLinear(in_features=13, out_features=96, scale=0.82073974609375, zero_point=67, qscheme=torch.per_channel_affine)
              (hgates): QuantizedLinear(in_features=24, out_features=96, scale=0.04487304016947746, zero_point=55, qscheme=torch.per_channel_affine)
              (gates): QFunctional(
                scale=0.7883639335632324, zero_point=64
                (activation_post_process): Identity()
              )
              (input_gate): Sigmoid()
              (forget_gate): Sigmoid()
              (cell_gate): Tanh()
              (output_gate): Sigmoid()
              (fgate_cx): QFunctional(
                scale=0.476

## Task C: Power of 2 Quantization

In [36]:
import torch
import torch.nn as nn
import math

def round_scale_to_power_of_two(scale):
    if scale <= 0:
        return scale
    return 2 ** round(math.log2(scale))

def update_quantized_weights_and_show_scales(module: nn.Module):
    for name, submodule in module.named_children():
        if isinstance(submodule, torch.nn.quantized.Linear):
            old_wt = submodule.weight()
            qscheme = old_wt.qscheme()

            if qscheme == torch.per_tensor_affine:
                old_scale = old_wt.q_scale()
                old_zp = old_wt.q_zero_point()
                float_wt = old_wt.dequantize()
                new_scale = round_scale_to_power_of_two(old_scale)
                new_qweight = torch.quantize_per_tensor(
                    float_wt, scale=new_scale, zero_point=old_zp, dtype=torch.qint8
                )
                submodule.set_weight_bias(new_qweight, submodule.bias())
                print(f"[Per-tensor] {name}:")
                print(f"  Old scale: {old_scale:.8f}")
                print(f"  New scale: {new_scale:.8f}")
                print(f"  Zero point: {old_zp}")

            elif qscheme == torch.per_channel_affine:
                old_scales = old_wt.q_per_channel_scales()
                old_zps = old_wt.q_per_channel_zero_points()
                axis = old_wt.q_per_channel_axis()
                float_wt = old_wt.dequantize()
                new_scales = torch.tensor(
                    [round_scale_to_power_of_two(s.item()) for s in old_scales],
                    dtype=old_scales.dtype,
                    device=old_scales.device
                )
                new_qweight = torch.quantize_per_channel(
                    float_wt, new_scales, old_zps, axis=axis, dtype=torch.qint8
                )
                submodule.set_weight_bias(new_qweight, submodule.bias())
                print(f"[Per-channel] {name}:")
                print(f"  Old scales: {old_scales.tolist()}")
                print(f"  New scales: {new_scales.tolist()}")
                print(f"  Zero points: {old_zps.tolist()}")

            else:
                print(f"[Skipped] {name}: Unsupported qscheme {qscheme}")

        else:
            update_quantized_weights_and_show_scales(submodule)


In [37]:
update_quantized_weights_and_show_scales(static_quantized_model_converted)

[Per-channel] igates:
  Old scales: [0.0014680278254672885, 0.0020057877991348505, 0.0019790276419371367, 0.0018683441448956728, 0.002030798699706793, 0.002097557531669736, 0.0016376471612602472, 0.0017878131475299597, 0.002021649619564414, 0.0020765841472893953, 0.0016087722033262253, 0.001678281114436686, 0.0015331507893279195, 0.0019961087964475155, 0.0014595008688047528, 0.0021228939294815063, 0.0019447437953203917, 0.002361265243962407, 0.0015113336266949773, 0.0015943407779559493, 0.0021128575317561626, 0.0018155161524191499, 0.0019345361506566405, 0.0015810549957677722, 0.002372996648773551, 0.0020744006615132093, 0.0020200232975184917, 0.0026737854350358248, 0.002018067752942443, 0.002467712387442589, 0.0017417334020137787, 0.0023375835735350847, 0.0015620372723788023, 0.0016133295139297843, 0.0017817617626860738, 0.0017840875079855323, 0.001950323348864913, 0.00205528037622571, 0.0019604831468313932, 0.0017279349267482758, 0.001883684890344739, 0.001897631329484284, 0.00237500

In [38]:
from src.evaluate import ModelEvaluator
print(device)
quant_test_instance_2 = ModelEvaluator(
    static_quantized_model_converted, 
    test_loader,
    device
)

cpu


In [39]:
quant_test_instance_2.evaluate()

  cx_tensor = torch.stack(cx_list)



 Accuracy on test data: 83.83%

 Classification Report:
              precision    recall  f1-score   support

           0     0.7174    0.9167    0.8049        72
           1     0.8923    0.8406    0.8657        69
           2     0.7442    0.5614    0.6400        57
           3     0.7778    1.0000    0.8750        56
           4     1.0000    0.9153    0.9558        59
           5     0.8824    0.7143    0.7895        63
           6     0.9592    0.8393    0.8952        56
           7     0.7619    0.8727    0.8136        55
           8     0.9615    0.8772    0.9174        57
           9     0.7966    0.8393    0.8174        56

    accuracy                         0.8383       600
   macro avg     0.8493    0.8377    0.8374       600
weighted avg     0.8480    0.8383    0.8371       600


 Confusion Matrix:
[[66  0  3  1  0  1  0  0  0  1]
 [ 1 58  0  0  0  1  0  9  0  0]
 [22  0 32  2  0  0  0  1  0  0]
 [ 0  0  0 56  0  0  0  0  0  0]
 [ 0  4  0  0 54  0  0  1  0  0]

In [42]:
static_quantized_model_converted.eval()

static_quant_model_save_path = os.path.join(project_root_dir, 'outputs', 'models', 'task-b-part-2_weights_jit.pth')

script_model = torch.jit.script(static_quantized_model_converted)
torch.jit.save(script_model, static_quant_model_save_path)