# LSTM for Spoken Digit Classification

This notebook builds and trains a recurrent neural network (LSTM) to classify spoken digits (0–9) from audio recordings.

- Dataset: [Free Spoken Digit Dataset (FSDD)](https://github.com/Jakobovski/free-spoken-digit-dataset)
- Framework: PyTorch
- Architecture: RNN with LSTM layers

In [58]:
import sys
import os

# Add the project root (parent of current folder) to Python path
project_root_dir = os.path.abspath(os.path.join(os.getcwd(), '..'))
sys.path.append(project_root_dir)


## Load Model Configuration from YAML

To make the training pipeline configurable and modular, we store model parameters like number of LSTM layers, hidden size, and learning rate etc in a YAML file. This structure enables quick adaptation to related tasks B, and C.

This section loads the model configuration using a custom utility function.

In [59]:
import src.utils as utils

In [60]:
utils.set_seed(42)

[INFO] Random seed set to: 42


In [61]:
import yaml
import json

model_config_path = os.path.join(project_root_dir, 'config', 'model_config.yaml')
model_config = utils.read_yaml_file(model_config_path)
# print(json.dumps(model_config, indent=2))

## Load and Split Dataset for Training and Evaluation

In this section, we load the recordings data from disk, generate data-label pairs, and split them into training and test sets according to the `test_size` defined in the YAML file.

Using `test_size` and `seed` from the YAML config ensures that experiments are reproducible and easily tunable for other tasks by simply updating the configuration.


In [62]:
data_path = model_config['dataset']['path']
test_data_size = model_config['data_splitting']['test_size']
seed = model_config['experiment']['seed']

In [63]:
data_label_pairs = utils.prepare_data_label_pairs(data_path)

In [64]:
from sklearn.model_selection import train_test_split
train_data, test_data = train_test_split(data_label_pairs, test_size=test_data_size, random_state=seed)

## Transform Raw Data into PyTorch Dataset Objects

The `AudioFeaturesDataset` class converts raw data-label pairs into PyTorch-compatible datasets that provide easy access to samples and labels.

AudioFeaturesDataset is a custom dataset class that:

- Loads audio recordings of spoken digits along with their labels.
- Optionally cleans the audio by filtering out noise.
- Extracts MFCC features (a common speech feature).
- Pads or trims these features to a fixed length so all inputs have the same shape.
- Works with PyTorch to provide samples one-by-one when training or testing a model.
- It helps prepare your audio data in the right format for training neural networks efficiently.


In [65]:
from src.data_preprocessor import AudioFeaturesDataset

train_dataset = AudioFeaturesDataset(train_data)
test_dataset = AudioFeaturesDataset(test_data)

In [66]:
print(f"Train size: {len(train_dataset)}")
print(f"Test size: {len(test_dataset)}")

Train size: 2400
Test size: 600


## Create DataLoaders for Batch Processing

Using PyTorch DataLoaders, we enable efficient loading, batching, and shuffling of data during training and evaluation.

In [67]:
import torch

train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=32, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=32, shuffle=False)

## LSTM Model Definition

A simple `n`-layer LSTM followed by a fully connected output layer. Variable `n` is defined in the configuration YAML file

In [68]:
input_dim = model_config['model']['input_dim']
hidden_dim = model_config['model']['hidden_dim']
num_layers = model_config['model']['num_layers']
output_dim = model_config['model']['output_dim']

In [69]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

In [70]:
import src.model as model
import torch.nn as nn
import torch.optim as optim

float_model = model.LSTMClassifier(input_dim=input_dim,
                       hidden_dim=hidden_dim,
                       num_layers=num_layers,
                       output_dim=output_dim).to(device)

## Training Loop

In [71]:
learning_rate = model_config['training']['learning_rate']
epochs = model_config['training']['epochs']

In [72]:
from src.train import ModelTrainer
trainer_instance = ModelTrainer(
    float_model, 
    epochs,
    train_loader,
    device,
    learning_rate
)

In [73]:
trainer_instance.train()

Epoch [1/20], Loss: 134.0127, Accuracy: 57.67%
Epoch [2/20], Loss: 49.1018, Accuracy: 86.17%
Epoch [3/20], Loss: 21.3893, Accuracy: 96.75%
Epoch [4/20], Loss: 12.2558, Accuracy: 96.42%
Epoch [5/20], Loss: 9.3282, Accuracy: 98.12%
Epoch [6/20], Loss: 6.7593, Accuracy: 96.42%
Epoch [7/20], Loss: 5.2892, Accuracy: 98.58%
Epoch [8/20], Loss: 4.0651, Accuracy: 99.17%
Epoch [9/20], Loss: 1.7189, Accuracy: 99.33%
Epoch [10/20], Loss: 1.9559, Accuracy: 99.79%
Epoch [11/20], Loss: 3.8065, Accuracy: 95.79%
Epoch [12/20], Loss: 3.7462, Accuracy: 99.79%
Epoch [13/20], Loss: 1.1268, Accuracy: 99.79%
Epoch [14/20], Loss: 1.7250, Accuracy: 99.08%
Epoch [15/20], Loss: 2.5933, Accuracy: 99.62%
Epoch [16/20], Loss: 2.1036, Accuracy: 99.25%
Epoch [17/20], Loss: 1.4843, Accuracy: 99.67%
Epoch [18/20], Loss: 0.4845, Accuracy: 99.92%
Epoch [19/20], Loss: 0.8350, Accuracy: 100.00%
Epoch [20/20], Loss: 0.1467, Accuracy: 100.00%


In [75]:
_, _ = utils.get_model_params_size(float_model)


Layer-wise parameter counts:
lstm.weight_ih_l0              -> 6,656 params
lstm.weight_hh_l0              -> 65,536 params
lstm.bias_ih_l0                -> 512 params
lstm.bias_hh_l0                -> 512 params
lstm.weight_ih_l1              -> 65,536 params
lstm.weight_hh_l1              -> 65,536 params
lstm.bias_ih_l1                -> 512 params
lstm.bias_hh_l1                -> 512 params
fc.weight                      -> 1,280 params
fc.bias                        -> 10 params


 Total Parameters: 206,602
Estimated Memory: 807.04 KB (0.79 MB)


## Evaluation & Visualization

In [76]:
from src.evaluate import ModelEvaluator

In [77]:
test_instance = ModelEvaluator(
    float_model, 
    test_loader,
    device
)


In [78]:
test_instance.evaluate()


 Accuracy on test data: 99.17%

 Classification Report:
              precision    recall  f1-score   support

           0     1.0000    1.0000    1.0000        52
           1     1.0000    1.0000    1.0000        65
           2     0.9692    1.0000    0.9844        63
           3     1.0000    0.9667    0.9831        60
           4     1.0000    1.0000    1.0000        55
           5     1.0000    0.9811    0.9905        53
           6     1.0000    0.9846    0.9922        65
           7     1.0000    0.9831    0.9915        59
           8     0.9683    1.0000    0.9839        61
           9     0.9853    1.0000    0.9926        67

    accuracy                         0.9917       600
   macro avg     0.9923    0.9915    0.9918       600
weighted avg     0.9919    0.9917    0.9917       600


 Confusion Matrix:
[[52  0  0  0  0  0  0  0  0  0]
 [ 0 65  0  0  0  0  0  0  0  0]
 [ 0  0 63  0  0  0  0  0  0  0]
 [ 0  0  1 58  0  0  0  0  1  0]
 [ 0  0  0  0 55  0  0  0  0  0]

In [79]:
float_model_save_path = os.path.join(project_root_dir, 'outputs', 'models', 'float_model_weights.pth')
torch.save(float_model.state_dict(), float_model_save_path)

In [80]:
for name, param in float_model.named_parameters():
    print(f"{name}: dtype={param.dtype}")

lstm.weight_ih_l0: dtype=torch.float32
lstm.weight_hh_l0: dtype=torch.float32
lstm.bias_ih_l0: dtype=torch.float32
lstm.bias_hh_l0: dtype=torch.float32
lstm.weight_ih_l1: dtype=torch.float32
lstm.weight_hh_l1: dtype=torch.float32
lstm.bias_ih_l1: dtype=torch.float32
lstm.bias_hh_l1: dtype=torch.float32
fc.weight: dtype=torch.float32
fc.bias: dtype=torch.float32


## Task B — Retrain Under Memory Constraints

- All parameters of any one layer must fit into memory simultaneously
- Maximum memory available for layer parameters is 36 KB

Since Pytorch stores all the layer parameters as floating point values, as 32-bit floats(4 bytes per parameter), this implies that the 
maximum number of parameters should not exceed

$$
\text{Max total number of parameters} = \frac{36\,\text{KB}}{4\,\text{bytes}} = \frac{36 \times 1024}{4} = 9,216 \text{ parameters}
$$

## Model Parameter Breakdown for 2 Layers LSTM
The following calculations are based on the parameter definitions from PyTorch's LSTM implementation, as described in the \href{https://docs.pytorch.org/docs/stable/generated/torch.nn.LSTM.html\#torch.nn.LSTM}{official documentation}. 

Let’s define the following variables:

$$
\begin{aligned}
\text{Input dimension} &= i \\
\text{Hidden dimension} &= h \\
\text{Output dimension} &= o \\
\text{Number of LSTM layers} &= 2 \\
\text{Fully Connected (Linear) layer count} &= 1 \\
\end{aligned}
$$

These will be used to calculate the total number of parameters in the model.

---

## LSTM Layer Parameters

Each LSTM layer has 4 internal gates (input, forget, cell, output).  
So each layer has:

$$
\begin{aligned}
W\_{ih} &: \text{Weights from input to hidden} \rightarrow \text{shape: } (4 \times h, i) \\
W\_{hh} &: \text{Weights from hidden to hidden} \rightarrow \text{shape: } (4 \times h, h) \\
b\_{ih}, b\_{hh} &: \text{Biases for each gate} \rightarrow \text{shape: } (4 \times h)
\end{aligned}
$$

### Calculations:

$$
\begin{aligned}
&\textbf{Layer 1 Parameters} \\
&W\_ih: 4 \times h \times i \\
&W\_hh: 4 \times h \times h \\
&b\_ih: 4 \times h \\
&b\_hh: 4 \times h \\
&\\
&\textbf{Layer 2 Parameters} \\
&W\_ih: 4 \times h \times h \\
&W\_hh: 4 \times h \times h \\
&b\_ih: 4 \times h \\
&b\_hh: 4 \times h \\
&\\
&\text{Total LSTM parameters} = (4 \times h \times i) + (4 \times h \times h) + (8 \times h) = 4hi + 4h^2 + 8h
\end{aligned}
$$

---

## Fully Connected (Linear) Layer

- Input features = $$\text{hidden\_dim} = h$$  
- Output features = $$\text{output\_dim} = o$$

### Calculations:
$$
\begin{aligned}
&\text{Weights}: \quad h \times o \\
&\text{Bias}: \quad o \\
&\textbf{Total Linear parameters} = h \cdot o + o = o(h + 1)
\end{aligned}
$$

---

## Total Parameters

The total number of parameters in the model is the sum of the LSTM and Linear layer parameters:

$$
\text{Total Parameters} = \text{Total LSTM Parameters} + \text{Total Linear Parameters}
$$

From earlier calculations:

- $\text{Total LSTM Parameters} = 4hi + 4h^2 + 8h$
- $\text{Total Linear Parameters} = h \cdot o + o = o(h + 1)$

Therefore:

$$
\text{Total Parameters} = (4hi + 4h^2 + 8h) + o(h + 1)
$$

Or more compactly:

$$
\boxed{\text{Total Parameters} = 4hi + 4h^2 + 8h + o(h + 1)}
$$

---


## Designing a 2-Layer LSTM Under a 36 KB Memory Constraint

Given an output dimension of $10$ (representing 10 classes or digits) and an input dimension of $13$ (corresponding to 13 MFCC coefficients per time step), the total number of parameters in the model reduces to the following quadratic expression:

$$
\text{Total Parameters} = 12h^2 + 78h + 10
$$

Here, $ h $ (the hidden dimension) remains the only variable we need to solve for.

Since the memory constraint allows for a maximum of \textbf{9,216 parameters}, the hidden dimension must satisfy:

$$
12h^2 + 78h + 10 \leq 9216
$$

The maximum valid integer value of $ h $ that satisfies the inequality is:
$$
h = 24
$$

---

In [82]:
import src.model as model_with_constraints

hidden_dim = 24
memory_constraint_model = model_with_constraints.LSTMClassifier(input_dim=input_dim,
                                           hidden_dim=hidden_dim,
                                           num_layers=num_layers,
                                           output_dim=output_dim).to(device)

In [83]:
from src.train import ModelTrainer
trainer_instance_2 = ModelTrainer(
    memory_constraint_model, 
    epochs,
    train_loader,
    device,
    learning_rate
)

In [84]:
trainer_instance_2.train()

Epoch [1/20], Loss: 172.7911, Accuracy: 14.88%
Epoch [2/20], Loss: 161.7900, Accuracy: 32.71%
Epoch [3/20], Loss: 122.7087, Accuracy: 58.21%
Epoch [4/20], Loss: 93.8751, Accuracy: 70.17%
Epoch [5/20], Loss: 71.3963, Accuracy: 77.58%
Epoch [6/20], Loss: 58.7306, Accuracy: 84.46%
Epoch [7/20], Loss: 54.2336, Accuracy: 83.71%
Epoch [8/20], Loss: 40.8367, Accuracy: 89.00%
Epoch [9/20], Loss: 34.6832, Accuracy: 91.25%
Epoch [10/20], Loss: 30.1345, Accuracy: 85.88%
Epoch [11/20], Loss: 31.7366, Accuracy: 92.08%
Epoch [12/20], Loss: 23.9289, Accuracy: 94.58%
Epoch [13/20], Loss: 20.8943, Accuracy: 94.79%
Epoch [14/20], Loss: 18.5835, Accuracy: 96.04%
Epoch [15/20], Loss: 16.9845, Accuracy: 95.83%
Epoch [16/20], Loss: 14.5756, Accuracy: 96.62%
Epoch [17/20], Loss: 13.4785, Accuracy: 96.67%
Epoch [18/20], Loss: 15.8476, Accuracy: 97.04%
Epoch [19/20], Loss: 11.0573, Accuracy: 96.92%
Epoch [20/20], Loss: 10.5518, Accuracy: 97.96%


In [85]:
_, _ = utils.get_model_params_size(memory_constraint_model)


Layer-wise parameter counts:
lstm.weight_ih_l0              -> 1,248 params
lstm.weight_hh_l0              -> 2,304 params
lstm.bias_ih_l0                -> 96 params
lstm.bias_hh_l0                -> 96 params
lstm.weight_ih_l1              -> 2,304 params
lstm.weight_hh_l1              -> 2,304 params
lstm.bias_ih_l1                -> 96 params
lstm.bias_hh_l1                -> 96 params
fc.weight                      -> 240 params
fc.bias                        -> 10 params


 Total Parameters: 8,794
Estimated Memory: 34.35 KB (0.03 MB)


In [86]:
from src.evaluate import ModelEvaluator

In [87]:
test_instance_2 = ModelEvaluator(
    memory_constraint_model, 
    test_loader,
    device
)

In [88]:
test_instance_2.evaluate()


 Accuracy on test data: 96.50%

 Classification Report:
              precision    recall  f1-score   support

           0     0.9811    1.0000    0.9905        52
           1     1.0000    1.0000    1.0000        65
           2     0.9839    0.9683    0.9760        63
           3     0.9333    0.9333    0.9333        60
           4     0.9818    0.9818    0.9818        55
           5     0.9636    1.0000    0.9815        53
           6     0.8939    0.9077    0.9008        65
           7     0.9831    0.9831    0.9831        59
           8     0.9333    0.9180    0.9256        61
           9     1.0000    0.9701    0.9848        67

    accuracy                         0.9650       600
   macro avg     0.9654    0.9662    0.9657       600
weighted avg     0.9652    0.9650    0.9650       600


 Confusion Matrix:
[[52  0  0  0  0  0  0  0  0  0]
 [ 0 65  0  0  0  0  0  0  0  0]
 [ 1  0 61  0  1  0  0  0  0  0]
 [ 0  0  1 56  0  0  2  1  0  0]
 [ 0  0  0  0 54  0  1  0  0  0]

In [30]:
memory_constraint_model_save_path = os.path.join(project_root_dir, 'outputs', 'models', 'memory_constraint_model_weights.pth')
torch.save(memory_constraint_model.state_dict(), memory_constraint_model_save_path)

In [31]:
import tempfile

with tempfile.NamedTemporaryFile(delete=True) as tmp:
    torch.save(memory_constraint_model.state_dict(), tmp.name)
    memory_model_size_kb = os.path.getsize(tmp.name) / 1024
    print(f"Memory Constraint model size: {memory_model_size_kb:.2f} KB")

Memory Constraint model size: 37.13 KB


## Floating Point Restriction (Task B Continued)

In addition to the 36 KB memory constraint, the hardware for Task B is limited in computational capability and **does not support floating point operations**. It means the network must operate using **integer or fixed-point arithmetic** only. **Weights, activations, and computations** should be **quantized** to lower-precision formats such as 8-bit integers (INT8)


### Dynamic Quantization

In [32]:
memory_constraint_model_cpu = memory_constraint_model.to('cpu') 
# Apply dynamic quantization to the entire model or just LSTM/Linear layers
dynamic_quantized_model = torch.quantization.quantize_dynamic(
    memory_constraint_model_cpu,  # the model instance
    {nn.LSTM, nn.Linear},  # layers to quantize
    dtype=torch.qint8  # quantize to 8-bit integers
)

In [33]:
from src.evaluate import ModelEvaluator

In [34]:
device = torch.device("cpu")
test_instance = ModelEvaluator(
    memory_constraint_model_cpu, 
    test_loader,
    device
)

In [35]:
test_instance.evaluate()


 Accuracy on test data: 58.50%

 Classification Report:
              precision    recall  f1-score   support

           0     0.5882    0.7692    0.6667        52
           1     0.0000    0.0000    0.0000        65
           2     0.5625    0.4286    0.4865        63
           3     0.7931    0.7667    0.7797        60
           4     0.6250    0.4545    0.5263        55
           5     0.7193    0.7736    0.7455        53
           6     0.7424    0.7538    0.7481        65
           7     0.3617    0.8644    0.5100        59
           8     0.6716    0.7377    0.7031        61
           9     0.4909    0.4030    0.4426        67

    accuracy                         0.5850       600
   macro avg     0.5555    0.5952    0.5608       600
weighted avg     0.5493    0.5850    0.5530       600


 Confusion Matrix:
[[40  0  4  1  2  0  0  1  0  4]
 [ 0  0  0  0  0  9  0 46  0 10]
 [20  0 27  1  7  1  1  0  1  5]
 [ 0  0  2 46  4  0  1  0  7  0]
 [ 8  0 14  4 25  0  0  0  0  4]

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


#### Compare the Inference times for Floating point and Quantised model

#### Compare the model sizes of Floating point and Quantised model

### Static Quantization

In [36]:
import src.model as model

hidden_dim = 24
quant_model = model.LSTMClassifier(input_dim=input_dim,
                       hidden_dim=hidden_dim,
                       num_layers=num_layers,
                       output_dim=output_dim).to(device)

In [37]:
memory_constraint_model_load_path = os.path.join(project_root_dir, 'outputs', 'models', 'memory_constraint_model_weights.pth')
quant_model.load_state_dict(torch.load(memory_constraint_model_load_path))

<All keys matched successfully>

In [38]:
quant_model.eval()

LSTMClassifier(
  (lstm): LSTM(13, 24, num_layers=2, batch_first=True, dropout=0.3)
  (dropout): Dropout(p=0.3, inplace=False)
  (fc): Linear(in_features=24, out_features=10, bias=True)
)

In [39]:
import src.quantize as quantize
static_quant_model = quantize.StaticQuantizableModel(quant_model)

In [40]:
print(static_quant_model)

StaticQuantizableModel(
  (quant): QuantStub()
  (dequant): DeQuantStub()
  (model): LSTMClassifier(
    (lstm): LSTM(13, 24, num_layers=2, batch_first=True, dropout=0.3)
    (dropout): Dropout(p=0.3, inplace=False)
    (fc): Linear(in_features=24, out_features=10, bias=True)
  )
)


In [1]:
static_quant_model.qconfig = torch.quantization.get_default_qconfig('fbgemm')  # For edge devices
torch.backends.quantized.engine = 'fbgemm'

static_quant_model_prepared = torch.quantization.prepare(static_quant_model)

NameError: name 'torch' is not defined

In [42]:
calibration_indices = list(range(100))
calibration_dataset = torch.utils.data.Subset(test_dataset, calibration_indices)

calibration_dataloader = torch.utils.data.DataLoader(calibration_dataset, batch_size=32)

In [43]:
static_quant_model_prepared = static_quant_model_prepared.to('cpu')
device = torch.device("cpu")

quantize.calibrate(static_quant_model_prepared, calibration_dataloader, device)

In [44]:
static_quantized_model = torch.quantization.convert(static_quant_model_prepared)

In [45]:
print(device)

cpu


In [46]:
from src.evaluate import ModelEvaluator
print(device)
quant_test_instance = ModelEvaluator(
    static_quantized_model, 
    test_loader,
    device
)



cpu


In [47]:
quant_test_instance.evaluate()

  hx_tensor = torch.stack(hx_list)
  cx_tensor = torch.stack(cx_list)



 Accuracy on test data: 51.83%

 Classification Report:
              precision    recall  f1-score   support

           0     0.4271    0.7885    0.5541        52
           1     0.3077    0.0615    0.1026        65
           2     0.1000    0.0159    0.0274        63
           3     0.7121    0.7833    0.7460        60
           4     0.7500    0.3818    0.5060        55
           5     0.7674    0.6226    0.6875        53
           6     0.8000    0.4923    0.6095        65
           7     0.3535    0.5932    0.4430        59
           8     0.5600    0.9180    0.6957        61
           9     0.3905    0.6119    0.4767        67

    accuracy                         0.5183       600
   macro avg     0.5168    0.5269    0.4849       600
weighted avg     0.5106    0.5183    0.4773       600


 Confusion Matrix:
[[41  0  1  1  3  0  0  3  1  2]
 [ 0  4  0  1  0  6  0 25  0 29]
 [36  0  1  6  4  2  0  2  1 11]
 [ 1  0  0 47  0  0  0  0 11  1]
 [18  0  8  4 21  0  0  1  0  3]

In [48]:
static_quantized_model.eval()
correct = 0
total = 0

all_preds = []
all_labels = []

with torch.no_grad():

    for inputs, labels in test_loader:
        # inputs = inputs.to(device)
        labels = labels.to(device)

        outputs = static_quantized_model(inputs)
        _, predicted = torch.max(outputs, 1)

  hx_tensor = torch.stack(hx_list)
  cx_tensor = torch.stack(cx_list)


In [49]:
_, _ = utils.get_model_params_size(static_quantized_model)


Layer-wise parameter counts:


 Total Parameters: 0
Estimated Memory: 0.00 KB (0.00 MB)


In [50]:
import tempfile

with tempfile.NamedTemporaryFile(delete=True) as tmp:
    torch.save(static_quantized_model.state_dict(), tmp.name)
    quantized_model_size_kb = os.path.getsize(tmp.name) / 1024
    print(f"Quantized model size: {quantized_model_size_kb:.2f} KB")

Quantized model size: 35.48 KB


In [51]:
for name, module in static_quantized_model.named_modules():
    print(f"Layer: {name} | Type: {type(module)}")

Layer:  | Type: <class 'src.quantize.StaticQuantizableModel'>
Layer: quant | Type: <class 'torch.ao.nn.quantized.modules.Quantize'>
Layer: dequant | Type: <class 'torch.ao.nn.quantized.modules.DeQuantize'>
Layer: model | Type: <class 'src.model.LSTMClassifier'>
Layer: model.lstm | Type: <class 'torch.ao.nn.quantized.modules.rnn.LSTM'>
Layer: model.lstm.layers | Type: <class 'torch.nn.modules.container.ModuleList'>
Layer: model.lstm.layers.0 | Type: <class 'torch.ao.nn.quantizable.modules.rnn._LSTMLayer'>
Layer: model.lstm.layers.0.layer_fw | Type: <class 'torch.ao.nn.quantizable.modules.rnn._LSTMSingleLayer'>
Layer: model.lstm.layers.0.layer_fw.cell | Type: <class 'torch.ao.nn.quantizable.modules.rnn.LSTMCell'>
Layer: model.lstm.layers.0.layer_fw.cell.igates | Type: <class 'torch.ao.nn.quantized.modules.linear.Linear'>
Layer: model.lstm.layers.0.layer_fw.cell.igates._packed_params | Type: <class 'torch.ao.nn.quantized.modules.linear.LinearPackedParams'>
Layer: model.lstm.layers.0.layer

In [52]:
for name, module in static_quantized_model.named_modules():
    if isinstance(module, torch.nn.quantized.Linear):
        print(f"\n{name} is a quantized Linear layer")
        weight = module.weight()
        print("Weight type:", type(weight))
        print("Weight dtype:", weight.dtype)
        print("Weight shape:", weight.shape)
        print("---")


model.lstm.layers.0.layer_fw.cell.igates is a quantized Linear layer
Weight type: <class 'torch.Tensor'>
Weight dtype: torch.qint8
Weight shape: torch.Size([96, 13])
---

model.lstm.layers.0.layer_fw.cell.hgates is a quantized Linear layer
Weight type: <class 'torch.Tensor'>
Weight dtype: torch.qint8
Weight shape: torch.Size([96, 24])
---

model.lstm.layers.1.layer_fw.cell.igates is a quantized Linear layer
Weight type: <class 'torch.Tensor'>
Weight dtype: torch.qint8
Weight shape: torch.Size([96, 24])
---

model.lstm.layers.1.layer_fw.cell.hgates is a quantized Linear layer
Weight type: <class 'torch.Tensor'>
Weight dtype: torch.qint8
Weight shape: torch.Size([96, 24])
---

model.fc is a quantized Linear layer
Weight type: <class 'torch.Tensor'>
Weight dtype: torch.qint8
Weight shape: torch.Size([10, 24])
---


In [53]:
import numpy as np

for name, module in static_quantized_model.named_modules():
    print(module)
    if hasattr(module, 'weight') and isinstance(module.weight, torch.Tensor):
        weight_tensor = module.weight()
        if weight_tensor.is_quantized:
            byte_size = weight_tensor.int_repr().element_size() * weight_tensor.int_repr().nelement()
            print(f"{name} quantized weight size: {byte_size / 1024:.4f} KB")

StaticQuantizableModel(
  (quant): Quantize(scale=tensor([3.1986]), zero_point=tensor([95]), dtype=torch.quint8)
  (dequant): DeQuantize()
  (model): LSTMClassifier(
    (lstm): QuantizedLSTM(
      (layers): ModuleList(
        (0): _LSTMLayer(
          (layer_fw): _LSTMSingleLayer(
            (cell): QuantizableLSTMCell(
              (igates): QuantizedLinear(in_features=13, out_features=96, scale=1.1903992891311646, zero_point=63, qscheme=torch.per_channel_affine)
              (hgates): QuantizedLinear(in_features=24, out_features=96, scale=0.039823323488235474, zero_point=48, qscheme=torch.per_channel_affine)
              (gates): QFunctional(
                scale=1.1539145708084106, zero_point=63
                (activation_post_process): Identity()
              )
              (input_gate): Sigmoid()
              (forget_gate): Sigmoid()
              (cell_gate): Tanh()
              (output_gate): Sigmoid()
              (fgate_cx): QFunctional(
                scale=0.

In [54]:
total_fp32_bytes = 0
total_int8_bytes = 0

for name, module in static_quantized_model.named_modules():
    if isinstance(module, torch.nn.quantized.Linear):
        weight = module.weight()
        if weight.is_quantized:
            num_elements = weight.numel()
            total_fp32_bytes += num_elements * 4  # FP32
            total_int8_bytes += num_elements * 1  # INT8

print(f"Estimated FP32 size: {total_fp32_bytes / 1024:.2f} KB")
print(f"Estimated INT8 size: {total_int8_bytes / 1024:.2f} KB")
print(f"Compression ratio: {total_fp32_bytes / total_int8_bytes:.2f}x")

Estimated FP32 size: 32.81 KB
Estimated INT8 size: 8.20 KB
Compression ratio: 4.00x


In [55]:
print(static_quantized_model)

StaticQuantizableModel(
  (quant): Quantize(scale=tensor([3.1986]), zero_point=tensor([95]), dtype=torch.quint8)
  (dequant): DeQuantize()
  (model): LSTMClassifier(
    (lstm): QuantizedLSTM(
      (layers): ModuleList(
        (0): _LSTMLayer(
          (layer_fw): _LSTMSingleLayer(
            (cell): QuantizableLSTMCell(
              (igates): QuantizedLinear(in_features=13, out_features=96, scale=1.1903992891311646, zero_point=63, qscheme=torch.per_channel_affine)
              (hgates): QuantizedLinear(in_features=24, out_features=96, scale=0.039823323488235474, zero_point=48, qscheme=torch.per_channel_affine)
              (gates): QFunctional(
                scale=1.1539145708084106, zero_point=63
                (activation_post_process): Identity()
              )
              (input_gate): Sigmoid()
              (forget_gate): Sigmoid()
              (cell_gate): Tanh()
              (output_gate): Sigmoid()
              (fgate_cx): QFunctional(
                scale=0.

In [1]:
def round_scale_to_power_of_two(scale):
    return float(2 ** round(np.log2(scale)))

# Function to update quantized weights with new scale
def update_quantized_weights(module: nn.Module):
    for name, submodule in module.named_children():
        if isinstance(submodule, torch.nn.quantized.Linear):
            old_wt = submodule.weight()
            old_scale = old_wt.q_scale()
            old_zp = old_wt.q_zero_point()

            # Dequantize
            float_wt = old_wt.dequantize()

            # Round scale to nearest power of 2
            new_scale = round_scale_to_power_of_two(old_scale)

            # Requantize with new scale
            new_qweight = torch.quantize_per_tensor(float_wt, scale=new_scale, zero_point=old_zp, dtype=torch.qint8)

            # Replace weight
            submodule.set_weight_bias(new_qweight, submodule.bias())

            print(f"[Updated] {name}: scale {old_scale:.6f} → {new_scale:.6f}, zero_point = {old_zp}")

        else:
            update_quantized_weights(submodule)

In [2]:
# --- Step 1: Extract and update all quantized scales in-place ---
print("🔍 Updating quantized weights to use power-of-2 scales...")
update_quantized_weights(static_quantized_model)

🔍 Updating quantized weights to use power-of-2 scales...


NameError: name 'static_quantized_model' is not defined

In [None]:
from src.evaluate import ModelEvaluator


static_quantized_model.eval()
device = torch.device("cpu" if torch.cuda.is_available() else "cpu")

static_quantized_model.to(device)


quant_test_instance_2 = ModelEvaluator(
    static_quantized_model, 
    test_loader,
    device
)


In [None]:
quant_test_instance_2.evaluate()

In [None]:
static_quantized_model.eval()

int8_model_save_path = os.path.join(project_root_dir, 'outputs', 'models', 'int8_model_weights.pth')

script_model = torch.jit.script(static_quantized_model)
torch.jit.save(script_model, int8_model_save_path)

## Total Parameters for 1-Layer LSTM

For a single LSTM layer, with the same input and output dimensions (13 MFCC coefficients and 10 output classes), the total number of parameters simplifies to:

$$
\text{Total Parameters} = 4h^2 + 70h + 10
$$