# LSTM for Spoken Digit Classification

This notebook builds and trains a recurrent neural network (LSTM) to classify spoken digits (0–9) from audio recordings.

- Dataset: [Free Spoken Digit Dataset (FSDD)](https://github.com/Jakobovski/free-spoken-digit-dataset)
- Framework: PyTorch
- Architecture: RNN with LSTM layers

In [1]:
import sys
import os

# Add the project root (parent of current folder) to Python path
project_root_dir = os.path.abspath(os.path.join(os.getcwd(), '..'))
sys.path.append(project_root_dir)


## Load Model Configuration from YAML

To make the training pipeline configurable and modular, we store model parameters like number of LSTM layers, hidden size, and learning rate etc in a YAML file. This structure enables quick adaptation to related tasks B, and C.

This section loads the model configuration using a custom utility function.

In [2]:
import src.utils as utils

In [3]:
utils.set_seed(42)

[INFO] Random seed set to: 42


In [4]:
import yaml
import json

model_config_path = os.path.join(project_root_dir, 'config', 'model_config.yaml')
model_config = utils.read_yaml_file(model_config_path)
# print(json.dumps(model_config, indent=2))

## Load and Split Dataset for Training and Evaluation

In this section, we load the recordings data from disk, generate data-label pairs, and split them into training and test sets according to the `test_size` defined in the YAML file.

Using `test_size` and `seed` from the YAML config ensures that experiments are reproducible and easily tunable for other tasks by simply updating the configuration.


In [5]:
data_path = model_config['dataset']['path']
test_data_size = model_config['data_splitting']['test_size']
seed = model_config['experiment']['seed']

In [6]:
data_label_pairs, calibration_samples = utils.prepare_data_label_pairs(data_path, calibration_samples_per_class=50)

In [7]:
print(len(calibration_samples))

500


In [8]:
from sklearn.model_selection import train_test_split
train_data, test_data = train_test_split(data_label_pairs, test_size=test_data_size, random_state=seed)

## Transform Raw Data into PyTorch Dataset Objects

The `AudioFeaturesDataset` class converts raw data-label pairs into PyTorch-compatible datasets that provide easy access to samples and labels.

AudioFeaturesDataset is a custom dataset class that:

- Loads audio recordings of spoken digits along with their labels.
- Optionally cleans the audio by filtering out noise.
- Extracts MFCC features (a common speech feature).
- Pads or trims these features to a fixed length so all inputs have the same shape.
- Works with PyTorch to provide samples one-by-one when training or testing a model.
- It helps prepare your audio data in the right format for training neural networks efficiently.


In [9]:
from src.data_preprocessor import AudioFeaturesDataset

train_dataset = AudioFeaturesDataset(train_data)
test_dataset = AudioFeaturesDataset(test_data)

In [10]:
print(f"Train size: {len(train_dataset)}")
print(f"Test size: {len(test_dataset)}")


Train size: 2400
Test size: 600


## Create DataLoaders for Batch Processing

Using PyTorch DataLoaders, we enable efficient loading, batching, and shuffling of data during training and evaluation.

In [11]:
import torch

train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=32, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=32, shuffle=False)

In [15]:
input_dim = model_config['model']['input_dim']
hidden_dim = model_config['model']['hidden_dim']
num_layers = model_config['model']['num_layers']
output_dim = model_config['model']['output_dim']

In [16]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

## Power of 2 Quantization

In [59]:
import src.model as model

hidden_dim = 24
po2_model = model.LSTMClassifier(input_dim=input_dim,
                       hidden_dim=hidden_dim,
                       num_layers=num_layers,
                       output_dim=output_dim).to(device)

In [76]:
static_quant_model_load_path = os.path.join(project_root_dir, 'outputs', 'models', 'task-b-part-2_weights_jit.pth')

po_model = torch.jit.load(static_quant_model_load_path)
# po_model = torch.jit.freeze(po_model)

# po2_model.load_state_dict(torch.load(static_quant_model_load_path))

In [77]:
po_model.eval()  # Set model to evaluation mode
# print(po_model.model.lstm.layers.0.layer_fw.cell.igates.scale)


RecursiveScriptModule(
  original_name=StaticQuantizableModel
  (quant): RecursiveScriptModule(original_name=Quantize)
  (dequant): RecursiveScriptModule(original_name=DeQuantize)
  (model): RecursiveScriptModule(
    original_name=LSTMClassifier
    (lstm): RecursiveScriptModule(
      original_name=LSTM
      (layers): RecursiveScriptModule(
        original_name=ModuleList
        (0): RecursiveScriptModule(
          original_name=_LSTMLayer
          (layer_fw): RecursiveScriptModule(
            original_name=_LSTMSingleLayer
            (cell): RecursiveScriptModule(
              original_name=LSTMCell
              (igates): RecursiveScriptModule(
                original_name=Linear
                (_packed_params): RecursiveScriptModule(original_name=LinearPackedParams)
              )
              (hgates): RecursiveScriptModule(
                original_name=Linear
                (_packed_params): RecursiveScriptModule(original_name=LinearPackedParams)
              )
  

In [74]:
import torch.nn as nn

def round_scale_to_power_of_two(scale):
    return float(2 ** round(np.log2(scale)))

# Function to update quantized weights with new scale
def update_quantized_weights(module: nn.Module):
    # print("Inside")
    for name, submodule in module.named_children():
        print(name)
        print(submodule)
        if isinstance(submodule, torch.nn.quantized.Linear):
            print("Inside")
            old_wt = submodule.weight()
            old_scale = old_wt.q_scale()
            old_zp = old_wt.q_zero_point()

            # Dequantize
            float_wt = old_wt.dequantize()

            # Round scale to nearest power of 2
            new_scale = round_scale_to_power_of_two(old_scale)

            # Requantize with new scale
            new_qweight = torch.quantize_per_tensor(float_wt, scale=new_scale, zero_point=old_zp, dtype=torch.qint8)

            # Replace weight
            submodule.set_weight_bias(new_qweight, submodule.bias())

            print(f"[Updated] {name}: scale {old_scale:.6f} → {new_scale:.6f}, zero_point = {old_zp}")

        else:
            update_quantized_weights(submodule)

In [75]:
print("🔍 Updating quantized weights to use power-of-2 scales...")
update_quantized_weights(po_model)

🔍 Updating quantized weights to use power-of-2 scales...
