## Week 9: GMM, HMM for Speech Regconition

Reference: https://github.com/desh2608/gmm-hmm-asr

1. Single Gaussian: Each digit is modeled using a single Gaussian with diagonal 
covariance.
2. Gaussian Mixture Model (GMM): Each digit is modeled using a mixture of Gaussians, 
initialized by perturbing the single Gaussian model.
3. Hidden Markov Model (HMM): Each digit is modeled by an HMM consisting of N states, 
where the emission probability of each state is a single Gaussian with diagonal covariance.

In [13]:
pip install git+https://github.com/desh2608/gmm-hmm-asr.git

Collecting git+https://github.com/desh2608/gmm-hmm-asr.git
  Cloning https://github.com/desh2608/gmm-hmm-asr.git to /private/var/folders/gc/49grx5f90nv3f12yh4jhhf740000gn/T/pip-req-build-yuvuu4qk
  Running command git clone --filter=blob:none --quiet https://github.com/desh2608/gmm-hmm-asr.git /private/var/folders/gc/49grx5f90nv3f12yh4jhhf740000gn/T/pip-req-build-yuvuu4qk
  Resolved https://github.com/desh2608/gmm-hmm-asr.git to commit ee7f9c319e2da9bbcfa9d665a91b8c93ac2fa60e
  Preparing metadata (setup.py) ... [?25ldone

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m25.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


To train, first create train_data which should be a list of DataTuple(key,feats,label) objects.

In [14]:
import os
import librosa
import numpy as np
from gmm_hmm_asr.data import DataTuple

data_dir = "digits_number"  # path to your folder

# Map folder names to digit labels
label_map = {
    "zero": "0", "one": "1", "two": "2", "three": "3", "four": "4",
    "five": "5", "six": "6", "seven": "7", "eight": "8", "nine": "9"
}

train_data = []
test_data = []

train_ratio = 0.8
ndim = 40  # feature dimensionality

for folder_name in os.listdir(data_dir):
    folder_path = os.path.join(data_dir, folder_name)
    if not os.path.isdir(folder_path):
        continue

    label = label_map[folder_name.lower()]
    files = [f for f in os.listdir(folder_path) if f.endswith('.wav')]
    files.sort()

    n_train = int(len(files) * train_ratio)
    train_files = files[:n_train]
    test_files = files[n_train:]

    # Function to extract features from wav
    def extract_features(file_path):
        y, sr = librosa.load(file_path, sr=None)
        # Extract MFCCs
        mfcc = librosa.feature.mfcc(y=y, sr=sr, n_mfcc=ndim)
        return mfcc.T  # transpose to shape (T, ndim)

    # Load train data
    for fname in train_files:
        feats = extract_features(os.path.join(folder_path, fname))
        key = f"{folder_name}_{fname}"
        train_data.append(DataTuple(key, feats, label))

    # Load test data
    for fname in test_files:
        feats = extract_features(os.path.join(folder_path, fname))
        key = f"{folder_name}_{fname}"
        test_data.append(DataTuple(key, feats, label))

print(f"Loaded {len(train_data)} training samples and {len(test_data)} testing samples.")

Loaded 2400 training samples and 600 testing samples.


1. Single Gaussian

In [15]:
from gmm_hmm_asr.data import DataTuple
from gmm_hmm_asr.trainers import SingleGaussTrainer

ndim = 40 # dimensionality of features
DIGITS = ['1','2','3','4','5', '6', '7', '8', '9'] # digits to be recognized

sg_model = SingleGaussTrainer(ndim, DIGITS)
sg_model.train(train_data)

preds = sg_model.predict(test_data)
y_pred = [pred[0] for pred in preds]  # predicted labels
y_ll = [pred[1] for pred in preds]  # maximum log-likelihood values

2. Gaussian Mixture Model

In [16]:
from gmm_hmm_asr.data import DataTuple
from gmm_hmm_asr.trainers import GMMTrainer

ndim = 40 # dimensionality of features
ncomp = 8 # number of Gaussian components
niter = 10 # number of training iterations
DIGITS = ['1','2','3','4','5'] # digits to be recognized

gmm_model = GMMTrainer(ndim, ncomp, DIGITS)
gmm_model.train(train_data, niter)

preds = gmm_model.predict(test_data)

Initializing from single gaussian model
Iteration: 0
log likelihood: -2814204.8368212963
Iteration: 1
log likelihood: -2750674.272555636
Iteration: 2
log likelihood: -2681388.3367618346
Iteration: 3
log likelihood: -2617692.665497031
Iteration: 4
log likelihood: -2588365.863257302
Iteration: 5
log likelihood: -2574946.4070767653
Iteration: 6
log likelihood: -2567989.626792239
Iteration: 7
log likelihood: -2564449.922053358
Iteration: 8
log likelihood: -2561365.9552560323
Iteration: 9
log likelihood: -2559479.1392793776


3. Hidden Markov Model

In [17]:
from gmm_hmm_asr.data import DataTuple
from gmm_hmm_asr.trainers import HMMTrainer

ndim = 40 # dimensionality of features
nstate = 5 # number of HMM states
niter = 10 # number of training iterations
DIGITS = ['1','2','3','4','5'] # digits to be recognized

hmm_model = GMMTrainer(ndim, nstate, DIGITS)
hmm_model.train(train_data, niter)

preds = hmm_model.predict(test_data)

Initializing from single gaussian model
Iteration: 0
log likelihood: -2816413.7226816127
Iteration: 1
log likelihood: -2765856.2520319596
Iteration: 2
log likelihood: -2708588.502998351
Iteration: 3
log likelihood: -2663293.8403043924
Iteration: 4
log likelihood: -2633856.5842787907
Iteration: 5
log likelihood: -2622042.5255488497
Iteration: 6
log likelihood: -2615079.6448175153
Iteration: 7
log likelihood: -2609450.8366513983
Iteration: 8
log likelihood: -2606530.362806533
Iteration: 9
log likelihood: -2604258.3284670617
