<a href="https://colab.research.google.com/github/Bosy-Ayman/DSAI-456-Speech/blob/main/assignment-solutions/Assignment%205/Assignment5_speech.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Implementation: HMM for Word Recognition**

* **Objective:**

  * Implement the Forward and Viterbi algorithms for word recognition using HMM.
  * Gain experience with dynamic programming for sequence modeling.

* **Dataset:**

  * Use the **TIMIT Acoustic-Phonetic Continuous Speech Corpus**.
  * Contains recordings from 630 speakers across 8 dialects.
  * Provides phoneme-level alignments (start and end times) for training HMMs.
  * Includes predefined training, development, and test sets.

* **Experiment Setup:**

  * **Observations:** Acoustic feature vectors extracted from audio files (each file = one spoken word).
  * **Hidden states:** Phonemes.

    * Note: Multiple phonemes can map to a single character, and a phoneme can correspond to multiple characters.
  * **HMM:** Models the probabilistic relationship between phonemes (hidden) and acoustic features (observed).
  * **Training:** Align audio with phoneme transcriptions to learn HMM parameters.

* **Tasks:**

  * Download the TIMIT dataset.
  * Extract features (MFCCs or Mel filter bank features) from raw audio.
  * Train an HMM using the `hmmlearn` library.
  * Inspect the HMM parameters:

    * Transition matrix
    * Emission matrix
    * Initial state probabilities
  * Implement your own **Forward algorithm** (do **not** use `hmmlearn`) to compute:

    * Likelihood that a given audio corresponds to a word.
  * Implement your own **Viterbi algorithm** (do **not** use `hmmlearn`) to compute:

    * Most likely hidden sequence of phonemes for a given audio word.


In [None]:
!pip install hmmlearn


Collecting hmmlearn
  Downloading hmmlearn-0.3.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.0 kB)
Downloading hmmlearn-0.3.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (165 kB)
[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/166.0 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m [32m163.8/166.0 kB[0m [31m6.1 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m166.0/166.0 kB[0m [31m4.2 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: hmmlearn
Successfully installed hmmlearn-0.3.3


In [None]:
import kagglehub
import os
import numpy as np
import pandas as pd
import librosa
import glob
from hmmlearn import hmm
from scipy.stats import multivariate_normal
from scipy.special import logsumexp
import matplotlib.pyplot as plt
from scipy.stats import multivariate_normal
from scipy.special import logsumexp

# Download the TIMIT dataset


In [None]:
dataset_path = kagglehub.dataset_download("mfekadu/darpa-timit-acousticphonetic-continuous-speech")
print("TIMIT dataset path:", dataset_path)

csv_path = os.path.join(dataset_path, "train_data.csv")
df = pd.read_csv(csv_path)

Using Colab cache for faster access to the 'darpa-timit-acousticphonetic-continuous-speech' dataset.
TIMIT dataset path: /kaggle/input/darpa-timit-acousticphonetic-continuous-speech


In [None]:
csv_path = "/kaggle/input/darpa-timit-acousticphonetic-continuous-speech/train_data.csv"
df = pd.read_csv(csv_path)

print(df.head())
print(df.columns)


   index test_or_train dialect_region speaker_id       filename  \
0    1.0         TRAIN            DR4      MMDM0  SI681.WAV.wav   
1    2.0         TRAIN            DR4      MMDM0     SI1311.PHN   
2    3.0         TRAIN            DR4      MMDM0     SI1311.WRD   
3    4.0         TRAIN            DR4      MMDM0      SX321.PHN   
4    5.0         TRAIN            DR4      MMDM0      SX321.WRD   

              path_from_data_dir        path_from_data_dir_windows  \
0  TRAIN/DR4/MMDM0/SI681.WAV.wav  TRAIN\\DR4\\MMDM0\\SI681.WAV.wav   
1     TRAIN/DR4/MMDM0/SI1311.PHN     TRAIN\\DR4\\MMDM0\\SI1311.PHN   
2     TRAIN/DR4/MMDM0/SI1311.WRD     TRAIN\\DR4\\MMDM0\\SI1311.WRD   
3      TRAIN/DR4/MMDM0/SX321.PHN      TRAIN\\DR4\\MMDM0\\SX321.PHN   
4      TRAIN/DR4/MMDM0/SX321.WRD      TRAIN\\DR4\\MMDM0\\SX321.WRD   

  is_converted_audio is_audio is_word_file is_phonetic_file is_sentence_file  
0               True     True        False            False            False  
1              Fal

# 1. Feature Extraction Function


   Returns: (T, 13) numpy array where T is time steps.

In [None]:
def extract_mfcc(audio_path):
        y, sr = librosa.load(audio_path, sr=16000)
        mfcc = librosa.feature.mfcc(y=y, sr=sr, n_mfcc=13, n_fft=512, hop_length=160)
        mfcc = mfcc.T
        # Normalize features
        mfcc = (mfcc - np.mean(mfcc, axis=0)) / (np.std(mfcc, axis=0) + 1e-8)
        return mfcc



In [None]:
print("CSV Columns:", df.columns.tolist())
file_col = "path_from_data_dir" if "path_from_data_dir" in df.columns else df.columns[3]
print("Using:", file_col)

CSV Columns: ['index', 'test_or_train', 'dialect_region', 'speaker_id', 'filename', 'path_from_data_dir', 'path_from_data_dir_windows', 'is_converted_audio', 'is_audio', 'is_word_file', 'is_phonetic_file', 'is_sentence_file']
Using: path_from_data_dir


In [None]:

def extract_features(audio_path):

    y, sr = librosa.load(audio_path, sr=16000)
    mfcc = librosa.feature.mfcc(y=y, sr=sr, n_mfcc=13).T
    mfcc = (mfcc - np.mean(mfcc, axis=0)) / np.std(mfcc, axis=0)

    return mfcc


In [None]:
print("CSV Columns:", df.columns.tolist())


CSV Columns: ['index', 'test_or_train', 'dialect_region', 'speaker_id', 'filename', 'path_from_data_dir', 'path_from_data_dir_windows', 'is_converted_audio', 'is_audio', 'is_word_file', 'is_phonetic_file', 'is_sentence_file']


In [None]:
file_col = 'path_from_data_dir' if 'path_from_data_dir' in df.columns else df.columns[3]
print(f"Using column: '{file_col}' for file paths.")


Using column: 'path_from_data_dir' for file paths.


In [None]:
sample_csv_path = df.iloc[0][file_col]
print(f"Sample path in CSV: {sample_csv_path}")

Sample path in CSV: TRAIN/DR4/MMDM0/SI681.WAV.wav


In [None]:
found_one = False
for root, dirs, files in os.walk(dataset_path):
    for file in files:
        if file.lower().endswith(".wav"):
            print(f"Found actual file at: {os.path.join(root, file)}")
            found_one = True
            break
    if found_one:
        break

Found actual file at: /kaggle/input/darpa-timit-acousticphonetic-continuous-speech/data/TEST/DR4/FLBW0/SA1.WAV.wav


#  LOAD FIRST 100 FILES

In [None]:
X, lengths = [], []
subset_df = df.head(100)
valid_files = 0

for _, row in subset_df.iterrows():
    rel = row[file_col].lstrip("./").lstrip("/")

    possible_paths = [
        os.path.join(dataset_path, rel),
        os.path.join(dataset_path, "data", rel),
        os.path.join(dataset_path, rel.replace("data/", ""))
    ]

    actual_path = None
    for p in possible_paths:
        if os.path.exists(p):
            actual_path = p
            break

    # try uppercase
    if not actual_path:
        for p in possible_paths:
            p_up = p[:-4] + ".WAV"
            if os.path.exists(p_up):
                actual_path = p_up
                break

    if actual_path:
        try:
            mfcc = extract_mfcc(actual_path)
            X.append(mfcc)
            lengths.append(len(mfcc))
            valid_files += 1
        except Exception as e:
            print("Error with file:", actual_path, e)

print(f"Extracted from {valid_files} files.")

  y, sr = librosa.load(audio_path, sr=16000)
	Deprecated as of librosa version 0.10.0.
	It will be removed in librosa version 1.0.
  y, sr_native = __audioread_load(path, offset, duration, dtype)
  y, sr = librosa.load(audio_path, sr=16000)
	Deprecated as of librosa version 0.10.0.
	It will be removed in librosa version 1.0.
  y, sr_native = __audioread_load(path, offset, duration, dtype)


Error with file: /kaggle/input/darpa-timit-acousticphonetic-continuous-speech/data/TRAIN/DR4/MMDM0/SI1311.PHN 
Error with file: /kaggle/input/darpa-timit-acousticphonetic-continuous-speech/data/TRAIN/DR4/MMDM0/SI1311.WRD 
Error with file: /kaggle/input/darpa-timit-acousticphonetic-continuous-speech/data/TRAIN/DR4/MMDM0/SX321.PHN 


  y, sr = librosa.load(audio_path, sr=16000)
	Deprecated as of librosa version 0.10.0.
	It will be removed in librosa version 1.0.
  y, sr_native = __audioread_load(path, offset, duration, dtype)
  y, sr = librosa.load(audio_path, sr=16000)
	Deprecated as of librosa version 0.10.0.
	It will be removed in librosa version 1.0.
  y, sr_native = __audioread_load(path, offset, duration, dtype)
  y, sr = librosa.load(audio_path, sr=16000)
	Deprecated as of librosa version 0.10.0.
	It will be removed in librosa version 1.0.
  y, sr_native = __audioread_load(path, offset, duration, dtype)


Error with file: /kaggle/input/darpa-timit-acousticphonetic-continuous-speech/data/TRAIN/DR4/MMDM0/SX321.WRD 
Error with file: /kaggle/input/darpa-timit-acousticphonetic-continuous-speech/data/TRAIN/DR4/MMDM0/SI681.TXT 


  y, sr = librosa.load(audio_path, sr=16000)
	Deprecated as of librosa version 0.10.0.
	It will be removed in librosa version 1.0.
  y, sr_native = __audioread_load(path, offset, duration, dtype)
  y, sr = librosa.load(audio_path, sr=16000)
	Deprecated as of librosa version 0.10.0.
	It will be removed in librosa version 1.0.
  y, sr_native = __audioread_load(path, offset, duration, dtype)


Error with file: /kaggle/input/darpa-timit-acousticphonetic-continuous-speech/data/TRAIN/DR4/MMDM0/SX51.PHN 
Error with file: /kaggle/input/darpa-timit-acousticphonetic-continuous-speech/data/TRAIN/DR4/MMDM0/SX231.WRD 


  y, sr = librosa.load(audio_path, sr=16000)
	Deprecated as of librosa version 0.10.0.
	It will be removed in librosa version 1.0.
  y, sr_native = __audioread_load(path, offset, duration, dtype)
  y, sr = librosa.load(audio_path, sr=16000)
	Deprecated as of librosa version 0.10.0.
	It will be removed in librosa version 1.0.
  y, sr_native = __audioread_load(path, offset, duration, dtype)


Error with file: /kaggle/input/darpa-timit-acousticphonetic-continuous-speech/data/TRAIN/DR4/MMDM0/SX51.WRD 
Error with file: /kaggle/input/darpa-timit-acousticphonetic-continuous-speech/data/TRAIN/DR4/MMDM0/SX231.PHN 


  y, sr = librosa.load(audio_path, sr=16000)
	Deprecated as of librosa version 0.10.0.
	It will be removed in librosa version 1.0.
  y, sr_native = __audioread_load(path, offset, duration, dtype)
  y, sr = librosa.load(audio_path, sr=16000)
	Deprecated as of librosa version 0.10.0.
	It will be removed in librosa version 1.0.
  y, sr_native = __audioread_load(path, offset, duration, dtype)


Error with file: /kaggle/input/darpa-timit-acousticphonetic-continuous-speech/data/TRAIN/DR4/MMDM0/SI1941.WRD 
Error with file: /kaggle/input/darpa-timit-acousticphonetic-continuous-speech/data/TRAIN/DR4/MMDM0/SX141.WRD 
Error with file: /kaggle/input/darpa-timit-acousticphonetic-continuous-speech/data/TRAIN/DR4/MMDM0/SA1.TXT 


  y, sr = librosa.load(audio_path, sr=16000)
	Deprecated as of librosa version 0.10.0.
	It will be removed in librosa version 1.0.
  y, sr_native = __audioread_load(path, offset, duration, dtype)
  y, sr = librosa.load(audio_path, sr=16000)
	Deprecated as of librosa version 0.10.0.
	It will be removed in librosa version 1.0.
  y, sr_native = __audioread_load(path, offset, duration, dtype)


Error with file: /kaggle/input/darpa-timit-acousticphonetic-continuous-speech/data/TRAIN/DR4/MMDM0/SX141.PHN 
Error with file: /kaggle/input/darpa-timit-acousticphonetic-continuous-speech/data/TRAIN/DR4/MMDM0/SI1941.PHN 


  y, sr = librosa.load(audio_path, sr=16000)
	Deprecated as of librosa version 0.10.0.
	It will be removed in librosa version 1.0.
  y, sr_native = __audioread_load(path, offset, duration, dtype)
  y, sr = librosa.load(audio_path, sr=16000)
	Deprecated as of librosa version 0.10.0.
	It will be removed in librosa version 1.0.
  y, sr_native = __audioread_load(path, offset, duration, dtype)


Error with file: /kaggle/input/darpa-timit-acousticphonetic-continuous-speech/data/TRAIN/DR4/MMDM0/SA2.TXT 
Error with file: /kaggle/input/darpa-timit-acousticphonetic-continuous-speech/data/TRAIN/DR4/MMDM0/SX411.TXT 


  y, sr = librosa.load(audio_path, sr=16000)
	Deprecated as of librosa version 0.10.0.
	It will be removed in librosa version 1.0.
  y, sr_native = __audioread_load(path, offset, duration, dtype)
  y, sr = librosa.load(audio_path, sr=16000)
	Deprecated as of librosa version 0.10.0.
	It will be removed in librosa version 1.0.
  y, sr_native = __audioread_load(path, offset, duration, dtype)


Error with file: /kaggle/input/darpa-timit-acousticphonetic-continuous-speech/data/TRAIN/DR4/MMDM0/SX231.TXT 
Error with file: /kaggle/input/darpa-timit-acousticphonetic-continuous-speech/data/TRAIN/DR4/MMDM0/SX51.TXT 


  y, sr = librosa.load(audio_path, sr=16000)
	Deprecated as of librosa version 0.10.0.
	It will be removed in librosa version 1.0.
  y, sr_native = __audioread_load(path, offset, duration, dtype)
  y, sr = librosa.load(audio_path, sr=16000)
	Deprecated as of librosa version 0.10.0.
	It will be removed in librosa version 1.0.
  y, sr_native = __audioread_load(path, offset, duration, dtype)


Error with file: /kaggle/input/darpa-timit-acousticphonetic-continuous-speech/data/TRAIN/DR4/MMDM0/SA2.PHN 
Error with file: /kaggle/input/darpa-timit-acousticphonetic-continuous-speech/data/TRAIN/DR4/MMDM0/SX411.PHN 


  y, sr = librosa.load(audio_path, sr=16000)
	Deprecated as of librosa version 0.10.0.
	It will be removed in librosa version 1.0.
  y, sr_native = __audioread_load(path, offset, duration, dtype)
  y, sr = librosa.load(audio_path, sr=16000)
	Deprecated as of librosa version 0.10.0.
	It will be removed in librosa version 1.0.
  y, sr_native = __audioread_load(path, offset, duration, dtype)


Error with file: /kaggle/input/darpa-timit-acousticphonetic-continuous-speech/data/TRAIN/DR4/MMDM0/SX411.WRD 
Error with file: /kaggle/input/darpa-timit-acousticphonetic-continuous-speech/data/TRAIN/DR4/MMDM0/SA2.WRD 
Error with file: /kaggle/input/darpa-timit-acousticphonetic-continuous-speech/data/TRAIN/DR4/MMDM0/SX141.TXT 


  y, sr = librosa.load(audio_path, sr=16000)
	Deprecated as of librosa version 0.10.0.
	It will be removed in librosa version 1.0.
  y, sr_native = __audioread_load(path, offset, duration, dtype)
  y, sr = librosa.load(audio_path, sr=16000)
	Deprecated as of librosa version 0.10.0.
	It will be removed in librosa version 1.0.
  y, sr_native = __audioread_load(path, offset, duration, dtype)
  y, sr = librosa.load(audio_path, sr=16000)
	Deprecated as of librosa version 0.10.0.
	It will be removed in librosa version 1.0.
  y, sr_native = __audioread_load(path, offset, duration, dtype)
  y, sr = librosa.load(audio_path, sr=16000)
	Deprecated as of librosa version 0.10.0.
	It will be removed in librosa version 1.0.
  y, sr_native = __audioread_load(path, offset, duration, dtype)


Error with file: /kaggle/input/darpa-timit-acousticphonetic-continuous-speech/data/TRAIN/DR4/MMDM0/SA1.PHN 
Error with file: /kaggle/input/darpa-timit-acousticphonetic-continuous-speech/data/TRAIN/DR4/MMDM0/SI1941.TXT 
Error with file: /kaggle/input/darpa-timit-acousticphonetic-continuous-speech/data/TRAIN/DR4/MMDM0/SA1.WRD 


  y, sr = librosa.load(audio_path, sr=16000)
	Deprecated as of librosa version 0.10.0.
	It will be removed in librosa version 1.0.
  y, sr_native = __audioread_load(path, offset, duration, dtype)
  y, sr = librosa.load(audio_path, sr=16000)
	Deprecated as of librosa version 0.10.0.
	It will be removed in librosa version 1.0.
  y, sr_native = __audioread_load(path, offset, duration, dtype)


Error with file: /kaggle/input/darpa-timit-acousticphonetic-continuous-speech/data/TRAIN/DR4/MMDM0/SI1311.TXT 
Error with file: /kaggle/input/darpa-timit-acousticphonetic-continuous-speech/data/TRAIN/DR4/MMDM0/SI681.PHN 


  y, sr = librosa.load(audio_path, sr=16000)
	Deprecated as of librosa version 0.10.0.
	It will be removed in librosa version 1.0.
  y, sr_native = __audioread_load(path, offset, duration, dtype)
  y, sr = librosa.load(audio_path, sr=16000)
	Deprecated as of librosa version 0.10.0.
	It will be removed in librosa version 1.0.
  y, sr_native = __audioread_load(path, offset, duration, dtype)


Error with file: /kaggle/input/darpa-timit-acousticphonetic-continuous-speech/data/TRAIN/DR4/MMDM0/SI681.WRD 
Error with file: /kaggle/input/darpa-timit-acousticphonetic-continuous-speech/data/TRAIN/DR4/MMDM0/SX321.TXT 


  y, sr = librosa.load(audio_path, sr=16000)
	Deprecated as of librosa version 0.10.0.
	It will be removed in librosa version 1.0.
  y, sr_native = __audioread_load(path, offset, duration, dtype)
  y, sr = librosa.load(audio_path, sr=16000)
	Deprecated as of librosa version 0.10.0.
	It will be removed in librosa version 1.0.
  y, sr_native = __audioread_load(path, offset, duration, dtype)


Error with file: /kaggle/input/darpa-timit-acousticphonetic-continuous-speech/data/TRAIN/DR4/MCSS0/SX30.WRD 
Error with file: /kaggle/input/darpa-timit-acousticphonetic-continuous-speech/data/TRAIN/DR4/MCSS0/SX30.PHN 


  y, sr = librosa.load(audio_path, sr=16000)
	Deprecated as of librosa version 0.10.0.
	It will be removed in librosa version 1.0.
  y, sr_native = __audioread_load(path, offset, duration, dtype)
  y, sr = librosa.load(audio_path, sr=16000)
	Deprecated as of librosa version 0.10.0.
	It will be removed in librosa version 1.0.
  y, sr_native = __audioread_load(path, offset, duration, dtype)


Error with file: /kaggle/input/darpa-timit-acousticphonetic-continuous-speech/data/TRAIN/DR4/MCSS0/SX120.PHN 
Error with file: /kaggle/input/darpa-timit-acousticphonetic-continuous-speech/data/TRAIN/DR4/MCSS0/SI688.PHN 
Error with file: /kaggle/input/darpa-timit-acousticphonetic-continuous-speech/data/TRAIN/DR4/MCSS0/SI688.WRD 


  y, sr = librosa.load(audio_path, sr=16000)
	Deprecated as of librosa version 0.10.0.
	It will be removed in librosa version 1.0.
  y, sr_native = __audioread_load(path, offset, duration, dtype)
  y, sr = librosa.load(audio_path, sr=16000)
	Deprecated as of librosa version 0.10.0.
	It will be removed in librosa version 1.0.
  y, sr_native = __audioread_load(path, offset, duration, dtype)
  y, sr = librosa.load(audio_path, sr=16000)
	Deprecated as of librosa version 0.10.0.
	It will be removed in librosa version 1.0.
  y, sr_native = __audioread_load(path, offset, duration, dtype)


Error with file: /kaggle/input/darpa-timit-acousticphonetic-continuous-speech/data/TRAIN/DR4/MCSS0/SX120.WRD 
Error with file: /kaggle/input/darpa-timit-acousticphonetic-continuous-speech/data/TRAIN/DR4/MCSS0/SX300.TXT 


  y, sr = librosa.load(audio_path, sr=16000)
	Deprecated as of librosa version 0.10.0.
	It will be removed in librosa version 1.0.
  y, sr_native = __audioread_load(path, offset, duration, dtype)
  y, sr = librosa.load(audio_path, sr=16000)
	Deprecated as of librosa version 0.10.0.
	It will be removed in librosa version 1.0.
  y, sr_native = __audioread_load(path, offset, duration, dtype)


Error with file: /kaggle/input/darpa-timit-acousticphonetic-continuous-speech/data/TRAIN/DR4/MCSS0/SX210.TXT 
Error with file: /kaggle/input/darpa-timit-acousticphonetic-continuous-speech/data/TRAIN/DR4/MCSS0/SX390.PHN 
Error with file: /kaggle/input/darpa-timit-acousticphonetic-continuous-speech/data/TRAIN/DR4/MCSS0/SX390.WRD 


  y, sr = librosa.load(audio_path, sr=16000)
	Deprecated as of librosa version 0.10.0.
	It will be removed in librosa version 1.0.
  y, sr_native = __audioread_load(path, offset, duration, dtype)
  y, sr = librosa.load(audio_path, sr=16000)
	Deprecated as of librosa version 0.10.0.
	It will be removed in librosa version 1.0.
  y, sr_native = __audioread_load(path, offset, duration, dtype)
  y, sr = librosa.load(audio_path, sr=16000)
	Deprecated as of librosa version 0.10.0.
	It will be removed in librosa version 1.0.
  y, sr_native = __audioread_load(path, offset, duration, dtype)


Error with file: /kaggle/input/darpa-timit-acousticphonetic-continuous-speech/data/TRAIN/DR4/MCSS0/SI750.TXT 
Error with file: /kaggle/input/darpa-timit-acousticphonetic-continuous-speech/data/TRAIN/DR4/MCSS0/SA1.TXT 


  y, sr = librosa.load(audio_path, sr=16000)
	Deprecated as of librosa version 0.10.0.
	It will be removed in librosa version 1.0.
  y, sr_native = __audioread_load(path, offset, duration, dtype)
  y, sr = librosa.load(audio_path, sr=16000)
	Deprecated as of librosa version 0.10.0.
	It will be removed in librosa version 1.0.
  y, sr_native = __audioread_load(path, offset, duration, dtype)


Error with file: /kaggle/input/darpa-timit-acousticphonetic-continuous-speech/data/TRAIN/DR4/MCSS0/SI1380.TXT 
Error with file: /kaggle/input/darpa-timit-acousticphonetic-continuous-speech/data/TRAIN/DR4/MCSS0/SA2.TXT 


  y, sr = librosa.load(audio_path, sr=16000)
	Deprecated as of librosa version 0.10.0.
	It will be removed in librosa version 1.0.
  y, sr_native = __audioread_load(path, offset, duration, dtype)
  y, sr = librosa.load(audio_path, sr=16000)
	Deprecated as of librosa version 0.10.0.
	It will be removed in librosa version 1.0.
  y, sr_native = __audioread_load(path, offset, duration, dtype)


Error with file: /kaggle/input/darpa-timit-acousticphonetic-continuous-speech/data/TRAIN/DR4/MCSS0/SI750.PHN 
Error with file: /kaggle/input/darpa-timit-acousticphonetic-continuous-speech/data/TRAIN/DR4/MCSS0/SX210.WRD 


  y, sr = librosa.load(audio_path, sr=16000)
	Deprecated as of librosa version 0.10.0.
	It will be removed in librosa version 1.0.
  y, sr_native = __audioread_load(path, offset, duration, dtype)
  y, sr = librosa.load(audio_path, sr=16000)
	Deprecated as of librosa version 0.10.0.
	It will be removed in librosa version 1.0.
  y, sr_native = __audioread_load(path, offset, duration, dtype)


Error with file: /kaggle/input/darpa-timit-acousticphonetic-continuous-speech/data/TRAIN/DR4/MCSS0/SX390.TXT 
Error with file: /kaggle/input/darpa-timit-acousticphonetic-continuous-speech/data/TRAIN/DR4/MCSS0/SI750.WRD 


  y, sr = librosa.load(audio_path, sr=16000)
	Deprecated as of librosa version 0.10.0.
	It will be removed in librosa version 1.0.
  y, sr_native = __audioread_load(path, offset, duration, dtype)
  y, sr = librosa.load(audio_path, sr=16000)
	Deprecated as of librosa version 0.10.0.
	It will be removed in librosa version 1.0.
  y, sr_native = __audioread_load(path, offset, duration, dtype)


Error with file: /kaggle/input/darpa-timit-acousticphonetic-continuous-speech/data/TRAIN/DR4/MCSS0/SX210.PHN 


  y, sr = librosa.load(audio_path, sr=16000)
	Deprecated as of librosa version 0.10.0.
	It will be removed in librosa version 1.0.
  y, sr_native = __audioread_load(path, offset, duration, dtype)
  y, sr = librosa.load(audio_path, sr=16000)
	Deprecated as of librosa version 0.10.0.
	It will be removed in librosa version 1.0.
  y, sr_native = __audioread_load(path, offset, duration, dtype)


Error with file: /kaggle/input/darpa-timit-acousticphonetic-continuous-speech/data/TRAIN/DR4/MCSS0/SA2.PHN 
Error with file: /kaggle/input/darpa-timit-acousticphonetic-continuous-speech/data/TRAIN/DR4/MCSS0/SI1380.WRD 


  y, sr = librosa.load(audio_path, sr=16000)
	Deprecated as of librosa version 0.10.0.
	It will be removed in librosa version 1.0.
  y, sr_native = __audioread_load(path, offset, duration, dtype)
  y, sr = librosa.load(audio_path, sr=16000)
	Deprecated as of librosa version 0.10.0.
	It will be removed in librosa version 1.0.
  y, sr_native = __audioread_load(path, offset, duration, dtype)


Error with file: /kaggle/input/darpa-timit-acousticphonetic-continuous-speech/data/TRAIN/DR4/MCSS0/SI1380.PHN 
Error with file: /kaggle/input/darpa-timit-acousticphonetic-continuous-speech/data/TRAIN/DR4/MCSS0/SA2.WRD 


  y, sr = librosa.load(audio_path, sr=16000)
	Deprecated as of librosa version 0.10.0.
	It will be removed in librosa version 1.0.
  y, sr_native = __audioread_load(path, offset, duration, dtype)
  y, sr = librosa.load(audio_path, sr=16000)
	Deprecated as of librosa version 0.10.0.
	It will be removed in librosa version 1.0.
  y, sr_native = __audioread_load(path, offset, duration, dtype)


Error with file: /kaggle/input/darpa-timit-acousticphonetic-continuous-speech/data/TRAIN/DR4/MCSS0/SA1.PHN 
Error with file: /kaggle/input/darpa-timit-acousticphonetic-continuous-speech/data/TRAIN/DR4/MCSS0/SA1.WRD 


  y, sr = librosa.load(audio_path, sr=16000)
	Deprecated as of librosa version 0.10.0.
	It will be removed in librosa version 1.0.
  y, sr_native = __audioread_load(path, offset, duration, dtype)
  y, sr = librosa.load(audio_path, sr=16000)
	Deprecated as of librosa version 0.10.0.
	It will be removed in librosa version 1.0.
  y, sr_native = __audioread_load(path, offset, duration, dtype)


Error with file: /kaggle/input/darpa-timit-acousticphonetic-continuous-speech/data/TRAIN/DR4/MCSS0/SX30.TXT 
Error with file: /kaggle/input/darpa-timit-acousticphonetic-continuous-speech/data/TRAIN/DR4/MCSS0/SX300.PHN 
Error with file: /kaggle/input/darpa-timit-acousticphonetic-continuous-speech/data/TRAIN/DR4/MCSS0/SX300.WRD 


  y, sr = librosa.load(audio_path, sr=16000)
	Deprecated as of librosa version 0.10.0.
	It will be removed in librosa version 1.0.
  y, sr_native = __audioread_load(path, offset, duration, dtype)
  y, sr = librosa.load(audio_path, sr=16000)
	Deprecated as of librosa version 0.10.0.
	It will be removed in librosa version 1.0.
  y, sr_native = __audioread_load(path, offset, duration, dtype)


Error with file: /kaggle/input/darpa-timit-acousticphonetic-continuous-speech/data/TRAIN/DR4/MCSS0/SI688.TXT 
Error with file: /kaggle/input/darpa-timit-acousticphonetic-continuous-speech/data/TRAIN/DR4/MCSS0/SX120.TXT 
Extracted from 40 files.


  y, sr = librosa.load(audio_path, sr=16000)
	Deprecated as of librosa version 0.10.0.
	It will be removed in librosa version 1.0.
  y, sr_native = __audioread_load(path, offset, duration, dtype)


In [None]:
X_concat = np.concatenate(X)
model = hmm.GaussianHMM(
    n_components=5,
    covariance_type="diag",
    n_iter=10,
    random_state=42
)
model.fit(X_concat, lengths)

print("Training complete.\n")

print("Start Probabilities:", model.startprob_)
print("Transition Matrix:", model.transmat_)

Training complete.

Start Probabilities: [4.91167259e-10 4.58130142e-30 3.05157599e-77 5.05146399e-99
 1.00000000e+00]
Transition Matrix: [[6.98297955e-02 2.62400608e-01 5.65764715e-03 7.14331274e-02
  5.90678822e-01]
 [1.29906296e-01 7.71842397e-01 6.53546736e-02 3.10036262e-02
  1.89300683e-03]
 [2.42962097e-08 8.50650030e-02 8.89985232e-01 2.49497405e-02
  5.95275284e-12]
 [1.12285605e-02 2.91871734e-02 3.42294318e-02 9.24437059e-01
  9.17774881e-04]
 [1.92469412e-01 4.51118827e-04 7.33238117e-04 1.81839930e-07
  8.06346049e-01]]


# Forward Algorithm

In [None]:
def manual_forward(obs_seq, model):

    T = obs_seq.shape[0]
    N = model.n_components
    log_startprob = np.log(model.startprob_ + 1e-10)
    log_transmat = np.log(model.transmat_ + 1e-10)
    means = model.means_
    covars = model.covars_


    log_emission = np.zeros((T, N))
    for n in range(N):
        log_emission[:, n] = multivariate_normal.logpdf(
            obs_seq, mean=means[n], cov=covars[n]
        )

    log_alpha = np.zeros((T, N))

    # Initialization
    log_alpha[0] = log_startprob + log_emission[0]

    # Recursion
    for t in range(1, T):
        for j in range(N):
            prev_step = log_alpha[t-1] + log_transmat[:, j]
            log_alpha[t, j] = logsumexp(prev_step) + log_emission[t, j]

    # Termination
    log_likelihood = logsumexp(log_alpha[-1])

    return log_likelihood

# Viterbi

  Finds the most likely sequence of hidden states

In [None]:
def manual_viterbi(obs_seq, model):

    T = obs_seq.shape[0]
    N = model.n_components

    log_startprob = np.log(model.startprob_ + 1e-10)
    log_transmat = np.log(model.transmat_ + 1e-10)
    means = model.means_
    covars = model.covars_


    log_emission = np.zeros((T, N))
    for n in range(N):
        log_emission[:, n] = multivariate_normal.logpdf(
            obs_seq, mean=means[n], cov=covars[n]
        )

    log_delta = np.zeros((T, N))
    psi = np.zeros((T, N), dtype=int)

    # Initialization
    log_delta[0] = log_startprob + log_emission[0]

    # Recursion
    for t in range(1, T):
        for j in range(N):
            trans_probs = log_delta[t-1] + log_transmat[:, j]
            best_prev = np.argmax(trans_probs)

            log_delta[t, j] = trans_probs[best_prev] + log_emission[t, j]
            psi[t, j] = best_prev

    # Termination
    best_path_score = np.max(log_delta[-1])
    best_last_state = np.argmax(log_delta[-1])

    # Backtracking
    path = [best_last_state]
    for t in range(T-1, 0, -1):
        prev = psi[t, path[-1]]
        path.append(prev)

    return path[::-1]

In [None]:
test_seq = X[0]

print(f"Testing on sequence with {len(test_seq)} frames...")

# TEST FORWARD
manual_ll = manual_forward(test_seq, model)
lib_ll = model.score(test_seq)

print("\n--- Forward Algorithm Comparison ---")
print(f"Manual Log-Likelihood: {manual_ll:.4f}")
print(f"Library Log-Likelihood: {lib_ll:.4f}")
print(f"Difference: {abs(manual_ll - lib_ll):.6f}")


#TEST VITERBI
manual_path = manual_viterbi(test_seq, model)
_, lib_path = model.decode(test_seq)

print("\n--- Viterbi Algorithm Comparison ---")
print(f"Manual Path : {manual_path[:20]}")
print(f"Library Path : {list(lib_path[:20])}")

# Calculate match accuracy
matches = np.sum(np.array(manual_path) == np.array(lib_path))
acc = matches / len(manual_path) * 100
print(f"Path Match Accuracy: {acc:.2f}%")

Testing on sequence with 250 frames...

--- Forward Algorithm Comparison ---
Manual Log-Likelihood: -4186.6862
Library Log-Likelihood: -4186.6862
Difference: 0.000000

--- Viterbi Algorithm Comparison ---
Manual Path : [np.int64(4), np.int64(4), np.int64(4), np.int64(4), np.int64(4), np.int64(4), np.int64(4), np.int64(4), np.int64(4), np.int64(4), np.int64(4), np.int64(4), np.int64(0), np.int64(1), np.int64(3), np.int64(3), np.int64(3), np.int64(3), np.int64(3), np.int64(3)]
Library Path : [np.int64(4), np.int64(4), np.int64(4), np.int64(4), np.int64(4), np.int64(4), np.int64(4), np.int64(4), np.int64(4), np.int64(4), np.int64(4), np.int64(4), np.int64(0), np.int64(1), np.int64(3), np.int64(3), np.int64(3), np.int64(3), np.int64(3), np.int64(3)]
Path Match Accuracy: 100.00%


| Aspect       | Forward                                                     | Viterbi                                           |
| ------------ | ----------------------------------------------------------- | ------------------------------------------------- |
| Goal         | Likelihood of the sequence                                  | Most likely hidden state sequence                 |
| Recursion    | Sum over all previous states                                | Max over previous states                          |
| Termination  | logsumexp of final alpha                                    | Max of final delta                                |
| Backtracking | None                                                        | Required to get path                              |
| Output       | Single number                                               | Sequence of states                                |
| Example      | “What’s the probability that this audio is the word ‘cat’?” | “Which phonemes did the speaker most likely say?” |
