# Lab Exercise 8 – Hidden Markov Model (HMM)

**Aim:** 
The aim of this lab experiment is to implement a Hidden Markov Model (HMM) to simulate phoneme transitions for the word ‘speech’ in speech processing.

**Problem Statement:**
The word speech can be divided into the following phonemes (hidden states): 
* /s/
* /p/
* /ie:/ (long ee)
* /tS/ (ch sound)

Observations represent measurable acoustic properties: **Energy**, **Pitch**, and **Duration**.

## Task (a) & (b): Represent and Display HMM Parameters

We will define the Initial Probabilities, Transition Probabilities, and Emission Probabilities using Python Lists and Dictionaries, and display them using Pandas for clarity.

In [1]:
import numpy as np
import pandas as pd

# 1. Define States (Phonemes) and Observations
states = ['/s/', '/p/', '/ie:/', '/tS/']
observations = ['Energy', 'Pitch', 'Duration']

# 2. Initial Probabilities (Start with /s/ is 1.0)
start_probability = np.array([1.0, 0.0, 0.0, 0.0])

# 3. Transition Probabilities
# Note: The specific table was not provided in the text, so we define a 
# deterministic Left-to-Right model to generate the word 'speech' cleanly.
# Logic: /s/ -> /p/ -> /ie:/ -> /tS/
transition_matrix = np.array([
    [0.0, 1.0, 0.0, 0.0],  # /s/ transitions to /p/
    [0.0, 0.0, 1.0, 0.0],  # /p/ transitions to /ie:/
    [0.0, 0.0, 0.0, 1.0],  # /ie:/ transitions to /tS/
    [0.0, 0.0, 0.0, 1.0]   # /tS/ stays at /tS/ (End state)
])

# 4. Emission Probabilities (Given in the problem statement)
# Rows: States, Columns: Observations [Energy, Pitch, Duration]
emission_matrix = np.array([
    [0.7, 0.2, 0.1],  # /s/
    [0.5, 0.3, 0.2],  # /p/
    [0.3, 0.5, 0.2],  # /ie:/
    [0.4, 0.4, 0.2]   # /tS/
])

# Displaying the matrices nicely using Pandas
print("--- Initial Probabilities ---")
df_start = pd.DataFrame(start_probability, index=states, columns=['Probability']).T
display(df_start)

print("\n--- Transition Probabilities (A) ---")
df_trans = pd.DataFrame(transition_matrix, index=states, columns=states)
display(df_trans)

print("\n--- Emission Probabilities (B) ---")
df_emit = pd.DataFrame(emission_matrix, index=states, columns=observations)
display(df_emit)

--- Initial Probabilities ---


Unnamed: 0,/s/,/p/,/ie:/,/tS/
Probability,1.0,0.0,0.0,0.0



--- Transition Probabilities (A) ---


Unnamed: 0,/s/,/p/,/ie:/,/tS/
/s/,0.0,1.0,0.0,0.0
/p/,0.0,0.0,1.0,0.0
/ie:/,0.0,0.0,0.0,1.0
/tS/,0.0,0.0,0.0,1.0



--- Emission Probabilities (B) ---


Unnamed: 0,Energy,Pitch,Duration
/s/,0.7,0.2,0.1
/p/,0.5,0.3,0.2
/ie:/,0.3,0.5,0.2
/tS/,0.4,0.4,0.2


## Task (c): Generate Sequence

We will write a program to generate a sequence of phonemes and their corresponding acoustic observations based on the probabilities defined above.

In [2]:
def generate_sequence(length=4):
    # Store results
    phoneme_sequence = []
    observation_sequence = []
    
    # Step 1: Choose initial state based on start_probability
    # np.random.choice expects 1-D array, result is the index of the state
    current_state_index = np.random.choice(len(states), p=start_probability)
    
    for _ in range(length):
        # Record current phoneme
        phoneme_sequence.append(states[current_state_index])
        
        # Step 2: Generate Observation based on Emission Probability for current state
        obs_index = np.random.choice(len(observations), p=emission_matrix[current_state_index])
        observation_sequence.append(observations[obs_index])
        
        # Step 3: Transition to next state
        current_state_index = np.random.choice(len(states), p=transition_matrix[current_state_index])

    return phoneme_sequence, observation_sequence

# Run the simulation
generated_phonemes, generated_obs = generate_sequence(length=4)

print("Generated phoneme sequence:", generated_phonemes)
print("Generated observation sequence:", generated_obs)

Generated phoneme sequence: ['/s/', '/p/', '/ie:/', '/tS/']
Generated observation sequence: ['Energy', 'Duration', 'Pitch', 'Pitch']


## Task (d): Inference

**Inference:**

1.  **Stochastic Nature:** The HMM successfully models the speech generation process as a doubly stochastic process. The underlying state sequence (phonemes) is hidden, but it dictates the probability of the observable outputs (Energy, Pitch, Duration).
2.  **Sequential Modeling:** The transition matrix enforces the temporal structure of the word "speech". By setting high transition probabilities from `/s/` $\to$ `/p/` $\to$ `/ie:/` $\to$ `/tS/`, the model correctly reconstructs the phonetic order of the word.
3.  **Variability:** While the phoneme sequence is relatively fixed for a specific word, the emission probabilities allow for variations in how the word sounds (acoustic observations) each time it is spoken. For example, the phoneme `/s/` has a high probability (0.7) of being observed as 'Energy', but there is a small chance it could be observed as 'Pitch', simulating real-world noise or speech variation.