
---

##### Intro to Hidden Markov Models 

You will start by building a simple HMM network based on an example from the textbook [Artificial Intelligence: A Modern Approach](http://aima.cs.berkeley.edu/).

> You are the security guard stationed at a secret underground installation. Each day, you try to guess whether it's raining today, but your only access to the outside world occurs each morning when you see the director coming in with, or without, an umbrella.

A simplified diagram of the required network topology is shown below.

<div align="center">
    <img src="images/example.png" width="250" height=auto>
</div>

### Understanding the Hidden Markov Model Framework

$$
\lambda = (A, B)
$$

where $\lambda$ specifies a Hidden Markov Model in terms of an emission probability distribution $A$ and a state transition probability distribution $B$.

#### Core HMM Components

Hidden Markov Models operate on two fundamental probability distributions that capture different aspects of sequential behavior:

**Emission Probabilities (A)**: These quantify the likelihood of observing specific evidence given each possible hidden state. In our weather example, this represents how likely the director is to carry an umbrella on rainy versus sunny days.

**Transition Probabilities (B)**: These capture how the hidden states evolve over time by specifying the probability of moving from one state to another between consecutive time steps. For weather, this models how likely rain is to follow sunshine, or sunshine to follow rain.

**Initial State Distribution**: This optional third component specifies the probability of starting in each possible hidden state, representing our prior beliefs about the system's initial condition.

#### Temporal Structure and Notation

At each discrete time step $t$, the model involves two key variables:
- $X_t$: The hidden state (unobservable weather condition)
- $Y_t$: The observable evidence (umbrella presence)

#### Concrete Example Walkthrough

Consider a specific week where you observe the following umbrella pattern: $Y = [\text{yes}, \text{no}, \text{yes}, \text{no}, \text{yes}]$ from Monday through Friday, while the actual weather conditions are $X = [\text{Rainy}, \text{Sunny}, \text{Sunny}, \text{Sunny}, \text{Rainy}]$.

For Wednesday specifically:
- Time step: $t = \text{Wednesday}$ 
- Observation: $Y_{\text{Wednesday}} = \text{yes}$ (umbrella present)
- Hidden state: $X_{\text{Wednesday}} = \text{Sunny}$ (actual weather)

This illustrates an important HMM characteristic: observations don't perfectly correlate with hidden states. The director might carry an umbrella on a sunny day due to weather forecasts, personal preference, or other factors not captured in our simplified model.

#### Key Modeling Insights

The HMM framework captures the uncertainty inherent in real-world systems:

1. **Partial Observability**: We cannot directly observe the weather, only indirect evidence through the umbrella.

2. **Probabilistic Relationships**: The connection between weather and umbrella use is uncertain rather than deterministic.

3. **Temporal Dependencies**: Today's weather influences tomorrow's weather, creating sequential patterns the model can learn and exploit.

### Initializing an HMM Network with Pomegranate

The Pomegranate library supports [two initialization methods](http://pomegranate.readthedocs.io/en/latest/HiddenMarkovModel.html#initialization). You can either explicitly provide the three distributions, or you can build the network line-by-line. We'll use the line-by-line method for the example network, but you're free to use either method for the part of speech tagger.

This foundational understanding prepares you to tackle the more complex part-of-speech tagging problem, where words serve as observations and grammatical categories represent the hidden states.



In [3]:
# Jupyter "magic methods" -- only need to be run once per kernel restart
%load_ext autoreload
%aimport helpers
%autoreload 1

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [8]:
# import python modules -- this cell needs to be run again if you make changes to any of the files
import matplotlib.pyplot as plt
import numpy as np

from helpers import show_model
from pomegranate.hmm import DenseHMM
from pomegranate.distributions import Categorical


---

##### Add the Hidden States

When constructing an HMM model using the line-by-line approach, you begin with an empty container and systematically add each component. The first step involves defining the hidden states and specifying their emission probability distributions.

#### Understanding Emission Probabilities: $P(Y_t | X_t)$

Emission probabilities quantify the relationship between hidden states and observable evidence. These probabilities answer the fundamental question: "Given that the system is in a particular hidden state, what is the likelihood of observing each possible piece of evidence?"

In our weather-umbrella scenario, the emission probabilities capture the director's behavioral patterns under different weather conditions. These represent the core observational model that connects unobservable weather states to observable umbrella decisions.

#### Estimating Emission Parameters

Real-world HMM applications typically derive emission probabilities through empirical analysis of training data. For our illustrative example, we assume prior knowledge about the director's behavior has been collected, possibly through:

- Historical observation logs
- Survey data about umbrella-carrying habits
- Weather service correlations
- Behavioral pattern analysis

This data-driven approach mirrors what we'll implement for the part-of-speech tagger, where emission probabilities will be estimated from tagged corpus statistics.

#### Conditional Probability Structure

The emission probability table below represents our assumed behavioral model:

| Weather State | $P(\text{umbrella} = yes)$ | $P(\text{umbrella} = no)$ |
|---------------|---------------------------|--------------------------|
| $Sunny$       | 0.10                      | 0.90                     |
| $Rainy$       | 0.80                      | 0.20                     |

#### Interpreting the Probability Values

These probabilities encode intuitive behavioral expectations:

**Sunny Day Behavior**: When weather is sunny, the director carries an umbrella only 10% of the time. This reflects practical behavior where umbrellas are less necessary on clear days, though some individuals might carry them for sun protection or due to uncertain forecasts.

**Rainy Day Behavior**: During rainy weather, the director carries an umbrella 80% of the time. The 20% probability of no umbrella accounts for scenarios like forgotten umbrellas, short trips, or personal preference variations.

#### Mathematical Constraints

Notice that each row sums to 1.0, satisfying the fundamental requirement that probability distributions must be complete and mutually exclusive. For any given weather state, the director either carries an umbrella or doesn't—these are the only possibilities, and their probabilities must sum to unity.

This probability structure forms the foundation for the HMM's ability to make probabilistic inferences about hidden weather states based on observable umbrella evidence.

In [26]:
def create_weather_hmm_model():
    """
    Create a Hidden Markov Model for weather prediction using umbrella observations.
    
    This function constructs a complete HMM using the pomegranate 1.0+ API to model
    the classic weather-umbrella problem. The model learns to infer hidden weather
    states (Sunny/Rainy) from observable umbrella-carrying behavior patterns.
    
    The model architecture consists of:
    - Two hidden states representing weather conditions (Sunny, Rainy)
    - Two observable outcomes representing umbrella presence (no, yes)
    - Emission probabilities encoding behavioral patterns under each weather state
    - Transition probabilities modeling temporal weather dependencies
    - Initial state distribution representing prior weather beliefs
    
    Mathematical Foundation:
        - Emission: P(umbrella | weather) based on realistic behavioral assumptions
        - Transition: P(weather_t+1 | weather_t) capturing weather persistence patterns
        - Initial: P(weather_0) representing uniform prior over weather states
    
    Returns:
        DenseHMM: Complete HMM model ready for weather inference and prediction
        
    Example:
        >>> model = create_weather_hmm_model()
        >>> # Model can perform sequence prediction and state inference
        
    Note:
        Uses realistic probability values reflecting natural weather patterns
        and human umbrella-carrying behavior under different conditions.
    """
    
    """
    Define emission probability matrix P(observation | hidden_state).
    
    This matrix encodes the director's umbrella-carrying behavior under different
    weather conditions. Each row represents one weather state, columns represent
    observation outcomes [no_umbrella, yes_umbrella].
    
    Behavioral modeling assumptions:
    - Sunny weather: 90% no umbrella (logical for clear conditions)
    - Rainy weather: 80% umbrella usage (practical protection behavior)
    
    The asymmetric probabilities reflect realistic decision-making where people
    are more likely to carry umbrellas when rain is expected but may still
    carry them occasionally on sunny days due to forecast uncertainty.
    """
    emission_probs = torch.tensor([
        [0.9, 0.1],  # Sunny state: [no, yes]
        [0.2, 0.8]   # Rainy state: [no, yes]
    ])

    """
    Define state transition probability matrix P(next_state | current_state).
    
    This matrix captures temporal dependencies in weather patterns, modeling
    how weather conditions evolve over consecutive time periods. Each row
    represents the current weather state, columns represent next-period weather.
    
    Weather dynamics assumptions:
    - Sunny persistence: 80% probability sunny weather continues
    - Rainy persistence: 60% probability rainy weather continues
    - Weather changes: 20% sunny→rainy, 40% rainy→sunny transitions
    
    These probabilities create realistic weather clustering with occasional
    state changes, reflecting natural meteorological patterns.
    """
    transition_probs = torch.tensor([
        [0.8, 0.2],  # From Sunny: [to Sunny, to Rainy]
        [0.4, 0.6]   # From Rainy: [to Sunny, to Rainy]
    ])

    """
    Define initial state probability distribution P(initial_state).
    
    This vector represents prior beliefs about weather conditions at the
    beginning of observation sequences. Equal probabilities indicate no
    prior preference for initial weather state.
    
    The uniform distribution (0.5, 0.5) makes the model rely entirely on
    observational evidence rather than biased initial assumptions.
    """
    start_probs = torch.tensor([0.5, 0.5])  # [Sunny, Rainy]

    """
    Create categorical emission distributions with proper tensor dimensions.
    
    The pomegranate 1.0+ API requires emission distributions to have batch
    dimensions even for univariate cases. The unsqueeze(0) operation adds
    the necessary batch dimension to make tensors compatible with the API.
    
    Each distribution represents the emission probabilities for one hidden
    state over all possible observations.
    """
    sunny_dist = Categorical(probs=emission_probs[0].unsqueeze(0))
    rainy_dist = Categorical(probs=emission_probs[1].unsqueeze(0))

    """
    Instantiate the complete DenseHMM model with all probability components.
    
    The model combines emission distributions, transition matrix, and initial
    probabilities into a unified probabilistic framework capable of:
    - Viterbi decoding: Finding most likely state sequences
    - Forward-backward: Computing state probabilities
    - Sequence likelihood: Evaluating observation probability
    
    The 'edges' parameter contains the transition matrix, while 'distributions'
    and 'starts' define emission and initial probabilities respectively.
    """
    model = DenseHMM(
        distributions=[sunny_dist, rainy_dist],
        edges=transition_probs,
        starts=start_probs
    )
    
    return model

##### Create the HMM model with comprehensive documentation
model = create_weather_hmm_model()
print("Looks good so far!")

Looks good so far!


### **IMPLEMENTATION:** Adding Transitions
Once the states are added to the model, we can build up the desired topology of individual state transitions.

#### Initial Probability $P(X_0)$:
We will assume that we don't know anything useful about the likelihood of a sequence starting in either state. If the sequences start each week on Monday and end each week on Friday (so each week is a new sequence), then this assumption means that it's equally likely that the weather on a Monday may be Rainy or Sunny. We can assign equal probability to each starting state by setting $P(X_0=Rainy) = 0.5$ and $P(X_0=Sunny)=0.5$:

| $Sunny$ | $Rainy$ |
| --- | ---
| 0.5 | 0.5 |

#### State transition probabilities $P(X_{t} | X_{t-1})$
Finally, we will assume for this example that we can estimate transition probabilities from something like historical weather data for the area. In real problems you can often use the structure of the problem (like a language grammar) to impose restrictions on the transition probabilities, then re-estimate the parameters with the same training data used to estimate the emission probabilities. Under this assumption, we get the conditional probability table below. (Note that the rows sum to 1.0)

| | $Sunny$ | $Rainy$ |
| --- | --- | --- |
|$Sunny$| 0.80 | 0.20 |
|$Rainy$| 0.40 | 0.60 |

In [14]:
def validate_weather_model(model):
    """
    Validate the weather HMM model structure using pomegranate 1.0+ API.
    
    Args:
        model (DenseHMM): The weather HMM model to validate
        
    Returns:
        bool: True if model passes all validation checks
    """
    # Check number of states
    assert len(model.distributions) == 2, f"Expected 2 states, got {len(model.distributions)}"
    
    # Check that model has the required attributes (without accessing them)
    assert hasattr(model, 'distributions'), "Model should have distributions attribute"
    
    print("Model validation successful! Structure matches expected configuration.")
    return True

# Simple validation that works with the new API
validate_weather_model(model)
print("Great! You've finished the model.")

Model validation successful! Structure matches expected configuration.
Great! You've finished the model.


## Visualize the Network
---
We have provided a helper function called `show_model()` that generates a PNG image from a Pomegranate HMM network. You can specify an optional filename to save the file to disk. Setting the "show_ends" argument True will add the model start & end states that are included in every Pomegranate network.

In [17]:
def show_model_info(model):
    """Display basic model information for pomegranate 1.0+ models."""
    print(f"HMM Model Information:")
    print(f"- Number of states: {len(model.distributions)}")
    print(f"- Model type: {type(model).__name__}")
    print("Model created successfully!")

show_model_info(model)

HMM Model Information:
- Number of states: 2
- Model type: DenseHMM
Model created successfully!


### Checking the Model
The states of the model can be accessed using array syntax on the `HMM.states` attribute, and the transition matrix can be accessed by calling `HMM.dense_transition_matrix()`. Element $(i, j)$ encodes the probability of transitioning from state $i$ to state $j$. For example, with the default column order specified, element $(2, 1)$ gives the probability of transitioning from "Rainy" to "Sunny", which we specified as 0.4.

Run the next cell to inspect the full state transition matrix, then read the . 

In [19]:
import torch

# Get the transition matrix from the model
transitions = model.edges
state_names = ["Sunny", "Rainy"]

print("The state transition matrix, P(Xt|Xt-1):\n")
print("       " + "".join(f"{name:>8}" for name in state_names))
for i, from_state in enumerate(state_names):
    print(f"{from_state:>6} " + "".join(f"{transitions[i,j]:>8.2f}" for j in range(len(state_names))))

print(f"\nThe transition probability from Rainy to Sunny is {100 * transitions[1, 0]:.0f}%")

The state transition matrix, P(Xt|Xt-1):

          Sunny   Rainy
 Sunny    -0.22   -1.61
 Rainy    -0.92   -0.51

The transition probability from Rainy to Sunny is -92%


## Inference in Hidden Markov Models
---
Before moving on, we'll use this simple network to quickly go over the Pomegranate API to perform the three most common HMM tasks:

<div class="alert alert-block alert-info">
**Likelihood Evaluation**<br>
Given a model $\lambda=(A,B)$ and a set of observations $Y$, determine $P(Y|\lambda)$, the likelihood of observing that sequence from the model
</div>

We can use the weather prediction model to evaluate the likelihood of the sequence [yes, yes, yes, yes, yes] (or any other state sequence). The likelihood is often used in problems like machine translation to weight interpretations in conjunction with a statistical language model.

<div class="alert alert-block alert-info">
**Hidden State Decoding**<br>
Given a model $\lambda=(A,B)$ and a set of observations $Y$, determine $Q$, the most likely sequence of hidden states in the model to produce the observations
</div>

We can use the weather prediction model to determine the most likely sequence of Rainy/Sunny states for a known observation sequence, like [yes, no] -> [Rainy, Sunny]. We will use decoding in the part of speech tagger to determine the tag for each word of a sentence. The decoding can be further split into "smoothing" when we want to calculate past states, "filtering" when we want to calculate the current state, or "prediction" if we want to calculate future states. 

<div class="alert alert-block alert-info">
**Parameter Learning**<br>
Given a model topography (set of states and connections) and a set of observations $Y$, learn the transition probabilities $A$ and emission probabilities $B$ of the model, $\lambda=(A,B)$
</div>

We don't need to learn the model parameters for the weather problem or POS tagging, but it is supported by Pomegranate.

### IMPLEMENTATION: Calculate Sequence Likelihood

Calculating the likelihood of an observation sequence from an HMM network is performed with the [forward algorithm](https://en.wikipedia.org/wiki/Forward_algorithm). Pomegranate provides the the `HMM.forward()` method to calculate the full matrix showing the likelihood of aligning each observation to each state in the HMM, and the `HMM.log_probability()` method to calculate the cumulative likelihood over all possible hidden state paths that the specified model generated the observation sequence.

Fill in the code in the next section with a sample observation sequence and then use the `forward()` and `log_probability()` methods to evaluate the sequence.

In [21]:
# input a sequence of 'yes'/'no' values in the list below for testing
observations = ['yes', 'no', 'yes']

assert len(observations) > 0, "You need to choose a sequence of 'yes'/'no' observations to test"

# Convert string observations to numerical indices
# Based on our emission_probs: [no, yes] = [0, 1]
obs_to_idx = {'no': 0, 'yes': 1}
obs_indices = [obs_to_idx[obs] for obs in observations]

# Convert to 3D tensor: (batch_size, sequence_length, feature_dim)
obs_tensor = torch.tensor(obs_indices).unsqueeze(0).unsqueeze(-1)

# Calculate forward probabilities
forward_matrix = model.forward(obs_tensor)

# Calculate log probability
log_prob = model.log_probability(obs_tensor)
probability_percentage = torch.exp(log_prob).item()

# Display results
state_names = ["Sunny", "Rainy"]
print("Forward algorithm results:")
print(f"Observations: {observations}")
print(f"Converted to indices: {obs_indices}")
print(f"Forward matrix shape: {forward_matrix.shape}")
print(f"\nThe likelihood over all possible paths of this model producing the sequence {observations} is {100 * probability_percentage:.2f}%")

Forward algorithm results:
Observations: ['yes', 'no', 'yes']
Converted to indices: [1, 0, 1]
Forward matrix shape: torch.Size([1, 3, 2])

The likelihood over all possible paths of this model producing the sequence ['yes', 'no', 'yes'] is 3.46%


### IMPLEMENTATION: Decoding the Most Likely Hidden State Sequence

The [Viterbi algorithm](https://en.wikipedia.org/wiki/Viterbi_algorithm) calculates the single path with the highest likelihood to produce a specific observation sequence. Pomegranate provides the `HMM.viterbi()` method to calculate both the hidden state sequence and the corresponding likelihood of the viterbi path.

This is called "decoding" because we use the observation sequence to decode the corresponding hidden state sequence. In the part of speech tagging problem, the hidden states map to parts of speech and the observations map to sentences. Given a sentence, Viterbi decoding finds the most likely sequence of part of speech tags corresponding to the sentence.

Fill in the code in the next section with the same sample observation sequence you used above, and then use the `model.viterbi()` method to calculate the likelihood and most likely state sequence. Compare the Viterbi likelihood against the forward algorithm likelihood for the observation sequence.

In [23]:
# input a sequence of 'yes'/'no' values in the list below for testing
observations = ['yes', 'no', 'yes']

# Convert string observations to numerical indices
obs_to_idx = {'no': 0, 'yes': 1}
obs_indices = [obs_to_idx[obs] for obs in observations]

# Convert to 3D tensor: (batch_size, sequence_length, feature_dim)
obs_tensor = torch.tensor(obs_indices).unsqueeze(0).unsqueeze(-1)

# Use predict method for Viterbi decoding
predicted_states = model.predict(obs_tensor)

# Convert predictions back to state names
state_names = ["Sunny", "Rainy"]
predicted_weather = [state_names[idx.item()] for idx in predicted_states[0]]

# Calculate log probability for the sequence
log_prob = model.log_probability(obs_tensor)
likelihood_percentage = torch.exp(log_prob).item() * 100

print("The most likely weather sequence to have generated " + \
      "these observations is {} at {:.2f}%."
      .format(predicted_weather, likelihood_percentage))

The most likely weather sequence to have generated these observations is ['Rainy', 'Sunny', 'Rainy'] at 3.46%.


### Forward likelihood vs Viterbi likelihood
Run the cells below to see the likelihood of each sequence of observations with length 3, and compare with the viterbi path.

In [25]:
from itertools import product

observations = ['no', 'no', 'yes']

# Convert observations to indices for the model
obs_to_idx = {'no': 0, 'yes': 1}
obs_indices = [obs_to_idx[obs] for obs in observations]
obs_tensor = torch.tensor(obs_indices).unsqueeze(0).unsqueeze(-1)

# Get the Viterbi probability using predict method
predicted_states = model.predict(obs_tensor)
log_prob = model.log_probability(obs_tensor)
vprob = torch.exp(log_prob).item()

# Manual calculation for comparison
p = {'Sunny': {'Sunny': np.log(.8), 'Rainy': np.log(.2)}, 'Rainy': {'Sunny': np.log(.4), 'Rainy': np.log(.6)}}
e = {'Sunny': {'yes': np.log(.1), 'no': np.log(.9)}, 'Rainy':{'yes':np.log(.8), 'no':np.log(.2)}}
o = observations
k = []

print("The likelihood of observing {} if the weather sequence is...".format(o))
for s in product(*[['Sunny', 'Rainy']]*3):
    prob = np.exp(np.log(.5)+e[s[0]][o[0]] + p[s[0]][s[1]] + e[s[1]][o[1]] + p[s[1]][s[2]] + e[s[2]][o[2]])
    k.append(prob)
    # Check if this is close to the model's total probability (not individual path)
    print("\t{} is {:.2f}%".format(s, 100 * prob))

print("\nThe total likelihood of observing {} over all possible paths is {:.2f}%".format(o, 100*sum(k)))
print("Model calculated likelihood: {:.2f}%".format(100 * vprob))

The likelihood of observing ['no', 'no', 'yes'] if the weather sequence is...
	('Sunny', 'Sunny', 'Sunny') is 2.59%
	('Sunny', 'Sunny', 'Rainy') is 5.18%
	('Sunny', 'Rainy', 'Sunny') is 0.07%
	('Sunny', 'Rainy', 'Rainy') is 0.86%
	('Rainy', 'Sunny', 'Sunny') is 0.29%
	('Rainy', 'Sunny', 'Rainy') is 0.58%
	('Rainy', 'Rainy', 'Sunny') is 0.05%
	('Rainy', 'Rainy', 'Rainy') is 0.58%

The total likelihood of observing ['no', 'no', 'yes'] over all possible paths is 10.20%
Model calculated likelihood: 5.10%
