# Viterbi Heuristic

## Introduction
- The **Viterbi Heuristic** is a simplified version of the **Viterbi Algorithm**, which is used to find the most probable sequence of hidden states in a **Hidden Markov Model (HMM)**. 
- It is typically used in tasks like **part-of-speech tagging**, **speech recognition**, and **machine translation**.

## Viterbi Algorithm
- The Viterbi algorithm is a dynamic programming algorithm used for decoding the most probable sequence of hidden states in Hidden Markov Models (HMMs), given a sequence of observed events.
- It is commonly used in speech recognition, POS tagging, and bioinformatics.
### Key Components:
1. **States**: Possible hidden states (e.g., POS tags like Noun, Verb).
2. **Observations**: Observed events (e.g., words in a sentence).
3. **Transition Probabilities (A)**: Probability of transitioning from one state to another.
4. **Emission Probabilities (B)**: Probability of observing a particular event given a state.
5. **Initial Probabilities (π)**: Probability of starting in a particular state.

### Steps:
1. **Initialization**: Calculate the probability of starting in each state for the first observation.
2. **Recursion**: For each subsequent observation, calculate the probability of arriving at each state from every possible previous state.
3. **Termination**: Once the last observation is processed, backtrack to find the best state sequence.
4. **Backtracking**: Determine the most probable sequence of states.

## Viterbi Heuristic
The **Viterbi Heuristic** is a simplified or approximate version of the Viterbi algorithm. It:
- **Limits the search space** to likely states.
- **Prunes unlikely paths**.
- **Uses approximation techniques** to speed up the computation.

### Common Techniques:
- **Pruning**: Discarding unlikely paths early on.
- **Greedy Decisions**: Choosing the most probable state at each step.
- **Beam Search**: Keeping the top **k** most likely paths at each step.

## Example Use Case
### Part-of-Speech Tagging:
- Input: "The cat sleeps."
- Hidden states: "DT" (Determiner), "NN" (Noun), "VBZ" (Verb).
- The Viterbi algorithm finds the most likely sequence of tags for this sentence. The heuristic might simplify by:
  - Keeping only the top **k** most likely sequences.
  - Pruning unlikely tag transitions.




In [6]:
import numpy as np

# Define the transition and emission probabilities as dictionaries
transition_probs = {
    'DT': {'DT': 0.1, 'NN': 0.7, 'VBZ': 0.2},
    'NN': {'DT': 0.6, 'NN': 0.3, 'VBZ': 0.1},
    'VBZ': {'DT': 0.2, 'NN': 0.3, 'VBZ': 0.5}
}

emission_probs = {
    'DT': {'The': 0.9, 'cat': 0.1, 'sleeps': 0.05},
    'NN': {'The': 0.05, 'cat': 0.8, 'sleeps': 0.05},
    'VBZ': {'The': 0.05, 'cat': 0.1, 'sleeps': 0.9}
}

# Observations (words in the sentence)
observations = ['The', 'cat', 'sleeps']

# Hidden states (POS tags)
states = ['DT', 'NN', 'VBZ']

# Initialize the Viterbi matrix (stores the highest probabilities)
viterbi = {state: [0] * len(observations) for state in states}

# Initialize the backpointer matrix (for tracking the best path)
backpointer = {state: [None] * len(observations) for state in states}

# Step 1: Initialization (first observation)
for state in states:
    viterbi[state][0] = emission_probs[state].get(observations[0], 0) * 1  # Initial probability (P(start) = 1)

# Step 2: Recursion (for subsequent observations)
k = 2  # Keep top k most probable states
for t in range(1, len(observations)):
    word = observations[t]
    
    for current_state in states:
        state_probabilities = []
        
        for previous_state in states:
            transition_prob = transition_probs[previous_state].get(current_state, 0)
            emission_prob = emission_probs[current_state].get(word, 0)
            prob = viterbi[previous_state][t-1] * transition_prob * emission_prob
            state_probabilities.append((prob, previous_state))
        
        # Sort and keep the top k states
        state_probabilities.sort(reverse=True, key=lambda x: x[0])
        top_k_states = state_probabilities[:k]
        
        # Store the most probable path
        best_prob, best_state = top_k_states[0]
        viterbi[current_state][t] = best_prob
        backpointer[current_state][t] = best_state

# Step 3: Termination (backtrack to find the best path)
# Start from the final time step (last word in the observations)
best_final_state = max(viterbi, key=lambda state: viterbi[state][len(observations)-1])
best_path = [best_final_state]

# Backtrack to find the most probable state sequence
for t in range(len(observations)-1, 0, -1):
    best_final_state = backpointer[best_final_state][t]
    best_path.insert(0, best_final_state)

# Output the best path (POS tags sequence)
print("Best POS tag sequence:", best_path)


Best POS tag sequence: ['DT', 'NN', 'VBZ']
