The Viterbi Algorithm is a dynamic programming algorithm for finding the most likely sequence of hidden states, in the case of POS tagging, the hidden states are the parts of speech (POS). The algorithm requires two main inputs, which are typically expressed in matrix form:

- **Transition probabilities**, which provide the likelihood of moving from one hidden state to another.
- **Emission probabilities**, which provide the likelihood of an observed value given a specific hidden state.

The Viterbi Algorithm uses these probabilities to compute the most likely sequence of hidden states that lead to the observed data.

The algorithm essentially consists of two steps, a forward step (or recursion) and a backward step (or backtrace).

1. **Forward step**: It goes through the sequence from start to end, storing at each position the maximum probability of each state and the state that preceded it.
2. **Backward step**: It backtracks from the end of the sequence to the beginning using the stored information to find the most probable path.

Below is an implementation of the Viterbi Algorithm in Python:


In [3]:
import numpy as np

def viterbi(words, tags, start_p, trans_p, emit_p):
    """
    Parameters:
    words : list of observations (e.g., words)
    tags : list of tags (e.g., POS tags)
    start_p : list of start probabilities (prior)
    trans_p : transition probability matrix
    emit_p : emission probability matrix

    Return:
    The best path with its corresponding probability.
    """

    # Initialization
    # List of Viterby variables
    V = [{}]
    # Dictionary of backward pointers.
    # The key in this dictionary is the state, and the value is the optimal path leading to this state.
    path = {}

    # Step 1. Initial probability = start probability x emission probability
    for tag in tags:
        V[0][tag] = start_p[tag] * emit_p[tag][words[0]]
        path[tag] = [tag]

    # Step 2. Forward
    for m in range(1, len(words)):
        V.append({})
        newpath = {}

        for tag in tags:
            # Maximum transition probability x corresponding emission probability
            (prob, state) = max(( V[m-1][y0] * trans_p[y0][tag] * emit_p[tag][words[m]], y0)
                                  for y0 in tags)
            V[m][tag] = prob
            # Store the path
            newpath[tag] = path[state] + [tag]

        path = newpath

    # Step 3. Maximum probability for final state
    (prob, state) = max((V[m][tag], tag) for tag in tags)

    # Step 4. Backward step (find the most probable path)
    return (prob, path[state])


In the above code, obs represents the observed states (e.g., words in a sentence), states are the hidden states (e.g., POS tags), start_p are the start probabilities (the probability of a tag appearing at the beginning of a sentence), trans_p are the transition probabilities (the probability of moving from one tag to another), and emit_p are the emission probabilities (the probability of a word given a tag).

The algorithm then proceeds through the observed data, calculating the maximum probability for each state at each step, and maintaining a record of the path that led to that state. Finally, it returns the path with the maximum probability.


In [4]:
# Example inputs
words = ['they', 'can', 'fish']
tags = ['noun', 'verb']
start_p = {'noun': 0.5, 'verb': 0.5}
trans_p = {
    'noun': {'noun': 0.3, 'verb': 0.7},
    'verb': {'noun': 0.4, 'verb': 0.6},
}
emit_p = {
    'noun': {'they': 0.5, 'can': 0.4, 'fish': 0.1},
    'verb': {'they': 0.1, 'can': 0.3, 'fish': 0.6},
}

# Test the Viterbi algorithm
prob, path = viterbi(words, tags, start_p, trans_p, emit_p)

# Print the result
print("Most probable path:", path)
print("Probability:", prob)


Most probable path: ['noun', 'verb', 'verb']
Probability: 0.0189
