# CTiB E2024 - Week 12 - Exercises

# Theoretical exercises

***Exercise 1***: How many terms are there in the sum on slide 13 from the lecture on Nov 18 for computing $P({\bf X}|\Theta)$? Why?

***Exercise 2***: How many terms are there in the maximization on slide 68 in the Viterbi decoding slides from the lecure on Nov 18 for computing the Viterbi decoding ${\bf Z}^*$? Why?

# Practical exercises

You are given the same 7-state HMM and helper functions that you used last week:

In [355]:
from prompt_toolkit.key_binding.bindings.named_commands import kill_word


class hmm:
    def __init__(self, init_probs, trans_probs, emission_probs):
        self.init_probs = init_probs
        self.trans_probs = trans_probs
        self.emission_probs = emission_probs

In [356]:
init_probs_7_state = [0.00, 0.00, 0.00, 1.00, 0.00, 0.00, 0.00]

trans_probs_7_state = [
    [0.00, 0.00, 0.90, 0.10, 0.00, 0.00, 0.00],
    [1.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00],
    [0.00, 1.00, 0.00, 0.00, 0.00, 0.00, 0.00],
    [0.00, 0.00, 0.05, 0.90, 0.05, 0.00, 0.00],
    [0.00, 0.00, 0.00, 0.00, 0.00, 1.00, 0.00],
    [0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 1.00],
    [0.00, 0.00, 0.00, 0.10, 0.90, 0.00, 0.00],
]

emission_probs_7_state = [
    #   A     C     G     T
    [0.30, 0.25, 0.25, 0.20],
    [0.20, 0.35, 0.15, 0.30],
    [0.40, 0.15, 0.20, 0.25],
    [0.25, 0.25, 0.25, 0.25],
    [0.20, 0.40, 0.30, 0.10],
    [0.30, 0.20, 0.30, 0.20],
    [0.15, 0.30, 0.20, 0.35],
]

hmm_7_state = hmm(init_probs_7_state, trans_probs_7_state, emission_probs_7_state)

In [357]:
def translate_observations_to_indices(obs):
    mapping = {'a': 0, 'c': 1, 'g': 2, 't': 3}
    return [mapping[symbol.lower()] for symbol in obs]

def translate_indices_to_observations(indices):
    mapping = ['a', 'c', 'g', 't']
    return ''.join(mapping[idx] for idx in indices)

def translate_path_to_indices(path):
    return list(map(lambda x: int(x), path))

def translate_indices_to_path(indices):
    return ''.join([str(i) for i in indices])

# 1 - Viterbi Decoding

Below you will implement and experiment with the Viterbi algorithm. The implementation has been split into three parts:

1. Fill out the $\omega$ table using the recursion presented at the lecture.
2. Find the state with the highest probability after observing the entire sequence of observations.
3. Backtrack from the state found in the previous step to obtain the optimal path.

We'll be working with the 7-state model (`hmm_7_state`) and the helper function for translating between observations, hidden states, and indicies, as introduced above (and also used last week).

Additionally, you're given the function below that constructs a table of a specific size filled with zeros.

In [358]:
def make_table(m, n):
    """Make a table with `m` rows and `n` columns filled with zeros."""
    return [[0] * n for _ in range(m)]

You'll be testing your code with the same two sequences as last week, i.e:

In [359]:
x_short = 'GTTTCCCAGTGTATATCGAGGGATACTACGTGCATAGTAACATCGGCCAA'
z_short = '33333333333321021021021021021021021021021021021021'

In [360]:
x_long = 'TGAGTATCACTTAGGTCTATGTCTAGTCGTCTTTCGTAATGTTTGGTCTTGTCACCAGTTATCCTATGGCGCTCCGAGTCTGGTTCTCGAAATAAGCATCCCCGCCCAAGTCATGCACCCGTTTGTGTTCTTCGCCGACTTGAGCGACTTAATGAGGATGCCACTCGTCACCATCTTGAACATGCCACCAACGAGGTTGCCGCCGTCCATTATAACTACAACCTAGACAATTTTCGCTTTAGGTCCATTCACTAGGCCGAAATCCGCTGGAGTAAGCACAAAGCTCGTATAGGCAAAACCGACTCCATGAGTCTGCCTCCCGACCATTCCCATCAAAATACGCTATCAATACTAAAAAAATGACGGTTCAGCCTCACCCGGATGCTCGAGACAGCACACGGACATGATAGCGAACGTGACCAGTGTAGTGGCCCAGGGGAACCGCCGCGCCATTTTGTTCATGGCCCCGCTGCCGAATATTTCGATCCCAGCTAGAGTAATGACCTGTAGCTTAAACCCACTTTTGGCCCAAACTAGAGCAACAATCGGAATGGCTGAAGTGAATGCCGGCATGCCCTCAGCTCTAAGCGCCTCGATCGCAGTAATGACCGTCTTAACATTAGCTCTCAACGCTATGCAGTGGCTTTGGTGTCGCTTACTACCAGTTCCGAACGTCTCGGGGGTCTTGATGCAGCGCACCACGATGCCAAGCCACGCTGAATCGGGCAGCCAGCAGGATCGTTACAGTCGAGCCCACGGCAATGCGAGCCGTCACGTTGCCGAATATGCACTGCGGGACTACGGACGCAGGGCCGCCAACCATCTGGTTGACGATAGCCAAACACGGTCCAGAGGTGCCCCATCTCGGTTATTTGGATCGTAATTTTTGTGAAGAACACTGCAAACGCAAGTGGCTTTCCAGACTTTACGACTATGTGCCATCATTTAAGGCTACGACCCGGCTTTTAAGACCCCCACCACTAAATAGAGGTACATCTGA'
z_long = '3333321021021021021021021021021021021021021021021021021021021021021021033333333334564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564563210210210210210210210210210210210210210210210210210210210210210210210210210210210210210210210210210210210210210210210210210210210321021021021021021021021033334564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564563333333456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456332102102102102102102102102102102102102102102102102102102102102102102102102102102102102102102102103210210210210210210210210210210210210210210210210210210210210210'

Remember to translate these sequences to indices before using them with your algorithms.

## Implementing without log-transformation

First, we will implement the algorithm without log-transformation. This will cause issues with numerical stability (like above when computing the joint probability), so we will use the log-transformation trick to fix this in the next section.

### Computation of the $\omega$ table

In [361]:
class ViterbiResult:
    def __init__(self, w, back_pointer):
        self.w = w
        self.back_pointer = back_pointer

In [362]:
def compute_w(model, x):
    k = len(model.init_probs)
    n = len(x)
    x = translate_observations_to_indices(x)
    
    # Step 1: Initialize the w and backpointer table
    w = make_table(k, n)
    back_pointer = make_table(k, n)  # Backpointer to reconstruct the path

    # Initialize base cases (t=0)
    for state in range(k):
        w[state][0] = (
            model.init_probs[state] * model.emission_probs[state][x[0]]
        )
        back_pointer[state][0] = None

    # Step 2: Fill the w table for t > 0
    for t in range(1, n):
        for state in range(k):
            max_prob, prev_state = max(
                (
                    w[prev_state][t-1] * model.trans_probs[prev_state][state],
                    prev_state
                )
                for prev_state in range(k)
            )
            w[state][t] = max_prob * model.emission_probs[state][x[t]]
            back_pointer[state][t] = prev_state
    
    return ViterbiResult(w, back_pointer)


### Finding the joint probability of an optimal path

Now, write a function that given the $\omega$-table, returns the probability of an optimal path through the HMM. As explained in the lecture, this corresponds to finding the highest probability in the last column of the table.

In [363]:
def opt_path_prob(w):
    # Find the maximum value in the last column
    last_column = [row[-1] for row in w]  # Extract the last column
    return max(last_column)  # Find the maximum value
    
#function for returning the index of the maximum probability in the last column
def opt_path_prob_index(w):
    return max(enumerate(w),key=lambda x: x[1][-1])[0]   

Now test your implementation in the box below:

In [364]:
result = compute_w(hmm_7_state, x_short).w
opt_path_prob(result)

1.9114255184318858e-31

Now do the same for `x_long`. What happens?

In [365]:
result = compute_w(hmm_7_state, x_long).w
opt_path_prob(result)

0.0

### Obtaining an optimal path through backtracking

Implement backtracking to find a most probable path of hidden states given the $\omega$-table.

In [366]:
import math # REMEMBER TO USE math.isclose(a, b) when comparing floats!

In [367]:
def backtrack(model, x, w, back_pointer):
    n = len(x)
    max_prob_index = opt_path_prob_index(w)
    path = [max_prob_index]
    for t in range(n - 1, 0, -1):
        path.insert(0, back_pointer[path[0]][t]) 
    return path

In [368]:
viterbi_result = compute_w(hmm_7_state, x_short)
z_viterbi = backtrack(hmm_7_state, x_short, viterbi_result.w, viterbi_result.back_pointer)
z_viterbi

[3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 2,
 1,
 0,
 2,
 1,
 0,
 2,
 1,
 0,
 2,
 1,
 0,
 2,
 1,
 0,
 2,
 1,
 0,
 2,
 1,
 0,
 2,
 1,
 0,
 2,
 1,
 0,
 2,
 1,
 0,
 2,
 1,
 0,
 2,
 1,
 0,
 2,
 1]

Now do the same for `x_long`. What happens?

In [369]:
# Your code here

In [370]:
import math

def log(x):
    if x == 0:
        return float('-inf')
    return math.log(x)

## Implementing with log-transformation

Now implement the Viterbi algorithm with log-transformation. The steps are the same as above.

### Computation of the (log-transformed) $\omega$ table

In [371]:
def compute_w_log(model, x):
    k = len(model.init_probs)
    n = len(x)
    x = translate_observations_to_indices(x)
    w = make_table(k, n)
    
    # Step 1: Initialize the w and backpointer table
    back_pointer = make_table(k, n)  # Backpointer to reconstruct the path

    # Initialize base cases (t=0)
    for state in range(k):
        w[state][0] = (
            log(model.init_probs[state]) + log(model.emission_probs[state][x[0]])
        )
        back_pointer[state][0] = None

    # Step 2: Fill the w table for t > 0
    for t in range(1, n):
        for state in range(k):
            max_prob, prev_state = max(
                (
                    w[prev_state][t-1] + log(model.trans_probs[prev_state][state]),
                    prev_state
                )
                for prev_state in range(k)
            )
            w[state][t] = max_prob + log(model.emission_probs[state][x[t]])
            back_pointer[state][t] = prev_state
    
    return ViterbiResult(w, back_pointer)


### Finding the (log-transformed) joint probability of an optimal path

In [372]:
#function for finding the optimal probability is essentially the same
def opt_path_prob_log(w):
    return opt_path_prob(w)

In [373]:
viterbi_result = compute_w_log(hmm_7_state, x_short)
opt_path_prob_log(viterbi_result.w)

-70.73228857440488

Now do the same for `x_long`. What happens?

In [374]:
viterbi_result = compute_w_log(hmm_7_state, x_long)
opt_path_prob_log(viterbi_result.w)

-1406.7209253880144

### Obtaining an optimal path through backtracking

In [375]:
#function for finding the optimal path is essentially the same
def backtrack_log(model, x, w, back_pointer):
    return backtrack(model, x, w, back_pointer)

In [376]:
viterbi_result = compute_w_log(hmm_7_state, x_short)
z_viterbi_log = backtrack_log(hmm_7_state, x_short, viterbi_result.w, viterbi_result.back_pointer)
z_viterbi_log

[3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 2,
 1,
 0,
 2,
 1,
 0,
 2,
 1,
 0,
 2,
 1,
 0,
 2,
 1,
 0,
 2,
 1,
 0,
 2,
 1,
 0,
 2,
 1,
 0,
 2,
 1,
 0,
 2,
 1,
 0,
 2,
 1,
 0,
 2,
 1,
 0,
 2,
 1]

Now do the same for `x_long`. What happens?

In [377]:
viterbi_result = compute_w_log(hmm_7_state, x_long)
z_viterbi_long_log = backtrack_log(hmm_7_state, x_long, viterbi_result.w, viterbi_result.back_pointer)
z_viterbi_long_log

[3,
 3,
 3,
 3,
 3,
 2,
 1,
 0,
 2,
 1,
 0,
 2,
 1,
 0,
 2,
 1,
 0,
 2,
 1,
 0,
 2,
 1,
 0,
 2,
 1,
 0,
 2,
 1,
 0,
 2,
 1,
 0,
 2,
 1,
 0,
 2,
 1,
 0,
 2,
 1,
 0,
 2,
 1,
 0,
 2,
 1,
 0,
 2,
 1,
 0,
 2,
 1,
 0,
 2,
 1,
 0,
 2,
 1,
 0,
 2,
 1,
 0,
 2,
 1,
 0,
 2,
 1,
 0,
 2,
 1,
 0,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 4,
 5,
 6,
 4,
 5,
 6,
 4,
 5,
 6,
 4,
 5,
 6,
 4,
 5,
 6,
 4,
 5,
 6,
 4,
 5,
 6,
 4,
 5,
 6,
 4,
 5,
 6,
 4,
 5,
 6,
 4,
 5,
 6,
 4,
 5,
 6,
 4,
 5,
 6,
 4,
 5,
 6,
 4,
 5,
 6,
 4,
 5,
 6,
 4,
 5,
 6,
 4,
 5,
 6,
 4,
 5,
 6,
 4,
 5,
 6,
 4,
 5,
 6,
 4,
 5,
 6,
 4,
 5,
 6,
 4,
 5,
 6,
 4,
 5,
 6,
 4,
 5,
 6,
 4,
 5,
 6,
 4,
 5,
 6,
 4,
 5,
 6,
 4,
 5,
 6,
 4,
 5,
 6,
 4,
 5,
 6,
 4,
 5,
 6,
 4,
 5,
 6,
 4,
 5,
 6,
 4,
 5,
 6,
 4,
 5,
 6,
 4,
 5,
 6,
 4,
 5,
 6,
 4,
 5,
 6,
 4,
 5,
 6,
 4,
 5,
 6,
 4,
 5,
 6,
 3,
 2,
 1,
 0,
 2,
 1,
 0,
 2,
 1,
 0,
 2,
 1,
 0,
 2,
 1,
 0,
 2,
 1,
 0,
 2,
 1,
 0,
 2,
 1,
 0,
 2,
 1,
 0,
 2,
 1,
 0,
 2,
 1,
 0,
 2,
 1,
 0,
 2,
 1,
 0,


### Does it work?

Think about how to verify that your implementations of Viterbi (i.e. `compute_w`, `opt_path_prob`, `backtrack`, and there log-transformed variants `compute_w_log`, `opt_path_prob_log`, `backtrack_log`) are correct.

One thing that should hold is that the probability of a most likely path as computed by `opt_path_prob` (or `opt_path_prob_log`) for a given sequence of observables (e.g. `x_short` or `x_long`) should be equal to the joint probability of a corersponding most probable path as found by `backtrack` (or `backtrack_log`) and the given sequence of observables. Why?

Make an experiment that validates that this is the case for your implementations of Viterbi and `x_short` and `x_long`.

In [378]:
# To access joint_prob and joint_prob_log, you must copy your implementations from last week here ...

def joint_prob(model, x, z):
    x = translate_observations_to_indices(x)
    # z = translate_path_to_indices(z)
    
    result = model.init_probs[z[0]] * model.emission_probs[z[0]][x[0]]
    
    for i in range(1, len(z)):
        prev_state = z[i - 1]
        state = z[i]
        x_state = x[i]

        result *= model.trans_probs[prev_state][state] * model.emission_probs[state][x_state]
    
    return result

def joint_prob_log(model, x, z):
    x = translate_observations_to_indices(x)
    # z = translate_path_to_indices(z)
    
    result = log(model.init_probs[z[0]]) + log(model.emission_probs[z[0]][x[0]])

    for i in range(1, len(z)):
        prev_state = z[i - 1]
        state = z[i]
        x_state = x[i]
        
        result += log(model.trans_probs[prev_state][state]) + log(model.emission_probs[state][x_state])

    return result

opt_path_prob_log(compute_w_log(hmm_7_state, x_short).w)
opt_path_prob(compute_w(hmm_7_state, x_short).w)

# Check that opt_path_prob is equal to joint_prob(hmm_7_state, x_short, z_viterbi)

if math.isclose(opt_path_prob(compute_w(hmm_7_state, x_short).w),
                joint_prob(hmm_7_state, x_short, z_viterbi)):
    print("opt_path_prob is equal to joint_prob(hmm_7_state, x_short, z_viterbi)")

# Check that opt_path_prob_log is equal to joint_prob_log(hmm_7_state, x_short, z_viterbi_log)

if math.isclose(opt_path_prob_log(compute_w_log(hmm_7_state, x_short).w),
                joint_prob_log(hmm_7_state, x_short, z_viterbi)):
    print("opt_path_prob_log is equal to joint_prob_log(hmm_7_state, x_short, z_viterbi_log)")
# Do the above checks for x_long ...

# Check that opt_path_prob is equal to joint_prob(hmm_7_state, x_long, z_viterbi)
viterbi_result = compute_w(hmm_7_state, x_long)
z_viterbi_long = backtrack(hmm_7_state, x_long, viterbi_result.w, viterbi_result.back_pointer)
z_viterbi_long

if math.isclose(opt_path_prob(viterbi_result.w),
                joint_prob(hmm_7_state, x_long, z_viterbi_long)):
    print("opt_path_prob is equal to joint_prob(hmm_7_state, x_long, z_viterbi)")

viterbi_result = compute_w_log(hmm_7_state, x_long)
z_viterbi_long_log = backtrack_log(hmm_7_state, x_long, viterbi_result.w, viterbi_result.back_pointer)
z_viterbi_long_log

if math.isclose(opt_path_prob_log(viterbi_result.w),
                joint_prob_log(hmm_7_state, x_long, z_viterbi_long_log)):
    print("opt_path_prob is equal to joint_prob(hmm_7_state, x_long, z_viterbi)")
# Your code here ...

opt_path_prob is equal to joint_prob(hmm_7_state, x_short, z_viterbi)
opt_path_prob_log is equal to joint_prob_log(hmm_7_state, x_short, z_viterbi_log)
 opt_path_prob is equal to joint_prob(hmm_7_state, x_long, z_viterbi)
 opt_path_prob is equal to joint_prob(hmm_7_state, x_long, z_viterbi)


Do your implementations pass the above checks?

### Does log-transformation matter?

Make an experiment that investigates how long the input string can be before `backtrack` and `backtrack_log` start to disagree on a most likely path and its probability.

In [None]:
# Your code here

**Your answer here:**

For the 7-state model, `backtrack` and `backtrack_log` start to disagree on a most likely path and its probability for **i = ?** .