In [1]:
import math  # Just ignore this :-)

def log(x):
    if x == 0:
        return float('-inf')
    return math.log(x)

# CTiB - Week 10 - Practical Exercises

In the exercise below, you will implement and experiment with the computation of the Viterbi decoding as explained in the lectures in week 10.

# 1 - Viterbi Decoding

Below you will implement and experiment with the Viterbi algorithm. The implementation has been split into three parts:

1. Fill out the $\omega$ table using the recursion presented at the lecture.
2. Find the state with the highest probability after observing the entire sequence of observations.
3. Backtrack from the state found in the previous step to obtain the optimal path.

We'll be working with the two models (`hmm_7_state` and `hmm_3_state`) that we also worked with last time: the 3 and 7-state models. We have included the models below.

In [2]:
class hmm:
    def __init__(self, init_probs, trans_probs, emission_probs):
        self.init_probs = init_probs
        self.trans_probs = trans_probs
        self.emission_probs = emission_probs

In [3]:
init_probs_7_state = [0.00, 0.00, 0.00, 1.00, 0.00, 0.00, 0.00]

trans_probs_7_state = [
    [0.00, 0.00, 0.90, 0.10, 0.00, 0.00, 0.00],
    [1.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00],
    [0.00, 1.00, 0.00, 0.00, 0.00, 0.00, 0.00],
    [0.00, 0.00, 0.05, 0.90, 0.05, 0.00, 0.00],
    [0.00, 0.00, 0.00, 0.00, 0.00, 1.00, 0.00],
    [0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 1.00],
    [0.00, 0.00, 0.00, 0.10, 0.90, 0.00, 0.00],
]

emission_probs_7_state = [
    #   A     C     G     T
    [0.30, 0.25, 0.25, 0.20],
    [0.20, 0.35, 0.15, 0.30],
    [0.40, 0.15, 0.20, 0.25],
    [0.25, 0.25, 0.25, 0.25],
    [0.20, 0.40, 0.30, 0.10],
    [0.30, 0.20, 0.30, 0.20],
    [0.15, 0.30, 0.20, 0.35],
]

hmm_7_state = hmm(init_probs_7_state, trans_probs_7_state, emission_probs_7_state)

In [4]:
init_probs_3_state = [0.10, 0.80, 0.10]

trans_probs_3_state = [
    [0.90, 0.10, 0.00],
    [0.05, 0.90, 0.05],
    [0.00, 0.10, 0.90],
]

emission_probs_3_state = [
    #   A     C     G     T
    [0.40, 0.15, 0.20, 0.25],
    [0.25, 0.25, 0.25, 0.25],
    [0.20, 0.40, 0.30, 0.10],
]

hmm_3_state = hmm(init_probs_3_state, trans_probs_3_state, emission_probs_3_state)

We also need the helper functions for translating between observations/paths and indices.

In [5]:
def translate_path_to_indices(path):
    return list(map(lambda x: int(x), path))

def translate_indices_to_path(indices):
    return ''.join([str(i) for i in indices])

def translate_observations_to_indices(obs):
    mapping = {'a': 0, 'c': 1, 'g': 2, 't': 3}
    return [mapping[symbol.lower()] for symbol in obs]

def translate_indices_to_observations(indices):
    mapping = ['a', 'c', 'g', 't']
    return ''.join(mapping[idx] for idx in indices)

Additionally, you're given the function below that constructs a table of a specific size filled with zeros.

In [6]:
def make_table(m, n):
    """Make a table with `m` rows and `n` columns filled with zeros."""
    return [[0] * n for _ in range(m)]

You'll be testing your code with the same two sequences as last time, i.e:

In [7]:
x_short = 'GTTTCCCAGTGTATATCGAGGGATACTACGTGCATAGTAACATCGGCCAA'
z_short = '33333333333321021021021021021021021021021021021021'

In [8]:
x_long = 'TGAGTATCACTTAGGTCTATGTCTAGTCGTCTTTCGTAATGTTTGGTCTTGTCACCAGTTATCCTATGGCGCTCCGAGTCTGGTTCTCGAAATAAGCATCCCCGCCCAAGTCATGCACCCGTTTGTGTTCTTCGCCGACTTGAGCGACTTAATGAGGATGCCACTCGTCACCATCTTGAACATGCCACCAACGAGGTTGCCGCCGTCCATTATAACTACAACCTAGACAATTTTCGCTTTAGGTCCATTCACTAGGCCGAAATCCGCTGGAGTAAGCACAAAGCTCGTATAGGCAAAACCGACTCCATGAGTCTGCCTCCCGACCATTCCCATCAAAATACGCTATCAATACTAAAAAAATGACGGTTCAGCCTCACCCGGATGCTCGAGACAGCACACGGACATGATAGCGAACGTGACCAGTGTAGTGGCCCAGGGGAACCGCCGCGCCATTTTGTTCATGGCCCCGCTGCCGAATATTTCGATCCCAGCTAGAGTAATGACCTGTAGCTTAAACCCACTTTTGGCCCAAACTAGAGCAACAATCGGAATGGCTGAAGTGAATGCCGGCATGCCCTCAGCTCTAAGCGCCTCGATCGCAGTAATGACCGTCTTAACATTAGCTCTCAACGCTATGCAGTGGCTTTGGTGTCGCTTACTACCAGTTCCGAACGTCTCGGGGGTCTTGATGCAGCGCACCACGATGCCAAGCCACGCTGAATCGGGCAGCCAGCAGGATCGTTACAGTCGAGCCCACGGCAATGCGAGCCGTCACGTTGCCGAATATGCACTGCGGGACTACGGACGCAGGGCCGCCAACCATCTGGTTGACGATAGCCAAACACGGTCCAGAGGTGCCCCATCTCGGTTATTTGGATCGTAATTTTTGTGAAGAACACTGCAAACGCAAGTGGCTTTCCAGACTTTACGACTATGTGCCATCATTTAAGGCTACGACCCGGCTTTTAAGACCCCCACCACTAAATAGAGGTACATCTGA'
z_long = '3333321021021021021021021021021021021021021021021021021021021021021021033333333334564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564563210210210210210210210210210210210210210210210210210210210210210210210210210210210210210210210210210210210210210210210210210210210321021021021021021021021033334564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564563333333456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456332102102102102102102102102102102102102102102102102102102102102102102102102102102102102102102102103210210210210210210210210210210210210210210210210210210210210210'

Remember to translate these sequences to indices before using them with your algorithms.

## Implementing without log-transformation

First, we will implement the algorithm without log-transformation. This will cause issues with numerical stability (like above when computing the joint probability), so we will use the log-transformation trick to fix this in the next section.

### Computation of the $\omega$ table

In [9]:
def compute_w(model, x):
    k = len(model.init_probs)
    n = len(x)
    
    w = make_table(k, n)
    
    # Base case: fill out w[i][0] for i = 0..k-1
    for i in range(k):
        w[i][0] = model.init_probs[i] * model.emission_probs[i][x[0]]
        
    # Inductive case: fill out w[i][j] for i = 0..k, j = 0..n-1
    for j in range(1, n):
        for i in range(0, k):
            for t in range(k):
                w[i][j] = max(w[i][j], model.emission_probs[i][x[j]] * w[t][j-1] * model.trans_probs[t][i])

    for element in w:
        print(element)

compute_w(hmm_7_state, translate_observations_to_indices(x_short))

[0.0, 0, 0, 0.0001875, 5.2734375000000006e-05, 1.3842773437500002e-05, 2.2148437499999996e-06, 7.475097656250001e-07, 9.343872070312501e-08, 2.39203125e-08, 1.0091381835937502e-08, 6.307113647460939e-10, 3.875090625e-10, 9.082243652343754e-11, 3.366822439205649e-11, 3.4875815625000005e-12, 2.452205786132814e-12, 6.628431677186121e-13, 2.1187057992187502e-14, 2.2069852075195328e-14, 8.948382764201263e-15, 2.0476541335074109e-16, 1.7876580180908215e-16, 6.44283559022491e-17, 1.119554897495177e-17, 2.011115270352174e-18, 1.6235945687366777e-18, 1.36025920045664e-19, 2.2625046791461958e-20, 5.114322891520534e-20, 5.509049761849393e-22, 3.0543813168473645e-22, 4.315209939720451e-22, 1.041210404989535e-23, 1.6493659110975769e-24, 1.398128020469426e-23, 1.171361705613227e-25, 1.781315183985383e-26, 2.2649673931604703e-25, 1.5813383025778564e-27, 3.20636733117369e-28, 8.561576746146578e-27, 8.539226833920425e-30, 8.657191794168963e-30, 1.6855604218976076e-28, 6.019858649634314e-32, 5.843604461

### Finding the joint probability of an optimal path

Now, write a function that given the $\omega$-table, returns the probability of an optimal path through the HMM. As explained in the lecture, this corresponds to finding the highest probability in the last column of the table.

In [None]:
def opt_path_prob(w):
    pass

Now test your implementation in the box below:

In [None]:
w = compute_w(hmm_7_state, translate_observations_to_indices(x_short))
opt_path_prob(w)

Now do the same for `x_long`. What happens?

In [26]:
# Your code here ...

### Obtaining an optimal path through backtracking

Implement backtracking to find a most probable path of hidden states given the $\omega$-table.

In [None]:
def backtrack(w):
    pass

In [None]:
w = compute_w(hmm_7_state, translate_observations_to_indices(x_short))
z_viterbi = backtrack(w)

Now do the same for `x_long`. What happens?

In [None]:
# Your code here ...

## Implementing with log-transformation

Now implement the Viterbi algorithm with log transformation. The steps are the same as above.

### Computation of the $\omega$ table

In [None]:
def compute_w_log(model, x):
    k = len(model.init_probs)
    n = len(x)
    
    w = make_table(k, n)
    
    # Base case: fill out w[i][0] for i = 0..k-1
    # ...
    
    # Inductive case: fill out w[i][j] for i = 0..k, j = 0..n-1
    # ...

### Finding the (log transformed) joint probability of an optimal path

In [None]:
def opt_path_prob_log(w):
    pass

In [None]:
w = compute_w_log(hmm_7_state, translate_observations_to_indices(x_short))
opt_path_prob_log(w)

Now do the same for `x_long`. What happens?

In [None]:
# Your code here ...

### Obtaining an optimal path through backtracking

In [None]:
def backtrack_log(w):
    pass

In [None]:
w = compute_w_log(hmm_7_state, translate_observations_to_indices(x_short))
z_viterbi_log = backtrack_log(w)

Now do the same for `x_long`. What happens?

In [None]:
# Your code here ...

### Does it work?

Think about how to verify that your implementations of Viterbi (i.e. `compute_w`, `opt_path_prob`, `backtrack`, and there log-transformed variants `compute_w_log`, `opt_path_prob_log`, `backtrack_log`) are correct.

One thing that should hold is that the probability of a most likely path as computed by `opt_path_prob` (or `opt_path_prob_log`) for a given sequence of observables (e.g. `x_short` or `x_long`) should be equal to the joint probability of a corresponding most probable path as found by `backtrack` (or `backtrack_log`) and the given sequence of observables. Why?

Make an experiment that validates that this is the case for your implementations of Viterbi and `x_short` and `x_long`. You use your code from last week to compute the joint probability

In [2]:
# Check that opt_path_prob is equal to joint_prob(hmm_7_state, x_short, z_viterbi)

# Your code here ...

# Check that opt_path_prob_log is equal to joint_prob_log(hmm_7_state, x_short, z_viterbi_log)

# Your code here ...

# Do the above checks for x_long ...

# Your code here ...

# Do the above checks using hmm_3_state

# Your code here ...

Do your implementations pass the above checks?

### Does log transformation matter?

Make an experiment that investigates how long the input string can be before `backtrack` and `backtrack_log` start to disagree on a most likely path and its probability.

In [None]:
# Your code here ...

** Your answer here: **

For the 3-state model, `backtrack` and `backtrack_log` start to disagree on a most likely path and its probability
for **i = ? **.

For the 7-state model, `backtrack` and `backtrack_log` start to disagree on a most likely path and its probability
for **i = ? ** .

