In [154]:
import math  # Just ignore this :-)

def log(x):
    if x == 0:
        return float('-inf')
    return math.log(x)

# CTiB E2023 - Week 11 - Exercises

# Theoretical exercises

***Exercise 1***: How many terms are there in the sum on slide 37 from the lecture on Nov 13 for computing $P({\bf X}|\Theta)$? Why?



***Exercise 2***: How many terms are there in the maximization on slide 4 from the lecure on Nov 20 for computing the Viterbi decoding ${\bf Z}^*$? Why?



***Exercise 3***: Where in the derivation of $\omega({\bf z}_n$) on slide 8 do we use that the fact that we are working with hidden Markov models? And how do we use it?

# Practical exercises

You are given the same 7-state HMM and helper functions that you used last week:

In [155]:
class hmm:
    def __init__(self, init_probs, trans_probs, emission_probs):
        self.init_probs = init_probs
        self.trans_probs = trans_probs
        self.emission_probs = emission_probs

In [156]:
init_probs_7_state = [0.00, 0.00, 0.00, 1.00, 0.00, 0.00, 0.00]

trans_probs_7_state = [
    [0.00, 0.00, 0.90, 0.10, 0.00, 0.00, 0.00],
    [1.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00],
    [0.00, 1.00, 0.00, 0.00, 0.00, 0.00, 0.00],
    [0.00, 0.00, 0.05, 0.90, 0.05, 0.00, 0.00],
    [0.00, 0.00, 0.00, 0.00, 0.00, 1.00, 0.00],
    [0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 1.00],
    [0.00, 0.00, 0.00, 0.10, 0.90, 0.00, 0.00],
]

emission_probs_7_state = [
    #   A     C     G     T
    [0.30, 0.25, 0.25, 0.20],
    [0.20, 0.35, 0.15, 0.30],
    [0.40, 0.15, 0.20, 0.25],
    [0.25, 0.25, 0.25, 0.25],
    [0.20, 0.40, 0.30, 0.10],
    [0.30, 0.20, 0.30, 0.20],
    [0.15, 0.30, 0.20, 0.35],
]

hmm_7_state = hmm(init_probs_7_state, trans_probs_7_state, emission_probs_7_state)

In [157]:
def translate_observations_to_indices(obs):
    mapping = {'a': 0, 'c': 1, 'g': 2, 't': 3}
    return [mapping[symbol.lower()] for symbol in obs]

def translate_indices_to_observations(indices):
    mapping = ['a', 'c', 'g', 't']
    return ''.join(mapping[idx] for idx in indices)

def translate_path_to_indices(path):
    return list(map(lambda x: int(x), path))

def translate_indices_to_path(indices):
    return ''.join([str(i) for i in indices])

# 1 - Viterbi Decoding

Below you will implement and experiment with the Viterbi algorithm. The implementation has been split into three parts:

1. Fill out the $\omega$ table using the recursion presented at the lecture.
2. Find the state with the highest probability after observing the entire sequence of observations.
3. Backtrack from the state found in the previous step to obtain the optimal path.

We'll be working with the 7-state model (`hmm_7_state`) and the helper function for translating between observations, hidden states, and indicies, as introduced above (and also used last week).

Additionally, you're given the function below that constructs a table of a specific size filled with zeros.

In [158]:
def make_table(m, n):
    """Make a table with `m` rows and `n` columns filled with zeros."""
    return [[0] * n for _ in range(m)]

def pretty_table(table):
    for row in table:
        print('| {:^10} | {:^10} | {:^10} | {:^10} |'.format(*row))

You'll be testing your code with the same two sequences as last week, i.e:

In [159]:
x_short = 'GTTTCCCAGTGTATATCGAGGGATACTACGTGCATAGTAACATCGGCCAA'
z_short = '33333333333321021021021021021021021021021021021021'

In [160]:
x_long = 'TGAGTATCACTTAGGTCTATGTCTAGTCGTCTTTCGTAATGTTTGGTCTTGTCACCAGTTATCCTATGGCGCTCCGAGTCTGGTTCTCGAAATAAGCATCCCCGCCCAAGTCATGCACCCGTTTGTGTTCTTCGCCGACTTGAGCGACTTAATGAGGATGCCACTCGTCACCATCTTGAACATGCCACCAACGAGGTTGCCGCCGTCCATTATAACTACAACCTAGACAATTTTCGCTTTAGGTCCATTCACTAGGCCGAAATCCGCTGGAGTAAGCACAAAGCTCGTATAGGCAAAACCGACTCCATGAGTCTGCCTCCCGACCATTCCCATCAAAATACGCTATCAATACTAAAAAAATGACGGTTCAGCCTCACCCGGATGCTCGAGACAGCACACGGACATGATAGCGAACGTGACCAGTGTAGTGGCCCAGGGGAACCGCCGCGCCATTTTGTTCATGGCCCCGCTGCCGAATATTTCGATCCCAGCTAGAGTAATGACCTGTAGCTTAAACCCACTTTTGGCCCAAACTAGAGCAACAATCGGAATGGCTGAAGTGAATGCCGGCATGCCCTCAGCTCTAAGCGCCTCGATCGCAGTAATGACCGTCTTAACATTAGCTCTCAACGCTATGCAGTGGCTTTGGTGTCGCTTACTACCAGTTCCGAACGTCTCGGGGGTCTTGATGCAGCGCACCACGATGCCAAGCCACGCTGAATCGGGCAGCCAGCAGGATCGTTACAGTCGAGCCCACGGCAATGCGAGCCGTCACGTTGCCGAATATGCACTGCGGGACTACGGACGCAGGGCCGCCAACCATCTGGTTGACGATAGCCAAACACGGTCCAGAGGTGCCCCATCTCGGTTATTTGGATCGTAATTTTTGTGAAGAACACTGCAAACGCAAGTGGCTTTCCAGACTTTACGACTATGTGCCATCATTTAAGGCTACGACCCGGCTTTTAAGACCCCCACCACTAAATAGAGGTACATCTGA'
z_long = '3333321021021021021021021021021021021021021021021021021021021021021021033333333334564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564563210210210210210210210210210210210210210210210210210210210210210210210210210210210210210210210210210210210210210210210210210210210321021021021021021021021033334564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564564563333333456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456332102102102102102102102102102102102102102102102102102102102102102102102102102102102102102102102103210210210210210210210210210210210210210210210210210210210210210'

Remember to translate these sequences to indices before using them with your algorithms.

## Implementing without log-transformation

First, we will implement the algorithm without log-transformation. This will cause issues with numerical stability (like above when computing the joint probability), so we will use the log-transformation trick to fix this in the next section.

### Computation of the $\omega$ table

In [161]:
def compute_w(model, x):
    x = translate_observations_to_indices(x)
    k_count = len(model.init_probs)
    n = len(x)
    
    w = make_table(k_count, n)
    
    # Base case: fill out w[i][0] for i = 0..k-1
    for j in range(k_count):
        ip = model.init_probs[j]
        ep = model.emission_probs[j][0]
        w[j][0] = ip * ep
    
    # Inductive case: fill out w[i][j] for i = 0..k, j = 0..n-1
    for i in range(1, n):
        for j in range(k_count):
            max_prob = 0
            
            for k in range(k_count):
                tp = model.trans_probs[k][j]
                ep = model.emission_probs[j][x[i]]
                prob = w[k][i-1] * tp * ep
                
                if prob > max_prob:
                    max_prob = prob
            
            w[j][i] = max_prob
    return w

### Finding the joint probability of an optimal path

Now, write a function that given the $\omega$-table, returns the probability of an optimal path through the HMM. As explained in the lecture, this corresponds to finding the highest probability in the last column of the table.

In [162]:
def opt_path_prob(w):
    k_count = len(w)
    max_prob = 0
    
    for j in range(k_count):
        if w[j][-1] > max_prob:
            max_prob = w[j][-1]
    return max_prob

Now test your implementation in the box below:

In [163]:
w = compute_w(hmm_7_state, x_short)
opt_path_prob(w)

1.9114255184318858e-31

Now do the same for `x_long`. What happens?

In [164]:
w = compute_w(hmm_7_state, x_long)
opt_path_prob(w)

0

### Obtaining an optimal path through backtracking

Implement backtracking to find a most probable path of hidden states given the $\omega$-table.

In [165]:
def backtrack(model, x, w):
    x = translate_observations_to_indices(x)
    n = len(w[0])
    k_count = len(w)
    max_prob = 0
    max_prob_k = None
    
    for k in range(k_count):
        if w[k][-1] > max_prob:
            max_prob = w[k][-1]
            max_prob_k = k
            
    path = f'{max_prob_k}' 
    
    previous_k = max_prob_k
    previous_max_prob = max_prob
    
    #print(-1, previous_k, previous_max_prob)
    #print(path)
    
    for i in range(-2, -n-1, -1):
        for k in range(k_count):
            
            tp = model.trans_probs[k][previous_k]
            ep = model.emission_probs[previous_k][x[i+1]]
            
            prob = w[k][i] * tp * ep
                
            #print(i, k, prob, previous_max_prob)
            if math.isclose(prob, previous_max_prob):
                path = f'{k}{path}'
                previous_max_prob = w[k][i]
                previous_k = k
                #print(previous_max_prob, previous_k)
        #print(path)
            
    return path

In [166]:
w = compute_w(hmm_7_state, x_short)
print(opt_path_prob(w))
z_viterbi = backtrack(hmm_7_state, x_short, w)

z_viterbi

1.9114255184318858e-31


'33333333333321021021021021021021021021021021021021'

Now do the same for `x_long`. What happens?

In [167]:
w = compute_w(hmm_7_state, x_long)
print(opt_path_prob(w))
z_viterbi = backtrack(hmm_7_state, x_long, w)

z_viterbi

0


TypeError: list indices must be integers or slices, not NoneType

## Implementing with log-transformation

Now implement the Viterbi algorithm with log-transformation. The steps are the same as above.

### Computation of the (log-transformed) $\omega$ table

In [170]:
def compute_w_log(model, x):
    x = translate_observations_to_indices(x)
    k_count = len(model.init_probs)
    n = len(x)
    
    w = make_table(k_count, n)
    
    # Base case: fill out w[i][0] for i = 0..k-1
    for j in range(k_count):
        ip = model.init_probs[j]
        ep = model.emission_probs[j][0]
        w[j][0] = log(ip) + log(ep)
    
    # Inductive case: fill out w[i][j] for i = 0..k, j = 0..n-1
    for i in range(1, n):
        for j in range(k_count):
            max_prob = float('-inf')
            
            for k in range(k_count):
                tp = model.trans_probs[k][j]
                ep = model.emission_probs[j][x[i]]
                prob = w[k][i-1] + log(tp) + log(ep)
                
                if prob > max_prob:
                    max_prob = prob
            
            w[j][i] = max_prob
    return w

### Finding the (log-transformed) joint probability of an optimal path

In [None]:
def opt_path_prob_log(w):
    k_count = len(w)
    max_prob = float('-inf')
    
    for j in range(k_count):
        if w[j][-1] > max_prob:
            max_prob = w[j][-1]
    return max_prob

In [None]:
w = compute_w_log(hmm_7_state, x_short)
opt_path_prob_log(w)

-70.73228857440488

Now do the same for `x_long`. What happens?

In [None]:
w = compute_w_log(hmm_7_state, x_long)
opt_path_prob_log(w)

-1406.7209253880144

### Obtaining an optimal path through backtracking

In [None]:
def backtrack_log(model, x, w):
    x = translate_observations_to_indices(x)
    n = len(w[0])
    k_count = len(w)
    max_prob = float('-inf')
    max_prob_k = None
    
    for k in range(k_count):
        if w[k][-1] > max_prob:
            max_prob = w[k][-1]
            max_prob_k = k
            
    path = f'{max_prob_k}' 
    
    previous_k = max_prob_k
    previous_max_prob = max_prob
    
    #print(-1, previous_k, previous_max_prob)
    #print(path)
    
    for i in range(-2, -n-1, -1):
        for k in range(k_count):
            
            tp = model.trans_probs[k][previous_k]
            ep = model.emission_probs[previous_k][x[i+1]]
            
            prob = w[k][i] + log(tp) + log(ep)
                
            #print(i, k, prob, previous_max_prob)
            if math.isclose(prob, previous_max_prob):
                path = f'{k}{path}'
                previous_max_prob = w[k][i]
                previous_k = k
                #print(previous_max_prob, previous_k)
        #print(path)
            
    return path

In [None]:
w = compute_w_log(hmm_7_state, x_short)
z_viterbi_log = backtrack_log(hmm_7_state, x_short, w)

z_viterbi_log

'33333333333321021021021021021021021021021021021021'

Now do the same for `x_long`. What happens?

In [None]:
w = compute_w_log(hmm_7_state, x_long)
z_viterbi_log = backtrack_log(hmm_7_state, x_long, w)

z_viterbi_log

'333332102102102102102102102102102102102102102102102102102102102102102103333333333456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456321021021021021021021021021021021021021021021021021021021021021021021021021021021021021021021021021021021021021021021021021021021032102102102102102102102103333456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456456333333345645645645645645645645645645645645645645645645645645645645645645645645645645645645645645645645645645645645645645633210210210210210210210210210210210210210210210210210210210210210210210210210210210210210210210210321021021021021021021021021021021021021021021021021021021021021

### Does it work?

Think about how to verify that your implementations of Viterbi (i.e. `compute_w`, `opt_path_prob`, `backtrack`, and there log-transformed variants `compute_w_log`, `opt_path_prob_log`, `backtrack_log`) are correct.

One thing that should hold is that the probability of a most likely path as computed by `opt_path_prob` (or `opt_path_prob_log`) for a given sequence of observables (e.g. `x_short` or `x_long`) should be equal to the joint probability of a corersponding most probable path as found by `backtrack` (or `backtrack_log`) and the given sequence of observables. Why?

Make an experiment that validates that this is the case for your implementations of Viterbi and `x_short` and `x_long`.

In [171]:
# To access joint_prob and joint_prob_log, you must copy your implementations from last week here ...

def joint_prob(model, x, z):
    x = translate_observations_to_indices(x)
    z = translate_path_to_indices(z)
    acc_prob = None
    
    if len(x) > 0:
        acc_prob = model.init_probs[z[0]] * model.emission_probs[z[0]][x[0]]
        
        for i in range(1, len(x)):
            acc_prob *= model.trans_probs[z[i-1]][z[i]] * model.emission_probs[z[i]][x[i]]
    
    return acc_prob

def joint_prob_log(model, x, z):
    x = translate_observations_to_indices(x)
    z = translate_path_to_indices(z)
    acc_prob = None
    
    if len(x) > 0:
        ip = model.init_probs[z[0]]
        print(model.emission_probs)
        print(model.emission_probs[z[0]][x[0]])
        ep = model.emission_probs[z[0]][x[0]]
        acc_prob = log(ip) + log(ep)
        
        for i in range(1, len(x)):
            tp = model.trans_probs[z[i-1]][z[i]]
            ep = model.emission_probs[z[i]][x[i]]
            acc_prob += log(tp) + log(ep)
    
    return acc_prob


# Check that opt_path_prob is equal to joint_prob(hmm_7_state, x_short, z_viterbi)

w = compute_w(hmm_7_state, x_short)
z_viterbi = backtrack(hmm_7_state, x_short, w)

print(opt_path_prob(w))
print(joint_prob(hmm_7_state, x_short, z_viterbi))

# Check that opt_path_prob_log is equal to joint_prob_log(hmm_7_state, x_short, z_viterbi_log)
w = compute_w_log(hmm_7_state, x_short)
z_viterbi_log = backtrack_log(hmm_7_state, x_short, w)

print(opt_path_prob_log(w))
# print(joint_prob_log(hmm_7_state, x_short, z_viterbi_log))

# # Do the above checks for x_long ...

# w = compute_w(hmm_7_state, x_long)
# z_viterbi = backtrack(hmm_7_state, x_long, w)

# print(opt_path_prob(w))
# print(joint_prob(hmm_7_state, x_long, z_viterbi))

# w = compute_w_log(hmm_7_state, x_long)
# z_viterbi_log = backtrack_log(hmm_7_state, x_long, w)

# print(opt_path_prob_log(w))
# print(joint_prob_log(hmm_7_state, x_long, z_viterbi_log))

1.9114255184318858e-31
1.9114255184318882e-31


TypeError: list indices must be integers or slices, not str

Do your implementations pass the above checks?

### Does log-transformation matter?

Make an experiment that investigates how long the input string can be before `backtrack` and `backtrack_log` start to disagree on a most likely path and its probability.

In [None]:
# Your code here ...

**Your answer here:**

For the 7-state model, `backtrack` and `backtrack_log` start to disagree on a most likely path and its probability for **i = ?** .