# ASR Lab 2 - Computing HMM probabilities

To begin with, we'll use your function to generate a Word WFST for the word "*peppers*", using `generate_word_wfst('peppers')`.  By viewing this as an HMM, you'll be able to sample possible paths through the model and also generate the likelihood of an observation sequence $(x_1, \dotsc, x_T)$.

We'll build on this to implement the basics of the Viterbi algorithm, which can later be used for word recognition.

First, copy your code from Lab 1 into the space below.  You can use the official solutions if you like.
If you want to extract the code-only parts of your previous notebook, on the terminal command line you can type:

```bash
jupyter nbconvert --to python <notebook-name.ipynb>
```

where <notebook-name.ipynb> indicates the path of the notebook file.

Now that the WFST has been constructed, we can traverse over the states and arcs.  This example (taken from [OpenFst](http://www.openfst.org/twiki/bin/view/FST/PythonExtension)) shows how you can do this:


In [None]:
for state in f.states():
    
    # iterate over all arcs leaving this state    
    for arc in f.arcs(state):
         print(state, arc.ilabel, arc.olabel, arc.weight, arc.nextstate)

Alternatively, we could begin at the start state, and traverse in a depth-first manner.  **Warning**: the code below specifically handles self-loops, but won't work if your WFST has larger cycles in it!

In [None]:
def traverse_arcs(state):
    """Traverse every arc leaving a particular state
    """
    for arc in f.arcs(state):
        print(state, arc.ilabel, arc.olabel, arc.weight, arc.nextstate)
        
        if arc.nextstate != state:   # don't follow the self-loops or we'll get stuck forever!
            traverse_arcs(arc.nextstate)

s = f.start()
traverse_arcs(s)

For a more readable table, you could find the indexes of the input and output labels in your symbol tables and print the string instead.

## Exercises

1. Write code to randomly generate (sample) a path through your word HMM for "*peppers*".  You should output the sequence of input and output labels along the path.  To sample from a list of arcs, you can use code like

```python
import random

arc_list = list(f.arcs(state))
sampled_arc = random.sample(arc_list,1)[0]
```

  Notice that if you repeat your random sampling by running the code multiple times, you'll get paths of different lengths due to the self-loops


In [None]:
import random

def sample_random_path(f):
    '''Given an FST, randomly sample a path through it.
    
        Args:
            f (fst.Fst()): an FST
        
        Returns:
            input_label_seq (list(str)): the list of input labels from the arcs that were sampled
            output_label_seq (list(str)): the list of output labels from the arcs that were sampled
        '''
    curr_state = f.start() # start from beginning
    weight_type = f.weight_type() # type of weights used in the fst
    input_label_seq = []
    output_label_seq = []

    while f.final(curr_state) == fst.Weight(weight_type, 'inf'): # the .final method returns the probability of a state being final
                                                             # it's infinite when the state is NOT final
            
            # your code here
            
    return input_label_seq, output_label_seq

input_label_seq, output_label_seq = sample_random_path(f)

print('\n'.join(['{} {}'.format(input_label_seq[i], output_label_seq[i]) for i in range(len(input_label_seq))]))

2. Now it's time to add probabilities to your WFST.  As mentioned at the end of Lab 1, probabilities in WFSTs are traditionally expressed in negative log format, that is, the weight $w$ on an arc transitioning between states $i$ and $j$ is given by $w=-\log a_{ij}$, where $a_{ij}$ is the HMM transition probability.  Remember that you can add weights using the third argument to `fst.Arc()`.

  You should now modify your code above to add weights to your word and phone recognition WFSTs from Lab 1, corresponding to transition probabilities.  Assume that the probability of a self-loop is $0.1$, and that when transitioning *between* separate multiple sets of phones (or words), the probabilities are uniform over all transitions.

  Remember to set your fst to use log probabilities and use log weights:

```python
import math
f = fst.Fst('log')

s1 = f.add_state()
s2 = f.add_state()
weight = fst.Weight('log', -math.log(0.1))
f.add_arc(s1, fst.Arc(0, 0, weight, s2))
```

3. Modify your answer to exercise 1 to sample a path through the word HMM *and* also compute the negative log probability of the path.  This gives you $-\log p(Q)$ in the lecture notation.  (Recall that $\log ab = \log a + \log b$)

  **Note**: Internally OpenFst stores weights in a special object that you will need to convert to a float, using the `float()` function, before adding your negative log probabilities.


In [None]:
def sample_random_path_prob(f):
    '''Given an FST, randomly sample a path through it and compute the negative log probability.
    
        Args:
            f (fst.Fst()): an FST
        
        Returns:
            input_label_seq (list(str)): the list of input labels from the arcs that were sampled
            output_label_seq (list(str)): the list of output labels from the arcs that were sampled
            neg_log_prob (float): negative log probability of the sampled path
        '''
    curr_state = f.start() # start from beginning
    weight_type = f.weight_type() # type of weights used in the fst
    input_label_seq = []
    output_label_seq = []

    while f.final(curr_state) == fst.Weight(weight_type, 'inf'): # the .final method returns the probability of a state being final
                                                             # it's infinite when the state is NOT final
            
            # your code here
            
    return input_label_seq, output_label_seq, neg_log_prob

input_label_seq, output_label_seq, neg_log_prob = sample_random_path_prob(f)

print('\n'.join(['{} {}'.format(input_label_seq[i], output_label_seq[i]) for i in range(len(input_label_seq))]))
print(neg_log_prob)

4. You are now given a set of observations, ($x_1, \dotsc, x_t, \dotsc$).  Can you use your WFST for the word "*peppers*" to compute $p(X,Q)$ for a randomly sampled path $Q$ through the HMM?  For now, we won't use real samples $x_t$, and will instead assume that you already have a function `observation_probability(state, t)` that computes $b_j(t) = p(x_t|q_t=j)$, provided here:

In [None]:
def observation_probability(hmm_label, t):
    """ Computes b_j(t) where j is the current state
    
    This is just a dummy version!  In later labs we'll generate 
    probabilities for real speech frames.
    
    You don't need to look at this function in detail.
    
    Args: hmm_label (str): the HMM state label, j.  We'll use string form: "p_1", "p_2", "eh_1" etc  
          t (int) : current time step, starting at 1
          
    Returns: 
          p (float): the observation probability p(x_t | q_t = hmm_label)
    """
    
    p = {} # dictionary of probabilities
    
    assert(t>0)
    
    # this is just a simulation!
    if t < 4:
        p = {'p_1': 1.0, 'p_2':1.0, 'p_3': 1.0, 'eh_1':0.2}
    elif t < 9:
        p = {'p_3': 0.5, 'eh_1':1.0, 'eh_2': 1.0, 'eh_3': 1.0}
    elif t < 13:
        p = {'eh_3': 1.0, 'p_1': 1.0, 'p_2': 1.0, 'p_3':1.0, 'er_1':0.5}
    elif t < 18:
        p = {'p_3': 1.0, 'er_1': 1.0, 'er_2': 1.0, 'er_3':0.7}
    elif t < 25:
        p = {'er_3': 1.0, 'z_1': 1.0, 'z_2': 1.0, 'z_3':1.0}
    else:
        p = {'z_2': 0.5, 'z_3': 1.0}
        
    for label in ['p_1', 'p_2', 'p_3', 'eh_1', 'eh_2', 'eh_3', 'er_1', 'er_2', 'er_3', 'z_1', 'z_2', 'z_3']:        
        if label not in p:
            p[label] = 0.01.  # give all other states a small probability to avoid zero probability
            
    # normalise the probabilities:
    scale = sum(p.values())
    for k in p:
        p[k] = p[k]/scale
        
    return p[hmm_label]
    

Enter your code below.  You might want to convert the observation probabilities into negative log probabilities.


In [None]:
def sample_random_path_obs_prob(f):
    '''Given an FST and observation probabilities, randomly sample a path
        through it and compute the negative log probability.
    
        Args:
            f (fst.Fst()): an FST
        
        Returns:
            input_label_seq (list(str)): the list of input labels from the arcs that were sampled
            output_label_seq (list(str)): the list of output labels from the arcs that were sampled
            neg_log_prob (float): negative log probability of the sampled path
        '''
    t = 1
    curr_state = f.start() # start from beginning
    weight_type = f.weight_type() # type of weights used in the fst
    input_label_seq = []
    output_label_seq = []
    neg_log_prob = 0.0 # log(1) = 0

    while f.final(curr_state) == fst.Weight(weight_type, 'inf'):
        
    # your code here
    
    
    return input_label_seq, output_label_seq, neg_log_prob

input_label_seq, output_label_seq, neg_log_prob = sample_random_path_obs_prob(f)
print('\n'.join(['{} {}'.format(input_label_seq[i], output_label_seq[i]) for i in range(len(input_label_seq))]))
print(neg_log_prob)

You might have noticed that the dummy observation probability function above effectively allows the observation sequence $x_t$ to be arbitrarily long.  This is simply to allow it to match the length of your sampled path $Q$.  In real use, the observation sequence will have a fixed length $T$, and any matching path through the HMM will have to have the same length.  We'll explore this more when writing the Viterbi decoder in the next lab.

## If you have more time

You might like to start thinking about how to implement the Viterbi algorithm over HMMs in WFST form.  Try working with the "*peppers*" example above.  You'll need to write functions to compute and store the probabilities $V_j(t)$, giving the probability up to time step $t$ of the observation sequence $(x_1, \dotsc, x_t)$ along the most likely path $(q_1, \dotsc, q_t)$.