# Class : Hidden Markov Models - Forward Backward

---

## Before Class
In class today we will be implementing the Forward, Backward, and Forward-Backward algorithms. Forward and Backward are very related to viterbi with minor differences in the calculations. 

Prior to class, please do the following:
1. Review these three algorithms in detail

---
## Learning Objectives

1. Forward algorithm
* Backward algorithm
* Forward-Backward algorithm

---
## Background

In the last class we described Markov chains. Here we expand this idea to the concept of a hidden state variable along with observed emissions from the model. We will be using the example of CpG islands from the lecture slides. I have provided the class structure of a simple HMM below. All parameters to this model must be provided as inputs, so essentially this is a class containing the parameters described below:

We define a categorical Hidden Markov Model as $M = (\Sigma, Q, \Theta)$ with the following parameters:

* $\Sigma$ : Finite alphabet of symbols (eg. A, C, G, T)

* $Q$ : Finite discrete hidden states

* $\Theta$: set of probabilities containing: $A$ as transition probabilites $a_{kl}$ for all $k,l \in Q$ and $E$ as emission probabilities $e_k(\sigma)$ for all $k \in Q$ and $\sigma \in \Sigma$ and $B$ as starting probabilities $b_k$ for all $k \in Q$.

We also define a number of $T$ emissions as $y_t = 1 \dots T$ that are drawn from $\Sigma$ and hidden states as $\pi_t = 1 \dots T$ that are drawn from $Q$.

---
## Imports

In [3]:
import numpy as np
import json

---
## Forward algorithm

The Forward algorithm can be used to estimate the probability of the sequence given our HMM. In general, this is the same as Viterbi except that we sum probablilities instead of taking the max, and we do not need traceback. This is described as bellow:

To estimate $P(x)$, the probability of sequence $x$ given our HMM, calculate forward algorithm values as $f()$ using:
> Initialization ($i = 0$): $f_{k}(0) = e_{k}(\sigma_{0})b_{k}$.<br>
> Recursion ($i = 1 \dots T$): $f_{l}(i) = e_{l}(\sigma_{i})\sum_{k}(f_{k}(i-1)a_{kl})$<br>
> Termination: $P(x) = \sum_{k}f_{k}(T)$

## Backward algorithm
The backward algorithm is essentially the reverse of the forward algorithm. To estimate $P(x)$, the probability of sequence $x$ given our HMM, calculate the backward algorithm values as $r()$ using:
>Initialization ($i = T$): $r_{k}(T) = 1$.<br>
>Recursion ($i = T-1 \dots 1$): $r_{k}(i) = \sum_{l}r_{l}(i+1)a_{kl}e_{l}(\sigma_{i+1})$<br>
>Termination: $P(x) = \sum_{l}r_{k}(1)e_l(\sigma_{1})b_{l}$

## Forward-Backward algorithm
The Forward-Backward algorithm is an extension of both the Forward and Backward algorithms and can be used to estimate the marginal posterior probability of our sequence $x$ being in a state at a specific time. We can calculate this value at every position $i$ and state $k$ as:
>$P(\pi_{i} = k | x) = f_{k}(i)b_{k}(i) / P(x)$


In [188]:
class HMM(object):
    """Main class for HMM objects
    
    Class for holding HMM parameters and to allow for implementation of
    functions associated with HMMs
    
    Private Attributes:
        _alphabet (set): The alphabet of emissions
        _hidden_states (set): Hidden states in the model
        _transitions (dict(dict)): A dictionary of transition probabilities
        _emissions (dict(dict)): A dictionary of emission probabilities
        _initial (dict): A dictionary of initial state probabilities

    """
    
    __all__ = ['viterbi', 'forward', 'backward', 'forward_backward']

    def __init__(self, alphabet, hidden_states, A=None, E=None, B=None):
        self._alphabet = set(alphabet)
        self._hidden_states = set(hidden_states)
        self._transitions = A
        self._emissions = E
        self._initial = B
        if(self._transitions == None):
            self._initialize_random(self._alphabet, self._hidden_states)
            
    def __str__(self):
        out_text = [f'Alphabet: {self._alphabet}',
                    f'Hidden States: {self._hidden_states}',
                    f'Initial Probabilities: {json.dumps(self._initial, sort_keys = True, indent=4)}',
                    f'Transition Probabilities: {json.dumps(self._transitions, sort_keys = True, indent=4)}',
                    f'Emission Probabilities: {json.dumps(self._emissions, sort_keys = True, indent=4)}']
        return '\n'.join(out_text)
    
    @classmethod
    def __dir__(cls):
        return cls.__all__
        
    def _emit(self, cur_state, symbol):
        return self._emissions[cur_state][symbol]
    
    def _transition(self, cur_state, next_state):
        return self._transitions[cur_state][next_state]
    
    def _init(self, cur_state):
        return self._initial[cur_state]

    def _states(self):
        for k in self._hidden_states:
            yield k
    
    def _get_alphabet(self):
        for sigma in self._alphabet:
            yield sigma
            
    def _initialize_random(self, alphabet, states):
        self._alphabet = set(alphabet)
        self._hidden_states = set(hidden_states)

        #Initialize empty matrices A and E with pseudocounts
        A = {}
        E = {}
        I = {}
        I_rand = np.random.dirichlet(np.ones(len(self._hidden_states)))
        for i, state in enumerate(self._states()):
            E[state] = {}
            A[state] = {}
            I[state] = I_rand[i]
            E_rand = np.random.dirichlet(np.ones(len(self._alphabet)))
            A_rand = np.random.dirichlet(np.ones(len(self._hidden_states)))
            for j, sigma in enumerate(self._get_alphabet()):
                E[state][sigma] = E_rand[j]
            for j, next_state in enumerate(self._states()):
                A[state][next_state] = A_rand[j]
                
        self._transitions = A
        self._emissions = E
        self._initial = I
        return
        
    def viterbi(self, sequence):
        """ The viterbi algorithm for decoding a string using a HMM

        Args:
            sequence (list): a list of valid emissions from the HMM

        Returns:
            result (list): optimal path through HMM given the model parameters
                           using the Viterbi algorithm
        
        Pseudocode for Viterbi:
            Initialization (𝑖=0): 𝑣𝑘(𝑖)=𝑒𝑘(𝜎)𝑏𝑘.
            Recursion (𝑖=1…𝑇): 𝑣𝑙(𝑖)=𝑒𝑙(𝑥𝑖) max𝑘(𝑣𝑘(𝑖−1)𝑎𝑘𝑙); 
                                ptr𝑖(𝑙)= argmax𝑘(𝑣𝑘(𝑖−1)𝑎𝑘𝑙).
            Termination: 𝑃(𝑥,𝜋∗)= max𝑘(𝑣𝑘(𝑙)𝑎𝑘0); 
                             𝜋∗𝑙= argmax𝑘(𝑣𝑘(𝑙)𝑎𝑘0).
            Traceback: (𝑖=𝑇…1): 𝜋∗𝑖−1= ptr𝑖(𝜋∗𝑖).
        """

        # Initialization (𝑖=0): 𝑣𝑘(𝑖)=𝑒𝑘(𝜎)𝑏𝑘.
        # Initialize trellis and traceback matrices
        # trellis will hold the vi data as defined by Durbin et al.
        # and trackback will hold back pointers
        trellis = {} # This only needs to keep the previous column probabilities
        traceback = [] # This will need to hold all of the traceback data so will be an array of dicts()
        for state in self._states():
            trellis[state] = np.log10(self._init(state)) + np.log10(self._emit(state, sequence[0])) # b * e(0) for all k
            
        # Next we do the recursion step:
        # Recursion (𝑖=1…𝑇): 𝑣𝑙(𝑖)=𝑒𝑙(𝑥𝑖) max𝑘(𝑣𝑘(𝑖−1)𝑎𝑘𝑙); 
        #                 ptr𝑖(𝑙)= argmax𝑘(𝑣𝑘(𝑖−1)𝑎𝑘𝑙).
        for t in range(1, len(sequence)):  # For each position in the sequence
            trellis_next = {}
            traceback_next = {}

            for next_state in self._states():    # Calculate maxk and argmaxk
                k={}
                for cur_state in self._states():
                    k[cur_state] = trellis[cur_state] + np.log10(self._transition(cur_state, next_state)) # k(t-1) * a
                argmaxk = max(k, key=k.get)
                trellis_next[next_state] =  np.log10(self._emit(next_state, sequence[t])) + k[argmaxk] # k * e(t)
                traceback_next[next_state] = argmaxk
                
            #Overwrite trellis 
            trellis = trellis_next
            #Keep trackback pointer matrix
            traceback.append(traceback_next)
            
        # Termination: 𝑃(𝑥,𝜋∗)= max𝑘(𝑣𝑘(𝑙)𝑎𝑘0); 
        #                  𝜋∗𝑙= argmax𝑘(𝑣𝑘(𝑙)𝑎𝑘0).
        max_final_state = max(trellis, key=trellis.get)
        max_final_prob = trellis[max_final_state]
                
        # Traceback: (𝑖=𝑇…1): 𝜋∗𝑖−1= ptr𝑖(𝜋∗𝑖).
        result = [max_final_state]
        for t in reversed(range(len(sequence)-1)):
            result.append(traceback[t][max_final_state])
            max_final_state = traceback[t][max_final_state]

        return result[::-1]

    def forward(self, sequence):
        """ The forward algorithm for calculating probability of sequence given HMM

        Args:
            sequence (list): a list of valid emissions from the HMM

        Returns:
            result (float, list of dicts): P(x) and the f matrix as a list
        
        Pseudocode for Forward:
            Initialization (𝑖=0): 𝑓𝑘(0)=𝑒𝑘(𝜎0)𝑏𝑘.
            Recursion (𝑖=1…𝑇): 𝑓𝑙(𝑖)=𝑒𝑙(𝜎𝑖)∑𝑘(𝑓𝑘(𝑖−1)𝑎𝑘𝑙)
            Termination: 𝑃(𝑥)=∑𝑘𝑓𝑘(𝑇)
        """
        
        #init
        states = list( self._states() ) #define deterministic ordering of states (get around `set`)
        trace = [ {s:self._init(s)*self._emit(s,sequence[0]) for s in states} ] #calc init probs
        
        #recurse
        for b in sequence[1:]: #loop over remainder of sequence
            trace.append( dict() ) #add a new dictionary
            for s in states:
                pos = ( self._emit(s,b)*self._transition(ls,s)*trace[-2][ls] for ls in states ) #calc lambda probs
                trace[-1][s] = sum( pos ) #sum lambda probs
                
        #term
        total = sum( trace[-1].values() )
                
        return total, trace

    def backward(self, sequence):
        """ The backward algorithm for calculating probability of sequence given HMM

        Args:
            sequence (list): a list of valid emissions from the HMM

        Returns:
            result (float, list of dicts): P(x) and the r matrix as a list
        
        Pseudocode for Backward:
            Initialization (𝑖=T): 𝑟𝑘(𝑇)=1.
            Recursion (𝑖=𝑇−1…1): 𝑟𝑘(𝑖)=∑𝑙𝑟𝑙(𝑖+1)𝑎𝑘𝑙𝑒𝑙(𝜎𝑖+1)
            Termination: 𝑃(𝑥)=∑𝑙𝑟𝑘(1)𝑒𝑙(𝜎1)𝑏𝑙
        """
        
        #init
        states = list( self._states() ) #define deterministic ordering of states (get around `set`)
        trace = [ {s:1.0 for s in states} ] #calc init probs
        
        #recurse
        for b in sequence[::-1]: #loop over remainder of sequence
            trace.append( dict() )
            for s in states:
                pos = ( self._emit(ls,b)*self._transition(s,ls)*trace[-2][ls] for ls in states ) #calc lambda probs
                trace[-1][s] = sum( pos ) #sum lambda probs
        
        #term
        total = sum( self._init(s)*trace[-1][s] for s in states ) 
                
        return total, trace[::-1][1:]
    
    def forward_backward(self, sequence):
        """ The forward-backward algorithm for calculating marginal posteriors given HMM

        Args:
            sequence (list): a list of valid emissions from the HMM

        Returns:
            posterior (list of dicts): all posteriors as a list
        
        Pseudocode for Forward-Backward:
            Calculate f[] as forward algorithm
            Calculate r[] as backward algorithm
            for all i in sequence
                for all states
                    posterior[i][state] = f[i][state] * r[i][state] / Px
        """
                
        p, fi = self.forward( sequence )
        p, ri = self.backward( sequence )
                
        return [ {s:f[s]*r[s]/p for s in self._states()} for f,r in zip(fi,ri) ]                        



In [189]:
# This section of code will initialize your HMM with parameters as defined in the lecture slides
# for the identification of CpG Islands.
# All of this should be able to run whether or not you implement the functions!

hidden_states = ('I', 'G') # CpG Island or Genome
alphabet = ('A', 'C', 'G', 'T') # DNA Alphabet

# These are the initial probabilities as defined in the lecture slides
initial_probabilities = {
    'I' : 0.1,
    'G' : 0.9
}

# These are the probabilities of transitioning from outer state to inner state
#  as defined in the lecture slides
transition_probabilities = {
    'I': { 'I' : 0.6, 'G' : 0.4 },
    'G': { 'I' : 0.1, 'G' : 0.9 }
}

# These are the probabilites of each state emmitting each alphabet character
emission_probabilities = {
    'I': { 'A' : 0.1, 'C' : 0.4, 'G' : 0.4, 'T' : 0.1 },
    'G': { 'A' : 0.4, 'C' : 0.1, 'G' : 0.1, 'T' : 0.4 }
}

# Build the model
model = HMM(alphabet, hidden_states, transition_probabilities, emission_probabilities, initial_probabilities)
print(model)

Alphabet: {'T', 'G', 'C', 'A'}
Hidden States: {'G', 'I'}
Initial Probabilities: {
    "G": 0.9,
    "I": 0.1
}
Transition Probabilities: {
    "G": {
        "G": 0.9,
        "I": 0.1
    },
    "I": {
        "G": 0.4,
        "I": 0.6
    }
}
Emission Probabilities: {
    "G": {
        "A": 0.4,
        "C": 0.1,
        "G": 0.1,
        "T": 0.4
    },
    "I": {
        "A": 0.1,
        "C": 0.4,
        "G": 0.4,
        "T": 0.1
    }
}


In [190]:
# Exact example from slides
sequence = "ACGCGATC"

print ("Forward:")
f_Px, f_matrix = model.forward(list(sequence))
print (f_Px, f_matrix)

print ("\nBackward:")
r_Px, r_matrix = model.backward(list(sequence))
print (r_Px, r_matrix)

print ("\nPosterior:")
posterior = model.forward_backward(list(sequence))
print (posterior)

Forward:
5.638948422400004e-06 [{'G': 0.36000000000000004, 'I': 0.010000000000000002}, {'G': 0.0328, 'I': 0.016800000000000006}, {'G': 0.003624000000000001, 'I': 0.0053440000000000015}, {'G': 0.0005399200000000003, 'I': 0.0014275200000000003}, {'G': 0.00010569360000000005, 'I': 0.0003642016000000001}, {'G': 9.632195200000003e-05, 'I': 2.2909032000000005e-05}, {'G': 3.834134784000002e-05, 'I': 2.3377614400000006e-06}, {'G': 3.5442317632000023e-06, 'I': 2.0947166592000014e-06}]

Backward:
5.626100089600004e-06 [{'G': 1.416161272000001e-05, 'I': 5.407678432000003e-05}, {'G': 6.178578400000004e-05, 'I': 0.0002150223040000001}, {'G': 0.0003113848000000002, 'I': 0.0008440288000000004}, {'G': 0.0020485600000000014, 'I': 0.0031753600000000016}, {'G': 0.01823200000000001, 'I': 0.010192000000000005}, {'G': 0.04960000000000001, 'I': 0.03760000000000001}, {'G': 0.13, 'I': 0.28}, {'G': 1.0, 'I': 1.0}]

Posterior:
[{'G': 0.9061659938514295, 'I': 0.09611770757502593}, {'G': 0.36020932491872604, 'I': 