# Viterbri Algorithm

## Problem description

Consider a village where all villagers are either healthy or have a fever and only the village doctor can determine whether each has a fever. The doctor diagnoses fever by asking patients how they feel. The villagers may only answer that they feel normal, dizzy, or cold.

The doctor believes that the health condition of his patients operate as a discrete Markov chain. There are two states, "Healthy" and "Fever", but the doctor cannot observe them directly; they are hidden from him. On each day, there is a certain chance that the patient will tell the doctor he/she is "normal", "cold", or "dizzy", depending on her health condition.

The observations (normal, cold, dizzy) along with a hidden state (healthy, fever) form a hidden Markov model (HMM), and can be represented as follows 

<img src="An_example_of_HMM.png" Alt="HMM graphical model"></img>

In [18]:
obs = ('normal', 'cold', 'dizzy','dizzy','dizzy')
states = ('Healthy', 'Fever')
start_p = {'Healthy': 0.6, 'Fever': 0.4}
trans_p = {
   'Healthy' : {'Healthy': 0.7, 'Fever': 0.3},
   'Fever' : {'Healthy': 0.4, 'Fever': 0.6}
   }

emit_p = {
   'Healthy' : {'normal': 0.5, 'cold': 0.4, 'dizzy': 0.1},
   'Fever' : {'normal': 0.1, 'cold': 0.3, 'dizzy': 0.6}
}

The patient visits three days in a row and the doctor discovers that:

* on the first day she feels normal,
* on the second day she feels cold,
* on the third day she feels dizzy.

The doctor has a question: 
**what is the most likely sequence of health conditions of the patient that would explain these observations?**

This is answered by the Viterbi algorithm.

### Algorithm Pseudo-Code

<img src="vitebri_algorithm_code.PNG">

In [15]:
def viterbi(obs, states, start_p, trans_p, emit_p):
    V = [{}]
    for st in states:
        # initialize at the starting probability
        V[0][st] = {"prob": start_p[st] * emit_p[st][obs[0]], "prev": None}
    
    # iterate through each observation
    for t in range(1, len(obs)):
        # V stores the state history
        V.append({})
        # for each state
        for st in states:
            #find the maximum transition probability
            # maximum transition probability of the current state=
            # the prior probability of being in the previous state * the transition probability of the prior state to the current state
            max_tr_prob = max(V[t-1][prev_st]["prob"]*trans_p[prev_st][st] for prev_st in states)
            
            for prev_st in states:
                if V[t-1][prev_st]["prob"] * trans_p[prev_st][st] == max_tr_prob:
                    # find the most likely previous state
                    max_prob = max_tr_prob * emit_p[st][obs[t]]
                    V[t][st] = {"prob": max_prob, "prev": prev_st}
                    break
    for line in dptable(V):
         print line
    opt = []
    # The highest probability
    max_prob = max(value["prob"] for value in V[-1].values())
    previous = None
    # Get most probable state and its backtrack
    for st, data in V[-1].items():
        if data["prob"] == max_prob:
            opt.append(st)
            previous = st
            break
    # Follow the backtrack till the first observation
    for t in range(len(V) - 2, -1, -1):
        opt.insert(0, V[t + 1][previous]["prev"])
        previous = V[t + 1][previous]["prev"]
    print 'The steps of states are ' + ' '.join(opt) + ' with highest probability of %s' % max_prob

def dptable(V):
     # Print a table of steps from dictionary
    yield " ".join(("%10d" % i) for i in range(len(V)))
    for state in V[0]:
        yield "%.7s: " % state + " ".join("%.7s" % ("%.7f" % v[state]["prob"]) for v in V)

In [19]:
viterbi(obs,states,start_p, trans_p,emit_p)

         0          1          2          3          4
Healthy: 0.30000 0.08400 0.00588 0.00060 0.00021
Fever: 0.04000 0.02700 0.01512 0.00544 0.00195
The steps of states are Healthy Healthy Fever Fever Fever with highest probability of 0.001959552


### Explanation
Suppose we are given a hidden Markov model (HMM) with state space $S$, initial probabilities $\pi _{i}$ of being in state $i$ and transition probabilities $a_{i,j}$ of transitioning from state $i$ to state $j$. Say we observe outputs $y_{1},\dots ,y_{T}$. The most likely state sequence $x_{1},\dots ,x_{T}$ that produces the observations is given by the recurrence relations:
$${\begin{array}{rcl}V_{1,k}&=&\mathrm {P} {\big (}y_{1}\ |\ k{\big )}\cdot \pi _{k}\\V_{t,k}&=&\max _{x\in S}\left(\mathrm {P} {\big (}y_{t}\ |\ k{\big )}\cdot a_{x,k}\cdot V_{t-1,x}\right)\end{array}}$$

Here $V_{t,k}$ is the probability of the most probable state sequence $\mathrm {P} {\big (}x_{1},\dots ,x_{T},y_{1},\dots ,y_{T}{\big )}$ responsible for the first $t$ observations that have $k$ as its final state. The Viterbi path can be retrieved by saving back pointers that remember which state $x$ was used in the second equation. Let $\mathrm {Ptr} (k,t)$ be the function that returns the value of $x$ used to compute $V_{t,k}$ if $t>1$, or $k$ if $t=1$. Then:

\begin{array}{rcl}x_{T}&=&\arg \max _{x\in S}(V_{T,x})\\x_{t-1}&=&\mathrm {Ptr} (x_{t},t)\end{array}

Here we're using the standard definition of arg max.
The complexity of this algorithm is $O(T\times \left|{S}\right|^{2})$.