<h1>Introduction to HMM</h1>

This is a short guide through to hidden Markov Models. While HMM is widely used, the concepts for this workbook have been adopted by using the following resources:
<ul>
    <li>https://web.ece.ucsb.edu/Faculty/Rabiner/ece259/Reprints/tutorial%20on%20hmm%20and%20applications.pdf</li>
    <li>Example: https://medium.com/@kangeugine/hidden-markov-model-7681c22f5b9</li>
    <li>API: https://hmmlearn.readthedocs.io/en/latest/api.html#hmmlearn.hmm.GaussianHMM</li>
    <li>Code Adopted from: https://github.com/jiaeyan/Hidden-Markov-Model</li>
<ul>

In HMMs, we observe some outcome variables (<b>$O_1,O_2...,O_T$</b>) which are driven from latent (hidden) variables. The probability of getting that outcome from one of the hidden state is called emission probability (and is denoted by <b>B</b>). In each time step, the latent state may change. However, this change depends only on the previous hidden state (hence it is called Markov model). These states transition with probability given by <b>A</b>. The model starts from an initial state. The initial state is denoted by <b>$\pi$</b>. 

The input parameters include - how many hidden states we would like to have in our model (denoted by <b>N</b>). Another parameter is the number of possible outcomes (in a discrete setting) and it is denoted by <b>M</b>. Thus a HMM model is denoted by the tuple $\lambda = (A,B,\pi)$. The notations are explained in the Figure below.

<img src="HMM.png" alt="HMM" width="628" height="628">

<h2>Three HMM problems</h2>

HMM are used to model three types of problems:
<ol>
    <li>Evaluation problem: If we know the model <b>(A,B,$\pi$)</b>, what is the probability of observing a given sequence?</li>
    <li>Decoding problem: If we know the model <b>(A,B,$\pi$)</b>, what is the best sequence of the hidden states that explain the sequence of the observations?</li>
    <li>Learning problem: How to estimate the value of <b>(A,B,$\pi$)</b> if we observe a given sequence of observations (or what model led to the generation of the given sequence). This is supervised HMM model.</li>
</ol>

In [48]:
# Consider a problem from https://towardsdatascience.com/introduction-to-hidden-markov-models-cd2c93e6b781
# outcome: hot (0) or cold (1)
# hidden states : snow, rain, sunshine
pi = np.array([0,0.2,0.8])
A  = np.array([[0.3,0.3,0.4],[0.1,0.45,0.45],[0.2,0.3,0.5]])
B  = np.array([[1,0],[0.8,0.2],[0.3,0.7]])
M  = 2 #(hot or cold)
N  = 3 # snow, rain, sunshine
T  = 20

# simulate the walk based on transition probability
# since this is generated based on actual probabilities, this state should have high probability of being observed
s = np.random.choice(3,1,p=pi)[0]
o = np.random.choice(2,1,p=B[s])[0]
S = [s]
O = [o]

for t in range(T-1):
    s = np.random.choice(3,1,p=A[s])[0]
    o = np.random.choice(2,1,p=B[s])[0]
    S.append(s)
    O.append(o)
print('observation  :',O)
print('hidden states:',S)

observation  : [1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0]
hidden states: [2, 2, 1, 2, 2, 2, 1, 1, 1, 0, 2, 0, 2, 0, 2, 0, 0, 2, 1, 1]


<h2>Evaluation problem: Forward algorithm and Backward algorithm</h2>

In this problem, we evaluate the probability of the observation sequence being observed in real life (if we already know the HMM). We first use forward algorithm and then use backward algorithm to show that we get same probability using both algorithms.


<img src="Evaluation.png" alt="forward-backward algorithm" width="828" height="628">

<h3>forward algorithm</h3>

In [60]:
# define alpha as the probability of partially observing the sequence upto t with state qt at time t
# initialization
alpha      = np.zeros((N,T))
alpha[:,0] = B[:,O[0]]*pi.T

# recursion            (s2 is the next state and s1 is the previous state)
for t in range(1,T):
    for s2 in range(N):
        for s1 in range(N):
            alpha[s2,t] += alpha[s1,t-1]*A[s1,s2]*B[s2,O[t]]
            
# final probability
prob_of_observing = np.sum(alpha[:,-1])
prob_of_observing

1.7890062156250946e-06

<h3>backward algorithm</h3>

In [61]:
# define beta as the probability of partially observing the sequence from t+1 with state qt at time t
# initialization
beta       = np.zeros((N,T))
beta[:,-1] = 1   # because sequence is satisfied from T+1 onwards

# recursion            (s2 is the next state and s1 is the previous state)
for t in reversed(range(T-1)):
    for s1 in range(N):
        for s2 in range(N):
            beta[s1,t] += beta[s2,t+1]*A[s1,s2]*B[s2,O[t+1]]
            
# final probability
prob_of_observing = np.sum(beta[:,0]*B[:,O[0]]*pi.T)
prob_of_observing

1.7890062156250956e-06

<h2>Decoding problem</h2>

In the decoding problem, we aim to find the best sequence for hidden states that led to the generation of the observation sequence as observed. The outline is shown below as we code for viterbi algorithm next. It is very similar to forward algorithm, just that in place of sum, we find the maximum.

<img src="Decoding.png" alt="viterbi algorithm" width="828" height="628">

<h3>Viterbi algorithm</h3>

In [83]:
# in viterbi algorithm, we still know the HMM model (A,B,pi), we just dont know the hidden state sequence
# the objective is to find the hidden state sequence to make inference. 

delta      = np.zeros((N,T))
delta[:,0] = B[:,O[0]]*pi.T
psi        = np.zeros((N,T))    # keeps a track of the best sequence
psi        = [0]*T

# recursion            (s2 is the next state and s1 is the previous state)
for t in range(1,T):
    for s2 in range(N):
        vals = [0]*N
        for s1 in range(N):
            vals[s1] = delta[s1,t-1]*A[s1,s2]*B[s2,O[t]]
        psi[t-1]    = np.argmax(vals)
        delta[s2,t] = vals[psi[t-1]]
            
# optimal hidden state sequence
print('original hidden state:    ', S)
print('hidden state from viterbi:',psi)

original hidden state:     [2, 2, 1, 2, 2, 2, 1, 1, 1, 0, 2, 0, 2, 0, 2, 0, 0, 2, 1, 1]
hidden state from viterbi: [2, 2, 1, 2, 1, 2, 1, 1, 1, 1, 2, 1, 2, 1, 2, 1, 1, 2, 1, 0]


<h2>Learning problem: EM algorithm</h2>
