### Introduction
In simpler Markov models (like a Markov chain), the state is directly visible to the observer, and therefore the state transition probabilities are the only parameters, while in the hidden Markov model, the state is not directly visible, but the output (in the form of data or "token" in the following), dependent on the state, is visible. Each state has a probability distribution over the possible output tokens. Therefore, the sequence of tokens generated by an HMM gives some information about the sequence of states; this is also known as pattern theory. 

The adjective hidden refers to the state sequence through which the model passes, not to the parameters of the model; the model is still referred to as a hidden Markov model even if these parameters are known exactly. 

You basically use your knowledge of Markov Models to make an educated guess about the model’s structure. 

### Description with an Example

Here, let's look at a simple example using two items that are very familiar in probability: dice and bags of colored balls.

The model components, which you’ll use to create the random model, are:  
A six-sided red die.  
A ten-sided black die.  
A red bag with ten balls. Nine balls are red, one is black.   
A black bag with twenty balls. One ball is red, nineteen are black.   

“Black” and “Red” are the two states in this model (in other words, you can be black, or you can be red).
Now create the model by following these steps:

![alt text](https://www.statisticshowto.datasciencecentral.com/wp-content/uploads/2015/09/hidden-markov-model.png)

##### Step 1: 
*EMISSION STEP:* Roll a die. Note the number that comes up. This is the emission. In the above graphic, I chose a red die to start (arbitrary — I could have chosen black) and rolled 2.   
##### Step 2:
*TRANSITION STEP*: Randomly choose a ball from the bag with the color that matches the die you rolled in step 1. I rolled a red die, so I’m going to choose a ball from the red bag. I pulled out a black ball, so I’m going to transition to the black die for the next emission. 

You can then repeat these steps to a certain number of emissions. For example, repeating this sequence of steps 10 times might give you the set {2,3,6,1,1,4,5,3,4,1}. The process of transitioning from one state to the next is called a Markov process. The Markov process itself cannot be observed, only the sequence of numbers on the coloured dies, thus this arrangement is called a "hidden Markov process".

Transitioning from red to black or black to red carries different probabilities as there are different numbers of black and red balls in the bags. The following diagram shows the probabilities for this particular model, which has two states (black and red):

![alt text](https://www.statisticshowto.datasciencecentral.com/wp-content/uploads/2015/09/TRANSITIONS-2.png)

### Structural Architecture

The diagram below shows the general architecture of an instantiated HMM. Each oval shape represents a random variable that can adopt any of a number of values. The random variable x(t) is the hidden state at time t (with the model from the above diagram, x(t) ∈ { x1, x2, x3 }). The random variable y(t) is the observation at time t (with y(t) ∈ { y1, y2, y3, y4 }). The arrows in the diagram (often called a trellis diagram) denote conditional dependencies. 

![alt text](https://upload.wikimedia.org/wikipedia/commons/8/83/Hmm_temporal_bayesian_net.svg)



From the diagram, it is clear that the conditional probability distribution of the hidden variable x(t) at time t, given the values of the hidden variable x at all times, depends only on the value of the hidden variable x(t − 1); the values at time t − 2 and before have no influence. This is called the Markov property. Similarly, the value of the observed variable y(t) only depends on the value of the hidden variable x(t) (both at time t). 

In the standard type of hidden Markov model considered here, the state space of the hidden variables is discrete, while the observations themselves can either be discrete (typically generated from a categorical distribution) or continuous (typically from a Gaussian distribution). The parameters of a hidden Markov model are of two types, transition probabilities and emission probabilities (also known as output probabilities). The transition probabilities control the way the hidden state at time t is chosen given the hidden state at time t-1.

The hidden state space is assumed to consist of one of N possible values, modelled as a categorical distribution. This means that for each of the N possible states that a hidden variable at time t can be in, there is a transition probability from this state to each of the N possible states of the hidden variable at time t + 1 , for a total of $ N^2 $ transition probabilities. Note that the set of transition probabilities for transitions from any given state must sum to 1. Thus, the N × N matrix of transition probabilities is a Markov matrix.

In addition, for each of the N possible states, there is a set of emission probabilities governing the distribution of the observed variable at a particular time given the state of the hidden variable at that time. The size of this set depends on the nature of the observed variable. 

### Inference Problems
Several inference problems are associated with hidden Markov models, as outlined below. 

#### Probability of an observed sequence
The task is to compute in a best way, given the parameters of the model, the probability of a particular output sequence.  This requires summation over all possible state sequences:

The probability of observing a sequence $ Y = y(0), y(1),..., y(L-1) $   
of length ''L'' is given by

$ P(Y) =\sum_{X}P(Y\mid X)P(X) $ where the sum runs over all possible hidden-node sequences $ X = x(0), x(1), ..., x(L-1) $

#### Filtering
The task is to compute, given the model's parameters and a sequence of observations, the distribution over hidden states of the last hidden variable at the end of the sequence, i.e. to compute ${\displaystyle P(x(t)\ |\ y(1),\dots ,y(t))}$. This task is normally used when the sequence of hidden variables is thought of as the underlying states that a process moves through at a sequence of points of time, with corresponding observations at each point in time. Then, it is natural to ask about the state of the process at the end. 

These problems can be handled efficiently using the forward algorithm. 

### Forward Algorithm
In Forward Algorithm (as the name suggested), we will use the computed probability on current time step to derive the probability of the next time step. Hence the it is computationally more efficient $(O(N^2.T))$. We need to find the answer of the following question to make the algorithm recursive:   
Given a a sequence of Visible state $(V^T)$ , what will be the probability that the Hidden Markov Model will be in a particular hidden state s at a particular time step t.  
If we write the above question mathematically it might be more easier to understand. 
$[
\alpha_j(t) = p(v(1)…v(t),s(t)= j)
] $.  
First, we will derive the equation using just probability.

##### Solution using Probabilities
###### When t = 1:
Rewrite the above equation when t = 1,

$
\begin{align}
\alpha_j(1) &= p(v_k(1),s(1)= j) \\
&= p(v_k(1)|s(1)=j)p(s(1)=j) \\
&= \pi_j p(v_k(1)|s(1)=j) \\
&= \pi_j b_{jk} \\
\text{where } \pi &= \text{ initial distribution, } \\
b_{jkv(1)} &= \text{ Emission Probability at } t = 1
\end{align}
$ 

##### Generalised Equation
Let’s generalize the equation now for any time step t+1:

$
\begin{align}
\alpha_j(t+1) &= p \Big( v_k(1) … v_k(t+1),s(t+1)= j \Big) \\
&= \color{Blue}{\sum_{i=1}^M} p\Big(v_k(1) … v_k(t+1),\color{Blue}{s(t)= i}, s(t+1)= j \Big) \\
&= \sum_{i=1}^M p\Big(v_k(t+1) | s(t+1)= j, v_k(1) … v_k(t),s(t)= i\Big) \\
& p\Big(v_k(1)…v_k(t),s(t+1),s(t)= i \Big) \\
&= \sum_{i=1}^M p\Big(v_k(t+1) | s(t+1)= j, \color{Red}{v_k(1)…v_k(t), s(t)= i}\Big) \\
& p\Big(s(t+1) | \color{Red}{v_k(1)…v_k(t),}s(t)= i\Big) p\Big(v_k(t),s(t)= i\Big)\\
&= \sum_{i=1}^M p\Big(v_k(t+1) | s(t+1)= j\Big) p\Big(s(t+1) | s(t)= i\Big) p\Big(v_k(t),s(t)= i\Big)\\
&= \color{DarkRed}{p\Big(v_k(t+1) | s(t+1)= j\Big) }\sum_{i=1}^M p\Big(s(t+1) | s(t)= i\Big) \color{Blue}{p\Big(v_k(t),s(t)= i\Big)} \\
&= \color{DarkRed}{b_{jk v(t+1)}} \sum_{i=1}^M a_{ij} \color{Blue}{\alpha_i(t)}
\end{align}
$


The probability that the system is at hidden state $s_1$ at time $ t $ is $ \alpha_1(t) $. Our aim is to calculate this $\alpha $ for all hidden states at different time steps given a sequence of observed values. Consider the [Hmmdata_python dataset](https://github.com/ebi-byte/kt/blob/master/data/Hmmdata_python.xlsx). It has 2 hidden states namely A, B (say sunny, rainy) and three visible states (Observable States) 0, 1, 2 ( say happy, sad, neutral feelings of a person towards the weather). The values for transition probability, emission probabilities and initial distribution are assumed to be the following. V is the visible sequence observed at different time steps. 

In [19]:
import pandas as pd
import numpy as np
 
data = pd.read_excel("Hmmdata_python.xlsx")
 
V = data['Visible'].values

# Transition Probabilities
a = np.array(((0.54, 0.46), (0.49, 0.51)))
 
# Emission Probabilities
b = np.array(((0.16, 0.26, 0.58), (0.25, 0.28, 0.47)))
 
# Equal Probabilities for the initial distribution
initial_distribution = np.array((0.5, 0.5))

def forward(V, a, b, initial_distribution):
    alpha = np.zeros((V.shape[0], a.shape[0]))
    alpha[0, :] = initial_distribution * b[:, V[0]]
 
    for t in range(1, V.shape[0]):
        for j in range(a.shape[0]):
            # Matrix Computation Steps
            #                  ((1x2) . (1x2))      *     (1)
            #                        (1)            *     (1)
            alpha[t, j] = alpha[t - 1].dot(a[:, j]) * b[j, V[t]]
 
    return alpha
 
alpha = forward(V, a, b, initial_distribution)
print(alpha)

[[8.00000000e-02 1.25000000e-01]
 [2.71570000e-02 2.81540000e-02]
 [1.65069392e-02 1.26198572e-02]
 [8.75653677e-03 6.59378003e-03]
 [2.06946534e-03 2.06943372e-03]
 [3.41045409e-04 5.01841314e-04]
 [1.11817359e-04 1.15589588e-04]
 [3.04252707e-05 3.09082690e-05]
 [1.83133248e-05 1.39866556e-05]]


### Smoothing

This is similar to filtering but asks about the distribution of a latent variable somewhere in the middle of a sequence, i.e. to compute 
$\displaystyle P(x(k)\ |\ y(1),\dots ,y(t)$
 for some $\displaystyle k<t$ 
. From the perspective described above, this can be thought of as the probability distribution over hidden states for a point in time k in the past, relative to time t.

The forward-backward algorithm is an efficient method for computing the smoothed values for all hidden state variables. 

### Forward-Backward Algorithm

The forward–backward algorithm is an inference algorithm for hidden Markov models which computes the posterior marginals of all hidden state variables given a sequence of observations/emissions $\displaystyle o_{1:T}:=o_{1},\dots ,o_{T}$, i.e. it computes, for all hidden state variables $\displaystyle X_{t}\in \{X_{1},\dots ,X_{T}\}$, the distribution$\displaystyle P(X_{t}\ |\ o_{1:T})$.

The algorithm makes use of the principle of dynamic programming to efficiently compute the values that are required to obtain the posterior marginal distributions in two passes. The first pass goes forward in time while the second goes backward in time; hence the name forward–backward algorithm.

In the first pass, the forward–backward algorithm computes a set of forward probabilities which provide, for all $\displaystyle t\in \{1,\dots ,T\}$, the probability of ending up in any particular state given the first $\displaystyle t$ observations in the sequence, i.e. $\displaystyle P(X_{t}\ |\ o_{1:t})$. In the second pass, the algorithm computes a set of backward probabilities which provide the probability of observing the remaining observations given any starting point $\displaystyle t$ 
, i.e. $\displaystyle P(o_{t+1:T}\ |\ X_{t})$. These two sets of probability distributions can then be combined to obtain the distribution over states at any specific point in time given the entire observation sequence: 

$$\displaystyle P(X_{t}\ |\ o_{1:T})=P(X_{t}\ |\ o_{1:t},o_{t+1:T})\propto P(o_{t+1:T}\ |\ X_{t})P(X_{t}|o_{1:t})$$

The last step follows from an application of the Bayes' rule and the conditional independence of $\displaystyle o_{t+1:T}$ 
 and $\displaystyle o_{1:t}$ given $\displaystyle X_{t}$. 

#### Example

Consider the following hidden markov model. A person is assumed to be in only two states: Healthy, Fever. The observable states are normal, cold, dizzy. The transistion probabilities and the emission probabilities are listed below.

In [6]:
states = ('Healthy', 'Fever')
end_state = 'E'
 
observations = ('normal', 'cold', 'dizzy')
 
start_probability = {'Healthy': 0.6, 'Fever': 0.4}
 
transition_probability = {
   'Healthy' : {'Healthy': 0.69, 'Fever': 0.3, 'E': 0.01},
   'Fever' : {'Healthy': 0.4, 'Fever': 0.59, 'E': 0.01},
   }
 
emission_probability = {
   'Healthy' : {'normal': 0.5, 'cold': 0.4, 'dizzy': 0.1},
   'Fever' : {'normal': 0.1, 'cold': 0.3, 'dizzy': 0.6},
   }

We can write the implementation of the forward-backward algorithm like this: 

In [7]:
def fwd_bkw(observations, states, start_prob, trans_prob, emm_prob, end_st):
    # forward part of the algorithm
    fwd = []
    f_prev = {}
    for i, observation_i in enumerate(observations):
        f_curr = {}
        for st in states:
            if i == 0:
                # base case for the forward part
                prev_f_sum = start_prob[st]
            else:
                prev_f_sum = sum(f_prev[k]*trans_prob[k][st] for k in states)

            f_curr[st] = emm_prob[st][observation_i] * prev_f_sum

        fwd.append(f_curr)
        f_prev = f_curr

    p_fwd = sum(f_curr[k] * trans_prob[k][end_st] for k in states)

    # backward part of the algorithm
    bkw = []
    b_prev = {}
    for i, observation_i_plus in enumerate(reversed(observations[1:]+(None,))):
        b_curr = {}
        for st in states:
            if i == 0:
                # base case for backward part
                b_curr[st] = trans_prob[st][end_st]
            else:
                b_curr[st] = sum(trans_prob[st][l] * emm_prob[l][observation_i_plus] * b_prev[l] for l in states)

        bkw.insert(0,b_curr)
        b_prev = b_curr

    p_bkw = sum(start_prob[l] * emm_prob[l][observations[0]] * b_curr[l] for l in states)

    # merging the two parts
    posterior = []
    for i in range(len(observations)):
        posterior.append({st: fwd[i][st] * bkw[i][st] / p_fwd for st in states})

    assert p_fwd == p_bkw
    return fwd, bkw, posterior

In [8]:
def example():
    return fwd_bkw(observations,
                   states,
                   start_probability,
                   transition_probability,
                   emission_probability,
                   end_state)

for line in example():
    print(*line)

{'Healthy': 0.3, 'Fever': 0.04000000000000001} {'Healthy': 0.0892, 'Fever': 0.03408} {'Healthy': 0.007518, 'Fever': 0.028120319999999997}
{'Healthy': 0.0010418399999999998, 'Fever': 0.00109578} {'Healthy': 0.00249, 'Fever': 0.00394} {'Healthy': 0.01, 'Fever': 0.01}
{'Healthy': 0.8770110375573259, 'Fever': 0.1229889624426741} {'Healthy': 0.623228030950954, 'Fever': 0.3767719690490461} {'Healthy': 0.2109527048413057, 'Fever': 0.7890472951586943}


The output gives the forward probabilities, backward probabilities and the posterior probabilities. Note that the posterior probabilities are scaled up so that the sum of probabilities of being in Healthy and Fever states equal 1 at any timestep t.

### Most likely Explanation

Consider the following figure.
<img src="https://upload.wikimedia.org/wikipedia/commons/1/13/HMMsequence.svg" style="height: 500px;"/>



The state transition and output probabilities of an HMM are indicated by the line opacity in the upper part of the diagram. Given that we have observed the output sequence in the lower part of the diagram, we may be interested in the most likely sequence of states that could have produced it. Based on the arrows that are present in the diagram, the following state sequences are candidates:  
5 3 2 5 3 2  
4 3 2 5 3 2  
3 1 2 5 3 2  
We can find the most likely sequence by evaluating the joint probability of both the state sequence and the observations for each case (simply by multiplying the probability values, which here correspond to the opacities of the arrows involved). In general, this type of problem (i.e. finding the most likely explanation for an observation sequence) can be solved efficiently using the Viterbi algorithm.

### Viterbi Algorithm

The Viterbi algorithm is a dynamic programming algorithm for finding the most likely sequence of hidden states—called the Viterbi path—that results in a sequence of observed events, especially in the context of Markov information sources and hidden Markov models (HMM). 

Suppose we are given a hidden Markov model (HMM) with state space S, initial probabilities $ πi $ of being in state i and transition probabilities $ai,j $ of transitioning from state i to state j. Say we observe outputs $ y1, … , yT $. The most likely state sequence ${\displaystyle x_{1},\dots ,x_{T}}$ that produces the observations is given by the recurrence relations

$ {\displaystyle {\begin{array}{rcl}V_{1,k}&=&\mathrm {P} {\big (}y_{1}\ |\ k{\big )}\cdot \pi _{k}\\V_{t,k}&=&\max _{x\in S}\left(\mathrm {P} {\big (}y_{t}\ |\ k{\big )}\cdot a_{x,k}\cdot V_{t-1,x}\right)\end{array}}} $

Here $ {\displaystyle V_{t,k}} $ is the probability of the most probable state sequence $ {\displaystyle \mathrm {P} {\big (}x_{1},\dots ,x_{t},y_{1},\dots ,y_{t}{\big )}} $ responsible for the first $ {\displaystyle t} $ observations that have ${\displaystyle k}$ as its final state. The Viterbi path can be retrieved by saving back pointers that remember which state 
$ {\displaystyle x} $ was used in the second equation. Let $ {\displaystyle \mathrm {Ptr} (k,t)} $ be the function that returns the value of $ {\displaystyle x} $ used to compute $ {\displaystyle V_{t,k}} $ if $ {\displaystyle t>1} $ $ {\displaystyle x} $ was used in the second equation. Let $ {\displaystyle \mathrm {Ptr} (k,t)} $ be the function that returns the value of $ {\displaystyle x} $ used to compute $ {\displaystyle V_{t,k}} $ if $ {\displaystyle t>1} 
{\displaystyle x} $ was used in the second equation. Let $ {\displaystyle \mathrm {Ptr} (k,t)} $ be the function that returns the value of $ {\displaystyle x} $ used to compute $ {\displaystyle V_{t,k}} $ if $ {\displaystyle t>1} $

, or $ {\displaystyle k} $ if $ {\displaystyle t=1}$. Then:

${\displaystyle {\begin{array}{rcl}x_{T}&=&\arg \max _{x\in S}(V_{T,x})\\x_{t-1}&=&\mathrm {Ptr} (x_{t},t)\end{array}}} $

Here we're using the standard definition of arg max.
The complexity of this implementation is $ {\displaystyle O(T\times \left|{S}\right|^{2})} $.

#### Example:
Consider a village where all villagers are either healthy or have a fever and only the village doctor can determine whether each has a fever. The doctor diagnoses fever by asking patients how they feel. The villagers may only answer that they feel normal, dizzy, or cold. 

The doctor believes that the health condition of his patients operate as a discrete Markov chain. There are two states, "Healthy" and "Fever", but the doctor cannot observe them directly; they are hidden from him. On each day, there is a certain chance that the patient will tell the doctor he/she is "normal", "cold", or "dizzy", depending on their health condition. 

The observations (normal, cold, dizzy) along with a hidden state (healthy, fever) form a hidden Markov model (HMM), and can be represented as follows 

In [5]:
obs = ('normal', 'cold', 'dizzy')
states = ('Healthy', 'Fever')
start_p = {'Healthy': 0.6, 'Fever': 0.4}
trans_p = {
   'Healthy' : {'Healthy': 0.7, 'Fever': 0.3},
   'Fever' : {'Healthy': 0.4, 'Fever': 0.6}
   }
emit_p = {
   'Healthy' : {'normal': 0.5, 'cold': 0.4, 'dizzy': 0.1},
   'Fever' : {'normal': 0.1, 'cold': 0.3, 'dizzy': 0.6}
   }

In this piece of code, start_probability represents the doctor's belief about which state the HMM is in when the patient first visits (all he knows is that the patient tends to be healthy). The particular probability distribution used here is not the equilibrium one, which is (given the transition probabilities) approximately {'Healthy': 0.57, 'Fever': 0.43}. The transition_probability represents the change of the health condition in the underlying Markov chain. In this example, there is only a 30% chance that tomorrow the patient will have a fever if he is healthy today. The emission_probability represents how likely each possible observation, normal, cold, or dizzy is given their underlying condition, healthy or fever. If the patient is healthy, there is a 50% chance that he feels normal; if he has a fever, there is a 60% chance that he feels dizzy.

![alt text](https://upload.wikimedia.org/wikipedia/commons/0/0c/An_example_of_HMM.png)

The patient visits three days in a row and the doctor discovers that on the first day he feels normal, on the second day he feels cold, on the third day he feels dizzy. The doctor has a question: what is the most likely sequence of health conditions of the patient that would explain these observations? This is answered by the Viterbi algorithm. 

In [6]:
def viterbi(obs, states, start_p, trans_p, emit_p):
    V = [{}]
    for st in states:
        V[0][st] = {"prob": start_p[st] * emit_p[st][obs[0]], "prev": None}
    # Run Viterbi when t > 0
    for t in range(1, len(obs)):
        V.append({})
        for st in states:
            max_tr_prob = V[t-1][states[0]]["prob"]*trans_p[states[0]][st]
            prev_st_selected = states[0]
            for prev_st in states[1:]:
                tr_prob = V[t-1][prev_st]["prob"]*trans_p[prev_st][st]
                if tr_prob > max_tr_prob:
                    max_tr_prob = tr_prob
                    prev_st_selected = prev_st
                    
            max_prob = max_tr_prob * emit_p[st][obs[t]]
            V[t][st] = {"prob": max_prob, "prev": prev_st_selected}
                    
    for line in dptable(V):
        print (line)
    opt = []
    # The highest probability
    max_prob = max(value["prob"] for value in V[-1].values())
    previous = None
    # Get most probable state and its backtrack
    for st, data in V[-1].items():
        if data["prob"] == max_prob:
            opt.append(st)
            previous = st
            break
    # Follow the backtrack till the first observation
    for t in range(len(V) - 2, -1, -1):
        opt.insert(0, V[t + 1][previous]["prev"])
        previous = V[t + 1][previous]["prev"]

    print ('The steps of states are ' + ' '.join(opt) + ' with highest probability of %s' % max_prob)

def dptable(V):
    # Print a table of steps from dictionary
    yield " ".join(("%12d" % i) for i in range(len(V)))
    for state in V[0]:
        yield "%.7s: " % state + " ".join("%.7s" % ("%f" % v[state]["prob"]) for v in V)

The function viterbi takes the following arguments: obs is the sequence of observations, e.g. ['normal', 'cold', 'dizzy']; states is the set of hidden states; start_p is the start probability; trans_p are the transition probabilities; and emit_p are the emission probabilities. For simplicity of code, we assume that the observation sequence obs is non-empty and that trans_p[i][j] and emit_p[i][j] is defined for all states i,j. 

In the running example, the forward/Viterbi algorithm is used as follows: 

In [7]:
viterbi(obs,
        states,
        start_p,
        trans_p,
        emit_p)

           0            1            2
Healthy: 0.30000 0.08400 0.00588
Fever: 0.04000 0.02700 0.01512
The steps of states are Healthy Healthy Fever with highest probability of 0.01512


This reveals that the observations ['normal', 'cold', 'dizzy'] were most likely generated by states ['Healthy', 'Healthy', 'Fever']. In other words, given the observed activities, the patient was most likely to have been healthy both on the first day when he felt normal as well as on the second day when he felt cold, and then he contracted a fever the third day. 


### Types

##### Gaussian HMM
Hidden Markov models can model complex Markov processes where the states emit the observations according to some probability distribution. One such example is the Gaussian distribution; in such a Hidden Markov Model the states output are represented by a Gaussian distribution. 

Moreover, it could represent even more complex behavior when the output of the states is represented as mixture of two or more Gaussians, in which case the probability of generating an observation is the product of the probability of first selecting one of the Gaussians and the probability of generating that observation from that Gaussian.

##### Poisson HMM
Poisson hidden Markov models (PHMM) are special cases of hidden Markov models where a Poisson process has a rate which varies in association with changes between the different states of a Markov model. PHMMs are not necessarily Markovian processes themselves because the underlying Markov chain or Markov process cannot be observed and only the Poisson signal is observed. 


### Questionnaire

#### 1. Consider two friends, Alice and Bob, who live far apart from each other and who talk together daily over the telephone about what they did that day. Bob is only interested in three activities: walking in the park, shopping, and cleaning his apartment. The choice of what to do is determined exclusively by the weather on a given day. Alice has no definite information about the weather, but she knows general trends. Based on what Bob tells her he did each day, Alice tries to guess what the weather must have been like.  How would Alice go about solving the problem at hand?

![alt text](https://upload.wikimedia.org/wikipedia/commons/4/43/HMMGraph.svg)

[Solutions](https://github.com/ebi-byte/kt/blob/master/supervised_ML/HMM%20Solutions.ipynb)