### Introduction
In simpler Markov models (like a Markov chain), the state is directly visible to the observer, and therefore the state transition probabilities are the only parameters, while in the hidden Markov model, the state is not directly visible, but the output (in the form of data or "token" in the following), dependent on the state, is visible. Each state has a probability distribution over the possible output tokens. Therefore, the sequence of tokens generated by an HMM gives some information about the sequence of states; this is also known as pattern theory. 

The adjective hidden refers to the state sequence through which the model passes, not to the parameters of the model; the model is still referred to as a hidden Markov model even if these parameters are known exactly. 

You basically use your knowledge of Markov Models to make an educated guess about the model’s structure. 

### Description with an Example

Here, let's look at a simple example using two items that are very familiar in probability: dice and bags of colored balls.

The model components, which you’ll use to create the random model, are:  
A six-sided red die.  
A ten-sided black die.  
A red bag with ten balls. Nine balls are red, one is black.   
A black bag with twenty balls. One ball is red, nineteen are black.   

“Black” and “Red” are the two states in this model (in other words, you can be black, or you can be red).
Now create the model by following these steps:

![alt text](https://www.statisticshowto.datasciencecentral.com/wp-content/uploads/2015/09/hidden-markov-model.png)

##### Step 1: 
*EMISSION STEP:* Roll a die. Note the number that comes up. This is the emission. In the above graphic, I chose a red die to start (arbitrary — I could have chosen black) and rolled 2.   
##### Step 2:
*TRANSITION STEP*: Randomly choose a ball from the bag with the color that matches the die you rolled in step 1. I rolled a red die, so I’m going to choose a ball from the red bag. I pulled out a black ball, so I’m going to transition to the black die for the next emission. 

You can then repeat these steps to a certain number of emissions. For example, repeating this sequence of steps 10 times might give you the set {2,3,6,1,1,4,5,3,4,1}. The process of transitioning from one state to the next is called a Markov process. The Markov process itself cannot be observed, only the sequence of numbers on the coloured dies, thus this arrangement is called a "hidden Markov process".

Transitioning from red to black or black to red carries different probabilities as there are different numbers of black and red balls in the bags. The following diagram shows the probabilities for this particular model, which has two states (black and red):

![alt text](https://www.statisticshowto.datasciencecentral.com/wp-content/uploads/2015/09/TRANSITIONS-2.png)

### Structural Architecture

The diagram below shows the general architecture of an instantiated HMM. Each oval shape represents a random variable that can adopt any of a number of values. The random variable x(t) is the hidden state at time t (with the model from the above diagram, x(t) ∈ { x1, x2, x3 }). The random variable y(t) is the observation at time t (with y(t) ∈ { y1, y2, y3, y4 }). The arrows in the diagram (often called a trellis diagram) denote conditional dependencies. 

![alt text](https://upload.wikimedia.org/wikipedia/commons/8/83/Hmm_temporal_bayesian_net.svg)



From the diagram, it is clear that the conditional probability distribution of the hidden variable x(t) at time t, given the values of the hidden variable x at all times, depends only on the value of the hidden variable x(t − 1); the values at time t − 2 and before have no influence. This is called the Markov property. Similarly, the value of the observed variable y(t) only depends on the value of the hidden variable x(t) (both at time t). 

In the standard type of hidden Markov model considered here, the state space of the hidden variables is discrete, while the observations themselves can either be discrete (typically generated from a categorical distribution) or continuous (typically from a Gaussian distribution). The parameters of a hidden Markov model are of two types, transition probabilities and emission probabilities (also known as output probabilities). The transition probabilities control the way the hidden state at time t is chosen given the hidden state at time t-1.

The hidden state space is assumed to consist of one of N possible values, modelled as a categorical distribution. This means that for each of the N possible states that a hidden variable at time t can be in, there is a transition probability from this state to each of the N possible states of the hidden variable at time t + 1 , for a total of $ N^2 $ transition probabilities. Note that the set of transition probabilities for transitions from any given state must sum to 1. Thus, the N × N matrix of transition probabilities is a Markov matrix.

In addition, for each of the N possible states, there is a set of emission probabilities governing the distribution of the observed variable at a particular time given the state of the hidden variable at that time. The size of this set depends on the nature of the observed variable. 

### Inference Problems
Several inference problems are associated with hidden Markov models, as outlined below. 

#### Probability of an observed sequence
The task is to compute in a best way, given the parameters of the model, the probability of a particular output sequence.  This requires summation over all possible state sequences:

The probability of observing a sequence $ Y = y(0), y(1),..., y(L-1) $   
of length ''L'' is given by

$ P(Y) =\sum_{X}P(Y\mid X)P(X) $ where the sum runs over all possible hidden-node sequences $ X = x(0), x(1), ..., x(L-1) $

#### Filtering
The task is to compute, given the model's parameters and a sequence of observations, the distribution over hidden states of the last hidden variable at the end of the sequence, i.e. to compute ${\displaystyle P(x(t)\ |\ y(1),\dots ,y(t))}$. This task is normally used when the sequence of hidden variables is thought of as the underlying states that a process moves through at a sequence of points of time, with corresponding observations at each point in time. Then, it is natural to ask about the state of the process at the end. 

These problems can be handled efficiently using the forward algorithm. 

#### Forward Algorithm
In Forward Algorithm (as the name suggested), we will use the computed probability on current time step to derive the probability of the next time step. Hence the it is computationally more efficient $(O(N^2.T))$. We need to find the answer of the following question to make the algorithm recursive:   
Given a a sequence of Visible state $(V^T)$ , what will be the probability that the Hidden Markov Model will be in a particular hidden state s at a particular time step t.  
If we write the above question mathematically it might be more easier to understand. 
$[
\alpha_j(t) = p(v(1)…v(t),s(t)= j)
] $.  
First, we will derive the equation using just probability.

##### Solution using Probabilities
###### When t = 1:
Rewrite the above equation when t = 1,

$
\begin{align}
\alpha_j(1) &= p(v_k(1),s(1)= j) \\
&= p(v_k(1)|s(1)=j)p(s(1)=j) \\
&= \pi_j p(v_k(1)|s(1)=j) \\
&= \pi_j b_{jk} \\
\text{where } \pi &= \text{ initial distribution, } \\
b_{jkv(1)} &= \text{ Emission Probability at } t = 1
\end{align}
$ 

##### Generalised Equation
Let’s generalize the equation now for any time step t+1:

$
\begin{align}
\alpha_j(t+1) &= p \Big( v_k(1) … v_k(t+1),s(t+1)= j \Big) \\
&= \color{Blue}{\sum_{i=1}^M} p\Big(v_k(1) … v_k(t+1),\color{Blue}{s(t)= i}, s(t+1)= j \Big) \\
&= \sum_{i=1}^M p\Big(v_k(t+1) | s(t+1)= j, v_k(1) … v_k(t),s(t)= i\Big) \\
& p\Big(v_k(1)…v_k(t),s(t+1),s(t)= i \Big) \\
&= \sum_{i=1}^M p\Big(v_k(t+1) | s(t+1)= j, \color{Red}{v_k(1)…v_k(t), s(t)= i}\Big) \\
& p\Big(s(t+1) | \color{Red}{v_k(1)…v_k(t),}s(t)= i\Big) p\Big(v_k(t),s(t)= i\Big)\\
&= \sum_{i=1}^M p\Big(v_k(t+1) | s(t+1)= j\Big) p\Big(s(t+1) | s(t)= i\Big) p\Big(v_k(t),s(t)= i\Big)\\
&= \color{DarkRed}{p\Big(v_k(t+1) | s(t+1)= j\Big) }\sum_{i=1}^M p\Big(s(t+1) | s(t)= i\Big) \color{Blue}{p\Big(v_k(t),s(t)= i\Big)} \\
&= \color{DarkRed}{b_{jk v(t+1)}} \sum_{i=1}^M a_{ij} \color{Blue}{\alpha_i(t)}
\end{align}
$

