<h1 align="center">Hidden Markov Model and Application to Automatic Speech Recognition</h1>

HMMs are probabilistic sequence  classifiers. A sequence classifier is a model whose job is to assign some label or class to each unit in a sequence. Given a sequence of units (words, letters, morphemes, sentences, whatever) their job is to compute a probability distribution over possible labels and choose the best label sequence.

<img src="http://www.cs.virginia.edu/%7Ehw5x/Course/CS6501-Text-Mining/_site/docs/codes/HMM.PNG" width="400" height="400" />

* The $t_i$s are instances of the hidden or latent variable. 
* The $w_i$s are instances of the tangible 'output' variable.

For example, in a part of speech tagger, the $w_i$s are the words and the $t_i$s are the parts of speech we wish to assign to these words.

**Markov Assumption:** Current state $t_i$ only depends on previous k tags for a k-th order HMM. Assuming an HMM of order 1,

$$\begin{equation}
P(t_i \mid t_1, t_2, \ldots t_{i-1}) = P(t_i  \mid t_{i-1})
\end{equation}$$

**Output Independence Assumption:** Current output state $w_i$ only depends on the current hidden state $t_i$. Assuming an HMM of order 1,

$$\begin{equation}
P(w_1, w_2,\ldots w_i, t_1, t_2, \ldots t_i) = P(w_i \mid t_i)
\end{equation}$$

## Components of a Hidden Markov Model

Next, we'll describe the different components of an HMM using the Ice Cream task. 

    Imagine that you are a climatologist in the year 2799 studying the history of global warming. You cannot find any records of the weather for the summer of 2007, but you do find Jason Eisner’s diary, which lists how many ice creams Jason ate every day that summer. Our goal is to use these observations to estimate the temperature every day. 

We’ll simplify this weather task by assuming there are only two kinds of days: cold (C) and hot (H). Also, Jason cares about his health so he only eats 1-3 icecreams. So the Eisner task is as follows:

Given a sequence of observations, each observation an integer between 1-3, corresponding to the number of ice creams eaten on a given day, figure out the correct ‘hidden’ sequence of weather states (H or C) which caused Jason to eat the ice cream. 

<img src="https://qph.ec.quoracdn.net/main-qimg-8fce62d562ac08766c168507f194956c" />

## The 3 HMM Problems

Hidden Markov Models are characterized by three fundamental problems:

**Problem 1 (Computing Likelihood):** Given an HMM $\lambda = (A, B)$ and an observation sequence O, determine the likelihood $P(O \mid \lambda)$.

**Problem 2 (Decoding):** Given an HMM $\lambda = (A, B)$ and an observation sequence O, find the most likely sequence Q.

**Problem 3 (Learning):** Given an HMM $\lambda = (A, B)$ and an observation sequence O, learn the HMM parameters A and B.

## Computing Likelihoods

Let's calculate the probability of the observation equence {3, 1} from the ice-cream HMM. This is the marginal probability

$P(O) = \sum_Q P(O, Q) = P(O \mid Q) \times P(Q) = \sum_Q [\prod_{i=1}^T P(o_i \mid q_i) \times \prod_{i=1}^T P(q_i | q_{i-1})] $

$P(3,1) = P(3, 1, H, H) + P(3, 1, H, T) + P(3, 1, T, H) + P(3, 1, T, T)$

$P(3, 1, H, H) = P(3 \mid H) \times P(1 \mid H) \times P(H \mid H) \times P(H) = 0.4 \times 0.2 \times 0.7 \times 0.5 = 0.028$

While computing $P(O \mid \lambda)$ is this way is simple, for an HMM with T steps and N values for the hidden state, this computation take $O(T \times N^T)$ time.

The Dynamic Programming solution to this problem is called the Forward Algorithm.

<img src="https://danieltakeshi.github.io/assets/forward_trellis.png" />

## Most Likely Subsequence

<img src="https://danieltakeshi.github.io/assets/backward_probability.png">


<img src="speech_example.PNG" />