# CS486 - Artificial Intelligence
## Lesson 23 - Hidden Markov Models 

Markov Chains gave us a way to estimate the probability distribution over a set of random variables over time. **Hidden Markov Models (HMMs)** are Markov Chains that incorporate evidence about the **hidden variables** we cannot directly observe. HMMs have the following components:

* An initial (prior) distribution: $X_0$. Usually uniform. 
* A **stationary transition model**: $P(X_t\mid{X_{t-1}})$.
* An **emission model** that gives the probability of seeing evidence for a state: $P(e\mid{X_{t-1}})$.


In [6]:
import helpers
from aima.probability import *
from aima.notebook import psource


### The Sad Grad

First, let's walk through the example from the lecture. You are a gad student that never goes outside. The only evidence you have of rain is seeing someone carrying an umbrella. Here is what we are given:

![Sad Grad Models](images/sad_grad_models.png)

Now, here are the first three states if an umbrella is always observed:

![Sad Grad](images/sad_grad.png)

### Time Diminishes Beliefs

If you don't see anybody with or without an umbrella, you will eventually return to the 50/50 belief that it might be raining. Each time step brings your probability vector closer to the stationary distribution of the transition model. 

Your current probability vector always includes whatever evidence you've seen to date and at each time step you compute the new vector based on the transition model and the current vector:

$$ P(X_{t+1}\mid{e_{1:t}})=\sum_{x_t}P(X_{t+1}\mid{x_t})P(x_t\mid{e_{1:t}}) $$

It gets cumbersome to carry the evidence everywhere we go, so we'll call a **belief** the probability given all of the evidence to date:

$$ B(X_{t+1}) = P(X_t\mid{e_{1:t}}) $$
$$ B'(X_{t+1})=\sum_{x_t}P(X'\mid{x})B(x_t) $$

### Evidence Strengthens Beliefs

Every time someone walks by with an umbrella, your belief that it is raining goes up:

$$ B'(X_{t+1}) = P(X_{t+1}\mid{e_{1:t}}) $$
$$ P(X_{t+1}\mid{e_{1:t+1}}) \propto  P(e_{t+1}\mid{X_{t+1}})P(X_{t+1}\mid{e_{1:t}}) $$

So our new beliefs are:

$$ B(X_{t+1}) \propto P(e\mid{X_{t+1}})B'(X_{t+1}) $$

Since the evidence is just a scalar, you need to normalize to get back to a proper probability distribution. 

### Forward Algorithm

The **Forward Algorithm** incorporates both time steps and evidence into one update:

$$P(x_t\mid{e_{1:t}}) \propto_{X} P(e_t\mid{x_t})\sum_{x_{t-1}}P(x_t\mid{x_{t-1}})P(x_{t-1},e_{1:t-1})$$

Note that the result is not normalized. 

### Particle Filtering

It's not always practical to keep track of every possible value a variable can take on in our HMM. Instead, generate a bunch of samples and let each sample have a vote toward a particular value for variable. 

For example, consider a robot in a 9x9 grid. A particle filter that approximates the robot's location might looks like this:

![Particle Filtering](images/particles.png)

Samples are referred to as *particles* and the more particles you have, the more accurate your approximation will be. 

Particle filtering can be divided into four steps:

1. __Initialization__
If we have some idea about the prior probability distribution, we drop the initial particles accordingly, or else we just drop them uniformly over the state space.

2. __Forward pass__: 
Every time step, loop through all our particles and try to simulate what could happen to each one of them by sampling its next position from the transition model. Since each sample can only make a single transition (this is no longer a probability distribution), uniformly pick a transition for each particle. Some sample will move, some may stay. 

3. __Reweight__:
Assign weights to each particle according to the evidence:
<br>
$$w(x) = P(e/x)$$
$$B(X) \propto P(e/X)B'(X)$$
<br>

4. __Resample__:
Instead of trying to keep track of weighted samples, we _resample_. Replace the weighted particles with the same number of new particles. Place the new particles according the weights of the previous particles. A highly probable particle will be replaced by numerous new particles and unlikely particles might not be replaced at all. Since each particle carries a $1/n$ for the probability of the assignment, this is like re-normalizing the distribution.

We can take a look at AIMA's `particle_filtering` function, but it only operates on a two-variable state-space, which limits its utility. 

In [9]:
psource(probability) 
psource(particle_filtering)