# Continuous HMM and change detection

Recall that in the HMM practice session we tried to detect wet and try seasons of Singapore

* [03_change_detection_with_hidden_markov_models.ipynb](../03/03_change_detection_with_hidden_markov_models.ipynb)

but at that time we did not know how to fit the model parameters. The aim of this exercise session is to complete the task again but now with parameter fitting. 

In [1]:
import numpy as np
import pandas as pd
import scipy.stats as stats
from matplotlib import pyplot as plt
import sklearn

from pandas import Series
from pandas import DataFrame
from typing import Tuple

from tqdm import tnrange#, tqdm_notebook
from plotnine import *

from sklearn.decomposition import PCA
from sklearn.cluster import AgglomerativeClustering

# Local imports
from common import *
from convenience import *

## I. E-step formulae for clustering and HMM

For the clustering, we must assign a weight for each character image by rescaling the likelihood matrix. More precisely, let $(p_{ij})$ be the probability that the $i$th image belongs to the $j$th cluster then 
\begin{align*}
 w_{ij} = \frac{p_{ij}}{\sum_{j} p_{ij}}\,.
\end{align*}
For the HMM we need to compute two marginal probabilities:
* $\gamma_{j}(i)$ – probability that the $i$th internal state is $j$ given the observation vector and HMM parameters.
* $\xi_{jk}(i)$ – probability that the $i$th and $(i+1)$th internal states are $j$ and $k$ given the observation vector and HMM parameters.

More formally,

\begin{align*}
\gamma_{j}(i)&=\Pr[x_i=j|\boldsymbol{y},\boldsymbol{\Theta}]\,,\\
\xi_{jk}(i)&=\Pr[x_i=j, x_{i+1}=k|\boldsymbol{y},\boldsymbol{\Theta}]\,.
\end{align*}

## II. M-step formulaes for clustering and HMM

For the clustering and HMM emission probabilities, we can use Gaussian mixtures as discussed in the previous exercise session:

* [03_concepts_behind_expectation_maximisation_algorithm](../07/03_concepts_behind_expectation_maximisation_algorithm.ipynb)

For a single observation sequence, the HMM transition probabilities can be calculated using the following formulae:

\begin{align*}
\beta_j&=\gamma_{j}(1)\,,\\
\alpha_{jk}&=\frac{\sum_{i} \xi_{jk}(i)}{\sum_i \gamma_j(i)}\,.
\end{align*}

The generalisation to multiple observation sequences as in our problem is obvious – we just compute sums over all observations and normalise appropriately.

# Homework

## 2.1 Implement the EM-algorithm  for the HMM with normal emissions (<font color='red'>3p</font>)

Implement weight computation for the HMM. The computation of $\gamma_{j}(i)$ has already been done in the HMM exercise session, as this is a marginal state probability given all observations. For the weights $\xi_{jk}(i)$, you need to define a computation scheme that is analogous. Check the consistency of your derivations through the following formula:

\begin{align*}
  \gamma_{j}(i)=\sum_k\xi_{ik}(i)\,.
\end{align*}

Implement the M-step by updating the parameters and then assemble the entire algorithm. Use simple 100 iterations as a stopping criterion. Run the algorithm on the dataset and output model parameters.
Redo the visual annotations by showing 4 types of state assignments as in notebook 
[03_change_detection_with_hidden_markov_models.ipynb](../03/03_change_detection_with_hidden_markov_models.ipynb).

## 2.2 Experiment with the model structure (<font color='red'>3p</font>)

By default the EM-algorithm adjusts all model parameters. Try different models:

* model where the transition matrix is fixed and you must learn only the emission distribution
* model where the number of states is three and all parameters are free
* model with cascading states as described in exercise 3.1 in notebook  [03_change_detection_with_hidden_markov_models.ipynb](../03/03_change_detection_with_hidden_markov_models.ipynb)