# Dynamic Latent Variable Models



### Preliminaries

- Goal 
  - Introduction to dynamic (=temporal) Latent Variable Models, including the Hidden Markov Model and Kalman filter.   
- Materials
  - Mandatory
    - These lecture notes
  - Optional 
    - Bishop pp.605-615 on Hidden Markov Models
    - Bishop pp.635-641 on Kalman filters
    - Faragher (2012), [Understanding the Basis of the Kalman Filter](./files/Faragher-2012-Understanding-the-Basis-of-the-Kalman-Filter.pdf), and [video](http://www.brainshark.com/brainshark/brainshark.net/salesportal/title.aspx?pi=zCJz8Qu7fz0z0) 
    - Minka (1999), [From Hidden Markov Models to Linear Dynamical Systems](./files/Minka-1999-from-HMM-to-LDS.pdf)
      


### Example Problem

<div class="exercise">
- We consider a one-dimensional cart position tracking problem, see  [Faragher 2012](./files/Faragher-2012-Understanding-the-Basis-of-the-Kalman-Filter.pdf).  

-  The hidden states are the position $z_t$ and velocity $\dot z_t$. We can apply an external breaking force $u_t$. (Noisy) observations are represented by $x_t$. 

- The equations of motions are given by 
$$\begin{align*}
\begin{bmatrix} z_t \\ \dot{z_t}\end{bmatrix} &=  \begin{bmatrix} 1 & \Delta t \\ 0 & 1\end{bmatrix} \begin{bmatrix} z_{t-1} \\ \dot z_{t-1}\end{bmatrix} + \begin{bmatrix} (\Delta t)^2/2 \\ \Delta t\end{bmatrix} u_t + \mathcal{N}(0,\Sigma_z) \\
x_t &= H \begin{bmatrix} z_t \\ \dot{z_t}\end{bmatrix} + \mathcal{N}(0,\sigma_x^2) 
\end{align*}$$

- Infer the position after 10 time steps. 
</div>
<img src="./figures/Faragher-2012-cart-1.png" width="600px">


### Dynamical Models

- In this lesson, we consider models where the sequence order of observations matters. 

- Consider the _ordered_ observation sequence $$x^N \triangleq \left(x_1,x_2,\ldots,x_N\right)\,.$$

- We wish to develop a generative model
    $$ p( x^N \,|\, \theta)$$
that 'explains' the time series $x^N$.

- We cannot use the IID assumption $p( x^N  | \theta) = \prod_n p(x_n \,|\, \theta)$. In general, we _can_ use the [**chain rule**](https://en.wikipedia.org/wiki/Chain_rule_(probability)

$$\begin{align*}
p(x^N) &= p(x_N|x^{N-1}) \,p(x^{N-1}) \\
  &=  p(x_N|x^{N-1}) \,p(x_{N-1}|x^{N-2}) \cdots p(x_2|x_1)\,p(x_1) \\
  &= p(x_1)\prod_{n=2}^N p(x_n\,|\,x^{N-1})
\end{align*}$$

- Generally, we will want to limit the depth of dependencies on previous observations. For example, the $K$th-order linear **Auto-Regressive** (AR) model
    $$\begin{align*}
  p(x_n|x^{n-1}) = \mathcal{N}\left( \sum_{k=1}^K a_k x_{n-k}\,,\sigma^2\,\right)  
    \end{align*}$$
    limits the dependencies to the past $K$ samples.


### State-space Models

- A limitation of AR models is that we need a large number of parameters for a deep temporal memory. Can we get a deep memory without the big order?  

- Yes, store the past in a set of _latent_ (unobserved) variables  $z^N \triangleq \left(z_1,z_2,\dots,z_N\right)$, which are called _state variables_ in dynamic systems.

- A general **state space model** is defined by
$$\begin{align*}
    \text{a state transition model}:&\,\, p(z_n\,|\,z^{n-1}) \,,  \\
    \text{an observation model}:&\,\, p(x_n\,|\,z_n) \,,  \\
    \text{and an initial state}:&\,\, p(z_1) 
\end{align*}$$

- A very common computational assumption is to let state transitions be ruled by a _first-order Markov chain_ as
$$
 p(z_n\,|\,z^{n-1}) = p(z_n\,|\,z_{n-1})
$$

- The Markov assumption leads to the following joint probability distribution for the state-space model:
$$
 p(x^N,z^N) = \underbrace{p(z_1)}_{\text{initial state}} \prod_{n=2}^N \underbrace{p(z_n\,|\,z_{n-1})}_{\text{state transitions}}\,\prod_{n=1}^N \underbrace{p(x_n\,|\,z_n)}_{\text{observations}}
$$

- The Forney-style factor graph for a state-space model:

<img src="./figures/ffg-state-space.png" width="600px">

- <span class="exercise">Exercise: Show that in a state-space model $x_n$ is not a first-order Markov chain in the observations, i.e., show that $$p(x_n\,|\,x_{n-1},x_{n-2}) \neq p(x_n\,|\,x_{n-1})\,.$$</span>



### Hidden Markov Models, Kalman filters etc.

- A **Hidden Markov Model** (HMM) is a state-space model with <span class="emphasis">discrete-valued</span> state variables $Z_n$.
  - E.g., $Z_n$ is a $K$-dimemsional hidden binary 'class indicator' with transition probabilities $A_{jk} \triangleq p(z_{nk}=1\,|\,z_{n-1,j}=1)$, or equivalently
  $$p(z_n|z_{n-1}) = \prod_{k=1}^K \prod_{j=1}^K A_{jk}^{z_{n-1,j}z_{nk}}$$
which is usually accompanied by an initial state distribution $\pi_k \triangleq p(z_{1k}=1)$.
  - The classical HMM has also discrete-valued observations but in pratice any (probabilistic) observation model $p(x_n|z_n)$ may be coupled to the hidden Markov chain.    

<img src="./figures/Figure13.7.png" width="500px">

- Another well-known state-space model with <span class="emphasis">continuous-valued</span> state variables $Z_n$ is the **(Linear) Dynamical System** (LDS), which is defined as

$$\begin{align*}
p(z_n\,|\,z_{n-1}) &= \mathcal{N}\left(\, A z_{n-1}\,,\,\Sigma_z\,\right) \\ 
p(x_n\,|\,z_n) &= \mathcal{N}\left(\, C z_n\,,\,\Sigma_x\,\right) \\
p(z_1) &= \mathcal{N}\left(\, \mu_1\,,\,\Sigma_1\,\right)
\end{align*}$$
<!---or, equivalently (in the usual state-space notation)
$$\begin{align*}
z_k &= A z_{k-1} + \mathcal{N}\left(0,\Sigma_z \right) \\ 
x_k &= C z_k + \mathcal{N}\left( 0, \Sigma_x \right) \\
z_1 &= \mu_1 + \mathcal{N}\left( 0, \Sigma_1\right)
\end{align*}$$
--->
- Technically, a [**Kalman filter**](https://en.wikipedia.org/wiki/Kalman_filter) is the solution to the recursive estimation (=inference) of the hidden state in an LDS, i.e., Kalman filtering solves the problem $p(z_n\,|\,x^n)$. 

- Kalman filtering and hidden Markov models (and variants thereof) are at the basis of a wide range of complex information processing systems, such as speech and language recognition, robotics and automatic car navigation, and even processing of DNA sequences.    

### Message Passing in State-space Models

- Once the (state-space) models have been specified, we can define state and parameter estimation problems as inference tasks on the generative model. 

- In principle, for linear Gaussian models these inference tasks can be analytically solved, see e.g. [Faragher, 2012](./files/Faragher-2012-Understanding-the-Basis-of-the-Kalman-Filter.pdf) 
  - These derivations quickly become quite laborious  

- Alternatively, we could specify the generative model in a (Forney-style) factor graph and use automated message passing to inference the posteriors overs the hidden variables. E.g., the message passing schedule for Kalman filtering looks like this: 

<img src="./figures/ffg-state-space-with-state-estimation.png" width="600">

### Example Problem Revisited

<span class="emphasis">MARCO CAN YOU INSERT HERE THE STATE ESTIMATION FOR THE CAR POSITION TRACKING USING KALMAN FILTERING IN FORNEYLAB? WOuld be nice if you can refer to the two pictuers below</span> 

<img src="./figures/Faragher-2012-cart-2.png" width="600">
<img src="./figures/Faragher-2012-cart-3.png" width="600">

### Extensions

<img src="./figures/fig-generative-Gaussian-models.png" width="550px">

-----
_The cell below loads the style file_

In [3]:
open("../../styles/aipstyle.html") do f
    display("text/html", readall(f))
end