# 6. Sequence Modelling

## 6.1. Markov Models

* **1st Order**

>$$y_{t+1} \perp y_{1:t-1}|y_t$$

>$$p(y_{1:T}) = p(y_1)p(y_2|y_1)p(y_3|y_2)...p(y_T|y_{T-1})$$

* **2nd Order**

>$$p(y_{1:T}) = p(y_1)p(y_2|y_1)p(y_3|y_2,y_1)...p(y_T|y_{T-1},y_{T-2})$$

## 6.2. N-gram Models (discrete data)

* **Bi-gram** (1st Order)

>* Discrete States: $y_t \in \{1,...,K\}$
>* Initial state probabilities: $p(y_1=k)=\pi_k^0$
>* Transition probabilities: $p(y_t=k|y_{t-1}=l)=T_{k,l} \rightarrow \sum^K_{k=1} T_{k,l} = 1$

* **Marginal Distribution**

>$$p(y_2=k)=\sum^K_{l=1} p(y_2=k|y_1=l)p(y_l=1)=\sum^K_{l=1} T_{k,l} \pi_l^0$$

* **Stationary Distribution**: eigenvector of $T$ with eigenvalue = 1

>\begin{align}
p(y_t=k) &= \sum^K_{l=1} p(y_t=k|y_{t-1}=l)p(y_{t-1}=l)\\
\pi^\infty_k &= \sum^K_{l=1} T_{k,l} \pi^\infty_l
\end{align}

* **Tri-gram** (2nd Order))

>$$p(y_t=k|y_{t-1}=l, y_{t-2}=m)=T_{k,l,m}$$

## 6.3. AR Gaussian Models (continuous data)

* **Multivariate Gaussian** ($y \in \mathbb{R}^D$)

>\begin{align}
\mathcal{G}(y;\mu,\Sigma) &= \frac{1}{(2\pi)^{D/2} (\det{\Sigma})^{1/2}} \exp \left( -\frac{1}{2} (y-\mu)^T \Sigma^{-1} (y-\mu) \right) \\
\end{align}

* **AR Models**

>\begin{align}
p(y_1) &= \mathcal{G}(y_1;\mu_0, \Sigma_0) \\
p(y_t|y_{t-1}) &= \mathcal{G}(y_t;\Lambda y_{t-1}, \Sigma) \Rightarrow \textbf{AR(1)}\\
p(y_t|y_{t-1},y_{t-2}) &= \mathcal{G}(y_t;\Lambda_1 y_{t-1}+\Lambda_2 y_{t-2}, \Sigma)  \Rightarrow \textbf{AR(2)}
\end{align}

* **Stationary Distribution**

>\begin{align}
y_t &= \lambda y_{t-1} + \sigma \epsilon_t \;\;\;,\;\;\; \epsilon_t \sim \mathcal{G}(0,1) \\
\langle y_t \rangle &= \lambda \langle y_{t-1} \rangle + \sigma \langle \epsilon_t \rangle \\
\langle y^2_t \rangle &= \lambda^2 \langle y^2_{t-1} \rangle + 2\lambda\sigma \langle y_{t-1} \epsilon_t \rangle + \sigma^2 \langle \epsilon^2_t \rangle \\
&= \lambda^2 \langle y^2_{t-1} \rangle + \sigma^2
\end{align}

>$$\mu_\infty=0 \;\;\;,\;\;\; \sigma_\infty^2 = \frac{\sigma^2}{1-\lambda^2}$$

## 6.4. HMM (discrete hidden state)

* **Discrete Hidden State**

>$$x_t \in \{1,...,K\} \;\;\;,\;\;\; p(x_1=k)=\pi_k^0 \;\;\;,\;\;\; p(x_t=k|x_{t-1}=l)=T_{k,l} $$

* **Discrete Observed State**

>$$p(y_t=l|x_t=k) = S_{l,k}$$

* **Continuous Observed State**

>\begin{align}
p(y_t|x_t=k) &= \mathcal{G}(y_t;\mu_k,\Sigma_k) \\
p(y_1) &= {\sum}_k \pi_k^0 \mathcal{G}(y_1;\mu_k,\Sigma_k)
\end{align}

* **Convergence of $p(t_t)$**

>\begin{align}
\pi^\infty_k &= \sum^K_{l=1} T_{k,l} \pi^\infty_l \\
p(y_t) &= {\sum}_k p(y_t|x_t=k)p(x_t=k) \rightarrow {\sum}_k \pi^\infty_k \mathcal{G}(y_t;\mu_k,\Sigma_k)
\end{align}

## 6.5. HMM (continuous hidden state)

* **Continuous Hidden State**

>$$x_t \in \mathbb{R}^K \;\;\;,\;\;\; p(x_t|x_{t-1})=\mathcal{G}(x_t;Ax_{t-1},Q)$$

* **Continuous Observed State**

>$$y_t \in \mathbb{R}^D \;\;\;,\;\;\; p(y_t|x_t) = \mathcal{G}(y_t;Cx_t,R)$$

>$$p(y_{1:T}|x_{1:T}) = \prod^T_{t=1} p(x_t|x_{t-1})p(y_t|x_t)$$

* **Distributional Estimates**

><img src = 'images\image08.png' width=400>

* **Point Estimates**

>\begin{align}
x^*_t &= \underset{x_t}{\operatorname{argmax}} p(x_t|y_{1:T}) \\
x'_{1:T} &= \underset{x_{1:T}}{\operatorname{argmax}} p(x_{1:T}|y_{1:T}) \\
x^*_{1:T} &= x'_{1:T} \text{ for Linear Gaussian State Space Models}
\end{align}

## 6.6. Kalman Filter

><img src = 'images\image09.png', width=400>

><img src = 'images\image10.png', width=400>

><img src = 'images\image11.png', width=400>