# Homework #9 (Due 11/20/2019, 11:59pm)
## Sequential Data and Kalman Filtering

**AM 207: Advanced Scientific Computing**<br>
**Instructor: Weiwei Pan**<br>
**Fall 2019**

**Name:**

**Students collaborators:**

### Instructions:

**Submission Format:** Use this notebook as a template to complete your homework. Please intersperse text blocks (using Markdown cells) amongst `python` code and results -- format your submission for maximum readability. Your assignments will be graded for correctness as well as clarity of exposition and presentation -- a “right” answer by itself without an explanation or is presented with a difficult to follow format will receive no credit.

**Code Check:** Before submitting, you must do a "Restart and Run All" under "Kernel" in the Jupyter or colab menu. Portions of your submission that contains syntactic or run-time errors will not be graded.

**Libraries and packages:** Unless a problems specifically asks you to implement from scratch, you are welcomed to use any `python` library package in the standard Anaconda distribution.

In [1]:
from autograd import numpy as np
from autograd import grad
from autograd.misc.optimizers import adam, sgd
from autograd import scipy as sp
import pandas as pd
import numpy
import matplotlib.pyplot as plt
from nn_models import Feedforward
from bayesian_regression import Bayesian_Regression
import sys
%matplotlib inline

## Problem Description: Kalman Filters for Sequential Data
In Lecture #19, we've argued that by assuming linear dynamics for both the state and observation models and by assuming Gaussian noise, we can inductively infer the distribution $p(X_n|Y_{1:n})$ -- i.e. the distribution over the true state at time $n$, $X_n$, given observations $Y_{1}, \ldots, Y_{n}$. In this homework, you will explore the properties of Kalman filters (what are the design choices, how do they impact your estimate of $p(X_n|Y_{1:n})$?) when applied to data generated by linear Gaussian models (LGMs) and also when they are applied to data generated by non-LGMs.

Recall that the general form of a 1-D linear Gaussian model is given by

\begin{align}
&X_0 \sim \mathcal{N}(0, \Sigma) \quad \mathbf{(Initial\; Distribution)}\\
&X_{n+1} = aX_n + b + c\xi,\;\; \xi\sim \mathcal{N}(0, 1) \quad \mathbf{(State\;Model)}\\
&Y_{n+1} = dX_{n+1} + e + f\epsilon,\;\; \epsilon\sim \mathcal{N}(0, 1) \quad \mathbf{(Observation\;Model)}
\end{align}

where $a, b, c, d, e, f$ are scalar constants (they do not change over time).

### Part I: Properties of Linear Guassian Models

1. By empirical or theoretical analysis, qualitatively describe the types of trajectory that can be generated by the state model $X_{n+1} = aX_n + b + c\xi,\;\; \xi\sim \mathcal{N}(0, 1)$. Concretely describe the impact of each constant $a$, $b$ and $c$ on trajectories generated by this state model.<br><br>

2. Given a fixed state model, by empirical or theoretical analysis, qualitatively describe the distribution of $Y_n - X_n$ (the difference between observation and true state value). Concretely describe the impace of each constant $d$, $e$ and $f$ on $Y_n - X_n$. <br><br>

### Part II: Deriving the Kalman Filtering Algorithm
In filtering, we assume that the dynamics of the LGM is known (you've estimated $a, b, c, d, e, f$ from the data in a previous stage or these were provided to you).

**Inductive Hypothesis**: In the following, suppose we have $p(X_{n-1} | Y_{1:n-1}) = \mathcal{N}(\hat{x}_{n-1}, \hat{\sigma}^2_{n-1})$.

1. (**Deriving the Prediction Step**) In the prediction step of the Kalman Filtering algorithm, we compute 
$$p(X_n | Y_{1:n-1}) = \int p(X_n|X_{n-1})p(X_{n-1}|Y_{1:n-1})dX_{n-1} = \mathcal{N}(a\hat{x}_{n-1} + b, r_{n}),$$
where $r_n = a^2 \hat{\sigma}^2_{n-1} + c^2$. Derive this formula for $p(X_n | Y_{1:n-1})$ in the Prediction Step.
<br><br>

2. (**Deriving the Update Step**) In the update step of the Kalman Filtering algorithm, we compute 

$$p(X_n| Y_{1:n}) = \frac{p(Y_t|X_t)p(X_t|Y_{1:t-1})}{p(Y_n | Y_{1:n-1})} = \mathcal{N}(\hat{x}_n, \hat{\sigma}_n),$$ 
where 
\begin{align}
\hat{x}_n &= \hat{x}_{n-1} + K_n(y_n - d(a\hat{x}_{n-1} + b) + e ),\\
\hat{\sigma}_n &= r_n  (1 - K_n  d),\\
K_n &= \frac{r_n  d}{f^2 + d^2  r_n}.
\end{align}
Derive this formula for $p(X_n| Y_{1:n})$ in the Update Step.
<br><br>

### Part III: Properties of the Kalman Filter

1. (**The Kalman Gain**) The constant $K_n$ in the update step is called the ***Kalman gain***. Show that the Kalman gain is a number between 0 and 1 when $d> 1$. Describe the relationship between the variance of $p(X_n| Y_{1:n})$ and the value of the Kalman gain. Then describe when factors affects the value of the Kalman gain (when is the gain large? When is it small?).
<br><br>

2. (**Kalman Filter Asymptotics**) From Part II, we see that $p(X_n| Y_{1:n})$ is a Gaussian $\mathcal{N}(\hat{x}_n, \hat{\sigma}_n)$ for each $n$. By empirical or theoretical analysis, describe what happens to the mean and variance of this Gaussian as $n$ approaches infinity. Do the asymptoptic behaviours of the mean and variance depend on your choice of initialization for the Kalman filtering algorithm - if so, how? Do the asymptoptic behaviours depend on your choices of $a, b, c, d, e, f$ - if so how? 
<br><br>
  
3. (**Applying Kalman Filters Under Model Misspecification**) Apply a Kalman filter with your choices of $a, b, c, d, e, f$ to data generated from the stochastic volatility model
$$
\begin{align}
X_0 &\sim \mathcal{N}\left(0, \frac{\sigma^2}{1 - \alpha^2} \right)\\
X_{n+1} | X_n &\sim \mathcal{N}(\alpha \cdot X_{n}, \sigma^2)\\
Y_{n+1} | X_{n+1} &\sim \mathcal{N}(0, \beta^2 \cdot \mathrm{exp}(X_{n+1}))
\end{align}
$$
for $\alpha = 0.91, \sigma = 1., \beta=0.5$.

  Describe the quality of your Kalman filter fit to the observed data and the true state values. Empirically speaking, can you produce a better fit to the data by choosing $a, b, c, d, e, f$ more strategically?
<br><br>