# The (Simplified) Kalman Filter Model
The following is a simplified representation of the state space and measurement models underlying the Kalman Filter:

$$
\begin{align*}
x_t &= Ax_{t - 1} + q_t,\\
y_t &= H x_t + r_t,
\end{align*}
$$

where $q_t \sim N(0, Q)$ and $r_t \sim N(0, R)$. It is simplified in that it only uses time subscripting where absolutely neccessary and does not feature a control input, two things that do not change the argument to come.

# Bayesian Setup for Filtering
The goal of "filtering" in
[Recursive Bayesian Estimation](https://en.wikipedia.org/wiki/Recursive_Bayesian_estimation#Sequential_Bayesian_filtering)
is "estimating the current value given past and current observations," and thus it is helpful to create notation that separates the past and current observations. At any time $t > 1$, let $y_t$ be the current observation and $\mathcal{Y}_{t-1}$ represent $(y_{t-1}, y_{t-2}, \ldots, y_1)$.

The following is the setup for inference about the unknown state, $x_t$ given $y_t, \mathcal{Y}_{t-1}$.

## Prior
$$x_t | \mathcal{Y}_{t-1}$$

## Likelihood
$$y_t | x_t, \mathcal{Y}_{t-1} \text{ which simplifies to } y_t | x_t$$

## Posterior:
$$x_t | y_t, \mathcal{Y}_{t - 1}$$



# Prior derivation
The conditional "prior" distribution $x_t | \mathcal{Y}_{t - 1}$ depends on both knowledge of the state at the previous time point (given all *past* observations) and of the movement of the state since that time point. The following expansion of the prior pdf illustrates this:

$$
\begin{align}
f(x_t | \mathcal{Y_{t - 1}}) &= \int f(x_t | x_{t - 1}, \mathcal{Y}_{t - 1}) f(x_{t-1} | \mathcal{Y}_{t - 1}) \, d x_{t - 1} \\
&= \int f(x_t | x_{t - 1}) f(x_{t-1} | \mathcal{Y}_{t - 1}) \, d x_{t - 1}
\end{align}
$$
                           
This expansion has a name, the "Chapman-Kolmogorov equation." Note that the distribution of $x_t | x_{t - 1}$ is specified by the state-space model and is simply $N(A x_{t - 1}, Q)$.  The distribution of $x_{t - 1} | \mathcal{Y}_{t - 1}$ *is* the posterior distribution if we step back one unit in time. This is where the recursion comes in. Since it is the posterior distribution from the last time step, you can assume that you *have it*. In this article, we will call the previous time point's posterior mean and variance $\mu_{t - 1}$ and $\Sigma_{t - 1}$, respectively. We will show later that the posterior is also gaussian, and hence $x_{t-1} | \mathcal{Y}_{t - 1} \sim N(\mu_{t - 1}, \Sigma_{t - 1})$.

There is a potentially high-dimensional integral involved. Fortunately, because the densities in that equation correspond to multivariate gaussians, we won't actually have to compute it. Consider the following result about multivariate gaussians:

$$
\begin{align*}
&\textbf{If } r \sim N(\mu_r, \Sigma_r) \text{ and } s|r \sim N(W \mu_r, \Sigma_{s|r}), \\
 & \\
& \textbf{Then } \left(\begin{matrix} 
r \\
s 
\end{matrix}\right) \sim \left[ 
N\left(\begin{matrix} 
\mu_r \\
W \mu_r 
\end{matrix}\right),
\left(\begin{matrix} 
\Sigma_r & \Sigma_r W^T \\
W \Sigma_r & W \Sigma_r W^T + \Sigma_{s | r} 
\end{matrix}\right) \right]. \; \; (*)
\end{align*}
$$

With $r$ corresponding to $x_{t-1}| \mathcal{Y}_{t-1}$ and $s | r$ to $x_t | x_{t - 1}, \mathcal{Y}_{t-1}$ , we have

$$
\left(\begin{matrix} 
x_{t-1}| \mathcal{Y}_{t-1} \\
x_{t}| \mathcal{Y}_{t-1}
\end{matrix}\right) \sim \left[ 
N\left(\begin{matrix} 
\mu_{t - 1} \\
A \mu_{t - 1} 
\end{matrix}\right),
\left(\begin{matrix} 
\Sigma_{t - 1} & \Sigma_{t - 1} A^T \\
A \Sigma_{t - 1} & A \Sigma_{t - 1} A^T + Q
\end{matrix}\right) \right].
$$

The marginal distribution $x_t | \mathcal{Y}_{t-1}$ is thus
$$
x_t | \mathcal{Y}_{t-1} \sim N(A \mu_{t - 1}, A \Sigma_{t - 1} A^T + Q)
$$
and is a coherent prior for the current state given past measurements and knowledge of the system dynamics. It does assume we have a posterior in hand from the time step before. Since the covariance matrix of this prior will be used in expressions to come, we will give it a name: $$P_t = A\Sigma_{t - 1} A^T + Q$$.


# Likelihood derivation

From the measurement model,
 
$$y_t | x_t, \mathcal{Y}_{t - 1} =  y_t | x_t \sim N(H x_t, R).$$ That $\mathcal{Y}_{t - 1}$ is conditionally independent of $y_t$ given $x_t$ is a reminder of the importance of getting the state formulation right - given its value, measurements from the past carry no additional information.


# Posterior derivation

Reusing the result $(*)$ from from the prior derivation with prior $x_t | \mathcal{Y}_{t-1}$ as the marginal gaussian and $y_t | x_t, \mathcal{Y}_{t-1}$ as the conditional gaussian, we arrive at the joint distribution:

$$
\left(\begin{matrix} 
x_t | \mathcal{Y}_{t-1} \\
y_t | \mathcal{Y}_{t-1} 
\end{matrix}\right) \sim \left[ 
N\left(\begin{matrix} 
A \mu_{t - 1} \\
H A \mu_{t - 1}
\end{matrix}\right),
\left(\begin{matrix} 
P_t & P_t H^T \\
H P_t & H P_t H^T + R 
\end{matrix}\right) \right].$$

Now that we have a joint multivariate gaussian, we can use this result for conditional distributions to get $x_t | y_t, \mathcal{Y}_{t-1}$:

$$
\begin{align*}
&\textbf{If} \quad \quad
\left(\begin{matrix} 
r \\
s 
\end{matrix}\right) \sim \left[ 
N\left(\begin{matrix} 
\mu_r \\
\mu_s 
\end{matrix}\right),
\left(\begin{matrix} 
\Sigma_r & C \\
C^T &  \Sigma_s 
\end{matrix}\right) \right], \\
& \\
&\textbf{Then} \quad r|s \sim N(\mu_r + C \Sigma_s^{-1} (s - \mu_s), \; \Sigma_r - C \Sigma_s^{-1}C^T).
\end{align*}
$$

Renaming the variance of $y_t | \mathcal{Y}_{t-1}$ as

$$
S_t = H P_t H^T + R,
$$ we arrive at the posterior

$$
x_t | y_t, \mathcal{Y}_{t-1} \sim N(A \mu_{t - 1} + P_t H^T S_t^{-1} (y_t - H A \mu_{t - 1}), \;
                                       P_t - P_t H^T S_t^{-1} H P_t).$$
                                       