Important point. Never make the order of your filter higher than the order of the dynamics in the system. That's like fitting a higher order polynomial to a scatter plot of a straight line.

**Gating** is a mechanism for rejecting outlier measurements. After all, the Kalman Filter is proven to be optimal in the least squares sense, so it makes sense to remove outliers the same way you would if you were doing RANSAC with linear least squares on a scatter plot. So for example, consider the radar example with x-y co-ordinates of a plane. If a new measurement comes in that's 500 km from the prediction and our time step is 3 seconds, it's clear that the measurement is a dud. But where do we set the line?

One basic approach would be to do a cutoff at 3 standard deviations from the prediction. We can do that cutoff with each of x and y independently, meaning we have a rectangular gate, we can use [Mahalanobis distance](https://en.wikipedia.org/wiki/Mahalanobis_distance) to get the distance of the measurement in terms of standard deviations. 

### Evaluating filter performance

Rule of thumb. A good way to evaluate filter performance is by plotting residuals vs variance (or more specifically, some number of standard deviations). We want approximately 68% of the residuals to fall withing 1 standard deviation, 97% to fall within two standard deviations, and 99.7% to fall within 3 standard deviations.

But more formally there is:

#### Normalized Estimated Error Squared

If we have access to the ground truth ($\bold{x}$) and the estimate ($\bold{\hat{x}}$) then we can compute the error:

$$
\bold{\tilde{x}} = \bold{x} - \bold{\hat{x}} 
$$

and then NEES is defined:

$$
\epsilon = \tilde{\mathbf x}^\mathsf T\mathbf P^{-1}\tilde{\mathbf x}
$$

some out of scope statistics theory says that $\epsilon$ is a random variable that is chi-squared distributed with $n$ degrees of freedom, and so the expected value is $n$. So if we take the mean of NEES values over all time steps, they should be less than the dimension of $\bold{x}$.

#### Likelihood function

In one of the steps of the Kalman Filter we compute the system uncertainty:


$$\begin{aligned}
\mathbf y &= \mathbf z - \mathbf{H \bar x}\\
\mathbf S &= \mathbf{H\bar{P}H}^\mathsf T + \mathbf R
\end{aligned}
$$

from which we can compute a likelihood function

$$
\mathcal{L} = \frac{1}{\sqrt{2\pi S}}\exp [-\frac{1}{2}\mathbf y^\mathsf T\mathbf S^{-1}\mathbf y]
$$

ie how likely is a residual $\bold{y}$ given our modelled system uncertainty? Wherever the likelihood function is close to zero (or more used in practice: the log-likelihood is very negative) we would say that the model is not a "good" one.

Fun fact: Putting the output of a Kalman filter through another Kalman filter will at best replicate the results and in any other case, degrade the results (unless you introduce new information that is). Two ways to reason about this:

1) A Kalman Filter is provably optimal in the least squares sense. So the output is already optimal and can't be made "more optimal".
2) A Kalman Filter works theoretically with white noise or markov processes. Ie the noise can't be time dependent. But the output of a Kalman filter is time dependent (ie non-markovian) because each output is based on the filter's state which is based on a history of measurements. The visualizations here are very useful into grokking why this can be a problem:

![](.images/2022-06-27-10-52-28.png)



TO BE CONTINUED. I'm finding it hard to stay focussed on this chapter without having any practical use case of my own to deal with. I will come back to it when I need to. For now I will continue on concepts.