## Summary Goal ##
This would be a very high level overview of the Kalman Filter.
I will focus here on examples, intuition and high-level understanding. We will not dive into the statistics behind the Kalman Filter. Note that in order to use it - we also don't need to (Although from the little I read, it's very interesting!)


## What is the Kalman Filter?

The Kalman Filter (KF) is an algorithm that allows us to estimate the hidden state of a system over time.  
It combines two sources of information:

1. **The model** – how we believe the system evolves (the state-space equations).  
2. **The measurements** – noisy observations of the system.  

By recursively blending these two, the KF produces estimates that are usually more accurate than relying on either the model or the measurements alone.
There is an internal uncertainty in the Kalman Filter, which is usually shown as the matrix P. That uncertainty is the internal state covariance. The whole idea of the kalman filter is that as the k get bigger (meaning, as we advance in time and exposed to more measurements), P should get smaller and smaller. If it does, it means that the Kalman Filter is doing its work, and its able to detect well the underlying signal (without the noise) of the system it tries to estimate.
<br><br>

## How does it work?
The Kalman Filter has two main methods - predict, and update.

**Predict step**
we use "predict" to predict the next state, without observing the actual measurement that corresponds to that state.
We also predict the next stat's Covariance matrix. As stated before, if the Kalman Filter's model fits the motion model, we expect it to get smaller and smaller.

$$
\hat{x}_{k|k-1} = A \hat{x}_{k-1|k-1}
$$

$$
P_{k|k-1} = A P_{k-1|k-1} A^\top + Q
$$

$$
\hat{y}_{k|k-1} = C \hat{x}_{k|k-1}
$$

- $\(\hat{x}_{k|k-1}\)$: predicted state estimate at time \(k\) (before seeing \(y_k\))  
- $\(A\)$: state transition matrix. In our case, $A$ will represent a physical model of movement (For example, constant-velocity model)  
- $\(Q\)$: process noise covariance  
- $\(u_k\)$: Process Noise. It is a gaussian random variable, distributed \mathcal{N}(0, Q) 
- $\(P_{k|k-1}\)$: predicted state covariance (uncertainty) for state k, given information on the first k-1 steps
- $\hat{y}_{k|k-1}$: The estimation of the measurement of the model 
- $C$: observation matrix  


**Update step**
When the new measurement arrives, we update the prediction:
$$K_k = P_{k|k-1} C^\top (C P_{k|k-1} C^\top + R)^{-1}$$
$$\hat{x}_{k|k} = \hat{x}_{k|k-1} + K_k (y_k - C \hat{x}_{k|k-1})$$
$$P_{k|k} = (I - K_k C) P_{k|k-1}$$

- $K_k$: Kalman gain — balance between trusting the model vs. the measurement (analogous to the alpha in a LP filter) 
- $y_k$: observation at time \(k\)  
- $C$: observation matrix  
- $R$: observation/measurement noise covariance  
- $\hat{x}_{k|k}$: updated state estimate after seeing $y_k$  
- $P_{k|k}$: updated covariance for step k


We call this method iteratively - 
**predict** -> **update** -> **predict** -> **update** ...
Call predict when we want to estimate the next state
Call update when we have the actual next measurement, and it "calibrates" the KF according to the mistake

<br><br>
## Usages of the Kalman Filter
As stated before, the main goal of the KF is to produce estimators for the underlying unobserved signal over time.
Suppose the underlying signal at time t is $x_t$, and we are using the measurements $Y_s = {y_1, y_2, \dots,y_s}$ to estimate it.
When s < t, the problem is called forecasting.
When s = t,the problem is called filtering.
When s > t, the problem is called smoothing.
<br>

### 1. Forecasting (Prediction)
Estimate a future underlying signal given current information:

$$p(x_{k+1} \mid y_{1:k}) = \mathcal{N}(\hat{x}_{k+1|k}, P_{k+1|k})$$

- Produces a forecast distribution for the next hidden state.  
- Useful when planning or simulating ahead.
- At time $k$, you only know the measurements $y_1, \dots, y_k$.  
- Using those, you predict where the system will be at $k+1$.  
- This is useful for forecasting. The estimation is more uncertain that filtering, because we have yet to see $y_{k+1}$
<br>

In our project, we would actually want to check the forecasting after t steps (t-lag prediction). Meaning, we would like to compute $\hat{x}_{k+t|k}$. In the code we would call it k-lag prediction, but that's the same idea.

### 2. Filtering
Estimate the current underlying signal given all observations up to now:

$$p(x_k \mid y_{1:k}) = \mathcal{N}(\hat{x}_{k|k}, P_{k|k})$$
- At time $k$, you’ve just received measurement $y_k$.  
- The filter combines the prior prediction $\hat{x}_{k|k-1}$ with $y_k$ to give the best estimate of the current state.  
- This is the real-time mode of the KF: at each step, the filter provides the most up-to-date estimate of the system.
- Balances the model’s prediction with the latest measurement.  
- The most common KF use. Used for real-time tracking.  
<br>

### 3. Smoothing
Estimate a past underlying signal using both past and future data:

$$k < T$$
$$p(x_k \mid y_{1:T}) = \mathcal{N}(\hat{x}_{k|T}, P_{k|T})$$


- Suppose you want to know what the state was at time $k$, but now you also have measurements up to time $T > k$.  
- By using future data, you can refine your estimate of past states.  
- This requires running a backward pass after the forward filtering. Many KF framework have that function.  
- Smoothing provides the most accurate trajectory but can only be done offline, once the entire dataset is available.
- We will mostly use Forecasting and the Filtering, but it's nice to know

## Evaluation of Kalman Filter

<br><br>
### Innovations

The **innovation** (also called the residual) is the difference between the actual measurement and the predicted measurement:

$$r_k = y_k - \hat{y}_{k|k-1} = y_k - C \hat{x}_{k|k-1}$$
Or in the t-lag prediction:
$$r_{k+t} = y_{k+t} - \hat{y}_{k+t|k} = y_{k+t} - C \hat{x}_{k+t|k}$$


- $r_k$: innovation (residual) at time $k$  
- $y_k$: the real measurement at time $k$  
- $\hat{y}_{k|k-1} = C \hat{x}_{k|k-1}$: predicted measurement from the model  

Intuitively, the innovation measures how “surprising” the new observation is compared to what the model expected.  
It is the direct quantity used in the update step to correct the state estimate.

The distribution of the innovation is also Gaussian:

$$r_k \sim \mathcal{N}(0, S_k), \quad S_k = C P_{k|k-1} C^\top + R$$

- $S_k$: innovation covariance, representing how uncertain the model was about this measurement  

If the Kalman Filter is well-specified, the innovations should look like zero-mean white noise.  
Large biases or unusually large residuals usually indicate that the motion or noise models ($A$, $Q$, $C$, $R$) are not well matched to the true system.


Once the filter is running, we want to know if it is performing well and if the assumed models are consistent with the data.  
Two common ways to check this are error metrics and statistical tests.

<br>
 
### 1. Mean Squared Error (MSE) and Normalized Mean Squared Error (NMSE)
If the true hidden state $x_k$ is known (e.g. in simulation), we can compute the average squared error:
**a. with true values of the underlying signal**
$$
\text{MSE} = \frac{1}{T} \sum_{k=1}^T \| x_k - \hat{x}_{k|k} \|^2
$$

- Measures how close the estimated states are to the true states.  
- Useful for benchmarking in experiments, but not available in real-world data where $x_k$ is hidden.

**b. with values of the measurements**
$$
\text{MSE} = \frac{1}{T} \sum_{k=1}^T \| y_k - \hat{x}_{k|k} \|^2
$$

- This can fit real-world applications. However, one must remember that the measurements are noisy, so it is better to use this method as a supplement with NIS which is discussed next.

**c. as a normalized version**
Sometimes the error is normalized by the scale of the measurements to make it comparable across systems.
We note that the noise in the measurements still affects the NMSE, but the normalization makes sure we are not misled by the noise or the signal scale.

$$
\text{NMSE} = \frac{\text{MSE}}{\frac{1}{T}\sum_{k=1}^T \| y_k - \bar{y} \|^2}
$$

- We normalize by the second moment (variance) of the measurements.
- $\bar{y}$: mean of the measurements  
- This gives a relative error measure that accounts for signal energy or variance.


### 2. Normalized Innovation Squared (NIS)
When the true state is not available, we can check if the innovations behave like white Gaussian noise with the right variance.  
For each step:

$$
\text{NIS}_k = r_k^\top S_k^{-1} r_k
$$

- $r_k$: innovation at step $k$  
- $S_k$: innovation covariance  

If the filter is consistent, $\text{NIS}_k$ should follow a chi-squared distribution with degrees of freedom equal to the measurement dimension.  
Large or biased values indicate a mismatch between the assumed model and the actual system.


### Summary
- **MSE**: Best used when true state $x_k$ is known (simulation).
- **NMSE**: A normalized version for comparing across datasets.    
- **NIS**: Best used when true state is unknown (real sensor data).  

These tools help verify that the Kalman Filter is not just producing estimates, but that those estimates are statistically consistent with the assumed system and noise models.
