# Kalman Filter on the Stationary Dynamics

The inverse problem 

$$y = \mathcal{G}(\theta) + \eta$$

can be solved by first introducing a (mean-field) stochastic dynamical system in which the parameter-to-data map is embedded and then employing techniques from nonlinear Kalman filtering.

Consider a family of stochastic dynamical systems

$$\begin{align}
  &\textrm{evolution:}    &&\theta_{n+1} = r + \alpha (\theta_{n}  - r) +  \omega_{n+1}, &&\omega_{n+1} \sim \mathcal{N}(0,\Sigma_{\omega}),\\
  &\textrm{observation:}  &&x_{n+1} = \mathcal{F}(\theta_{n+1}) + \nu_{n+1}, &&\nu_{n+1} \sim \mathcal{N}(0,\Sigma_{\nu}).
\end{align}$$

Then different Kalman filters can be employed on these stochastic dynamical systems, which leads to different Kalman inversion algorithms.


# Optimization approach [1]

This approach aims to finding the *best* $\theta$ for ill-posed inverse problems, where an effective regularization is incorporated.


Consider the case :
* $\alpha \in [0, 1]$, $r = r_0$ is the prior mean, $\Sigma_{\omega} = \gamma \Sigma_{0}$, where  $\Sigma_{0}$ is the prior covariance.  
* $x_{n+1} = y$, $\mathcal{F} = \mathcal{G}$, and $\Sigma_{\nu} = \frac{\gamma}{\gamma - 1} \Sigma_{\eta}$

where the hyperparameter $\gamma > 1$ is set to be $2$. When $\alpha = 1$ the evolution model is an identical map; when $\alpha \in [0, 1)$, the model has a stationery point $r$, and therefore regularization toward $r$ is added:
* When the observation noise is negligible, and there are more observations than parameters (identifiable inverse problem) $\alpha = 1$ (no regularization)
* Otherwise $\alpha < 1$. The smaller $\alpha$ is, the closer Kalman inversion will converge to the prior mean.
    
    

## Linear Analysis
In the linear setting, $\mathcal{G}(\theta) = G\cdot \theta$
The update equations become
$$
\begin{align*}
    \hat{m}_{n+1} &= \alpha m_n + (1-\alpha)r_0,\\
    \hat{C}_{n+1} &=  \alpha^2 C_{n} + \Sigma_{\omega},
\end{align*}
$$
and
$$
\begin{align*}
        m_{n+1} &= \hat{m}_{n+1} + \hat{C}_{n+1} G^T (G  \hat{C}_{n+1} G^T + \Sigma_{\nu})^{-1} \Big(y - G\hat{m}_{n+1} \Big), \\
        C_{n+1}&= \hat{C}_{n+1} - \hat{C}_{n+1} G^T(G  \hat{C}_{n+1} G^T + \Sigma_{\nu})^{-1} G \hat{C}_{n+1}. \end{align*}
$$

We have the following theorem about the convergence of the 
algorithm in the setting of the linear forward model:

**Theorem**
Assume that $\Sigma_{\omega}\succ 0$ and $\Sigma_{\nu}\succ 0.$ Consider the iteration mapping
$(m_n,C_n)$ into $(m_{n+1},C_{n+1})$. Assume further 
that $\alpha \in (0,1)$ or that
$\alpha=1$ and
$\text{Range}(G^T)=\mathbb{R}^{N_{\theta}}$.
Then the steady state equation of the covariance

$$
C_{\infty}^{-1} =  G^T\Sigma_{\nu}^{-1}G + (\alpha^2 C_{\infty} + \Sigma_{\omega})^{-1}
$$

has a unique solution $C_{\infty} \succ 0.$ 
The pair $(m_n,C_n)$
converges exponentially fast to  limit $(m_{\infty},C_{\infty})$.
Furthermore the limiting mean $m_{\infty}$ is the minimizer
of the Tikhonov regularized least squares
functional $\Phi_R$ given by

$$
\Phi_R(\theta) := \frac{1}{2}\lVert\Sigma_{\nu}^{-\frac{1}{2}}(y - G\theta) \rVert^2 +
\frac{1 - \alpha}{2}\lVert \hat{C}_{\infty}^{-\frac{1}{2}}(\theta - r_0) \rVert^2,
$$

where 

$$
\hat{C}_{\infty} =\alpha^2 C_{\infty} + \Sigma_{\omega}.
$$

**Remark**
Despite the clear parallels between $\Phi_R$ and Tikhonov regularization, there is an important
difference: the matrix $\hat{C}_{\infty}$ defining the implied
prior covariance in the regularization term
depends on the forward model. 
To get some insight into the implications of this, we consider the over-determined linear system in which $G^T\Sigma_{\eta}^{-1}G$ is invertible
and we may define
$$
{C_{*}} = (G^T\Sigma_{\eta}^{-1}G)^{-1}.
$$
If we choose the artificial evolution and observation error covariances
$$
\begin{align*}
\Sigma_\nu &= \gamma \Sigma_{\eta},\\
\Sigma_{\omega} &= \bigl(\frac{\gamma}{\gamma-1} - \alpha^{2}\bigl) C_{*},
\end{align*}
$$
then straightforward calculation shows that
$$C_{\infty}=C_{*}, \quad \hat{C}_{\infty}=\frac{\gamma}{\gamma-1}C_{*}.$$
It follows that

$$
    \Phi_R(\theta)= 
     \frac{1}{2\gamma} \left\lVert \Sigma_{\eta}^{-\frac{1}{2}}(y-G\theta) \right\rVert^2 +
    \frac{(1-\alpha)(\gamma-1)}{2 \gamma} \left\lVert \Sigma_{\eta}^{-\frac{1}{2}}(Gr_0 - G \theta) \right\rVert^2.
$$

This calculation clearly demonstrates the dependence of the second (regularization) term on the forward model and that choosing
$\alpha \in (0,1]$ allows different weights on the regularization term.
In contrast to Tikhonov regularization, the regularization term scales similarly with respect to $G$
as does the data misfit, providing a regularization between 
the prior mean $r_0$ and an overfitted parameter  $\theta^* : y = G\theta^{*}$. Therefore, despite the differences from
standard Tikhonov regularization, the implied regularization
resulting from the proposed stochastic dynamical system
is both interpretable and controllable; in particular, the
single parameter $\alpha$ measures the balance between prior
and the overfitted solution.


1. [Iterated Kalman Methodology For Inverse Problems](https://arxiv.org/abs/2102.01580)

# Probabilistic approach [2]

This approach aims to finding the *best* Gaussian approximation of the posterior distribution of $\theta$ for ill-posed inverse problems, where the prior is a Gaussian $\mathcal{N}(r_0, \Sigma_0)$.


Consider the case :
* $\alpha = 1$,  $\Sigma_{\omega} = \gamma C_{n}$, where  $C_{n}$ is the covariance estimation at the current step.  
* $x_{n+1} = \begin{bmatrix} y \\ r_0 \end{bmatrix}, \quad 
\mathcal{F}(\theta) = \begin{bmatrix} \mathcal{G}(\theta) \\ \theta  \end{bmatrix},\quad 
\textrm{and}\quad \Sigma_{\nu} = \frac{\gamma + 1}{\gamma} \begin{bmatrix} \Sigma_{\eta} & 0 \\ 0 & \Sigma_0\end{bmatrix}
$ 

where $r_0$ and $\Sigma_0$ are prior mean and covariance, and the hyperparameter $\gamma > 0$ is set to be $1$. 

## Linear Analysis
In the linear setting, 

$$\mathcal{G}(\theta) = G\cdot \theta \qquad F = \begin{bmatrix} G \\ I  \end{bmatrix}$$

The update equations become
$$
\begin{align*}
    \hat{m}_{n+1} &=  m_n\\
    \hat{C}_{n+1} &=  (\gamma + 1) C_{n}
\end{align*}
$$
and
$$
\begin{align*}
        m_{n+1} &= m_{n+1} + \hat{C}_{n+1} F^T (F  \hat{C}_{n+1} F^T + \Sigma_{\nu,n+1})^{-1} (x_{n+1} - F m_{n}) \\
         C_{n+1}&= \hat{C}_{n+1} - \hat{C}_{n+1} F^T(F  \hat{C}_{n+1} F^T + \Sigma_{\nu,n+1})^{-1} F \hat{C}_{n+1}, 
\end{align*}
$$

We have the following theorem about the convergence of the 
algorithm in the setting of the linear forward model:

**Theorem**
Assume that the prior covariance matrix $\Sigma_{0} \succ 0$ and initial covariance matrix $C_{0} \succ 0.$
%
The iteration for the conditional mean $m_n$ and covariance matrix $C_{n}$ characterizing the distribution of $\theta_n|Y_n$
converges exponentially fast to posterior mean $m_{\rm post}$ and covariance $C_{\rm post}.$
            
            
            
2. [Efficient Derivative-free Bayesian Inference for Large-Scale Inverse Problems](https://arxiv.org/abs/2204.04386)
            
            
    

# Probabilistic approach [3]

This approach aims to finding the *best* Gaussian approximation of the posterior distribution of $\theta$ for well-posed inverse problems (number of observations is larger than number of unknowns), where the prior is an improper uniform distribution in the whole space.


 
 
Consider the case :
* $\alpha = 1$,  $\Sigma_{\omega} = \gamma C_{n}$, where  $C_{n}$ is the covariance estimation at the current step.  
* $x_{n+1} = y, \quad 
\mathcal{F}(\theta) =  \mathcal{G}(\theta),\quad \textrm{and}\quad 
\Sigma_{\nu} = \frac{\gamma + 1}{\gamma} \Sigma_{\eta}
$ 

where the hyperparameter $\gamma > 0$ is set to be $1$. 

## Linear Analysis
In the linear setting, 

$$\mathcal{G}(\theta) = G\cdot \theta \qquad F = G $$

The update equations become
$$
\begin{align*}
    \hat{m}_{n+1} &=  m_n\\
    \hat{C}_{n+1} &=  (\gamma + 1) C_{n}
\end{align*}
$$
and
$$
\begin{align*}
        m_{n+1} &= m_{n+1} + \hat{C}_{n+1} G^T (G  \hat{C}_{n+1} G^T + \Sigma_{\nu,n+1})^{-1} (x_{n+1} - G m_{n}) \\
         C_{n+1}&= \hat{C}_{n+1} - \hat{C}_{n+1} G^T(G  \hat{C}_{n+1} G^T + \Sigma_{\nu,n+1})^{-1} G \hat{C}_{n+1}, 
\end{align*}
$$

We have the following theorem about the convergence of the 
algorithm in the setting of the linear forward model:

**Theorem**
Assume the inverse problem is well-defined, namely $\text{Range}(G^T)=\mathbb{R}^{N_{\theta}}$, and the initial covariance matrix $C_{0} \succ 0$ is strictly positive definite. 

* The iteration for the conditional mean $m_n$ and covariance matrix $C_{n}$ characterizing the distribution of $\theta_n|Y_n$
converges exponentially fast to posterior mean and covariance with an improper uniform prior 

$$
m_{\rm post} = argmin \frac{1}{2}\lVert \Sigma_{\eta}^{-1}\Bigl(y - G\theta \Bigr) \rVert^2\quad \textrm{and} \quad  C_{\rm post} = \Bigl(G^T \Sigma_{\eta}^{-1} G\Bigr)^{-1}
$$

* The uncertainty is given as an error bound about the parameter estimation

$$
    P\Big( |{\theta_{ref}}_{(i)} - {m_{\rm post}}_{(i)}| \leq 3\sqrt{{C_{\rm post}}_{(i,i)}} \Big) \geq 99.7\%,
$$

here the subscript $i$ represents the vector or matrix index, and $\theta_{ref}$ represents the reference parameter, which satisfies $y - G\theta_{ref} \sim \mathcal{N}(0, \Sigma_{\eta})$.          
            
            
            
            
3. [Bayesian Calibration for Large‐Scale Fluid Structure Interaction Problems Under Embedded/Immersed Boundary Framework](https://arxiv.org/pdf/2105.09497.pdf)