<h1>Mathematical Theory of Data Assimilation with Applications:<br>

<p class="fragment">Tutorial part 3 of 4 --- introducing flow dependence of the errors<p></h1>


<h3>Letting the prior vary</h3>
<ul>
    <li class="fragment">A weakness of the earlier approach was that the prior did not accumulate new information over the observation-analysis-forecast cycle.</li>
    <li class="fragment">We can increase the accumulation of the information in the DA scheme by taking a recursive form for the learning.</li>
    <li class="fragment">This is again a strength of the Bayesian framework for DA, in which there is a natural recursive formulation:</li>
    <ul>
        <li class="fragment">particularly, evolving the last posterior forward in time gives a natural choice for the subsequent prior.</li>
    </ul>
    <li class="fragment">We will consider how to formulate this in scalar, linear-Gaussian equation studied earlier.</li> 
</ul>

<h3>The simple example</h3>
<ul>
    <li class="fragment">We consider once again the simple estimation of the true air temperature $T_t$ at the ground in Reading from:</li>
    <ol>
        <li class="fragment">a background state $T_b$ generated via a numerical forecast model; and</li>
        <li class="fragment">an observed state $T_o$.</li>
    </ol>
    <li class="fragment"> From the frequentist perspective, we had assumed that these are independent and
        \begin{align}
        T_b &= T_t + \epsilon_b & & \epsilon_b \sim N(0, \sigma_b^2)\\
        T_o &= T_t + \epsilon_o & & \epsilon_o \sim N(0, \sigma_o^2)
        \end{align}
    </li>
    <li class="fragment"> This is equivalent to re-writing the statement as,
             \begin{align}
        T_t &\sim N(T_b, \sigma_b^2)\\
        T_o &\sim N(T_t, \sigma_o^2)
        \end{align}
        following a Bayesian perspective.
    </li>   
</ul>

<h3>The simple example continued</h3>
<ul>
    <li class="fragment"> We found that the minimum mean-square-error (minimum variance) analysis state can be derived by a linear combination of the two sources of information,
    \begin{align}
     T_a = a_bT_b + a_o T_o
     \end{align}
    </li>
    <li class="fragment">These weights $a_b, a_o$ take the form,
        \begin{align}
        a_b &= \frac{a_o^2}{a_b^2 + a_o^2} & & a_o = \frac{a_b^2}{a_b^2 + a_o^2}
        \end{align}
    </li>
</ul>

<h3>The innovation</h3>
<ul>
    <li class="fragment"> However, we will find it convenient to change this relationship in order to mirror the Bayesian update step.</li>
    <li class="fragment">Specifically, we treat the $T_b$ as a prior estimate that we would like to "update" with its weighted difference from the observation:</li>
        <li class="fragment">
        \begin{align}
        T_a &= T_b - T_b + a_bT_b + a_o T_o \\
            &= T_b + \frac{T_b \sigma_o^2 - T_b(\sigma_b^2 + \sigma_o^2) + \sigma_b^2 T_o}{\sigma_b^2 + \sigma_o^2} \\
            &= T_b + W\left(T_o - T_b\right)
        \end{align}
        where $W = \frac{\sigma_b^2}{\sigma_b^2 + \sigma_o^2}$.
    </li>
    <li class="fragment">The value $T_o - T_b$ is called the observational "innovation".</li>
</ul>

<h3>Deriving the analysis error variance</h3>
<ul>
    <li class="fragment"> We will define the analysis error in the same way as we defined the error of the other states,
    \begin{align}
    \epsilon_a = T_a - T_t    & & T_t \sim N(T_a, \sigma_a^2)
    \end{align}
     which is Gaussian distributed because it is the sum of Gaussian random variables.
    </li>
    <li class="fragment"><b>Exercise (6 minutes):</b> use the statement
        \begin{align}
        T_a = T_b + W\left(T_o - T_b\right)
        \end{align}</li>
        <li class="fragment">and the statement
        \begin{align}
        W = \frac{\sigma_b^2}{\sigma_b^2 + \sigma_o^2}
        \end{align}
    </li>
    <li class="fragment">to derive the analysis error variance $\sigma_a^2$ as a function of the optimal weights $W$ and the background error variance $\sigma_b^2$. 
</ul>

<h3>Deriving the analysis error variance continued</h3>
<ul>
    <li class="fragment"><b>Solution:</b> consider that,
    \begin{align}
        \epsilon_a &= T_a - T_t \\
                   &= T_t + \epsilon_b - T_t + W(T_t + \epsilon_o - T_t - \epsilon_b)\\
                   &= \epsilon_b + W(\epsilon_o - \epsilon_b)\\
                   &= (1 - W)\epsilon_b + W\epsilon_o.
    \end{align}
    </li>
    <li class="fragment">Therefore, we find that,
        \begin{align}
        \sigma_a^2 &= \mathbb{E}[\epsilon_a^2] \\
        &=(1-W)^2 \sigma_b^2 + W^2 \sigma_o^2 \\
        &= \sigma_b^2 - 2 \sigma_b^2 W  + W^2 \left(\sigma_b^2 +\sigma_o^2\right) \\
        &= \sigma_b^2 - 2 \sigma_b^2 W  + \sigma_b^2 W\\
        &= (1 - W)\sigma_b^2
        \end{align}
</ul>

<h3>The analysis error variance</h3>
<ul>
    <li class="fragment">We found that the posterior error variance can be computed recursively from the prior error variance as,
    \begin{align}
    \sigma_a^2 = (1 - W) \sigma_b^2 & & W = \frac{\sigma_b^2}{\sigma_b^2 + \sigma_o^2}.
    \end{align}
    </li>
    <li class="fragment">By definition, $0 \leq W\leq 1$, so that $\sigma_a^2 \leq \sigma_b^2$ and the assimilation step has the effect of reducing the variance of the background error.</li>
    <li class="fragment">With the analysis error variance in hand, we can <em>forecast the entire posterior</em> like the optimal analysis state earlier;</li>
    <li class="fragment">this forecasted posterior becomes the obvious choice for the next prior, if we have a reasonable way to compute it.</li>
    <li class="fragment">We will expand our consideration slightly to a more realistic model for this as well.</li>
</ul>

<h3>Re-introducing the forecast and observational models</h3>
<ul>
    <li class="fragment"> Let's re-introduce a simple dynamical model and observational model,
        \begin{align}
        \mathbf{x}_{k} &= \mathbf{M} \mathbf{x}_{k-1} + \mathbf{w}_k & & \mathbf{M} \in \mathbb{R}^{n\times n} & & \mathbf{x}_k,\mathbf{w}_k \in \mathbb{R}^n\\
        \mathbf{y}_{k} &= \mathbf{H} \mathbf{x}_k + \mathbf{v}_k & & \mathbf{H} \in \mathbb{R}^{d \times n} & & \mathbf{y}_k, \mathbf{v}_k \in \mathbb{R}^{d} 
        \end{align}
    </li>
    <li class="fragment"> The new term $\mathbf{w}_k$ will represent the assumption that our model is inherently stochastic, or that there are errors in our model of the physical process;
    <li class="fragment"> this may arise due to, e.g., physics on small scales that we cannot account for in the simple model $\mathbf{M}$.</li>
    <li class="fragment"> While the actual errors will be unkown, we will assume that $\mathbf{w}_k \sim N(0,\mathbf{Q})$ where $\mathbf{Q}$ is some known matrix for the covariance.</li>
    <ul>
        <li class="fragment"> This assumes implicitly that the model $\mathbf{M}$ is unbiased;</li>
        <li class="fragment"> if this isn't the case, post-processing of the forecast for bias correction or model redesign may be necessary.</li>
</ul>

<h3>The forecast distribution</h3>
<ul>
    <li class="fragment"> It can be shown that if</li>
    <ol>
        <li class="fragment"> $\mathbf{M}\in\mathbb{R}^{n \times n}$ is a linear operator;</li>
        <li class="fragment"> $\mathbf{w}_k\sim N(0 ,\mathbf{Q})$; and</li>
        <li class="fragment"> if the probability measure $\mathbf{P}_{\mathbf{x}_b, \mathbf{B}}(\mathbf{x}_{k-1})$ is Gaussian distributed;</li>
    </ol>
     <li class="fragment"> then the forward evolution of states $\mathbf{x}_{k} = \mathbf{M}\mathbf{x}_{k-1}+\mathbf{w}_{k}$ will also have probability measure with Gaussian distribution.</li>
    <li class="fragment"> Gaussian distributions are entirely characterized by their first two moments, such that we can describe the forecast distribution by the forward-evolved mean and covariance.</li>
    <li class="fragment"> We will derive this in the simple example.</li>
</ul>

<h3>Describing the forecast mean</h3>
<ul>
        <li class="fragment">In the Bayesian perspective, we will again treat the true temperature $T_t \sim N\left(T_b,\mathbf{B}\right)$ as a <em>random variable</em> where the background prior state is the mean.</li> 
    <li class="fragment">  Recall that the minimum variance analysis state was defined by the <em>update to the background state</em> by
    \begin{align}
        T_a = T_b + W\left(T_o - T_b\right)
    \end{align}
    </li>
    <li class="fragment"> This analysis state represents the mean (or expected value) for the unkown true temperature conditioned on the observation $T_o \sim N(T_t, R)$.</li>
    <ul>
        <li class="fragment"> Before the observation $T_t \sim N(T_b, \sigma^2_b)$, but $T_t \sim N(T_a , \sigma_a^2)$ once conditioning on $T_o$ according to the Bayesian update.</li>
    </ul>
    <li class="fragment"> Noting that $0 \leq W \leq 1$, the recursion,
            \begin{align}
            \sigma_a^2 = (1 - W) \sigma_b^2,
            \end{align}
            implies that the variance of the distribution around the analysis mean is less than or equal to the variance of the distribution around the background.</li>
        <li class="fragment">Particularly, conditioning on the observation reduces the overall uncertainty of the true temperature $T_t \sim N(T_a, \sigma^2_a)$.</li>
</ul>

<h3>Describing the forecast mean continued</h3>
<ul>
     <li class="fragment"> The true temperature evolves under the forecast model
        \begin{align}
        T_t(k+1) = \mathbf{M} T_t(k) + \mathbf{w}_{k+1}
        \end{align}
        </li>
    <li class="fragment"> <b>Q:</b> what should the mean of the forward-evolved state look like in this example?</li>
    <li class="fragment"><b>A:</b> the mean for this state is derived as,
            \begin{align}
           T_b(k+1) &\triangleq \mathbb{E}\left[T_t(k+1)\right] \\
        &= \mathbb{E}\left[ \mathbf{M}T_t(k) + \mathbf{w}_{k+1}\right] \\
        &= \mathbb{E}\left[ \mathbf{M}T_t(k)\right] + \mathbb{E}\left[\mathbf{w}_{k+1}\right] \\
            &=\mathbf{M}T_a(k) 
            \end{align}</li>
    <li class="fragment">That is, the mean of the new prior is derived exactly as the deterministic evolution of the analysis mean.</li>
</ul>

<h3>Describing the forecast (co)-variance</h3>
<ul>
    <li class="fragment">  In the simple example, the forecast error can thus be derived directly as,
    \begin{align}
    \epsilon_{b}(k+1)& = T_t(k+1) - T_b(k+1) \\
        &= \mathbf{M}T_t(k) + \mathbf{w}_{k+1} - \mathbf{M}T_a (k) \\
        &= \mathbf{M}\left( T_t(k) - T_a(k)\right) + \mathbf{w}_{k+1}\\
        &= \mathbf{M}\left( \epsilon_a(k) \right) + \mathbf{w}_{k+1}
    \end{align}
    </li>
    <li class="fragment">In the simple scalar example, we thus recover
        \begin{align}
        \sigma_b^2(k+1) &= \mathbb{E}\left[\left(\mathbf{M}\left( \epsilon_a(k) \right) + \mathbf{w}_{k+1}\right)^2\right]\\
        & = \mathbf{M} \sigma_a^2(k)\mathbf{M} + \mathbf{Q}
        \end{align}</li>
</ul>

<h3>The observation-analysis-forecast cycle</h3>
<ul>
    <li class="fragment">The two-step process we have now derived describes the evolution of the posterior into the next prior at all times.</li>
    <li class="fragment">Using the analysis update step, we can define the next posterior whenever new information comes in the form of an observation.</li>
    <li class="fragment">This new posterior is then evolved forward in time by the numerical model (with errors) to define the <b>next prior</b>.</li>
    <li class="fragment">The cycle can continue ad infinitum;</li>
    <li class="fragment">this simple example explains the basis of the <em>Kalman filter</em>, which we have derived in one dimension.</li>
</ul>

<h3>The Kalman filter in multiple dimensions</h3>
<ul>
    <li class="fragment">The recursion we described in terms of the one dimensional posterior extends to multiple dimensions as follows:</li>
    <ol>
        <li class="fragment"> suppose at time $t_0$ the model prior state is distributed $\mathbf{x}_0 \sim N\left(\overline{\mathbf{x}}_0, \mathbf{P}_0\right) $;</li>
        <li class="fragment"> suppose at each time $t_k$ for $k=1,\cdots$ the dynamical and observational models are given,    
          \begin{align}
        \mathbf{x}_{k} &= \mathbf{M} \mathbf{x}_{k-1} + \mathbf{w}_k & & \mathbf{M} \in \mathbb{R}^{n\times n} & & \mathbf{x}_k,\mathbf{w}_k \in \mathbb{R}^n & & \mathbf{w}_k \sim N(0, \mathbf{Q})\\
        \mathbf{y}_{k} &= \mathbf{H} \mathbf{x}_k + \mathbf{v}_k & & \mathbf{H} \in \mathbb{R}^{d \times n} & & \mathbf{y}_k, \mathbf{v}_k \in \mathbb{R}^{d} & & \mathbf{v}_k \sim N(0, \mathbf{R}) 
        \end{align}
        </li>
        <li class="fragment">Then, the model forecast  $\mathbf{x}_k \sim N\left(\overline{\mathbf{x}}_k^f, \mathbf{P}^f_k\right)$, where
            \begin{align}
            \overline{\mathbf{x}}^f_k = \mathbf{M} \overline{\mathbf{x}}^a_{k-1} & & \mathbf{P}_k^f = \mathbf{M} \mathbf{P}_{k-1} \mathbf{M}^\mathrm{T} + \mathbf{Q}.
            \end{align}
        <li class="fragment"> Conditioned on the observation $\mathbf{y}_k$, the posterior for $\mathbf{x}_k$ is given by $N\left(\overline{\mathbf{x}}^a_k, \mathbf{P}^a_k\right)$ where,
            \begin{align}
            \overline{\mathbf{x}}_k^a = \overline{\mathbf{x}}_k^f + \mathbf{K}_k\left(\mathbf{y}_k - \mathbf{H} \overline{\mathbf{x}}_k^f\right) & & \mathbf{P}_k^a = \left(\mathbf{I} - \mathbf{K}_k\mathbf{H}\right) \mathbf{P}_k^f
            \end{align}
    </ol>
</ul>

<h3>The Kalman filter in multiple dimensions continued</h3>
<ul>
    <li class="fragment"> The optimal weights $\mathbf{W}$ derived earlier are given in terms of the <em>Kalman gain</em> matrix, defined,
    \begin{align}
        \mathbf{K}_k &\triangleq \mathbf{P}_k^f \mathbf{H}^\mathrm{T}\left( \mathbf{R} + \mathbf{H} \mathbf{P}_k^f \mathbf{H}^\mathrm{T}\right)^{-1}.
    \end{align}
    </li>
    <li class="fragment">Once again, the update solution is a combination of the background and the observation, weighted inverse-proportionately to their uncertainties.</li>
    <li class="fragment">However, in this case, the background uncertainty <em>varies in time</em> as the forecast of the last posterior covariance.</li> 
        <li class="fragment">It can be shown like earlier that $\mathbf{K}_k$ is a multi-dimensional form for combining the model forecast mean and the observation to construct the minimum mean-square-error (minimum variance) analysis state.</li>
    <li class="fragment">We will discuss this in the following.</li>
</ul>

<h3>Best Linear Unbiased Estimation (BLUE) </h3>
<ul>
    <li class="fragment"> The Kalman gain can be derived in the framework of the Gauss-Markov theorem, which provides the "BLUE".</li>
    <li class="fragment"> We suppose we have two time series of vectors,
        \begin{align}
        \mathbf{x}(t) = \begin{pmatrix}x_1(t)&\cdots & x_n(t)\end{pmatrix}^\mathrm{T}; & & \mathbf{y}(t) = \begin{pmatrix}y_1(t)&\cdots & y_n(t)\end{pmatrix}^\mathrm{T};
        \end{align}
        where each has been re-centered at zero by subtracting their respective means.</li>
    <li class="fragment">This is to say that $\mathbb{E}[\mathbf{x}] = \mathbb{E}[\mathbf{y}] = 0 $, i.e., these are vectors of <em>anomalies</em>.</li>
 </ul>   
    

<h3>Best Linear Unbiased Estimation (BLUE) continued</h3>
<ul>
<li class="fragment">We will assume that there is some <em>linear relationship</em> between $\mathbf{x}$ and $\mathbf{y}$ that is represented by,
        \begin{align}
        \mathbf{y} = \mathbf{W} \mathbf{x} + \boldsymbol{\epsilon}
        \end{align}</li>
    <li class="fragment"> As a multiple regression, we will write the estimated value for this relationship by,
        \begin{align}
        \mathbf{y}_a = \hat{\mathbf{W}} \mathbf{x}
        \end{align}</li>
    <li class="fragment">
        such that 
        \begin{align}
        \mathbf{y} - \mathbf{y}_a &= \mathbf{y} - \hat{\mathbf{W}} \mathbf{x} \\
                                  &= \hat{\boldsymbol{\epsilon}}
        \end{align}
    </li>
    <li class="fragment">The Gauss-Markov theorem loosely states that the weights $\hat{\mathbf{W}}$ found by least-squares, i.e., minimizing the expected residual sum of squares
    \begin{align}
    RSS =  \hat{\boldsymbol{\epsilon}}^\mathrm{T} \hat{\boldsymbol{\epsilon}},
    \end{align}
    is the best-linear-unbiased-estimator of the true relationship $\mathbf{W}$.</li>
    <li class="fragment">"Best" in the sense of the Gauss-Markov theorem is to say that the weights $\hat{\mathbf{W}}$ will be the minimum-variance estimate, as compared with other unbiased estimates of the true relationship $\mathbf{W}$.</li>
    </ul>

<h3>Best Linear Unbiased Estimation (BLUE) continued</h3>
<ul>
    <li class="fragment"> To find the minimizing $\hat{\mathbf{W}}$, we differentiate the expected RSS, i.e.,
        \begin{align}
        \frac{\partial}{\partial W_{ij}} \mathbb{E}\left[ \hat{\boldsymbol{\epsilon}}^\mathrm{T} \hat{\boldsymbol{\epsilon}}\right] & = \mathbb{E}\left[ \left\{\mathbf{W} \mathbf{y}\mathbf{y}^\mathrm{T}\right\}_{ij} - \left\{\mathbf{x}\mathbf{y}^\mathrm{T}\right\}_{ij} \right]
        \end{align}
<li class="fragment">Setting the equation to zero for some choice of $\hat{\mathbf{W}}$, we obtain the normal equation
    \begin{align}
     & &\hat{\mathbf{W}}\mathbb{E}\left[\mathbf{x}\mathbf{x}^\mathrm{T}\right] - \mathbb{E}\left[ \mathbf{x}\mathbf{y}^\mathrm{T}\right]&= 0\\
    \Leftrightarrow & & \hat{\mathbf{W}}=  \mathbb{E}\left[\mathbf{x}\mathbf{y}^\mathrm{T}\right] \mathbb{E}\left[\mathbf{x}\mathbf{x}^\mathrm{T}\right]^{-1}   
    \end{align}
    

<h3>BLUE in data assimlation</h3>

<ul>
    <li class="fragment">With the background and analysis errors defined as before,
        \begin{align}
        \boldsymbol{\epsilon}_b= \mathbf{x}_b - \mathbf{x}_t ; & &
        \boldsymbol{\epsilon}_a= \mathbf{x}_a - \mathbf{x}_t;
        \end{align}
    </li>
    <li class="fragment">we also define the observation error as,
        \begin{align}
        \boldsymbol{\epsilon} &= \mathbf{y} - \mathbf{H}\mathbf{x}_t,
        \end{align}
        where $\mathbf{H}$ is the observation operator.</li>
    <li class="fragment">Using the above, we re-write the observational innovation as,
        \begin{align}
        \boldsymbol{\delta} &= \mathbf{y} - \mathbf{H}\mathbf{x}_b = \mathbf{y} - \mathbf{H}\left[\mathbf{x}_t + \left(\mathbf{x}_b - \mathbf{x}_t\right)\right] \\
        & = \mathbf{y} - \mathbf{H}\mathbf{x}_t  - \mathbf{H}\left(\mathbf{x}_b - \mathbf{x}_t\right)= \boldsymbol{\epsilon}_o - \mathbf{H} \boldsymbol{\epsilon}_b
        \end{align}</li>
  <ul>      

<h3>BLUE in data assimlation continued</h3>

<ul>
    <li class="fragment">We will require that the analysis state estimate is once again unbiased;</li>
    <li class="fragment">recall that $\overline{\mathbf{x}}_a = \overline{\mathbf{x}}_b + \mathbf{W}\delta$ so that we can obtain
        \begin{align}
        &\overline{\mathbf{x}}_a = \mathbf{x}_t + \boldsymbol{\epsilon}_a \\
        \Leftrightarrow & \mathbf{x}_t - \overline{\mathbf{x}}_b = \mathbf{W}\delta -\boldsymbol{\epsilon}_a\\
        \Leftrightarrow & \boldsymbol{\epsilon}_b = \mathbf{W}\boldsymbol{\delta} - \boldsymbol{\epsilon}_a
        \end{align}
        </ul>
    
   

 
<h3>BLUE in data assimlation continued</h3>
<ul>
<li class="fragment">Assuming that the background and observation errors are uncorrelated, the choice of $\hat{\mathbf{W}}$ that minimizes $\boldsymbol{\epsilon}^\mathrm{T}_a \boldsymbol{\epsilon}_a$ is given as
            \begin{align}
            \hat{\mathbf{W}} &= \mathbb{E}\left[\left(-\boldsymbol{\epsilon}_b\right)\left(-\boldsymbol{\delta}\right)^\mathrm{T}\right]\mathbb{E}\left[\left(-\boldsymbol{\delta}\right) \left(-\boldsymbol{\delta}\right)^\mathrm{T}\right]^{-1}\\
            &= \mathbb{E}\left[ \left(-\boldsymbol{\epsilon}_b\right)\left(\boldsymbol{\epsilon}_o - \mathbf{H}\boldsymbol{\epsilon}_b\right)^\mathrm{T}\right]\mathbb{E}\left[\left(\boldsymbol{\epsilon}_o - \mathbf{H}\boldsymbol{\epsilon}_b\right) \left(\boldsymbol{\epsilon}_o - \mathbf{H}\boldsymbol{\epsilon}_b\right)^\mathrm{T}\right]^{-1} \\
            &= \mathbf{B}\mathbf{H}\left(\mathbf{H}\mathbf{B}\mathbf{H}^\mathrm{T}+\mathbf{R}\right)^{-1}
            \end{align}</li>
    <li class="fragment">With the above derivation, we find that the mean of the posterior (the BLUE estimate) is given by the update of the Kalman gain recursion.</li> 
            

<h3>The extended Kalman filter</h3>
<ul>
    <li class="fragment">The Kalman filter provides a parametric recursion for the Bayesian posterior when the dynamic and observation models are linear, and all error distributions are Gaussian.</li> 
    <li class="fragment"> In most cases, however, the numerical model will not be linear and so $\mathbf{M}$ will represent the <em>linearized</em> numerical model along some nonlinear trajctory.</li>
        <li class="fragment">The process of:</li>
    <ol>
        <li class="fragment">evolving the estimated mean state with the fully nonlinear model; </li>
        <li class="fragment">while approximating the evolution of the covariance with the linearized equations about this trajectory; and</li>
        <li class="fragment">linearizing the relationship between the model variables and the observations;</li>
    </ol>
    <li class="fragment">is known as <em>extended Kalman filtering</em>.</li>
    <li class="fragment">We will demonstrate this technique in the Ikeda model.</li>
</ul>

<h3> The extended Kalman filter continued</h3>

<ul>
  <li class="fragment"> In the following slide, we will attempt to extended Kalman filter in the Ikeda map.</li>
  <li class="fragment"> The code chunk below defines the Jacobian of the map, used to propagate the covariance in in the forecast step.</li>
  <li class="fragment"><b>Exercise (2 minutes):</b> use the sliders in the following slide to examine how the covariance changes due to the flow dependence.  Then consider the following questions:</li>
    <ol>
            <li class="fragment">How does the analysis covariance differ from the fixed background prior?</li>
            <li class="fragment">How does the analysis covariance change with respect to the forecast covariance at each step?</li>
            <li class="fragment">How does the analysis covariance change with respect to different observation error variances?</li>
    </ol>
</ul>

In [None]:
import matplotlib.pyplot as plt
from ipywidgets import interactive
from IPython.display import display
from matplotlib.patches import Ellipse


def Ikeda(X_0, u):
    """The array X_0 will define the initial condition and the parameter u controls the chaos of the map
    
    This should return X_1 as the forward state."""
    
    t_1 = 0.4 - 6 / (1 + X_0.dot(X_0) )
    
    x_1 = 1 + u * (X_0[0] * np.cos(t_1) + X_0[1] * np.cos(t_1))
    y_1 = u * (X_0[0] * np.sin(t_1) + X_0[1] * np.cos(t_1))
                 
    X_1 = np.array([x_1, y_1])
    
    return X_1

def Ikeda_jacobian(X_0, u):
    
    # define the partial derviative of t with respect to v
    def dt_dv(v,w):
        return 12 * v / ( (1 + w**2 + v**2) ** (2) )

    # unpack the values for x and y
    x, y = X_0
    
    # compute t
    t = 0.4 - 6 / (1 + x**2 + y**2)
    
    # evaluate the partial derivatives
    df1_dx = u * (np.cos(t) - x * np.sin(t) * dt_dv(x,y) + y * np.cos(t) * dt_dv(x,y))
    df1_dy = u * (-x * np.sin(t) * dt_dv(y,x) + np.sin(t) + y * np.cos(t) * dt_dv(y,x))
    df2_dx = u * (np.sin(t) + x * np.cos(t) * dt_dv(x,y) - y * np.sin(t) * dt_dv(x,y))
    df2_dy = u * (x * np.cos(t) * dt_dv(y,x) + np.cos(t) - y * np.sin(t) * dt_dv(y,x))
    
    return np.array([[df1_dx, df1_dy], [df2_dx, df2_dy]])


In [None]:
def animate_ext_kf(B_var = 0.1, R_var = 0.1, k=2):

    # define the static background and observational error covariances
    P_0 = B_var * np.eye(2)
    R = R_var * np.eye(2)

    # set a random seed for the reproducibility
    np.random.seed(1)
    
    # we define the mean for the background
    x_b = np.array([0,0])
    
    # and the initial condition of the real state as a random draw from the prior
    x_t = np.random.multivariate_normal([0,0], P_0)

    # define the Ikeda map parameter
    u = 0.75
    for i in range(k-1):
        
        # we forward propagate the true state
        x_t = Ikeda(x_t, u)
        
        # and generate a noisy observation
        y_obs = x_t + np.random.multivariate_normal([0,0], R)
        
        # forward propagate the last analysis
        x_b_f = Ikeda(x_b, u)
        
        # forward propagate the covariance
        J = Ikeda_jacobian(x_b, u)
        P_1 = J @ P_0 @ J.transpose()
        
        # analyze the observation
        K = P_1 @ np.linalg.inv(P_1 + R)
        x_b = x_b_f + K @ (y_obs - x_b_f)
        P_0 = (np.eye(2) - K) @ P_1
    
    
    fig = plt.figure(figsize=(16,8))
    ax = fig.add_axes([.1, .1, .8, .8])
    
    l1 = ax.scatter(x_b_f[0], x_b_f[1], c='k', s=40)
    w, v = np.linalg.eigh(P_1)
    ANGLE = np.pi / 2 - np.arctan(v[0][0]/ v[0][1])
    ANGLE = ANGLE * 180 / np.pi
    ax.add_patch(Ellipse(x_b_f, w[0], w[1], angle=ANGLE, ec='k', fc='none'))
    
    l2 = ax.scatter(y_obs[0], y_obs[1], c='r', s=40)
    ax.add_patch(Ellipse(y_obs, R_var, R_var, ec='r', fc='none'))
    
    
    l3 = ax.scatter(x_b[0], x_b[1], c='b', s=40)
    w, v = np.linalg.eigh(P_0)
    ANGLE = np.pi / 2 - np.arctan(v[0][0]/ v[0][1])
    ANGLE = ANGLE * 180 / np.pi
    ax.add_patch(Ellipse(x_b, w[0], w[1], angle=ANGLE, ec='b', fc='none'))
    
    
    ax.set_xlim([-1, 4])
    ax.set_ylim([-3,1])
    
    labels = ['Forecast', 'Observation', 'Analysis']
    plt.legend([l1,l2,l3],labels, loc='upper right', fontsize=26)
    plt.show()
    
w = interactive(animate_ext_kf,B_var=(0.01,1.0,0.01), R_var=(0.001,1.0,0.001), k=(2, 50, 1))
display(w)

<h3>Issues with extended Kalman filters</h3>

<ul>
    <li class="fragment">Although the extended Kalman filter is able to introduce flow dependence in the forecasted prior, it suffers from several issues:</li>
    <ol>
        <li class="fragment">the linear assumption that is enforced in the evolution of the posterior is extremely  unrealistic, especially for the Ikeda map (which is highly nonlinear);</li>
        <li class="fragment">moreover, computing the Jacobian and the tangent-linear evolution is very computationally heavy and is not feasible for operational models;</li>
        <li class="fragment">finally, using entirely a parametric form for the evolution leads to catastrophic divergence in the estimates when the parametric equations are not well satisfied, see e.g., <a href="https://journals.ametsoc.org/doi/abs/10.1175/1520-0469%281994%29051%3C1037%3AADAISN%3E2.0.CO%3B2" target="blank">one classic paper on filter divergence and the extended Kalman filter</a>.</li>
    </ol>
    <li class="fragment">Particularly in the previous example, we saw how the over-confidence of the extended Kalman filter covariance (being flat in one direction) meant that it wasn't receptive to observations which had almost no uncertainty.</li>
    </ul>
    

<h3>Issues with extended Kalman filters continued</h3>

<ul>    
    <li class="fragment"> Catastrophic divergence occurs when the extended Kalman filter no longer tracks the observations and the computation itself become singular;</li> 
    <li class="fragment"> this is particularly problematic because it wouldn't be solved with infinite computational resources.</li>
    <li class="fragment"> Indeed, a purely parametric approach to computing the (non-Gaussian) posterior via the tangent-linear evolution of the second moment is too rigid to handle severe nonlinearities.</li>
</ul>

<h3>Sampling</h3>

<ul>
    <li class="fragment"> For these reasons, we can consider following a more fully Bayesian analysis, by sampling the posterior and forecasting the samples to estimate the next prior directly.</li>
    <li class="fragment">This also allows us to sample from highly non-Gaussian distributions, possibly eliminating this unrealistic assumption.</li>
    <li class="fragment"> This philosophy is the basis of the particle filter and ensemble Kalman filter, which we will discuss next.</li>
    <li class="fragment"> Each of these represents a more direct Bayesian approach to the data assimilation problem; 
        <ul>
        <li class="fragment"> a central difference, however, will be in how each learning scheme handles the bias/ variance tradeoff in estimating the true relationship.</li>
        </ul>
</ul>