<h1>Mathematical Theory of Data Assimilation with Applications:<br>

<p class="fragment">Tutorial part 2 of 4 --- 3D-VAR as a naive Bayesian filter<p></h1>


<h3> Jupyter notebooks</h3>

<ul>
    <li class="fragment"> In the following, we will utilize Jupyter/ Ipython notebooks to explore a computational example.</li>
    <li class="fragment"> These tutorials are made in "Jupyter notebooks" which conveniently combine Python (code) with text (markdown). </li>
    <li class="fragment"> Notebooks live in the web browser to allow for a modifiable graphical interface for interactive code development, data analysis and visualization.</li> 
     <li class="fragment"> A notebook consists of "cells", which you can work with using your mouse, or more efficiently, your keyboard:
         
| Navigate                      | Edit              | Exit           | Run                              |
| -------------                 | : ------------- : | -------------  | : ------------- :                |
| <kbd>↓</kbd> and <kbd>↑</kbd> | <kbd>Enter</kbd>  | <kbd>Esc</kbd> | <kbd>Ctrl</kbd>+<kbd>Enter</kbd> |</li>
</ul>

<h3> Jupyter notebooks continued</h3>

<ul>
    <li class="fragment"> When you open a notebook it starts a session of Python in the background. 
    <li class="fragment"> All of the Python code cells (in a given notebook) are connected -- they use the same Python session and thus share variables, functions, and classes.</li> 
    <li class="fragment">  For this reason, the order in which you run the cells matters. </li>
    <li class="fragment"> We will begin our coding exercises by importing basic scientific libraries.</li>
</ul>

<h3>Pythonic programming</h3>
<ul>
    <li class="fragment"> Python uses several standard scientific libraries for numerical computing, data processing and visualization.</li>
    <li class="fragment"> At the core, there is a Python kernel and interpreter that can take human readable inputs and turn these into machine code.</li>
    <li class="fragment"> This is the basic Python functionality, but there are extensive specialized libaries for purpose oriented computing.</li>
    <li class="fragment">The most important of these for scientific computing are the following:</li>
    <ol>
        <li class="fragment">Numpy -- large array manipulation;</li>
        <li class="fragment">Scipy -- library of numerical routines and scientific computing ecosystem;</li>
        <li class="fragment">Pandas -- R dataframe inspired, data structures and analysis;</li>
        <li class="fragment">Scikit-learn -- machine learning libraries;</li>
        <li class="fragment">Matplotlib -- Matlab inspired, object oriented plotting library.</li>
    </ol>
</ul>

<h3>Pythonic programming continued</h3>
<ul>
    <li class="fragment">To accomodate the flexibility of the Python programming environment, conventions around methods namespaces and scope have been adopted.</li>
    <li class="fragment">The convention is to utilize import statements to call methods of the library.</li>
    <li class="fragment">For example, we will import the library "numpy" as a new object to call methods from</li>
</ul>
    

In [None]:
import numpy as np

<ul>
    <li class="fragment">The tools we use from numpy will now be called from numpy as an object, with the form of the call looking like "np.mehtod()".</li>
</ul>

<h3> Numpy arrays</h3>
<ul>
    <li class="fragment">Numpy has a method known as "array";</li>
</ul>
    

In [None]:
foo = np.array([[1,2,3], [3,4,5], [5,6,7]])
foo

<h3>Numpy arrays continued</h3>
<ul>
     <li class="fragment">Arrays function as mathematical multi-linear matricies in arbitrary dimensions.</li>
     <li class="fragment">Because arrays are understood as mathematical objects, they have inherent methods for mathematical computation, such as:
        <ul>
            <li class="fragment">the transpose</li>
         </ul>
         
         

In [None]:
foo.transpose()

</ul>         
         <ul>  
           <li class="fragment">the dot or matrix product:</li>
        </ul>    
</ul>
    

In [None]:
foo.dot(foo)

<h3>Mathematical functions</h3>
   <ul>
    <li class="fragment">Mathematical functions also appear as methods in numpy, such as:</li>
        <ul>
            <li class="fragment">cosine</li>
            

In [None]:
np.cos(np.pi)

<ul><ul>         
         <li class="fragment">sine</li>
    

In [None]:
np.sin(np.pi/2)

<ul><ul>        
        <li class="fragment"> natural logarithm</li>
    

In [None]:
np.log(1)

<h3>Mathematical functions continued</h3>
   <ul>
   <ul>
       <li class="fragment">exponential</li>
            

In [None]:
np.exp(1)

<ul><ul>
       <li class="fragment"> square root </li>
        

In [None]:
np.sqrt(4)

<ul>
    <li>elementwise scalar multiplication is given by "*", elementwise exponentiation is given by "**", and matrix multiplication is given by "@".

<h3>(Re)-introducing the forecast and observational models</h3>
<ul>
    <li class="fragment"> Last time we formally introduced a simple dynamical model and observational model,
        \begin{align}
        \mathbf{x}_{k} &= \mathbf{M} \mathbf{x}_{k-1} & & \mathbf{M} \in \mathbb{R}^{n\times n} & & \mathbf{x}_k \in \mathbb{R}^n\\
        \mathbf{y}_{k} &= \mathbf{H} \mathbf{x}_k + \mathbf{v}_k & & \mathbf{H} \in \mathbb{R}^{d \times n} & & \mathbf{y}_k, \mathbf{v}_k \in \mathbb{R}^{d} 
        \end{align}
    </li>
    <li class="fragment">Here, the <em>vector</em> $\mathbf{x}_k$ corresponds to all physical states we study with our model at time $t_k$ --- we suppose that the initial state $\mathbf{x}_0 \sim N\left(\overline{x}_0, \mathbf{B}\right)$.</li>
    <li class="fragment"> The <em>matrix</em> $\mathbf{M}$ defines the time evolution of these states from time $t_{k-1}$ to time $t_{k}$ for all values $k$, corresponding to some numerical model.</li>
    <li class="fragment">The <em>vector</em> $\mathbf{y}_k$ represents the values of the physical state we observe.</li>
    <li class="fragment">The <em>vector</em> $\mathbf{v}_k \sim N(0, \mathbf{R})$ is noise in the observation.</li>
    <li class="fragment">Note that we may include stochasticity in the evolution of the state $\mathbf{x}_k$, but we neglect this at the moment.</li>
</ul>
    

<h3>A naive update with fixed prior</h3>
<ul>
    <li class="fragment">Let's suppose (naively) that we will always use the same form for the background prior for the true state;</li>
    <ul>
    <li class="fragment"> that is, let us assume a prior for the true state given as
        \begin{align}
        P_{\mathbf{B},\mathbf{x}_b}(\mathbf{x}) = \frac{1}{(2\pi)^{n/2}\vert \mathbf{B}\vert}e^{-\frac{1}{2}\left(\mathbf{x} - \mathbf{x}_b \right)^\mathrm{T} \mathbf{B}^{-1} \left(\mathbf{x} - \mathbf{x}_b \right)}.
        \end{align}</li>
    <li class="fragment">The vector $\mathbf{x}_b$ will be the forward time evolution of last "optimal" analysis state $\mathbf{M}\mathbf{x}_a$, defining the prior for the true state with the fixed background covariance $\mathbf{B}$.</li>
    <li class="fragment">We then define the of the observation $\mathbf{y}_k$ depending on the state $\mathbf{x}$ as before with 
        \begin{align}
        L_\mathbf{R}( \mathbf{y}_k \vert \mathbf{x}) = \frac{1}{(2\pi)^{d/2}\vert \mathbf{R}\vert}e^{-\frac{1}{2}\left( \mathbf{y}_k - \mathbf{H}\mathbf{x} \right)^\mathrm{T} \mathbf{R}^{-1} \left( \mathbf{y}_k - \mathbf{H}\mathbf{x} \right)},
        \end{align}
        where the likelihood is measured in the observational space $\mathbb{R}^d$ by transfering the state $\mathbf{x}$ with $\mathbf{H}$.</li>
    </ul>
</ul>

<h3>3D-VAR as a naive Bayesian filter</h3>
<ul>
    <li class="fragment"> With the definitions on the last slide, we can write a naive Bayesian update (with fixed form for the prior) as
    \begin{align}
    P_{\mathbf{x}_b, \mathbf{B}}(\mathbf{x}\vert \mathbf{y}_k )\triangleq \frac{L_\mathbf{R}(\mathbf{y}_k\vert \mathbf{x})P_{\mathbf{x}_b,\mathbf{B}}(\mathbf{x}) }{P_{\mathbf{R}}(\mathbf{y}_k)} .   
    \end{align}
    </li>
    <li class="fragment"> Similar to the one dimensional version, we can recover the (naive) maximum a posteriori state $\mathbf{x}_a$ by minimizing the cost function
    \begin{align}
    J(\mathbf{x}) &= \frac{1}{2}\left[\left(\mathbf{x} - \mathbf{x}_b\right)^\mathrm{T} \mathbf{B}^{-1}\left(\mathbf{x} - \mathbf{x}_b\right) + \left(\mathbf{H}\mathbf{x} - \mathbf{y}_k\right)^\mathrm{T} \mathbf{R}^{-1} \left(\mathbf{H}\mathbf{x} - \mathbf{y}_k\right)\right]  
        \end{align}</li>
    <li class="fragment"> The solution to the DA problem by minimizing the above cost function is the classical method known as "3D-VAR";
    <li class="fragment">this stands for the variational solution to the three-physical-state-dimenion, maximum likelihood/ max a posteriori formulation.</li>
    <li class="fragment">Once again, each piece of information is weighted inverse-proportionately to the relative uncertainty.</li>
</ul>

<h3>The Ikeda map</h3>
<ul>
    <li class="fragment"> In real, two-dimensional coordinates the Ikeda map is given
    \begin{align}
    x_{{k+1}}&=1 + u(x_{k}\cos t_{k}+y_{k}\sin t_{k})\\
    y_{{k+1}}&=u(x_{k}\sin t_{k}+y_{k}\cos t_{k})
    \end{align}
    </li>
    <li class="fragment">Here, $u$ is a parameter and we define
    \begin{align}
    t_k = 0.4 - \frac{6}{1 + x_k^2 + y_k^2}.
    \end{align}
    </li>
    <li class="fragment">When the parameter $u>0.6$ the map above exhibits a (computationally) simple dynamical system with a chaotic attractor.</li>
</ul>

<h3>Coding the Ikeda map</h3>
<ul>
    <li class="fragment"><b>Exercise (5 minutes):</b> complete the code chunk below to define the Ikeda map
         \begin{align}
    x_{{k+1}}=1 + u(x_{k}\cos t_{k}+y_{k}\sin t_{k}) & &
    y_{{k+1}}=u(x_{k}\sin t_{k}+y_{k}\cos t_{k})\\
    t_k = 0.4 - \frac{6}{1 + x_k^2 + y_k^2}
    \end{align}
        of a point defined as a 2-D array.</li>
</ul>

In [None]:
def Ikeda(X_0, u):
    """The array X_0 will define the initial condition and the parameter u controls the chaos of the map
    
    This should return X_1 as the forward state."""

    t_1 =  # define the t1 here
    
    x_1 = # define the forward x state here
    y_1 = # define the forward y state here
                 
    X_1 = np.array([x_1, y_1])
    
    return X_1

<h3> One example solution</h3>

In [None]:
def Ikeda(X_0, u):
    """The array X_0 will define the initial condition and the parameter u controls the chaos of the map
    
    This should return X_1 as the forward state."""
    
    t_1 = 0.4 - 6 / (1 + X_0.dot(X_0) )
    
    x_1 = 1 + u * (X_0[0] * np.cos(t_1) + X_0[1] * np.cos(t_1))
    y_1 = u * (X_0[0] * np.sin(t_1) + X_0[1] * np.cos(t_1))
                 
    X_1 = np.array([x_1, y_1])
    
    return X_1

<h3>The computational example</h3>
<ul>
    <li class="fragment">In this example, to make plots, we will import the basic plotting library for python "pyplot" as a new object.</li>
    <li class="fragment">This object will, by convention, be called "plt".</li>
  

In [None]:
import matplotlib.pyplot as plt

<ul>    
    <li class="fragment">One slider in the example change the parameter value $u$ alter the dynamics.</li>
    <li class="fragment">The other slider changes the number of iterations of the initial condition that are plotted in the figure.</li>
    <li class="fragment"><b>Q:</b> what do you notice about the differences in the asymptotic trajectory as the value of $u$ is changed?</li>
</ul>

In [None]:
from ipywidgets import interactive
from IPython.display import display

def animate_ikeda(u=0.9, k=2):
    
    X_traj = np.zeros([k, 2])
    X_traj[0,:] = [0,0]
    for i in range(k-1):
        tmp = Ikeda(X_traj[i, :], u)
        X_traj[i+1, :] = tmp

    fig = plt.figure(figsize=(16,8))
    ax = fig.add_axes([.1, .1, .8, .8])
    ax.scatter(X_traj[:,0], X_traj[:, 1])
    
    plt.show()
    
w = interactive(animate_ikeda,u=(0,.95,0.01), k=(2, 2002, 50))
display(w)

<h3>3D-VAR in the Ikeda model</h3>
<ul>
    <li class="fragment">  We will now consider the problem of finding the (naive) maximum a posteriori state of the Ikeda model from</li>
    <ol>
        <li class="fragment"> a background state, generated by a model forecast; and</li>
        <li class="fragment"> an observation of the "true" state with noise.</li>
    </ol>
    <li class="fragment"> The 3D-VAR cost function once again takes the form of the weighted-least-squares difference between the two sources of information:
    \begin{align}
    J(\mathbf{x}) &= \frac{1}{2}\left[\left(\mathbf{x} - \mathbf{x}_b\right)^\mathrm{T} \mathbf{B}^{-1}\left(\mathbf{x} - \mathbf{x}_b\right) + \left(\mathbf{H}\mathbf{x} - \mathbf{y}_k\right)^\mathrm{T} \mathbf{R}^{-1} \left(\mathbf{H}\mathbf{x} - \mathbf{y}_k\right)\right]  
        \end{align}
    </li>
</ul>
 

<h3>Twin experiments</h3>
<ul>
    <li class="fragment"> This type of experiment is known as a "twin-experiment", in which we will generate both the "model-twin" and the "truth-twin", to evaluate the strengths and the limitations of the DA method.</li>
    <li class="fragment"> The "truth-twin" is the sequence of model states that generate the "observered" pseudo-data;</li>
        <ul>
            <li class="fragment"> this pseudo-data is given to the DA method (possibly sequentially or all at once) to estimate the true sequence of states.</li>
    </ul>
    <li class="fragment"> The "model-twin" is the sequence of model states that are generated by the DA cycle;</li>
    <ul>
        <li class="fragment">the model twin is produced by using the numerical model to make a forecast and by analyzing the observations to produce analyses.</li>
    </ul>
</ul>

<h3>Coding 3D-VAR</h3>
<ul>
    <li class="fragment"> <b>Exercise (3 minutes):</b> complete the code chunk below to define the 3D-VAR cost function:
        \begin{align}
    J(\mathbf{x}) &= \frac{1}{2}\left[\left(\mathbf{x} - \mathbf{x}_b\right)^\mathrm{T} \mathbf{B}^{-1}\left(\mathbf{x} - \mathbf{x}_b\right) + \left(\mathbf{H}\mathbf{x} - \mathbf{y}_k\right)^\mathrm{T} \mathbf{R}^{-1} \left(\mathbf{H}\mathbf{x} - \mathbf{y}_k\right)\right]  
        \end{align}
    </li>
    <li class="fragment">Note that the inverse of a matrix can be called as a method as follows</li>        
</ul>

In [None]:
A = np.array([[1, 2], [3, 4]])
A_inverse = np.linalg.inv(A)
A_inverse

In [None]:
def D3_var(X, args):
    """This function defines is the 3D-VAR cost function
    
    For simplicity, we will assume that the observation operator H is the identity operator"""
    
    # we unpack the extra arguments
    [x_b, B, y_obs, R] = args
    
    b_diff = # define the weighted difference of the state from the background

    W_innovation = # define the weighted difference of the state from the observation
    
    return b_diff + W_innovation

<h3>Coding 3D-VAR</h3>
<ul>
    <li class="fragment"> <b>Example solution:</b> </li>
</ul>

In [None]:
def D3_var(X, args):
    """This function defines is the 3D-VAR cost function
    
    For simplicity, we will assume that the observation operator H is the identity operator"""
    
    # we unpack the extra arguments
    [x_b, B, y_obs, R] = args
    
    # define the weighted difference of the state from the background
    b_diff = (X - x_b).transpose() @ np.linalg.inv(B) @ (X - x_b)

    # define the weighted difference of the state from the observation
    W_innovation = (y_obs - X).transpose() @ np.linalg.inv(R) @ (y_obs - X)
    
    return b_diff + W_innovation

<h3>Coding 3D-VAR continued</h3>

<ul>
    <li class="fragment">In order to implement the 3D-VAR method, we need to perform a numerical optimization/ root finding.</li>
    <li class="fragment">In this case, we need to call a method in order to find the zero of the cost function $J(x)$.</li>
    <li class="fragment">Scipy has a built in module called "optimize", from which we will import a root finding scheme.</li>
</ul>

In [None]:
from scipy.optimize import minimize

<ul>
    <li class="fragment">Additionally, for graphical tools in visualizing the covariances, we will import "Ellipse" from the module "patches" in matplotlib.</li>
</ul>

In [None]:
from matplotlib.patches import Ellipse

<h3>Analyzing 3D-VAR</h3>
<ul>
    <li class="fragment"><b>Exercise (2 minutes):</b>In the following cell, use the sliders to analyze the performance of the 3D-VAR estimator.  Specifically consider:</li>
    <ol>
        <li class="fragment">What is the effect on the analysis solution by changing the variance of the background covariance $\mathbf{B}\triangleq B_{var} * \mathbf{I}_2$?</li>
                <li class="fragment">What is the effect on the analysis solution by changing the variance of the observation error covariance $\mathbf{B}\triangleq R_{var} * \mathbf{I}_2$?</li>
                <li class="fragment">What is the effect on the analysis solution by changing the number of analyses $N$?</li>
    </ol>
</ul>

In [None]:
def animate_D3(B_var = 0.1, R_var = 0.1, k=2):

    # define the static background and observational error covariances
    B = B_var * np.eye(2)
    R = R_var * np.eye(2)

    # set a random seed for the reproducibility
    np.random.seed(1)
    
    # we define the mean for the background
    x_b = np.array([0,0])
    
    # and the initial condition of the real state as a random draw from the prior
    x_t = np.random.multivariate_normal([0,0], np.eye(2) * B_var)

    # define the Ikeda map parameter
    u = 0.75
    for i in range(k-1):
        
        # we forward propagate the true state
        x_t = Ikeda(x_t, u)
        
        # and generate a noisy observation
        y_obs = x_t + np.random.multivariate_normal([0,0], R)
        
        # forward propagate the last analysis
        x_b_f = Ikeda(x_b, u)
        
        # define the arguments necessary for the 3D-VAR
        ARGS = [x_b_f, B, y_obs, R]

        analys = minimize(D3_var, x_b_f, args=ARGS)
        x_b = analys.x
    
    fig = plt.figure(figsize=(16,8))
    ax = fig.add_axes([.1, .1, .8, .8])
    
    l1 = ax.scatter(x_b_f[0], x_b_f[1], c='k', s=40)
    ax.add_patch(Ellipse(x_b_f, B_var, B_var, ec='k', fc='none'))
    
    
    l2 = ax.scatter(y_obs[0], y_obs[1], c='r', s=40)
    ax.add_patch(Ellipse(y_obs, R_var, R_var, ec='r', fc='none'))
    
    l3 = ax.scatter(x_b[0], x_b[1], c='b', s=40)
    ax.set_xlim([-1, 4])
    ax.set_ylim([-3,1])
    
    labels = ['Forecast', 'Observation', 'Analysis']
    plt.legend([l1,l2,l3],labels, loc='upper right', fontsize=26)
    plt.show()
    
w = interactive(animate_D3,B_var=(0.01,1.0,0.01), R_var=(0.01,1.0,0.01), k=(2, 50, 1))
display(w)

<h3>Analyzing 3D-VAR continued</h3>
<ul>
    <li class="fragment">We can see that the analysis is closer to the background or the observation depending on the weights we give them, based on the relative uncertainty described in the naive Bayesian update.</li>
    <li class="fragment">However, there is no real performance gain from one analysis to the next, as the form of the naive cost function doesn't accumulate new information;</li>
    <ul>
        <li class="fragment">indeed, by re-using the same background prior at every step, we forget all information we gained in the posterior except for the last analysis state.</li>
    </ul>
    <li class="fragment"> This is the greatest weakness of the 3D-VAR approach, that it doesn't take into account the "flow-dependence" of the last posterior when generating a new prior.</li>
</ul>

<h3>An extension --- 4D-VAR</h3>
<ul>
    <li class="fragment">One successful means of adding information to the cost function is to require that the maximum a posteriori solution doesn't deviate from a <em>sequence of observations</em>, under the constraint of the model evolution.</li>
    <li class="fragment">This approach is the basis of 4D-VAR, where the 4th dimension stands for time.</li>
    <li class="fragment">Let us suppose that we have a sequence of observations at times $t_1$ to $t_N$.</li>
    <li class="fragment">We will define this sequence as $\{\mathbf{y}_k\}_{k=1}^N$, and we will assume we have an initial background state at time $t_0$ defined as $\mathbf{x}_b$.</li>
    <li class="fragment">The 4D-VAR cost function is defined,
        \begin{align}
       J\left(\mathbf{x}_0\right)& = \frac{1}{2}\left(\mathbf{x}_0 - \mathbf{x}_b\right)^\mathrm{T} \mathbf{B}^{-1}\left(\mathbf{x}_0 - \mathbf{x}_b\right) +  \\
        &\frac{1}{2}\sum_{k=1}^N  \left(\mathbf{H}\mathbf{x}_k - \mathbf{y}_k\right)^\mathrm{T} \mathbf{R}^{-1} \left(\mathbf{H}\mathbf{x}_k - \mathbf{y}_k\right)
        \end{align}
    </li>
</ul>

<h3>4D-VAR continued</h3>
<ul>
    <li class="fragment">This is the approach to DA which has been adopted as the primary method at the European Centre for Medium-Range Weather Forecasts (ECMWF).</li>
    <li class="fragment"> This approach has been extremely successful, but has limitations due to the delicate nature of taking the derivative $\partial_{\mathbf{x}_0}$ of the 4D-VAR cost function
        \begin{align}
       J\left(\mathbf{x}_0\right)& = \frac{1}{2}\left(\mathbf{x}_0 - \mathbf{x}_b\right)^\mathrm{T} \mathbf{B}^{-1}\left(\mathbf{x}_0 - \mathbf{x}_b\right) +  \\
        &\frac{1}{2}\sum_{k=1}^N  \left(\mathbf{H}\mathbf{x}_k - \mathbf{y}_k\right)^\mathrm{T} \mathbf{R}^{-1} \left(\mathbf{H}\mathbf{x}_k - \mathbf{y}_k\right)
        \end{align}
    </li>
    <li class="fragment">Particularly, taking the derivative with respect to the initial condition means that we must take the derivative of the equations of motion of the physics-based model with respect to the evolution of the initial condition.</li>
</ul>

<h3>4D-VAR continued</h3>
<ul>
    <li class="fragment">It can be shown that if: </li>
        <ol>
            <li class="fragment"> the matrix $\mathbf{A}\in \mathbb{R}^{n\times n}$ is symmetric; </li>
                <li class="fragment">if the functional $J$ is defined as $J(\mathbf{x})\triangleq \frac{1}{2} \mathbf{y}^\mathrm{T}\mathbf{A}\mathbf{y}$; and</li>
            <li class="fragment">$\mathbf{y}= \mathbf{y}(\mathbf{x})$;</li>
    </ol>
        <li class="fragment">then, we have the partial derivative
            \begin{align}
            \frac{\partial J}{\partial \mathbf{x}} = \frac{\partial \mathbf{y}}{\partial \mathbf{x}}^\mathrm{T} \mathbf{A}\mathbf{y}.
            \end{align}</li>
</ul>
  

  
<h3>4D-VAR continued</h3>
<ul>
        <li class="fragment"> Using the previous rule as motivation, the full gradient is approximated by,
        \begin{align}
        \nabla_{\mathbf{x}} J \approx - \mathbf{B}^{-1} \left(\mathbf{x}_b - \mathbf{x} \right) + \sum_{k=1}^N \left(\mathbf{M}_k \mathbf{M}_{k-1} \cdots \mathbf{M}_1 \right)^\mathrm{T} \left( \frac{\partial \mathbf{H}_k}{\partial\mathbf{x}_k}\right)\mathbf{R}^{-1}\left(\mathbf{y}_k - \mathbf{H}_k\mathbf{x}_k\right)
            \end{align}</li>
<li class="fragment"> To solve for the minimum of the cost function by this approximation, the increments between the forward state and the associated observation at each time are iteratively minimized by the "adjoint method".</li> 
    <li class="fragment">The adjoint model takes the future sensitivities back-in-time to earlier times, contra-variantly to the tangent-linear model.</li>
     <li class="fragment">For a useful discussion on the adjoint method, and its use in 4D-VAR, see e.g., the following  <a href="http://www.met.reading.ac.uk/~ross/Documents/Var4d.html" target="blank">tutorial on 4D-VAR</a>.</li>
    </ul>

<h3>4D-VAR continued</h3>
<div style="float:left; width:60%">
<ul>
    <li class="fragment">The difficulty of implementing the 4D-VAR formulation has limited its adoption in operational DA, and makes this approach beyond the scope of this tutorial.</li>
    <li class="fragment"> We mention this approach here because of the theoretical and historical importance of this approach;</li>
    <ul>
        <li class="fragment"> also, these techniques are increasingly being merged with statistical techniques into "hybrid" schemes in state-of-the-art methods.</li>
    </ul>
    <li class="fragment"> Understanding the statistical approach will be the subject of the remainder of these tutorials.</li>
</ul>
</div>

<div style="float:left; width:40%">
    <img src="./4D-Var.jpg"/>
    <b>Image courtesy of <a href="https://www.ecmwf.int/en/learning/seminars/symposium-20-years-4dvar">ECMWF</a> 
</div>