# TUTORIAL: 
# Data assimilation using real experimental data

We can now put everything we have learned together. 

We can investigate two scenarios:

A) Assume that we have access to the post-processed data and assimilate it. This situation simplifies the problem as the experimental data is not biased (see tutorial TA_azimuthal_data to see how the raw data is biased).
-   Truth: post-processed data 
-   Observations: post-processed data + noise (possibly coloured noise)

B) Assume a realistic setting in which the post-processed data is not available on the fly to input to the data assimilation algorithm. Here, we need to address the issue of biased observations.
-   Truth: post-processed data
-   Observations: raw data

In this tutorial we will work with option B. For option A go to the tutorial ```10_DA_annular_ideal.ipynb```.

In [None]:
import numpy as np
import os

%matplotlib notebook

rng = np.random.default_rng(0)


if os.path.isdir('/mscott/'):
    data_folder = '/mscott/an553/data/'  # set working directory to 
else:
    data_folder = "../data/"

## 1. Load data 
Create the reference truth and the observations.

*The code below can be compacted with the function ```create_truth```, which outputs a dictionary:*
```
  
from essentials.create import create_truth
truth = create_truth(filename, t_start, t_stop, dt_obs, post_processed=False)
```

In [3]:
from essentials.create import create_truth
from essentials.physical_models import Annular
from essentials.plotResults import plot_truth

ER = 0.5125  
filename = data_folder + 'annular/ER_{}'.format(ER)

# Select the observations time-window
t_start = 2* Annular.t_transient 
t_stop = t_start + Annular.t_CR * 5
Nt_obs = 55

truth = create_truth(filename, t_start, t_stop, Nt_obs, post_processed=False)
# plot_truth(**truth)


## 2. Define the forecast model
This is the physical model which we will use to model the true data.
Here, we select the filter parameters and create ensemble


In [4]:
from essentials.create import create_ensemble

filter_params = {'m': 10, 
                 'inflation': 1.002,
                 'std_psi': 0.3,
                 'std_a': dict(nu=(40., 50.),
                               c2beta=(5, 20),
                               kappa=(1.E-4, 1.3E-4),
                               epsilon=(0.0001, 0.03),
                               omega=(1090 * 2 * np.pi, 1100 * 2 * np.pi),
                               theta_b=(0.5, 0.7),
                               theta_e=(0.5, 0.8)
                               )}

ensemble = create_ensemble(model=Annular, **filter_params)

ensemble_no_bias = ensemble.copy()


## 4. Train an ESN to model the model bias
The procedure is the following

&emsp; i. Initialise ESN Bias class object
&emsp; ii. Create synthetic bias to use as training data 
&emsp; iii. Train the ESN
&emsp; iv. Create washout data

In [5]:
from essentials.create import create_bias_training_dataset, create_washout
from essentials.bias_models import ESN

train_params = dict(bias_model=ESN, 
                    upsample=5,
                    N_units=50,
                    N_wash=3,
                    t_train=ensemble.t_CR * 10,
                    bayesian_update=True,
                    biased_observations=True,
                    m=ensemble.m,
                    # Training data generation options
                    augment_data=True,
                    L=10,
                    noise=0.1, 
                    # Hyperparameter search ranges
                    rho_range=(0.5, 1.1),
                    sigma_in_range=(np.log10(1e-5), np.log10(1e1)),
                    tikh_range=[1e-16]
                    )

# 4.1. Initialise the ESN
ensemble.init_bias(**train_params)

# 4.2. Create training data
train_data = create_bias_training_dataset(truth['y_raw'], truth['y_true'], ensemble, **train_params)


# 4.3. Train the ESN
# The training convergence, hyperparameter optimization and testing results are saved in a pdf file in figs_ESN folder.
ensemble.bias.train_bias_model(**train_data) 

# 4.4. Create washout data
ensemble.t_init = truth['t_obs'][0]
ensemble.bias.t_init = ensemble.t_init - 2 * Nt_obs * truth['dt']
wash_t, wash_obs = create_washout(bias_case=ensemble.bias, **truth)

observed_idx [0 1 2 3 4 5 6 7] -> [4 5 6 7]
t_train 0.1 -> 0.1
t_val 0.1 -> 0.01
augment_data True -> True
bayesian_update True -> True
biased_observations True -> True
L 10 -> 10

 ----------------- HYPERPARAMETER SEARCH ------------------
 4x4 grid and 4 points with Bayesian Optimization
		 rho	 sigma_in	 tikh	 MSE val 
1	 5.000e-01	 1.000e-05	 1.000e-16	 -4.1607
2	 5.000e-01	 1.000e-03	 1.000e-16	 -3.9115
3	 5.000e-01	 1.000e-01	 1.000e-16	 -3.9295
4	 5.000e-01	 1.000e+01	 1.000e-16	 -4.8230
5	 7.000e-01	 1.000e-05	 1.000e-16	 -4.1702
6	 7.000e-01	 1.000e-03	 1.000e-16	 -4.0509
7	 7.000e-01	 1.000e-01	 1.000e-16	 -4.0148
8	 7.000e-01	 1.000e+01	 1.000e-16	 -4.4151
9	 9.000e-01	 1.000e-05	 1.000e-16	 -4.0827
10	 9.000e-01	 1.000e-03	 1.000e-16	 -3.9760
11	 9.000e-01	 1.000e-01	 1.000e-16	 -3.9478
12	 9.000e-01	 1.000e+01	 1.000e-16	 -4.0736
13	 1.100e+00	 1.000e-05	 1.000e-16	 -2.4807
14	 1.100e+00	 1.000e-03	 1.000e-16	 -1.1193
15	 1.100e+00	 1.000e-01	 1.000e-16	 -2.0071
16	 1.100e

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

Median and max error in 120 tests: -1.4001855453750935 -0.7867074175501121
VAAR 234860.176096294


## 5. Apply data assimilation
We now have all the ingredients to start our data assimilation algorithm.

In [6]:
ensemble_ESN = ensemble.copy()


In [7]:
from essentials.DA import dataAssimilation

std_obs = 0.1

kwargs = dict(y_obs=truth['y_obs'], t_obs=truth['t_obs'], std_obs=std_obs, 
              wash_obs=wash_obs, wash_t=wash_t)


ensemble_ESN.filter = 'rBA_EnKF'
ensemble_ESN.regularization_factor = 2.
ensemble_no_bias.filter ='EnSRKF'


out = []
# for ens in [ensemble_no_bias, ensemble_ESN]:
for ens in [ensemble_ESN]:
    ens = ens.copy()
    ens.bias.update_reservoir = False
    filter_ens = dataAssimilation(ens, **kwargs.copy())
    
    #Forecast the ensemble further without assimilation
    Nt_extra = int(filter_ens.t_CR / filter_ens.dt) + 1
    
    psi, t = filter_ens.time_integrate(Nt_extra)
    filter_ens.update_history(psi, t)
    
    y = filter_ens.get_observable_hist(Nt_extra)
    b, t_b = filter_ens.bias.time_integrate(t=t, y=y)
    filter_ens.bias.update_history(b, t_b)
    
    out.append(filter_ens)
    # out[-1] = filter_ens



 ------------------ Annular Model Parameters ------------------ 
	 ER = 0.5
	 Nq = 4
	 c2beta = 12.5
	 dt = 1.95313e-05
	 epsilon = 0.01505
	 kappa = 0.000115
	 n = 1.0
	 nu = 45.0
	 omega = 6880.09
	 theta_b = 0.6
	 theta_e = 0.65

 ---------------- ESN bias model parameters --------------- 
	 L = 10
	 N_units = 50
	 N_wash = 3
	 augment_data = True
	 bayesian_update = True
	 connect = 3
	 observed_idx = [4 5 6 7]
	 perform_test = True
	 rho = 0.5
	 sigma_in = 10.0
	 t_train = 0.1
	 t_val = 0.01
	 tikh = 1e-16
	 upsample = 5

 -------------------- ASSIMILATION PARAMETERS -------------------- 
 	 Filter = rBA_EnKF  
	 bias = ESN 
 	 m = 10 
 	 Time steps between analysis = None 
 	 Inferred params = ['nu', 'c2beta', 'kappa', 'epsilon', 'omega', 'theta_b', 'theta_e'] 
 	 Inflation = 1.002 
 	 Ensemble std(psi0) = 0.3
 	 Ensemble std(alpha0) = {'nu': (40.0, 50.0), 'c2beta': (5, 20), 'kappa': (0.0001, 0.00013), 'epsilon': (0.0001, 0.03), 'omega': (6848.671984825749, 6911.503837897545), '

In [8]:
from essentials.plotResults import plot_timeseries, plot_parameters
# 
# truth = dict(y_raw=y_raw, y_true=y_true, t=t_true, dt=dt_t,
#              t_obs=t_true[obs_idx], y_obs=y_raw[obs_idx], dt_obs=dt_obs * dt_t,
#              std_obs=std_obs, wash_t=wash_t, wash_obs=wash_obs)

for filter_ens in out:
    plot_timeseries(filter_ens, truth)
    plot_parameters(filter_ens, truth)

[(10851,), (10851, 4, 10), (3556,), (3556, 4, 10)]


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [27]:
from essentials.plotResults import plot_states_PDF, plot_RMS_pdf
plot_states_PDF(out, truth)
plot_RMS_pdf(out, truth)


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [22]:
from essentials.plotResults import *

for filter_ens in [out[-1]]:    
    
    fig, axs = plt.subplots(filter_ens.Nq, 2, figsize=(10, 5), sharey=True, sharex=True, layout='tight')
    
    j0 = np.argmin(abs(t_ref- (truth['t_obs'][0] - ensemble.t_CR*.5)))
    j1s = [np.argmin(abs(t_ref - truth['t_obs'][idx])) for idx in [0, 5]]
    t = t_ref[j0:j1s[-1]]
    
    y_raw = interpolate(truth['t'], truth['y_raw'], t)
    y_true = interpolate(truth['t'], truth['y_true'], t)
    
    t_obs, obs = truth['t_obs'], truth['y_obs']

    
    y_est = filter_ens.get_observable_hist()   
    b_est = filter_ens.bias.hist
    
    
    y_est = interpolate(filter_ens.hist_t, y_est, t)
    b_est = interpolate(filter_ens.bias.hist_t, b_est, t)
    for qi, ax in enumerate(axs):
        # Observables ---------------------------------------------------------------------
        # ax[0].plot(t, np.expand_dims(y_raw[:, qi], -1)-y_est[:, qi], label='t', **bias_obs_props)
        ax[0].plot(t, np.mean(np.expand_dims(y_raw[:, qi], -1)-y_est[:, qi], axis=-1), label='t', **bias_obs_props)
        ax[0].plot(t, b_est[:, qi], **bias_obs_noisy_props)
        ax[1].plot(t, np.mean(np.expand_dims(y_true[:, qi], -1)-y_est[:, qi], axis=-1), label='t', **bias_obs_props)
        # ax[1].plot(t, np.expand_dims(y_true[:, qi], -1)-y_est[:, qi], label='t', **bias_obs_props)
        ax[1].plot(t, b_est[:, filter_ens.Nq+qi], **bias_obs_noisy_props)
        for ax_ in ax:
            for x in [wash_t[0], wash_t[1], t_obs[0]]:
                ax_.axvline(x, color='b', lw=.5)        
    axs[0,0].set(xlim=(wash_t[0]-truth['dt_obs'],t_obs[2] ))
        



<IPython.core.display.Javascript object>