<img src='https://github.com/LinkedEarth/Logos/raw/master/PYLEOCLIM_logo_HORZ-01.png' width="800">

# 8. Model-Data Confrontation in the time domain

In the notebook, we demonstrate how to use `Pyleoclim` to load LiPD files, and compare proxy records with the [last millennium reanalysis (LMR)](https://cpo.noaa.gov/News/News-Article/ArtMID/6226/ArticleID/1807/Last-Millennium-Reanalysis-now-at-NOAAs-National-Centers-for-Environmental-Information-marking-major-milestone) at  proxy locales.

In [None]:
# load essential packages
%load_ext autoreload
%autoreload 2
    
import os
import pickle

import numpy as np
import pandas as pd
from tqdm import tqdm
import xarray as xr

import pyleoclim as pyleo  # make an alias name for "pyleoclim"

## Load proxy data

The proxy record we'd like to load is [this one](http://wiki.linked.earth/LPD81e53153.temperature), attached to [Tierney et al (2015)](http://dx.doi.org/10.1126/sciadv.1500682). It is an SST reconstruction based on the TEX86 proxy from two cores from the horn of Africa.

In [None]:
d = pyleo.Lipd(usr_path='../data/Afr-P178-15P.Tierney.2015.lpd')
Ocn_136 = d.to_LipdSeries(0) 
Ocn_137 = d.to_LipdSeries(2)   
Ocn_136.label = 'Ocn_136'
Ocn_137.label = 'Ocn_137'

Let's plot the two cores on the same graph:

In [None]:
fig, ax = Ocn_137.plot(mute=True)
Ocn_136.plot(ax=ax)
pyleo.showfig(fig)

Wwe'd like to see how this compares to the [last millennium reanalysis](https://cpo.noaa.gov/News/News-Article/ArtMID/6226/ArticleID/1807/Last-Millennium-Reanalysis-now-at-NOAAs-National-Centers-for-Environmental-Information-marking-major-milestone) (LMR, [Hakim et al. 2016](https://agupubs.onlinelibrary.wiley.com/doi/full/10.1002/2016JD024751), [Tardif et al. 2019](https://cp.copernicus.org/articles/15/1251/2019/)) at the same location. Note that LMR knows nothing of this dataset, as it (currently) only uses annually-resolved records. Thus, this exercise can serve as independent validation of LMR.  Let us first extract the geographical coordinates of the core:

In [None]:
tslist = d.to_tso()
plat = tslist[0]['geo_meanLat']
plon = tslist[0]['geo_meanLon']

Now, let's move on to extract the LMR-reconstructed temperature series.

## Extract LMR-reconstructed temperature series

We will use the sea-surface temperature full grid ensemble [mean](https://atmos.washington.edu/%7Ehakim/lmr/LMRv2/sst_MCruns_ensemble_mean_LMRv2.1.nc) and [spread](https://atmos.washington.edu/%7Ehakim/lmr/LMRv2/sst_MCruns_ensemble_spread_LMRv2.1.nc).

In [None]:
mean_url = 'https://atmos.washington.edu/%7Ehakim/lmr/LMRv2/sst_MCruns_ensemble_mean_LMRv2.1.nc'
spread_url = 'https://atmos.washington.edu/%7Ehakim/lmr/LMRv2/sst_MCruns_ensemble_spread_LMRv2.1.nc'

In [None]:
# download the files
! wget $mean_url
! wget $spread_url

In [None]:
with xr.open_dataset('sst_MCruns_ensemble_mean_LMRv2.1.nc') as ds:
    print(ds)
    sst_lat = ds['lat']
    sst_lon = ds['lon']
    sst_time = ds['time']
    sst_mean = ds['sst']
    
with xr.open_dataset('sst_MCruns_ensemble_spread_LMRv2.1.nc') as ds:
    print(ds)
    sst_spread = ds['sst']

In [None]:
print(np.shape(sst_mean))  # check the shape of the LMR ensemble mean
print(np.shape(sst_spread))  # check the shape of the LMR ensemble spread

In [None]:
print(sst_time) # check the time axis from LMR

Note that the time axis from LMR is in `cftime.DatetimeNoLeap`.
We need to convert it to an array of floats.
Since it's simply a list of integers from 0 to 2000, we simply define it with a numpy function.

In [None]:
# define time axis in an array of floats
sst_time = np.arange(0, 2001)
print(sst_time)

Now we need to locate the nearest gridpoint in the LMR grid for each proxy record.
We utilize a function below to achieve that.

In [None]:
def find_nearest_loc(lat, lon, target_lat, target_lon, mode=None, verbose=False):
    from scipy import spatial
    
    ''' Find the nearest model model point based on the given target list

    Args:
        lat, lon (array): the model latitude and longitude arrays
        target_lat, target_lon (array): the target latitude and longitude arrays
        mode (str):
        + latlon: the model lat/lon is a 1-D array
        + mesh: the model lat/lon is a 2-D array

    Returns:
        lat_ind, lon_ind (array): the indices of the found closest model sites

    '''
    if mode is None:
        if len(np.shape(lat)) == 1:
            mode = 'latlon'
        elif len(np.shape(lat)) == 2:
            mode = 'mesh'
        else:
            raise ValueError('ERROR: The shape of the lat/lon cannot be processed !!!')

    if mode == 'latlon':
        # model locations
        mesh = np.meshgrid(lon, lat)

        list_of_grids = list(zip(*(grid.flat for grid in mesh)))
        model_lon, model_lat = zip(*list_of_grids)

    elif mode == 'mesh':
        model_lat = lat.flatten()
        model_lon = lon.flatten()

    elif mode == 'list':
        model_lat = lat
        model_lon = lon

    model_locations = []

    for m_lat, m_lon in zip(model_lat, model_lon):
        model_locations.append((m_lat, m_lon))

    # target locations
    if np.size(target_lat) > 1:
        #  target_locations_dup = list(zip(target_lat, target_lon))
        #  target_locations = list(set(target_locations_dup))  # remove duplicated locations
        target_locations = list(zip(target_lat, target_lon))
        n_loc = np.shape(target_locations)[0]
    else:
        target_locations = [(target_lat, target_lon)]
        n_loc = 1

    lat_ind = np.zeros(n_loc, dtype=int)
    lon_ind = np.zeros(n_loc, dtype=int)

    # get the closest grid
    for i, target_loc in (enumerate(tqdm(target_locations)) if verbose else enumerate(target_locations)):
        X = target_loc
        Y = model_locations
        distance, index = spatial.KDTree(Y).query(X)
        closest = Y[index]
        nlon = np.shape(lon)[-1]

        if mode == 'list':
            lat_ind[i] = index % nlon
        else:
            lat_ind[i] = index // nlon
        lon_ind[i] = index % nlon

        #  if np.size(target_lat) > 1:
            #  df_ind[i] = target_locations_dup.index(target_loc)

    if np.size(target_lat) > 1:
        #  return lat_ind, lon_ind, df_ind
        return lat_ind, lon_ind
    else:
        return lat_ind[0], lon_ind[0]

In [None]:
print(lat.keys())  # check the keys we have so far

Now we use the `find_nearest_loc()` function to search for the nearest grid point in the LMR grid.

In [None]:
lat_idx = {}
lon_idx = {}
pid = 'Ocn_136'
lat_idx[pid], lon_idx[pid] = find_nearest_loc(sst_lat, sst_lon, plat, plon)
print(f'Target: {plat, plon}; Found: {sst_lat[lat_idx[pid]], sst_lon[lon_idx[pid]]}')

Now the grid point is located, we are able to define `pyleoclim.EnsembleSeries` for the LMR data.
Note that a `pyleoclim.EnsembleSeries` is simply a list of `pyleoclim.Series`.

In [None]:
# get the dimension sizes
nt, nEns, nlat, nlon = np.shape(sst_mean)

# the dictionary to store pyleoclim.EnsembleSeries
ms_mean = {}
ms_spread = {}

pid = 'Ocn_136'
ts_mean_list = []
ts_spread_list = []
for i in range(nEns):
    ts_mean_tmp = pyleo.Series(
            time=sst_time,
            value=sst_mean[:, i, lat_idx[pid], lon_idx[pid]],
            time_name='Time',
            value_name='LMR-temp.',
            time_unit='AD',
            value_unit='K',
        )
    ts_spread_tmp = pyleo.Series(
            time=sst_time,
            value=sst_spread[:, i, lat_idx[pid], lon_idx[pid]],
            time_name='Time',
            value_name='LMR-temp.',
            time_unit='AD',
            value_unit='K',
        )
    ts_mean_list.append(ts_mean_tmp)
    ts_spread_list.append(ts_spread_tmp)
    
# define pyleoclim.EnsembleSeries
ms_mean[pid] = pyleo.EnsembleSeries(series_list=ts_mean_list)
ms_spread[pid] = pyleo.EnsembleSeries(series_list=ts_spread_list)

Now we let's do a quick visualization of the data with two available plotting methods:
1. `.plot_traces()`: display several example members
2. `.plot_envelope()`: display all members as an envelope plot

In [None]:
fig, ax = ms_mean['Ocn_136'].plot_traces() # display several example members
fig, ax = ms_mean['Ocn_136'].plot_envelope() # display all members as an envelope plot

fig, ax = ms_spread['Ocn_136'].plot_traces() # display several example members
fig, ax = ms_spread['Ocn_136'].plot_envelope() # display all members as an envelope plot

Note, however, the ensemble of the means is different from the ensemble of the original reconstructed temperature series.
To get a flavor of the original ensemble, we plot the ensemble GMST below.

In [None]:
# download LMR GMST ensembles
!wget https://atmos.washington.edu/%7Ehakim/lmr/LMRv2/gmt_MCruns_ensemble_full_LMRv2.1.nc

In [None]:
with xr.open_dataset('gmt_MCruns_ensemble_full_LMRv2.1.nc') as ds:
    print(ds)
    gmt = ds['gmt'].values
    gmt_time = np.arange(2001)

In [None]:
# exact data and define EnsembleSeries object
ts_gmt_list = []
nt, nMC, nM = np.shape(gmt)
for i in range(nMC):
    for j in range(nM):
        ts_gmt_tmp = pyleo.Series(
                time=gmt_time,
                value=gmt[:,i,j],
                time_name='Time',
                value_name='LMR-GMST',
                time_unit='AD',
                value_unit='K',
            )
    ts_gmt_list.append(ts_gmt_tmp)

ms_gmt = pyleo.EnsembleSeries(ts_gmt_list)

In [None]:
# visualization
fig, ax = ms_gmt.plot_traces()
fig, ax = ms_gmt.plot_envelope()

## Comparing the two reconstructions

Now, back to the ensemble means and spreads, we are ready to perform model-data comparison.
Since the LMR reconstruction is expressed as anomalies, we need to first calculate the anomaly series from the proxy record before the comparison. To do so, we simply call the `pyleoclim.Series.anomaly()` method:

In [None]:
fig, ax = ms_mean['Ocn_136'].plot_envelope(mute=True,curve_lw=0.5,curve_clr='black',shade_clr='gray')
Ocn_137.anomaly().plot(ax=ax, zorder=100)  # adjust zorder to reveal the curve
Ocn_136.anomaly().plot(ax=ax, zorder=100)
pyleo.showfig(fig)
pyleo.closefig(fig)

We can see that the timing of industrial warming is consistent between the two cores and LMR, though pre-indsutrial variability is severely damped in LMR (because od the lack of nearby, anually resolved proxy records) particularly in the first millennium. This is because of the attrition of whatever few annually-resolved proxies there are in that part of the world, most likely coral records from the Indian Ocean.

Now we calculate the correlation between the LMR median curve and the proxy record, after which we visualize the result.

In [None]:
corr_ens = ms_mean['Ocn_136'].correlation(Ocn_136)
print(corr_ens)

fig, ax = corr_ens.plot()

Not surprisingly, one finds a positive correlation, consistent among ensemble members, likely driven by the anthropogenic warming trend. More instructive would be to look at the correlation over the Common Era as a whole.

**Exercise 8.1** 
How does this picture change when using the longer core (Ocn_137)?

**Exercise 8.2**
How does this picture change when using either core and the global mean surface temperature series?

In [None]:
## Your code here