<img src='https://github.com/LinkedEarth/Logos/raw/master/PYLEOCLIM_logo_HORZ-01.png' width="800">

# 8. Model-Data Confrontation in the time domain

In the notebook, we demonstrate how to use `Pyleoclim` to load LiPD files, and compare proxy records with the [last millennium reanalysis (LMR)](https://cpo.noaa.gov/News/News-Article/ArtMID/6226/ArticleID/1807/Last-Millennium-Reanalysis-now-at-NOAAs-National-Centers-for-Environmental-Information-marking-major-milestone) at  proxy locales.

In [None]:
!pip install demjson --upgrade  # addresses this setuptools/demjson incompatibility: https://github.com/dmeranda/demjson/issues/40
# load essential packages    
import os
import pickle
import numpy as np
import pandas as pd
from tqdm import tqdm
import xarray as xr
import pyleoclim as pyleo  # make an alias name for "pyleoclim"

## Load proxy data

The proxy record we'd like to load is [this one](http://wiki.linked.earth/LPD81e53153.temperature), attached to [Tierney et al (2015)](http://dx.doi.org/10.1126/sciadv.1500682). It is an SST reconstruction based on the TEX86 proxy from two cores from the horn of Africa.

In [None]:
d = pyleo.Lipd(usr_path='../data/Afr-P178-15P.Tierney.2015.lpd')
Ocn_136 = d.to_LipdSeries(0) 
Ocn_137 = d.to_LipdSeries(2)   
Ocn_136.label = 'Ocn_136'
Ocn_137.label = 'Ocn_137'

Let's plot the two cores on the same graph:

In [None]:
fig, ax = Ocn_137.plot(mute=True)
Ocn_136.plot(ax=ax)
pyleo.showfig(fig)

We'd like to see how this compares to the [last millennium reanalysis](https://cpo.noaa.gov/News/News-Article/ArtMID/6226/ArticleID/1807/Last-Millennium-Reanalysis-now-at-NOAAs-National-Centers-for-Environmental-Information-marking-major-milestone) (LMR, [Hakim et al. 2016](https://agupubs.onlinelibrary.wiley.com/doi/full/10.1002/2016JD024751), [Tardif et al. 2019](https://cp.copernicus.org/articles/15/1251/2019/)) at the same location. Note that LMR knows nothing of this dataset, as it (currently) only uses annually-resolved records. Thus, this exercise can serve as independent validation of LMR.  Let us first extract the geographical coordinates of the core:

In [None]:
tslist = d.to_tso()
plat = tslist[0]['geo_meanLat']
plon = tslist[0]['geo_meanLon']
pid = 'Ocn_136'

Now, let's move on to extract the LMR-reconstructed temperature series.

## Extract LMR-reconstructed temperature series

We will use the sea-surface temperature full grid ensemble [mean](https://atmos.washington.edu/%7Ehakim/lmr/LMRv2/sst_MCruns_ensemble_mean_LMRv2.1.nc) and [spread](https://atmos.washington.edu/%7Ehakim/lmr/LMRv2/sst_MCruns_ensemble_spread_LMRv2.1.nc).

In [None]:
mean_url = 'https://atmos.washington.edu/%7Ehakim/lmr/LMRv2/sst_MCruns_ensemble_mean_LMRv2.1.nc'
spread_url = 'https://atmos.washington.edu/%7Ehakim/lmr/LMRv2/sst_MCruns_ensemble_spread_LMRv2.1.nc'

In [None]:
# download the files
! wget $mean_url
! wget $spread_url

To manipulate netCDF files, we will be using a package called [xarray](http://xarray.pydata.org/en/stable/#). 

In [None]:
ds_mean = xr.open_dataset('sst_MCruns_ensemble_mean_LMRv2.1.nc')
ds_mean

The file contains sea surface temperature information with dimensions: time, Monte-Carlo run, latitude, and longitude. 

In [None]:
ds_spread = xr.open_dataset('sst_MCruns_ensemble_spread_LMRv2.1.nc')
ds_spread

Let's select the nearest gridpoint in the LMR dataset to our proxy record. `xarray` has a built-in function to do so:

In [None]:
sst_mean = ds_mean['sst'].sel(lat=plat,lon=plon,method='nearest')
sst_mean

Note that the array is now in dimensions of time and Monte Carlo runs.

In [None]:
sst_spread = ds_spread['sst'].sel(lat=plat,lon=plon,method='nearest')
sst_spread

Now the grid point is located, we are able to define `pyleoclim.EnsembleSeries` for the LMR data.
Note that a `pyleoclim.EnsembleSeries` is simply a list of `pyleoclim.Series`.

In [None]:
# get the dimension sizes
nt, nEns = np.shape(sst_mean)

# the dictionary to store pyleoclim.EnsembleSeries
ms_mean = {}
ms_spread = {}

ts_mean_list = []
ts_spread_list = []
for i in range(nEns):
    ts_mean_tmp = pyleo.Series(
            time=np.arange(0, 2001),
            value=sst_mean[:, i],
            time_name='Time',
            value_name='LMR-temp.',
            time_unit='year CE',
            value_unit='K',
        )
    ts_spread_tmp = pyleo.Series(
            time=np.arange(0, 2001),
            value=sst_spread[:, i],
            time_name='Time',
            value_name='LMR-temp.',
            time_unit='year CE',
            value_unit='K',
        )
    ts_mean_list.append(ts_mean_tmp)
    ts_spread_list.append(ts_spread_tmp)
    
# define pyleoclim.EnsembleSeries
ms_mean[pid] = pyleo.EnsembleSeries(series_list=ts_mean_list)
ms_spread[pid] = pyleo.EnsembleSeries(series_list=ts_spread_list)

Now we let's do a quick visualization of the data with two available plotting methods:
1. `.plot_traces()`: display several example members
2. `.plot_envelope()`: display all members as an envelope plot

In [None]:
fig, ax = ms_mean['Ocn_136'].plot_traces() # display several example members
fig, ax = ms_mean['Ocn_136'].plot_envelope() # display all members as an envelope plot

fig, ax = ms_spread['Ocn_136'].plot_traces() # display several example members
fig, ax = ms_spread['Ocn_136'].plot_envelope() # display all members as an envelope plot

Note, however, the ensemble of the means is different from the ensemble of the original reconstructed temperature series.
To get a flavor of the original ensemble, we plot the ensemble GMST below.

In [None]:
# download LMR GMST ensembles
!wget https://atmos.washington.edu/%7Ehakim/lmr/LMRv2/gmt_MCruns_ensemble_full_LMRv2.1.nc
!mv gmt_MCruns_ensemble_full_LMRv2.1.nc ../data

In [None]:
ds_gmst = xr.open_dataset('../data/gmt_MCruns_ensemble_full_LMRv2.1.nc')
ds_gmst

In [None]:
# exact data and define EnsembleSeries object
ts_gmt_list = []
nt, nMC, nM = np.shape(ds_gmst['gmt'])
for i in range(nMC):
    for j in range(nM):
        ts_gmt_tmp = pyleo.Series(
                time=np.arange(2001),
                value=ds_gmst['gmt'][:,i,j],
                time_name='Time',
                value_name='LMR-GMST',
                time_unit='AD',
                value_unit='K',
            )
    ts_gmt_list.append(ts_gmt_tmp)

ms_gmt = pyleo.EnsembleSeries(ts_gmt_list)

In [None]:
# visualization
fig, ax = ms_gmt.plot_traces()
fig, ax = ms_gmt.plot_envelope()

## Comparing the two reconstructions

Now, back to the ensemble means and spreads, we are ready to perform model-data comparison.
Since the LMR reconstruction is expressed as anomalies, we need to first calculate the anomaly series from the proxy record before the comparison. To do so, we simply call the `pyleoclim.Series.anomaly()` method:

In [None]:
fig, ax = ms_mean['Ocn_136'].plot_envelope(mute=True,curve_lw=0.5,curve_clr='black',shade_clr='gray')
Ocn_137.anomaly().plot(ax=ax, zorder=100)  # adjust zorder to reveal the curve
Ocn_136.anomaly().plot(ax=ax, zorder=100)
pyleo.showfig(fig)
pyleo.closefig(fig)

We can see that the timing of industrial warming is consistent between the two cores and LMR, though pre-indsutrial variability is severely damped in LMR (because od the lack of nearby, anually resolved proxy records) particularly in the first millennium. This is because of the attrition of whatever few annually-resolved proxies there are in that part of the world, most likely coral records from the Indian Ocean.

Now we calculate the correlation between the LMR median curve and the proxy record, after which we visualize the result.

In [None]:
corr_ens = ms_mean['Ocn_136'].correlation(Ocn_136)
print(corr_ens)

fig, ax = corr_ens.plot()

Not surprisingly, one finds a positive correlation, consistent among ensemble members, likely driven by the anthropogenic warming trend. More instructive would be to look at the correlation over the Common Era as a whole.

**Exercise 8.1** 
How does this picture change when using the longer core (Ocn_137)?

**Exercise 8.2**
How does this picture change when using either core and the global mean surface temperature series?

In [None]:
## Your code here