<img src='https://github.com/LinkedEarth/Logos/raw/master/PYLEOCLIM_logo_HORZ-01.png' width="800">

# 7. Model-Data Confrontation #1

In this tutorial, we demonstrate how to use `Pyleoclim` for spectral analysis of proxy records and model simulations, along with their visualization.

In the 1st part, to introduce the spectral analysis funtionalities, we reproduce the results of [Zhu et al. (2019)](https://www.pnas.org/content/early/2019/04/09/1809959116).

In the 2nd part, a practictum, encourages you to play with the functionalities, with the help of the [documentation](https://pyleoclim-util.readthedocs.io/en/latest/core/ui.html#), and compare the results from different spectral analysis methods.

In [None]:
# load essential packages
%load_ext autoreload
%autoreload 2
    
import os
import pickle

import numpy as np
import pandas as pd
from tqdm import tqdm

import pyleoclim as pyleo  # make an alias name for "pyleoclim"

## Part I: a presentation of the `Pyleoclim` spectral analysis functionalities

### Load GMST data

#### Load PMIP3 simulations

The PMIP3([Braconnot et al. 2012](https://www.nature.com/articles/nclimate1456))  simulations  of the past millennium ([past1000](https://wiki.lsce.ipsl.fr/pmip3/doku.php/pmip3:design:lm:final)) of global mean surface temperature (GMST) are stored in a text file and can be imported with `Pandas` conveniently.
The file includes several ensemble members for CESM and GISS simulations, for which we substitue their ensemble mean series.

In [None]:
# load the raw data
df = pd.read_table('../data/PMIP3_GMST.txt')

# display the raw data
df

In [None]:
# create a new pandas.DataFrame to store the processed data
df_new = df.copy()

# remove the data columns for CESM and GISS ensemble members
for i in range(10):
    df_new = df_new.drop([f'CESM_member_{i+1}'], axis=1)
    
df_new = df_new.drop(['GISS-E2-R_r1i1p127.1'], axis=1)
df_new = df_new.drop(['GISS-E2-R_r1i1p127'], axis=1)
df_new = df_new.drop(['GISS-E2-R_r1i1p121'], axis=1)

# calculate the ensemble mean for CESM and GISS, and add the results into the table
df_new['CESM'] = df[[
    'CESM_member_1',
    'CESM_member_2',
    'CESM_member_3',
    'CESM_member_4',
    'CESM_member_5',
    'CESM_member_6',
    'CESM_member_7',
    'CESM_member_8',
    'CESM_member_9',
    'CESM_member_10',
]].mean(axis=1)

df_new['GISS'] = df[[
    'GISS-E2-R_r1i1p127.1',   
    'GISS-E2-R_r1i1p127',
    'GISS-E2-R_r1i1p121',
]].mean(axis=1)

# display the processed data
df_new

Now we define a `pyleoclim.Series` object for each simulated GMST time series.
A `pyleoclim.Series` represents a time series object that comes with a collection of methods, such as performing spectral analysis, wavelet analysis, interpolation, plotting, and so on.
For details, see [the documentation](https://pyleoclim-util.readthedocs.io/en/stable/core/ui.html#series-pyleoclim-series).

In [None]:
# store each pyleoclim.Series() object into a dictionary
ts_dict = {}
for name in df_new.columns[1:]:
    ts_dict[name] = pyleo.Series(
        time=df_new['Year'].values,  # the time axis
        value=df_new[name].values,   # the value axis
        label=name,                  # optional metadata: the nickname of the series
        time_name='Time',            # optional metadata: the name of the time axis
        time_unit='yrs',             # optional metadata: the unit of the time axis
        value_name='GMST anom.',     # optional metadata: the name of the value axis
        value_unit='K',              # optional metadata: the unit of the value axis
    )

Once a `pyleoclim.Series` is defined, we can easily visualize it by calling the `pyleoclim.Series.plot()` method.
For instance, we plot the CCSM4 GMST below:

In [None]:
fig, ax = ts_dict['CCSM4'].plot()

Note that the return of the `plot()` method is a list of a `matplotlib.pyplot.figure` and a `matplotlib.pyplot.axis`.
That means all possible `matplotlib` manipulations can follow.
For instance, let's change the limit of the y-axis and the label below.

In [None]:
fig, ax = ts_dict['CCSM4'].plot(mute=True, label='CCSM4 series')  # the argument "mute=True" means to hold the display
ax.set_ylim([-4, 2])
pyleo.showfig(fig)  # display the final presentation of the figure

Note that when we want to modify on the original `fig` and `ax` returned from `pyleoclim.Series.plot()`, we need to use the `mute=True` argument to first hold the display of the figure, and then use `pyleoclim.showfig(fig)` to display the final presentation of the figure.

With the same mechanism, we may plot two time series in the same figure as following, in which we use the argument `ax=ax` to specify that the we'd like to plot the series of GISS into the same `matplotlib.pyplot.axis`.

In [None]:
fig, ax = ts_dict['CCSM4'].plot(mute=True)
ts_dict['GISS'].plot(ax=ax)  # the argument "ax=ax" indicates we'd like to plot into the "ax" we got from the previous line of code 
ax.set_ylim([-4, 2])
pyleo.showfig(fig)

Is there a way to plot a collection of time series at once? Absolutely.
We can define a `pyleoclim.MultipleSeries` object, which takes a list of `pyleoclim.Series` objects as input.

Since we have defind a dictionary of a collection of `pyleoclim.Series` objects, we may first convert this dictionary into a list, and then use that list to define a `pyleoclim.MultipleSeries` object.

In [None]:
ts_list = [v for k, v in ts_dict.items()]  # a pythonic way to convert the pyleo.Series items in the dictionary to a list
ms_pmip = pyleo.MultipleSeries(ts_list)

Now that the `pyleoclim.MultipleSeries` called "ms_pmip" is defined, we can visualize all the time series at once by calling the `pyleoclim.MultipleSeries.plot()` method.

In [None]:
fig, ax = ms_pmip.plot()

You may notice that the legend is not in its best place, and we may want to move it to the right side.
We can achieve that by passing a dictionary of arguments for `matplotlib.pyplot.axis.legend()` (see the [matplotlib documentation](https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.legend.html) for details) as below:

In [None]:
fig, ax = ms_pmip.plot(
    lgd_kwargs={
        'bbox_to_anchor': (1.25, 1),  # move the legend to the right side
    }
)

Now that the loading of PMIP3 simulations is complete, let's move on to proxies, the [last millennium reanalysis](https://cpo.noaa.gov/News/News-Article/ArtMID/6226/ArticleID/1807/Last-Millennium-Reanalysis-now-at-NOAAs-National-Centers-for-Environmental-Information-marking-major-milestone) (LMR, [Hakim et al. 2016](https://agupubs.onlinelibrary.wiley.com/doi/full/10.1002/2016JD024751), [Tardif et al. 2019](https://cp.copernicus.org/articles/15/1251/2019/)), and deglacial simulations.

#### Load proxies, LMR, and deglacial simulations

We've preprocessed the data for proxies, LMR, and deglacial simulations, and stored them in a [Python pickle](https://docs.python.org/3/library/pickle.html) file, which includes two dictionaries called `ts` and `vs`.
`ts` includes the time axis for each dataset,
and `vs` includes the GMST for each dataset.

We load the pickle file below and print out the dictionary keys to see how many datasets we have.

In [None]:
with open('../data/PNAS19_data.pkl', 'rb') as f:
    ts, vs = pickle.load(f)
    
print(vs.keys())

Now we extract the data and organize them in a `Series` object:

In [None]:
for name in vs.keys():
    # we may specify specific metadata for each dataset with the if-clauses
    if name == 'LMR':
        value_name = 'GSMT anom.'
        value_unit = 'K'
    elif name in ['trace21ka_full', 'DGns', 'SIM2bl']:
        value_name = 'GSMT'
        value_unit = 'K'
    else:
        value_name = 'Proxy Value'
        value_unit = None
        
    if name == 'trace21ka_full':
        label = 'TraCE-21ka'
    elif name in ['trace21ka_mwf', 'trace21ka_orb', 'trace21ka_ghg', 'trace21ka_ice']:
        continue
    else:
        label = name
        
    ts_dict[name] = pyleo.Series(
        time=ts[name],
        value=vs[name],
        label=label,
        time_name='Time',
        time_unit='yrs',
        value_name=value_name,
        value_unit=value_unit,
    )

Now we define a `MultipleSeries` object for each group of datasets:

In [None]:
ms_obs = pyleo.MultipleSeries(
    [ts_dict[name] for name in ['EDC', 'HadCRUT4', 'GAST', 'ProbStack']]
)
ms_deglacial = pyleo.MultipleSeries(
    [ts_dict[name] for name in ['trace21ka_full', 'DGns', 'SIM2bl']]
)

Now we visualize what we have.
LMR first.

Note that here we use the median of the LMR ensembles for our analysis for simplicity and calculation speed, while all the ensemble members are being analyzed in the original paper, so the estimated scaling slope value that we show later would be a bit different from that in the original paper. 

In [None]:
fig, ax = ts_dict['LMR'].plot()

Then the proxies.

In [None]:
fig, ax = ms_obs.plot()

We notice that the time axis is in unit of years by default, which is odd for paleo-records prior to CE.
We can easily convert the time unit to "kyrs BP" by calling the `pyleoclim.MultipleSeries.convert_time_unit` method as below:

In [None]:
ms_obs = ms_obs.convert_time_unit('myrs BP')
fig, ax = ms_obs.plot()

Okay, now the time unit is "myrs BP", and the numerical time values are ascending.
What if we'd like to present the data in a way that the right hand side represents more recent time?
Well, we can manipulate the returned `matplotlib.pyplot.axis` object as mentioned earlier, or just use the `invert_xais=True` argument of the `pyleoclim.MultipleSeries.plot()` method as below:

In [None]:
fig, ax = ms_obs.plot(invert_xaxis=True)

Similarly, we convert the time unit of deglacial simulations to "kyrs BP" and then visualize them with the x-axis inverted and the lelgend location moved to the right side.

In [None]:
ms_deglacial = ms_deglacial.convert_time_unit('kyrs BP')
fig, ax = ms_deglacial.plot(
    lgd_kwargs={
        'loc': 'upper right',         # put the legend anchor to the upper right corner
        'bbox_to_anchor': (1.25, 1),  # move the legend to the right side
    },
    invert_xaxis=True,
)

Now that all data needed has been loaded, let's perform spectral analysis using the Weighted Wavelet Z-transform method (WWZ)([Foster 1996](https://ui.adsabs.harvard.edu/abs/1996AJ....112.1709F), [Kirchner & Neal 2013](https://www.pnas.org/content/110/30/12213)), which can handle unevenly-spaced data without interpolation. (see notebook 6 for more details)

### Spectral analysis using WWZ

We may perform spectral analysis on time series by calling the `pyleoclim.Series.spectral()` method.
It has the argument `method` to specify which method to use. It is set to `wwz` by default to use the WWZ method. 
It also has an argument `freq_method` to specify the approach to generate the frequency vector for the analysis.
It is set to `log` by default to use generate the frequency vector in a log space.
Here, we set to `nfft` so that we can reproduce the result in the original paper [Zhu et al. (2019)](https://www.pnas.org/content/early/2019/04/09/1809959116).
Other arguments specific to each spectral analysis method can be passed in through the argument `settings`.
Since WWZ is originally a wavelet analysis method, we may specify `tau` to specify the evenly-spaced time points (the temporal resolution) for wavelet analysis.
However, since our purpose here is spectral analysis, the temporal resolution is not required to be high, and we may use small values to accelerate the calculation.
Please see the documentation on [pyleoclim.Series.spectral](https://pyleoclim-util.readthedocs.io/en/stable/core/Series/spectral.html#pyleoclim.core.ui.Series.spectral) and the [wwz_psd](https://pyleoclim-util.readthedocs.io/en/stable/utils/spectral/wwz_psd.html?highlight=wwz_psd) function that `pyeloclim.Series.spectral` called for details.

The method will return a [pyleoclim.PSD](https://pyleoclim-util.readthedocs.io/en/stable/core/ui.html#psd-pyleoclim-psd) object, which includes the estimated power spectral density (PSD) along with the information of the frequency axis, and the object iteself is intended for lalter operations such as visualization, scaling slope estimation, and significance test.

Note that to reproduce exactly the result in the paper, we need to use settings in the cell below (that's commented out), which could be slow (> 5 mins).
For the sake of time, we may load a file with pre-calculated results.

In [None]:
# %%time

# # we will store the result in a dictionary with the dataset names as keys
# psd_wwz = {}
# for name, ts in ts_dict.items():
#     print(f'Processing {name} ...')
#     print(f'Data length: {np.size(ts.time)}')
#     if name in ['DGns', 'SIM2bl']:
#         ntau = 51  # to accelerate the calculation; the smaller, the faster
#     else:
#         ntau = 501
#     tau = np.linspace(np.min(ts.time), np.max(ts.time), ntau)
#     psd_wwz[name] = ts.spectral(method='wwz', freq_method='nfft', settings={'tau': tau})

In [None]:
# quick loading of the pyleoclim.PSD objects
with open('../data/PNAS19_psd.pkl', 'rb') as f:
    psd_wwz = pickle.load(f)

We may, however, perform the WWZ method with the default settings that makes the calculation faster, and compare with the pre-calculated results.

In [None]:
%%time
psd_wwz_new = {}
for name, ts in ts_dict.items():
    print(f'Processing {name} ...')
    print(f'Data length: {np.size(ts.time)}')
    psd_wwz_new[name] = ts.spectral(method='wwz')

Now we compare the results.

In [None]:
for k in psd_wwz_new.keys():
    fig, ax = psd_wwz_new[k].plot(figsize=[5, 2], mute=True, label='new')
    psd_wwz[k].plot(ax=ax, label='paper', color='red', alpha=1)
    ax.set_title(k)
    pyleo.showfig(fig, close=True)

We see that the difference is overall rather small.
Indeed, the difference is mainly caused by the frequency vector: the results of the paper used a linear spaced frequency vector while the current default settings used a log space vector.
For the following presentation, we will stick with the pre-calculated results of the paper.

### Visualization the PSD objects returned from the spectral analysis

Now let's visualize the results.
In below cells, we first define a colormap, then specify the colors for each `pyleoclim.PSD` object.
Similar to `pyleoclim.MultipleSeries`, we may also define a `pyleoclim.MultiplePSD` object for a collection of the `pyleoclim.PSD` objects for operations at once.

In [None]:
# define the tableau20 colors
tableau20 = [(31, 119, 180), (174, 199, 232), (255, 127, 14), (255, 187, 120),    
             (44, 160, 44), (152, 223, 138), (214, 39, 40), (255, 152, 150),    
             (148, 103, 189), (197, 176, 213), (140, 86, 75), (196, 156, 148),    
             (227, 119, 194), (247, 182, 210), (127, 127, 127), (199, 199, 199),    
             (188, 189, 34), (219, 219, 141), (23, 190, 207), (158, 218, 229)]    
  
# scale the RGB values to the [0, 1] range, which is the format matplotlib accepts.    
for i in range(len(tableau20)):    
    r, g, b = tableau20[i]    
    tableau20[i] = (r / 255., g / 255., b / 255.) 

In [None]:
# define a dictionary for the colors
clr_dict = {
    'EDC': tableau20[0],
    'HadCRUT4': tableau20[3],
    'GAST': tableau20[4],
    'ProbStack': tableau20[5],
    'LMR': tableau20[6],
}

# specify color for each pyleoclim.PSD objects
for k, v in clr_dict.items():
    psd_wwz[k].plot_kwargs = {'color': v}
    
# for the period axis customization later
period_ticks = [0.5, 1, 2, 5, 10, 20, 100, 1e3, 1e4, 1e5, 1e6]
period_ticklabels = ['0.5', '1', '2', '5', '10', '20', '100', '1 k', '10 k', '100 k', '1 m']

# define the pyleoclim.MultiplePSD object and visualize the several pyleoclim.PSD objects at once
mpsd_obs = pyleo.MultiplePSD([psd_wwz[name] for name in ['EDC', 'HadCRUT4', 'GAST', 'ProbStack', 'LMR']])
fig, ax = mpsd_obs.plot(figsize=[8, 4], mute=True)
ax.set_xlim([1e7, 0.1])
ax.set_ylim([1e-4, 1e8])
ax.set_xticks(period_ticks)
ax.set_xticklabels(period_ticklabels)
ax.set_ylabel('Spectral Density')
pyleo.showfig(fig)

We have reproduced Fig. 1 of the original paper above.

To reproduce the upper panel of Fig. 2, we reset the colors for observations to be grey, and set the opacity via `alpha`, as well as the line width via `linewidth` below.
Note that the colors for the PMIP3 simulations will follow a default list of Python.

In [None]:
clr_dict = {
    'EDC': 'grey',
    'HadCRUT4': 'grey',
    'GAST': 'grey',
    'ProbStack': 'grey',
    'LMR': 'grey'
}
for k, v in clr_dict.items():
    psd_wwz[k].plot_kwargs = {'color': v, 'alpha': 0.3, 'linewidth': 1.5}
    
mpsd_obs = pyleo.MultiplePSD([psd_wwz[name] for name in ['EDC', 'HadCRUT4', 'GAST', 'ProbStack', 'LMR']])

period_ticks = [0.5, 1, 2, 5, 10, 20, 100, 1000, 10000, 100000]
period_ticklabels = ['0.5', '1', '2', '5', '10', '20', '100', '1 k', '10 k', '100 k']

pmip_names = ['bcc_csm1_1', 'CCSM4', 'FGOALS_gl', 'FGOALS_s2', 'IPSL_CM5A_LR', 'MPI_ESM_P', 'CSIRO', 'HadCM3', 'CESM', 'GISS']
mpsd_pmip = pyleo.MultiplePSD([psd_wwz[name] for name in pmip_names])
fig, ax = mpsd_pmip.plot(figsize=[8, 4], mute=True, cmap='tab10')
mpsd_obs.plot(ax=ax, legend=False)
ax.set_xlim([1e7, 0.1])
ax.set_ylim([1e-4, 1e8])
ax.set_xticks(period_ticks)
ax.set_xticklabels(period_ticklabels)
ax.set_ylabel('Spectral Density')
pyleo.showfig(fig)

Similarly, we reproduce the lower panel of Fig. 2 of the original paper as below:

In [None]:
clr_deglacial_dict = {
    'trace21ka_full': tableau20[6],
    'DGns': tableau20[4],
    'SIM2bl': tableau20[0],
}
for k, v in clr_deglacial_dict.items():
    psd_wwz[k].plot_kwargs = {'color': v}


period_ticks = [0.5, 1, 2, 5, 10, 20, 100, 1000, 10000, 100000]
period_ticklabels = ['0.5', '1', '2', '5', '10', '20', '100', '1 k', '10 k', '100 k']

mpsd_deglacial = pyleo.MultiplePSD([psd_wwz[name] for name in ['trace21ka_full', 'DGns', 'SIM2bl']])
fig, ax = mpsd_deglacial.plot(figsize=[8, 4], mute=True)
mpsd_obs.plot(ax=ax, legend=False)
ax.set_xlim([1e7, 0.1])
ax.set_ylim([1e-4, 1e8])
ax.set_xticks(period_ticks)
ax.set_xticklabels(period_ticklabels)
ax.set_ylabel('Spectral Density')
pyleo.showfig(fig)

### Estimation of the scaling exponents

You may notice that something is missing comparing our reproduced figures to the figures in the original paper -- the scaling exponents.

Below, we use the `pyleoclim.PSD.beta_est()` method to estimate the scaling exponents for each dataset.
To do that, we need to specify the frequency range over which we estimate.
The estimation is achieved utilizing linear regression in the log-log space.
Since the frequency vector we used is `nfft`, which is defined in a linear space, so the frequency points will be denses over the high frequency band and coarser over the low frequency band, and binning is needed prior to the linear regression, so there's an argument called `logf_binning_step` that we need to set.
While the default is `max`, which means to use the largest spacing for binning, here we use the first spacing of the frequency vector, as per the original paper.

Note that we estimate exponents over two scaling regimes with a break at 400 yrs for the deglacial simualtions.

In [None]:
# define frequency range for the exponent estimation
franges = {
    'EDC': [1/50000, 1/1500],
    'HadCRUT4': [1/50, 6],
    'GAST': [1/100000, 1/2000],
    'ProbStack': [1/100000, 1/10000],
    'LMR': [1/1000, 1/2],
}

# for PMIP simulations, we estimation the scaling slope over 2-500 yrs
for name in pmip_names:
    franges[name] = [1/500, 1/2]

beta_est_res = {}
for name, frange in franges.items():
    beta_est_res[name] = psd_wwz[name].beta_est(fmin=frange[0], fmax=frange[-1], logf_binning_step='first')
    
# for deglacial model simulations, we have two scaling regimes, one over 20-400 yrs, and another over 400-2000 yrs
s_break = 400
franges_s = {
    'trace21ka_full': [1/s_break, 1/21],  # note that for TraCE-21ka, the slope is estimated over 21-400 yrs due to its temporal resolution 
    'DGns': [1/s_break, 1/20],
    'SIM2bl': [1/s_break, 1/20],
}
franges_l = {
    'trace21ka_full': [1/2000, 1/s_break],
    'DGns': [1/2000, 1/s_break],
    'SIM2bl': [1/2000, 1/s_break],
}

beta_est_s_res = {}
for name, frange in franges_s.items():
    beta_est_s_res[name] = psd_wwz[name].beta_est(fmin=frange[0], fmax=frange[-1], logf_binning_step='first')
    
beta_est_l_res = {}
for name, frange in franges_l.items():
    beta_est_l_res[name] = psd_wwz[name].beta_est(fmin=frange[0], fmax=frange[-1], logf_binning_step='first')

Now we re-plot the figures with the estimated scaling exponents displayed in the legend and visualized via straight lines in the figure.
Below is for Fig. 1.

In [None]:
clr_dict = {
    'EDC': tableau20[0],
    'HadCRUT4': tableau20[3],
    'GAST': tableau20[4],
    'ProbStack': tableau20[5],
    'LMR': tableau20[6],
}

for k, v in clr_dict.items():
    psd_wwz[k].plot_kwargs = {'color': v}
    
period_ticks = [0.5, 1, 2, 5, 10, 20, 100, 1e3, 1e4, 1e5, 1e6]
period_ticklabels = ['0.5', '1', '2', '5', '10', '20', '100', '1 k', '10 k', '100 k', '1 m']

mpsd_obs = pyleo.MultiplePSD([psd_wwz[name] for name in ['EDC', 'HadCRUT4', 'GAST', 'ProbStack', 'LMR']])
fig, ax = mpsd_obs.plot(figsize=[8, 4], mute=True)
ax.set_xlim([1e7, 0.1])
ax.set_ylim([1e-4, 1e8])
ax.set_xticks(period_ticks)
ax.set_xticklabels(period_ticklabels)

labels = ax.get_legend_handles_labels()[-1]
new_labels = []
i = 0
for name in ['EDC', 'HadCRUT4', 'GAST', 'ProbStack', 'LMR']:
    res = beta_est_res[name]
    ax.plot(1/res['f_binned'], res['Y_reg'], linestyle='--', color='k', linewidth=1, zorder=99)
    new_labels.append(fr'{labels[i]} ($\beta=${res["beta"]:.2f}$\pm${res["std_err"]:.2f})')
    i += 1

ax.legend(labels=new_labels)
ax.set_ylabel('Spectral Density')
pyleo.showfig(fig)

Then the upper panel of Fig. 2.

In [None]:
clr_dict = {
    'EDC': 'grey',
    'HadCRUT4': 'grey',
    'GAST': 'grey',
    'ProbStack': 'grey',
    'LMR': 'grey'
}
for k, v in clr_dict.items():
    psd_wwz[k].plot_kwargs = {'color': v, 'alpha': 0.2, 'linewidth': 1.5}
    
mpsd_obs = pyleo.MultiplePSD([psd_wwz[name] for name in ['EDC', 'HadCRUT4', 'GAST', 'ProbStack', 'LMR']])


period_ticks = [0.5, 1, 2, 5, 10, 20, 100, 1e3, 1e4, 1e5]
period_ticklabels = ['0.5', '1', '2', '5', '10', '20', '100', '1 k', '10 k', '100 k']

fig, ax = mpsd_pmip.plot(figsize=[8, 4], mute=True, cmap='tab10')

mpsd_obs.plot(ax=ax, legend=False)
ax.set_xlim([1e6, 0.1])
ax.set_ylim([1e-4, 1e8])
ax.set_xticks(period_ticks)
ax.set_xticklabels(period_ticklabels)

labels = ax.get_legend_handles_labels()[-1]
new_labels = []
i = 0
for name in pmip_names:
    res = beta_est_res[name]
    ax.plot(1/res['f_binned'], res['Y_reg'], linestyle='--', color='k', linewidth=1, zorder=99)
    new_labels.append(fr'{labels[i]} ($\beta=${res["beta"]:.2f}$\pm${res["std_err"]:.2f})')
    i += 1

ax.legend(labels=new_labels, bbox_to_anchor=(1.5, 1))
ax.set_ylabel('Spectral Density')
pyleo.showfig(fig)

... and the lower panel of Fig. 2.

In [None]:
clr_dict = {
    'EDC': 'grey',
    'HadCRUT4': 'grey',
    'GAST': 'grey',
    'ProbStack': 'grey',
    'LMR': 'grey'
}
for k, v in clr_dict.items():
    psd_wwz[k].plot_kwargs = {'color': v, 'alpha': 0.2, 'linewidth': 1.5}
    
mpsd_obs = pyleo.MultiplePSD([psd_wwz[name] for name in ['EDC', 'HadCRUT4', 'GAST', 'ProbStack', 'LMR']])

clr_deglacial_dict = {
    'trace21ka_full': tableau20[6],
    'DGns': tableau20[4],
    'SIM2bl': tableau20[0],
}
for k, v in clr_deglacial_dict.items():
    psd_wwz[k].plot_kwargs = {'color': v}

period_ticks = [0.5, 1, 2, 5, 10, 20, 100, 1e3, 1e4, 1e5]
period_ticklabels = ['0.5', '1', '2', '5', '10', '20', '100', '1 k', '10 k', '100 k']

fig, ax = mpsd_deglacial.plot(figsize=[8, 4], mute=True)
mpsd_obs.plot(ax=ax, legend=False)
ax.set_xlim([1e6, 0.1])
ax.set_ylim([1e-4, 1e8])
ax.set_xticks(period_ticks)
ax.set_xticklabels(period_ticklabels)

labels = ax.get_legend_handles_labels()[-1]
new_labels = []
i = 0
for name in ['trace21ka_full', 'DGns', 'SIM2bl']:
    res_s = beta_est_s_res[name]
    res_l = beta_est_l_res[name]
    ax.plot(1/res_s['f_binned'], res_s['Y_reg'], linestyle='--', color='k', linewidth=1, zorder=99)
    ax.plot(1/res_l['f_binned'], res_l['Y_reg'], linestyle='--', color='k', linewidth=1, zorder=99)
    beta_s_str = r'$\beta_{DC}$'
    beta_s = res_s['beta']
    err_s = res_s['std_err']
    beta_l_str = r'$\beta_{CM}$'
    beta_l = res_l['beta']
    err_l = res_l['std_err']
    new_labels.append(fr'{labels[i]} ({beta_l_str}$=${beta_l:.2f}$\pm${err_l:.2f}; {beta_s_str}$=${beta_s:.2f}$\pm${err_s:.2f})')
    i += 1

ax.legend(labels=new_labels, loc='upper right', bbox_to_anchor=(1.1, 1))
ax.set_ylabel('Spectral Density')
pyleo.showfig(fig)

## Part II (A) spectral analysis of `EnsembleSeries` with the Lomb-Scargle method

In this practice, we will load the LMR reconstructed GMST ensemble, and perform spectral analysis on it with the Lomb-Scargle method.

In [None]:
# download the LMR GMST ensemble
!wget https://atmos.washington.edu/%7Ehakim/lmr/LMRv2/gmt_MCruns_ensemble_full_LMRv2.1.nc

In [None]:
import xarray as xr

with xr.open_dataset('gmt_MCruns_ensemble_full_LMRv2.1.nc') as ds:
    print(ds)
    lmr_gmt = ds['gmt'].values
    lmr_time = ds['time'].values

In [None]:
# replacing the time axis with an numpy array of years
print(lmr_time)
lmr_time = np.arange(2001)

In [None]:
nt, nMC, nEns = np.shape(lmr_gmt)
ts_lmr_members = []
for i in range(nMC):
    for j in range(nEns):
        ts_lmr_members.append(pyleo.Series(time=lmr_time, value=lmr_gmt[:, i, j]))

Now that all the series have been put into an `Ensemble` object, they can easily be manipulated. First, let's plot the time evolving distribution of the ensemble:

In [None]:
ms_lmr = pyleo.EnsembleSeries(ts_lmr_members)        
fig, ax = ms_lmr.plot_envelope()
pyleo.closefig(fig)

As you can see, the variance of the median decreases back in time, but that is associated with a wide increase in the uncertainties. This is simply due to attrition: the further back you go, the fewer annually-resolved proxies are constraining the reconstriction, so it reverts to a flat-ish line, with broad uncertainties to tell you not to take this literally.

But there's more! In one flourish of the wand, you can also apply a method (say, Lomb-Scargle spectral analysis) to all ensemble members at once. It does require a little patience to see this to completion, but that is the only way to truly quantify uncertainties in the spectrum of this index.

In [None]:
# your code here for spectral analysis

The output of this function is a `MultiplePSD` object, and it too knows how to plot distributions in one line of code:

In [None]:
# your code here to visualize the result using the MultiplePSD.plot_envelope() method

## Part II (B) a practice of comparing results from different spectral analysis method

Now it's time for the reader to seize control of this tool and see what they might do with it. Some suggestions:

We have performed spectral analysis using the WWZ method above, yet `Pyleoclim` provides also the classic multi-taper method (MTM, [Thomson 1982](https://ieeexplore.ieee.org/abstract/document/1456701/)) and Lomb-Scargle periodogram ([Lomb 1976](https://link.springer.com/article/10.1007%2FBF00648343), [Scargle 1982](https://link.springer.com/article/10.1007%2FBF00648343)).
The MTM method can handle only evenly-spaced data, so for paleo-records, we need to first perform an interpolation prior to the spectral analysis.
The Lomb-Scargle peridogram, on the other hand, can handle unevenly-spaced data without interpolation.

For our practice, we'd like to perform spectral analysis on an unevenly-spaced time series using all the three methods and compare the results:
1. [WWZ](https://pyleoclim-util.readthedocs.io/en/stable/utils/spectral/wwz_psd.html?highlight=wwz_psd), or `pyleoclim.Series.spectral(method='wwz')` with appropriate arguments
2. [interpolation](https://pyleoclim-util.readthedocs.io/en/stable/core/Series/interp.html#pyleoclim.core.ui.Series.interp) + [MTM](https://pyleoclim-util.readthedocs.io/en/stable/utils/spectral/mtm.html?highlight=mtm), or `pyleoclim.Series.interp()` + `pyleoclim.Series.spectral(method='mtm')` with appropriate arguments
3. [Lomb-Scargle](https://pyleoclim-util.readthedocs.io/en/stable/utils/spectral/lombscargle.html?highlight=lomb%20scargle#pyleoclim.utils.spectral.lomb_scargle), or `pyleoclim.Series.spectral(method='lomb_scargle')` with appropriate arguments


### Details of the problem

Please complete the following steps:

1. Perform spectral analysis on the EDC dataset using WWZ, MTM, and Lomb-Scargle, and get three `pyleoclim.PSD` objects.
2. Estimate the scaling exponent over the frequency band [1/50000, 1/1500].
3. Visualize the three `pyleoclim.PSD` objects along with the information of the scaling exponents (with text in the legend and straight lines in the figure).
4. Compare the results from three methods, and discuss what you find.

In [None]:
# your code here
# hint: to perform interpolation, you may need to enable extrapolation following the doc:
# https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.interp1d.html
# or pyleoclim.Series.interp(..., fill_value='extrapolate')