<img src='https://github.com/LinkedEarth/Logos/raw/master/PYLEOCLIM_logo_HORZ-01.png' width="800">

# 2. Using Pyleoclim with LiPD files

## Preamble

Thought it is not a requirement for its use, Pyleoclim can handle LiPD files directly. For instance, Pyleoclim has a `Lipd` object that stores all the content of a LiPD file in memory (ie., root metadata, location, paleodata information, chrondata information...). Some methods are specific to this object (e.g., mapping). However, a majority of the available analyses operate on a time series `LipdSeries`, for which we leverage the `Series` class described in Notebook 1. The `LipdSeries` object has a few more metadata information, which allows for more in-depth functionalities. 

This notebook makes use of several previously published records. Please cite these studies if used in a presentation/publication.

- MD98-2170 record:  Stott, L., Cannariato, K., Thunell, R. et al. Decline of surface temperature and salinity in the western tropical Pacific Ocean in the Holocene epoch. Nature 431, 56–59 (2004). https://doi.org/10.1038/nature02903.

- Euro2k database: PAGES2k Consortium., Emile-Geay, J., McKay, N. et al. A global multiproxy database for temperature reconstructions of the Common Era. Sci Data 4, 170088 (2017). https://doi.org/10.1038/sdata.2017.88

- Crystal cave record: McCabe-Glynn, S., Johnson, K., Strong, C. et al. Variable North Pacific influence on drought in southwestern North America since AD 854. Nature Geosci 6, 617–621 (2013). https://doi.org/10.1038/ngeo1862

## Working with LiPD objects

The Linked Paleo Data format ([LiPD](http://www.clim-past-discuss.net/11/4309/2015/cpd-11-4309-2015-discussion.html)) was designed to simplify the sharing, reuse, and analysis of paleoclimate data by combining a flexible, hierarchical data container with linked data concepts. Data stored in the `.lpd` format can be directly loaded into Pyleoclim as a [Lipd object](https://pyleoclim-util.readthedocs.io/en/master/core/ui.html#lipd-pyleoclim-lipd). 

Let's load a single LiPD file by initializaing an object. You can either open a file/folder (local or url) by specifying `usr_path` and/or an exisiting dictonary loaded file using `lipd_dict`. Note that `lipd_dict` refers to a dictionary obtained through the LiPD utilities and not an another Lipd object. 

In [None]:
import pyleoclim as pyleo
d = pyleo.Lipd('../data/MD98-2170.Stott.2004.lpd')

It is also possible to import multiple LiPD files from the same folder using this method. The `validate` parameter allows you to check the file against an online LiPD validator to make sure that the files contain the minumum amount of metadata. This process can take a long time for large folders. Setting remove to True ignores files judged invalid. Note that most functions *should* work even with invalid files, though not optimally.

In [None]:
d_euro = pyleo.Lipd('../data/Euro2k')

Some functions are meant to directly manipulate LiPD libraries. An example is [mapAllArchive](https://pyleoclim-util.readthedocs.io/en/master/core/ui.html#pyleoclim.core.ui.Lipd.mapAllArchive) which will create a map of all the dataset locations, arranged by the type of archive.

In [None]:
d_euro.mapAllArchive()

To change the projection to center around Europe and place the legend on the right side, one can write the following:

In [None]:
d_euro.mapAllArchive(projection='Orthographic', proj_default={'central_longitude':10, 'central_latitude':30},lgd_kwargs={'loc':'lower right'})

Pretty, eh? Well, that legend is a bit obstrusive. 


**Exercise 2.1**

Place the legend outside the plot. (Hint: look up `bbox_to_anchor`  in [this Matplotlib guide](https://matplotlib.org/stable/tutorials/intermediate/legend_guide.html))

In [None]:
## your code here##

To save the figure:

In [None]:
d_euro.mapAllArchive(projection='Orthographic', proj_default={'central_longitude':10, 'central_latitude':30},savefig_settings={'path':'map.png','format':'png'})

Although working with LiPD objects can be useful for mapping, most of the granularity in routine paleoceanographic studies happens at the individual timeseries level. Next, we will discuss how to obtain a LipdSeries from a Lipd object in Pyleoclim. 

## Working with LipdSeries

The [`LipdSeries object`](https://pyleoclim-util.readthedocs.io/en/master/core/ui.html#lipdseries-pyleoclim-lipdseries) is a child of the `Series` object, therefore all the methods discussed for `Series` in Notebook 1 will apply to these series. In addition to these functions, a few are specific to `LipdSeries`.

### Creating a `LipdSeries` from a `Lipd` object.
 
There are several ways to obtain a `LipdSeries` from a `Lipd` object. Each method has its advantages and disadvantages. Let's have a look at all of them. 

#### Using `Lipd.to_tso`

If nothing is known about the content of the file, it may be useful to use the `Lipd.to_tso` method to obtain a list of dictionary that can be iterated upon. Dictionaries are native to Python and can be easily explored as shown below:

In [None]:
ts_list = d.to_tso()
for idx, item in enumerate(ts_list):
    print(str(idx)+': '+item['dataSetName']+': '+item['paleoData_variableName'])

In [None]:
ts_list[5].keys()

Remember that Python indexing starts at 0, so the first timeseries is actually representative of depth.

Let's create a `LipdSeries` object of the sea surface temperature (sst) variable:

In [None]:
ts_sst=pyleo.LipdSeries(ts_list[5])

**Exercise 2.2**

Now that the object has been created, use the `plot()` function to display the timeseries.

In [None]:
##your code here##

Alternatively, Pyleoclim also supports passing the entire dictionary (d). In this case, you will be prompted to choose a `LipdSeries` based on the datasetname and variable name.

**Exercise 2.3**

Run the cell below and select mg/ca:


In [None]:
ts_mgca = pyleo.LipdSeries(ts_list)

Let's check that we have the right series

In [None]:
ts_mgca.plot()

#### Using `Lipd.to_LipdSeries`

Another option to create a `LipdSeries` object from a `Lipd` object is to use the [`Lipd.to_LipdSeries`](https://pyleoclim-util.readthedocs.io/en/master/core/ui.html#pyleoclim.core.ui.Lipd.to_LipdSeries) method. This function can take an optional argument (the index of the series of interest) if it is known. Otherwise, the behavior is equivalent to using a lipd timeseries list.

* ***Option 1***: Not passing the number of the timeseries. 

**Exercise 2.4**

Run the cell below and choose d18O.

In [None]:
ts_d18O = d.to_LipdSeries()

* ***Option 2***: Use the `number` parameter to directly select a variable

**Exercise 2.5** 

Use the `number` parameter to store the series with information about d18Ow into a new `LipdSeries` object called `ts_d18Ow` and plot the series.

In [None]:
##your code here##

**Warning**: By construction, a `LipdSeries` requires float or entries that can be [coerced](https://python-reference.readthedocs.io/en/latest/docs/functions/coerce.html) to a `float` type, since most of the functionalities require floats to work correctly. If a column contains a string (for instance to signal the name of a core), the `LipdSeries` object won't be created.  


Let's demonstrate with the Euro2k database:

In [None]:
ts_list_euro = d_euro.to_tso()
for idx, item in enumerate(ts_list_euro):
    if 'archiveType' in item.keys():
        at = item['archiveType']
    else:
        at ='other'
    print(str(idx)+': '+item['dataSetName']+': '+at + ': ' +item['paleoData_variableName'])

One of the timeseries refers to SampleID, which is unlikely to be coerced into a float. Choose the corresponding number and enter it in the cell below:

In [None]:
ts_sampleID = d_euro.to_LipdSeries(number=__)

Pyleoclim informs that it could not be converted to float and returns an error.

#### Using `Lipd.to_LipdSeriesList` to create `MultipleSeries` object

This method is intended to create a list of potential `LipdSeries` for use with `MultipleSeries`. Remember that a `MultipleSeries` object can be created using a list of `Series`. Since `LipdSeries` is a child of `Series`, a `MultipleSeries` object can also be created from a list of `LipdSeries`. 

In a lot of ways, it is intended to function like the `Lipd.to_tso` object. However, the list contains `LipdSeries` objects that can be directly utilized. 

Let's look at an example:


In [None]:
ts_SeriesList = d.to_LipdSeriesList()

Since ts_SeriesList is a list, it can be sliced for variables of interest. Here, let's use only the sst and d18Ow variables to create a `MultipleSeries` object.

In [None]:
ms_md70 = pyleo.MultipleSeries(ts_SeriesList[4:])

And let's plot!

In [None]:
ms_md70.stackplot()

This method is fast for a limited number of timeseries. To create a `MultipleSeries` object from a larger database, follow the following recipe.

First, enumerate the available timeseries. 

In [None]:
ts_list_euro = d_euro.to_tso()
for idx, item in enumerate(ts_list_euro):
    if 'archiveType' in item.keys():
        at = item['archiveType']
    else:
        at ='other'
    print(str(idx)+': '+item['dataSetName']+': '+at + ': ' +item['paleoData_variableName'])

Let's collect all the indices for coral d18O records and put them in a list: 

In [None]:
idx = [_____]

From this list, we can create a `MultipleSeries` object:

In [None]:
ts_list_euro_coral =[]

for i in idx:
    ts_list_euro_coral.append(pyleo.LipdSeries(ts_list_euro[i]))

ms_euro_coral = pyleo.MultipleSeries(ts_list_euro_coral)
ms_euro_coral.stackplot()

An alternative option is to extract all LipdSeries using `Lipd.to_LipdSeriesList` and slicing. Note that this option could be slow for a large collection of series.

**Important Note**

Remember that the Euro2k database contains timeseries that cannot be coerced to float. In order to avoid returning an error, Pyleoclim is instructed to just pass over these series and warns the user. Consequently, the number of timeseries returned by `Lipd.to_LipdSeriesList` could be less than the number returned by `Lipd.to_tso`. The indexing is also affected. 

One option to get the slicing indices is shown below:

In [None]:
import numpy as np
idx_pass = 0
for idx, item in enumerate(ts_list_euro):
    if 'archiveType' in item.keys():
        at = item['archiveType']
    else:
        at ='other'
    try:
        np.array(item['paleoData_values'],dtype=float) 
        print(str(idx-idx_pass)+': '+item['dataSetName']+': '+at + ': ' +item['paleoData_variableName'])
    except:
        idx_pass+=1

**Exercise 2.6**

Use the alternative option to recreate the `MultipleSeries` object containing all coral d18O series and plot them using the `stackplot` method.

In [None]:
##your code here##

### Loading a LipdSeries with an age ensemble

Pyleoclim makes use of age ensembles for uncertainty quantification. Although the package doesn't contain age modeling software, it is capable of leveraging the output of such software.

*Note*: Since most age modeling software have an R interface, age modeling for LiPD datasets is handled through the [`GeochronR`](https://nickmckay.github.io/GeoChronR/) package. Note that Jupyter can support R code through [Magics](https://ipython.readthedocs.io/en/stable/interactive/magics.html) so it is possible to use both software using the Jupyter environment. 

Let's load a file with such an age model.

In [None]:
d_cave = pyleo.Lipd('../data/Crystal.McCabe-Glynn.2013.lpd')

Let's load the d18O record.

In [None]:
ts = d_cave.to_LipdSeries()

To attach the age model ensemble, you can used the function [`chronEnsembleToPaleo`](https://pyleoclim-util.readthedocs.io/en/master/core/ui.html#pyleoclim.core.ui.LipdSeries.chronEnsembleToPaleo).

Note that this function needs to reference the original `Lipd` object (d_cave, in this case).

In [None]:
ens_cave = ts.chronEnsembleToPaleo(d_cave)

We can now plot the record on this ensemble of ages. Prior to doing this, we need to align these time axes, which we do via `common_time()`:

In [None]:
fig,ax=ens_cave.common_time(method='interp').plot_envelope()

### Functions specific to LipdSeries objects

#### Mapping

Because a `LipdSeries` object contains richer metadata than their `Series` counterpart, a few more functionalities are available. One such functionality is to `map` the location of the record.

Let's [`map`](https://pyleoclim-util.readthedocs.io/en/master/core/ui.html#pyleoclim.core.ui.LipdSeries.map) the record from Stott et al. (2004) that we loaded originally. Remember that we extracted several `LipdSeries` from this record. Any of them will work for mapping purposes.

In [None]:
ts_sst.map(lgd_kwargs={'bbox_to_anchor':(1.5, 1)})

By default, Pyleoclim uses a color palette for each archive. You can modify that behavior by passing a different color to the function.

In [None]:
ts_sst.map(color='k', lgd_kwargs={'bbox_to_anchor':(1.5, 1)})

If you want to get fancier, you need to deal with the mapping package [cartopy](https://scitools.org.uk/cartopy/docs/latest/). God help you.

In [None]:
import cartopy.crs as ccrs

label = 'MD98-2170'
fig,ax=ts_sst.map(markersize = 100, mute=True, lgd_kwargs={'bbox_to_anchor':(1.5, 1)}) # important as to not return the figure before adding the label
ax.text(130,-2,label,transform=ccrs.PlateCarree()) #need to use the transform option for use with cartopy to set the projection for the data, in this case the label. 
pyleo.showfig(fig)

**Exercise 2.7**

Map a record from the Euro2k database. 

In [None]:
## your code here ##

#### Map records close to the one of interest. 

The [`LipdSeries.mapNearRecord`](https://pyleoclim-util.readthedocs.io/en/latest/core/ui.html#pyleoclim.core.ui.LipdSeries.mapNearRecord) method allows to plot the nearest records in a LiPD database.

Let's have a look at a record from the Euro2k database again:

In [None]:
for idx, item in enumerate(ts_list_euro):
    if 'archiveType' in item.keys():
        at = item['archiveType']
    else:
        at ='other'
    print(str(idx)+': '+item['dataSetName']+': '+at + ': ' +item['paleoData_variableName'])

I am really interested in the 'Eur-FinnishLakelands.Helama.2014: tree: temperature' record and would like to know what's nearby for comparison. First, let's load that record into a `LipdSeries` object called `ts_finnish`.

In [None]:
##your code here##

To use the mappind function, I need to give a reference LiPD object as well:

In [None]:
ts_finnish.mapNearRecord(D=d_euro)

By default, Pyleoclim will select the 5 nearest record to the site and plot them with a white border. Let's increase to 8 records:

In [None]:
ts_finnish.mapNearRecord(D=d_euro, n=8)

**Exercise 2.8** 

Using the `LipdSeries.mapNearRecord` available parameters, plot the nearest records that are of the same archiveType as the original record.

In [None]:
##your code here##

**Exercise 2.9**

Plot all records that are within 500km of the original timeseries.

In [None]:
##your code here##

#### Dashboards

[Dashboards](https://pyleoclim-util.readthedocs.io/en/master/core/ui.html#pyleoclim.core.ui.LipdSeries.dashboard) plot essential information about a `LipdSeries`, making use of various functions applicable to `Series` and `LipdSeries`. Everything is customizable by passing the appropriate arguments to each of the functionalities.

In [None]:
ts_sst.dashboard(spectralsignif_kwargs={'number':1000})

Notice the four elements: 
1. the timeseries itself
1. its distribution
1. a map of the site
1. a spectral estimate, complete with estimate of significance

This is about what you need to get a synopsis of a dataset. Note that spectral analysis can be slow with long timeseries (see below)

**Exercise 2.10**

Create a dashboard for the d18O record from Crystal cave, removing the metadata on the right to give more space to the plots themselves. 

In [None]:
## your code here ##

**Exercise 2.11**

When an ensemble is present, the dashboard can be modified to take these into accounts. Try the function while enabling ensemble.

**Hint**: Read the arguments carefully, when ensemble is set to True, a `Lipd` object needs to be passed.

In [None]:
##your code here ##