# Working with ensembles in `pylipd`

## Authors

Deborah Khider, Varun Ratnakar

Information Sciences Institute, University of Southern California

Author1 = {"name": "Deborah Khider", "affiliation": "Information Sciences Institute, University of Southern California", "email": "khider@usc.edu", "orcid": "0000-0001-7501-8430"}

Author2 = {"name": "Varun Ratnakar", "affiliation": "Information Sciences Institute, University of Southern California", "email": "varunr@isi.edu"}

## Preamble

`pylipd` is a Python package that allows you to read, manipulate, and write [LiPD](https://cp.copernicus.org/articles/12/1093/2016/cp-12-1093-2016-discussion.html#discussion) formatted datasets. One of the advantages of the LiPD format is that it allows to store tables of uncertainty ensembles (in particular, age). This notebook describes how `pylipd` handles dealing with ensembles.

### Goals

* Reading an ensemble from a LiPD object

Reading Time: 5 minutes

### Keywords

LiPD

### Pre-requisites

None. This tutorial assumes basic knowledge of Python and Pandas. If you are not familiar with this coding language and this particular library, check out this tutorial: http://linked.earth/ec_workshops_py/.

### Relevant Packages

pylipd

## Data Description

This notebook uses the following datasets, in LiPD format:

- McCabe-Glynn, S., Johnson, K., Strong, C. et al. Variable North Pacific influence on drought in southwestern North America since AD 854. Nature Geosci 6, 617–621 (2013). https://doi.org/10.1038/ngeo1862

- Mix, A. C., J. Le, and N. J. Shackleton (1995), Benthic foraminiferal stable isotope stratigraphy from Site 846: 0–1.8 Ma, Proc. Ocean Drill. Program Sci. Results, 138, 839–847.

- Shackleton, N. J., Hall, M. A., & Pate, D. (1995). Pliocene stable isotope stratigraphy of ODP Site 846. Proc. Ocean Drill. Program Sci. Results, 138, 337-356.

- Lawrence, K. T., Liu, Z. H., & Herbert, T. D. (2006). Evolution of the eastern tropical Pacific through Plio-Pleistocne glaciation. Science, 312(5770), 79-83.

## Demonstration

In [1]:
from pylipd.lipd import LiPD

In [2]:
D = LiPD()
data_path = ['../data/Crystal.McCabe-Glynn.2013.lpd', '../data/ODP846.Lawrence.2006.lpd']
D.load(data_path)

Loading 2 LiPD files
Conversion to RDF done..
Loading RDF into graph
Loaded..


In [3]:
names = D.get_all_dataset_names()
print(names)

['Crystal.McCabe-Glynn.2013', 'ODP846.Lawrence.2006']


To load the ensemble tables for all the files:

In [4]:
df = D.get_ensemble_tables()

df

Unnamed: 0,datasetName,ensembleTable,ensembleVariableName,ensembleVariableValues,ensembleVariableUnits,ensembleDepthName,ensembleDepthValues,ensembleDepthUnits,notes
0,Crystal.McCabe-Glynn.2013,http://linked.earth/lipd#Crystal.McCabe-Glynn....,Year,"[[2007.0, 2007.0, 2008.0, 2007.0, 2007.0, 2007...",AD,depth,"[0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0...",mm,
1,ODP846.Lawrence.2006,http://linked.earth/lipd#chron0model0ensemble0,age,"[[4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0,...",kyr BP,depth,"[0.12, 0.23, 0.33, 0.43, 0.53, 0.63, 0.73, 0.8...",m,


The dataframes return the following information:

* `datasetName`: The name of the dataset
* `ensembleTable`: The ensemble tables associated with the dataset. If more than one ensembleTable is available for the record, then each table will be contained on a different row
* `ensembleVariableName`: The name of the ensemble variable. Most likely, it will be a variant of 'age' or 'year'
* `ensembleVariableValues`: The values on the ensembles
* `ensembleVariableUnits`: The units associated with the time variable
* `ensembleDepthName`: The name of the depth vector
* `ensembleDepthValues`: The values for the depth axis. This is particularly useful when matching a ensemble table to a particular variable
* `ensembleDepthUnits`: The units for the depth. 
* `notes`: Notes regarding how the model was obtained/done. 

If interested in only one dataset:

In [5]:
df = D.get_ensemble_tables(dsname=names[0])

df

Unnamed: 0,datasetName,ensembleTable,ensembleVariableName,ensembleVariableValues,ensembleVariableUnits,ensembleDepthName,ensembleDepthValues,ensembleDepthUnits,notes
0,Crystal.McCabe-Glynn.2013,http://linked.earth/lipd#Crystal.McCabe-Glynn....,Year,"[[2007.0, 2007.0, 2008.0, 2007.0, 2007.0, 2007...",AD,depth,"[0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0...",mm,


If you know the variable name:

In [7]:
df = D.get_ensemble_tables(ensembleVarName='age')

df

Unnamed: 0,datasetName,ensembleTable,ensembleVariableName,ensembleVariableValues,ensembleVariableUnits,ensembleDepthName,ensembleDepthValues,ensembleDepthUnits,notes
0,ODP846.Lawrence.2006,http://linked.earth/lipd#chron0model0ensemble0,age,"[[4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0,...",kyr BP,depth,"[0.12, 0.23, 0.33, 0.43, 0.53, 0.63, 0.73, 0.8...",m,
