## PyLiPD OOP Demo

#### PyLiPD Ontologies

The PyLiPD OOP Classes are generated from the ontologies at: https://linked.earth/ontology/
* The [Linked Earth Core Ontology](https://linked.earth/ontology/core/2.0.0/index-en.html) provides the main concepts and relationships to describe a paleoclimate dataset and its values.
* The [Archive Type Ontology](https://linked.earth/ontology/archive/2.0.0/index-en.html) describes a taxonomy of the most common types of archives.
* The [Paleo Variables Ontology](https://linked.earth/ontology/paleo_variables/2.0.0/index-en.html) describes a taxonomy of the most common types of paleo variables.
* The [Paleo Proxy Ontology](https://linked.earth/ontology/paleo_proxy/2.0.0/index-en.html) describes a taxonomy of the most common types of paleo proxies.
* The [Paleo Units Ontology](https://linked.earth/ontology/paleo_units/2.0.0/index-en.html) describes a taxonomy of the most common types of paleo units.
* The [Interpretation Ontology](https://linked.earth/ontology/interpretation/2.0.0/index-en.html) describes a taxonomy of the most common interpretations.
* The [Instrument Ontology](https://linked.earth/ontology/instrument/2.0.0/index-en.html) describes a taxonomy of the most common instrument for taking measurements.
* The [Chron Variables Ontology](https://linked.earth/ontology/chron_variables/2.0.0/index-en.html) describes a taxonomy of the most common types of chron variables.
* The [Chron Proxy Ontology](https://linked.earth/ontology/chron_proxy/2.0.0/index-en.html) describes a taxonomy of the most common types of chron proxies.
* The [Chron Units Ontology](https://linked.earth/ontology/chron_units/2.0.0/index-en.html) describes a taxonomy of the most common types of chron units.



### PyLiPD OOP Classes UML Diagram
![PyLiPD OOP Classes UML Diagram](UMLDiagram.png "yLiPD OOP Classes UML Diagram")

## Reading an existing LiPD file

In [1]:
# Dataset is the Main OOP Class
from pylipd.classes.dataset import Dataset

# LiPD is the LiPD parser/writer
from pylipd.lipd import LiPD

# Load LiPD files as usual.
# - This loads the LiPD data into the internal RDF graph
path = '../data/Pages2k'
D = LiPD()
D.load_from_dir(path)

# Convert the LiPD datasets to the PyLiPD OOP "Dataset" class. 
# - This allows to modify the datasets via OOP calls in memory
# - To write the LiPD back, we have to call the save function
datasets = D.get_datasets()

for ds in datasets:
    # Now we can call individual function on the dataset to get its details
    print("\n")
    print(ds.getName())
    print("=========================")
    for funding in ds.getFundings():
        if funding.getGrants():
            print(f"Funding: {funding.getGrants()}")
    
    for pub in ds.getPublications():
        print(f"Publication: {pub.getTitle()} by {list(map(lambda x: x.getName(), pub.getAuthors()))}")

    for pd in ds.getPaleoData():
        for table in pd.getMeasurementTables():
            print(f"- Paleo Table: {table.getFileName()}")
            
            # Can get the dataframe for the whole table
            df = table.getDataFrame(use_standard_names=True)
            display(df.head())

            # The returned dataframe also contains the attributes for the variables
            for varname in df.attrs:
                vardata = df.attrs[varname]
                # print(varname, vardata)

            # Can also get the variables one by one and make calls to their class functions
            print(f"Missing Value: {table.getMissingValue()}")
            for var in table.getVariables():
                if var.getUnits():
                    print(f"- {var.getName()} [{var.getUnits().getLabel()}]")
                else:
                    print(f"- {var.getName()}")

Loading 16 LiPD files


100%|██████████| 16/16 [00:00<00:00, 67.02it/s]


Loaded..


Ocn-RedSea.Felis.2000
Publication: A coral oxygen isotope record from the northern Red Sea documenting NAO, ENSO, and North Pacific teleconnections on Middle East climate variability since the year 1750 by ['Thomas Felis', 'Jürgen Pätzold', 'Ahmed H. Nawar', 'Gerold Wefer', 'Yossi Loya', 'Maoz Fine']
Publication: World Data Center for Paleoclimatology by ['T. Felis']
Publication: Tropical sea surface temperatures for the past four centuries reconstructed from coral archives by ['Jessica E. Tierney', 'Kevin J. Anchukaitis', 'K. Halimeda Kilbourne', 'Michael N. Evans', 'Henry C. Wu', 'Cyril Giry', 'Nerilie J. Abram', 'Jens Zinke', 'Casey P. Saenger']
- Paleo Table: Ocn-RedSea.Felis.2000.paleo1measurement1.csv


Unnamed: 0,d18O,year
0,-4.12,1995.583
1,-3.82,1995.417
2,-3.05,1995.25
3,-3.02,1995.083
4,-3.62,1994.917


Missing Value: NaN
- d18O [permil]
- year [yr AD]


Ant-WAIS-Divide.Severinghaus.2012
Publication: Little Ice Age cold interval in West Antarctica: Evidence from borehole temperature at the West Antarctic Ice Sheet (WAIS) Divide by ['Jeffrey P. Severinghaus', 'Anais J. Orsi', 'Bruce D. Cornuelle']
Publication: This study by ['Severinghaus J.']
- Paleo Table: Ant-WAIS-Divide.Severinghaus.2012.paleo1measurement1.csv


Unnamed: 0,uncertainty,year,temperature
0,1.327,8,-29.607
1,1.328,9,-29.607
2,1.328,10,-29.606
3,1.329,11,-29.606
4,1.33,12,-29.605


Missing Value: NaN
- uncertainty_temperature [degC]
- year [yr AD]
- temperature [degC]


Asi-SourthAndMiddleUrals.Demezhko.2007
Publication: Climatic changes in the Urals over the past millennium; an analysis of geothermal and meteorological data by ['D. Yu. Demezhko', 'I. V. Golovanova']
Publication: This study by ['D Demezhko']
- Paleo Table: Asi-SourthAndMiddleUrals.Demezhko.2007.paleo1measurement1.csv


Unnamed: 0,temperature,year
0,0.166,800
1,0.264,850
2,0.354,900
3,0.447,950
4,0.538,1000


Missing Value: NaN
- temperature [degC]
- year [yr AD]


Ocn-AlboranSea436B.Nieto-Moreno.2013
Funding: ['200800050084447 (MARM)']
Funding: ['Project RNM 05212']
Funding: ['CTM2009-7715']
Funding: ['CGL2009-07603']
Funding: ['Research Group 0179']
Funding: ['FP7/2007-2013)/ERC Grant Agreement #226600']
Publication: Climate conditions in the westernmost Mediterranean over the last two millennia: An integrated biomarker approach by ['J. García-Orellana', 'P. Masqué', 'F. Martínez-Ruiz', 'V. Willmott', 'V. Nieto-Moreno', 'J.S. Sinninghe Damsté']
Publication: PANGAEA by ['V. Nieto-Moreno']
Publication: Robust global ocean cooling trend for the pre-industrial Common Era by ['Kandasamy Selvaraj', 'Helen V. McGregor', 'Belen Martrat', 'Hugues Goosse', 'P. Graham Mortyn', 'Guillaume Leduc', 'Helena L. Filipsson', 'Kaustubh Thirumalai', 'Vasile Ersek', 'Marit-Solveig Seidenkrantz', 'Jason A. Addison', 'Michael N. Evans', 'Marie-Alexandrine Sicre', 'Steven J. Phipps', 'Delia W. Oppo']
- Paleo Tab

Unnamed: 0,year,temperature
0,1999.07,18.79
1,1993.12,19.38
2,1987.17,19.61
3,1975.26,18.88
4,1963.36,18.74


Missing Value: NaN
- year [yr AD]
- temperature [degC]


Eur-SpannagelCave.Mangini.2005
Publication: Reconstruction of temperature in the Central Alps during the past 2000 yr from a δ18O stalagmite record by ['A. Mangini', 'C. Spötl', 'P. Verdes']
Publication: World Data Center for Paleoclimatology by ['A. Mangini']
- Paleo Table: Eur-SpannagelCave.Mangini.2005.paleo1measurement1.csv


Unnamed: 0,year,d18O
0,1935.0,-7.49
1,1932.0,-7.41
2,1930.0,-7.36
3,1929.0,-7.15
4,1929.0,-7.28


Missing Value: NaN
- year [yr AD]
- d18O [permil]


Ocn-FeniDrift.Richter.2009
Publication: World Data Center for Paleoclimatology by ['T.O. Richter']
Publication: Late Holocene (0–2.4kaBP) surface water temperature and salinity variability, Feni Drift, NE Atlantic Ocean by ['F.J.C. Peeters', 'T.C.E. van Weering', 'T.O. Richter']
Publication: Robust global ocean cooling trend for the pre-industrial Common Era by ['Marit-Solveig Seidenkrantz', 'Steven J. Phipps', 'Hugues Goosse', 'P. Graham Mortyn', 'Kaustubh Thirumalai', 'Jason A. Addison', 'Michael N. Evans', 'Marie-Alexandrine Sicre', 'Delia W. Oppo', 'Kandasamy Selvaraj', 'Helen V. McGregor', 'Belen Martrat', 'Guillaume Leduc', 'Helena L. Filipsson', 'Vasile Ersek']
- Paleo Table: Ocn-FeniDrift.Richter.2009.paleo2measurement1.csv


Unnamed: 0,temperature,depthBottom,notes,year,Mg/Ca,depthTop
0,12.94,0.5,M200309,1998,2.31,0.5
1,10.99,1.5,M200309,1987,1.973,1.5
2,10.53,2.5,M200309,1975,1.901,2.5
3,10.44,3.5,M200309,1962,1.887,3.5
4,11.39,4.5,M200309,1949,2.038,4.5


Missing Value: NaN
- temperature [degC]
- depth_bottom [cm]
- notes
- year [yr AD]
- Mg_Ca
- depth_top [cm]
- Paleo Table: Ocn-FeniDrift.Richter.2009.paleo1measurement1.csv


Unnamed: 0,temperature,year,Mg/Ca
0,12.94,1998,2.31
1,10.99,1987,1.973
2,10.53,1975,1.901
3,10.44,1962,1.887
4,11.39,1949,2.038


Missing Value: NaN
- temperature [degC]
- year [yr AD]
- Mg_Ca


Eur-LakeSilvaplana.Trachsel.2010
Publication: Scanning reflectance spectroscopy (380–730 nm): a novel method for quantitative high-resolution climate reconstructions from minerogenic lake sediments by ['C. Kamenik', 'B. Rein', 'D. Schnyder', 'M. Grosjean', 'M. Trachsel']
Publication: World Data Center for Paleoclimatology by ['M. Trachsel']
- Paleo Table: Eur-LakeSilvaplana.Trachsel.2010.paleo1measurement1.csv


Unnamed: 0,year,temperature
0,1175,0.181707
1,1176,0.111083
2,1177,0.001382
3,1178,-0.008682
4,1179,-0.048438


Missing Value: NaN
- year [yr AD]
- temperature [degC]


Ocn-PedradeLume-CapeVerdeIslands.Moses.2006
Publication: Evidence of multidecadal salinity variability in the eastern tropical North Atlantic by ['Peter K. Swart', 'Brad E. Rosenheim', 'Christopher S. Moses']
Publication: World Data Center for Paleoclimatology by ['C.S. Moses']
- Paleo Table: Ocn-PedradeLume-CapeVerdeIslands.Moses.2006.paleo1measurement1.csv


Unnamed: 0,year,d18O
0,1928.96,-3.11
1,1929.04,-2.9
2,1929.12,-2.88
3,1929.21,-2.73
4,1929.29,-2.73


Missing Value: NaN
- year [yr AD]
- d18O [permil]


Ocn-SinaiPeninsula,RedSea.Moustafa.2000
Publication: Tropical sea surface temperatures for the past four centuries reconstructed from coral archives by ['Michael N. Evans', 'Henry C. Wu', 'Nerilie J. Abram', 'Cyril Giry', 'Jens Zinke', 'Casey P. Saenger', 'Jessica E. Tierney', 'Kevin J. Anchukaitis', 'K. Halimeda Kilbourne']
Publication: PANGAEA by ['Y.A. Moustafa']
Publication: Mid-Holocene stable isotope record of corals from the northern Red Sea by ['Yossi Loya', 'Yaser Ahmed Moustafa', 'Jürgen Pätzold', 'Gerold Wefer']
- Paleo Table: Ocn-SinaiPeninsula,RedSea.Moustafa.2000.paleo1measurement1.csv


Unnamed: 0,year,d18O
0,1993.12,-3.05
1,1992.86,-3.63
2,1992.66,-3.53
3,1992.39,-3.47
4,1992.12,-3.1


Missing Value: NaN
- year [yr AD]
- d18O [permil]


Eur-NorthernSpain.Martin-Chivelet.2011
Publication: World Data Center for Paleoclimatology by ['J. Martín-Chivelet']
Publication: Land surface temperature changes in northern Iberia since 4000 yr BP, based on δ13C of speleothems by ['R. Lawrence Edwards', 'María J. Turrero', 'Javier Martín-Chivelet', 'Ana I. Ortega', 'M. Belén Muñoz-García']
- Paleo Table: Eur-NorthernSpain.Martin-Chivelet.2011.paleo1measurement1.csv


Unnamed: 0,year,d18O
0,2000,0.94
1,1987,0.8
2,1983,0.23
3,1978,0.17
4,1975,0.51


Missing Value: NaN
- year [yr AD]
- d18O [permil]


Arc-Kongressvatnet.D'Andrea.2012
Publication: World Data Center for Paleoclimatology by ["W.J. D'Andrea"]
Publication: Mild Little Ice Age and unprecedented recent warmth in an 1800 year lake sediment record from Svalbard by ['N. L. Balascio', 'S. R. Roof', 'M. Retelle', 'R. S. Bradley', 'D. A. Vaillencourt', 'A. Werner', "W. J. D'Andrea"]
- Paleo Table: Arc-Kongressvatnet.D'Andrea.2012.paleo1measurement1.csv


Unnamed: 0,Uk37,year,temperature
0,-0.65,2008,5.9
1,-0.67,2004,5.1
2,-0.65,2000,6.1
3,-0.67,1996,5.3
4,-0.69,1990,4.3


Missing Value: NaN
- Uk37
- year [yr AD]
- temperature [degC]


Eur-CoastofPortugal.Abrantes.2011
Publication: PANGAEA by ['F. Abrantes']
Publication: Climate of the last millennium at the southern pole of the North Atlantic Oscillation: an inner-shelf sediment record of flooding and upwelling by ['C Santos', 'AHL Voelker', 'C Lopes', 'B Montanari', 'L Witt', 'F Abrantes', 'T Rodrigues']
- Paleo Table: Eur-CoastofPortugal.Abrantes.2011.paleo1measurement1.csv


Unnamed: 0,year,temperature
0,971.19,15.235
1,982.672,15.329
2,991.858,15.264
3,1001.044,15.376
4,1010.23,15.4


Missing Value: NaN
- year [yr AD]
- temperature [degC]


Eur-SpanishPyrenees.Dorado-Linan.2012
Publication: World Data Center for Paleoclimatology by ['I. Dorado-Linan']
Publication: Estimating 750 years of temperature variations and uncertainties in the Pyrenees by tree-ring reconstructions and climate simulations by ['J. P. Montávez', 'I. Heinrich', 'G. Helle', 'I. Dorado Liñán', 'M. Brunet', 'U. Büntgen', 'E. Gutiérrez', 'J. J. Gómez-Navarro', 'E. Zorita', 'F. González-Rouco']
- Paleo Table: Eur-SpanishPyrenees.Dorado-Linan.2012.paleo1measurement1.csv


Unnamed: 0,year,ringWidth
0,1260,-1.612
1,1261,-0.703
2,1262,-0.36
3,1263,-0.767
4,1264,-0.601


Missing Value: NaN
- year [yr AD]
- trsgi


Eur-FinnishLakelands.Helama.2014
Publication: World Data Center for Paleoclimatology by ['S. Helama']
Publication: A palaeotemperature record for the Finnish Lakeland based on microdensitometric variations in tree rings by ['Samuli Helama', 'Jari Holopainen', 'Hanna Mäkelä', 'Jouko Meriläinen', 'Taneli Kolström', 'Matti Vartiainen']
- Paleo Table: Eur-FinnishLakelands.Helama.2014.paleo1measurement1.csv


Unnamed: 0,year,temperature
0,2000,14.603
1,1999,14.643
2,1998,12.074
3,1997,13.898
4,1996,13.671


Missing Value: NaN
- year [yr AD]
- temperature [degC]


Eur-NorthernScandinavia.Esper.2012
Publication: Orbital forcing of tree-ring data by ['Daniel Nievergelt', 'Nils Fischer', 'Anne Verstege', 'Mauri Timonen', 'Sebastian Wagner', 'Steffen Holzkämper', 'Jan Esper', 'David C. Frank', 'Ulf Büntgen', 'Rob J. S. Wilson', 'Jürg Luterbacher', 'Eduardo Zorita']
Publication: World Data Center for Paleoclimatology by ['J. Esper']
- Paleo Table: Eur-NorthernScandinavia.Esper.2012.paleo1measurement1.csv


Unnamed: 0,MXD,year
0,0.46,-138
1,1.305,-137
2,0.755,-136
3,-0.1,-135
4,-0.457,-134


Missing Value: NaN
- MXD
- year [yr AD]


Eur-Stockholm.Leijonhufvud.2009
Publication: World Data Center for Paleoclimatology by ['L. Leijonhufvud']
Publication: Five centuries of Stockholm winter/spring temperatures reconstructed from documentary evidence and instrumental observations by ['Dag Retsö', 'Johan Söderberg', 'Rob Wilson', 'Anders Moberg', 'Lotta Leijonhufvud', 'Ulrica Söderlind']
- Paleo Table: Eur-Stockholm.Leijonhufvud.2009.paleo1measurement1.csv


Unnamed: 0,year,temperature
0,1502,-1.7212
1,1503,-1.6382
2,1504,-0.6422
3,1505,0.1048
4,1506,-0.7252


Missing Value: NaN
- year [yr AD]
- temperature [degC]


## Editing an existing LiPD file

In [2]:
# Dataset is the Main OOP Class
from pylipd.classes.dataset import Dataset

# LiPD is the LiPD parser/writer
from pylipd.classes.datatable import DataTable
from pylipd.lipd import LiPD

# Load LiPD files as usual.
# - This loads the LiPD data into the internal RDF graph
path = '../data/ODP846.Lawrence.2006.lpd'
D = LiPD()
D.load(path)

# Convert the LiPD datasets to the PyLiPD OOP "Dataset" class. 
# - This allows to modify the datasets via OOP calls in memory
# - To write the LiPD back, we have to call the save function
datasets = D.get_datasets()

ds = datasets[0]
pdata = ds.getPaleoData()[0]

# ********************************************
# We will add a table to the paleoData here
# ********************************************
attrs = {
    "site": {
        'number': 1, 
        'variableName': 'site/hole', 
        'hasStandardVariable': 'site', 
        'units': 'unitless', 
        'TSid': 'PYTJ3PSH0LT', 
        'variableType': 'measured', 
        'takenAtDepth': 'depth'
    },
    "ukprime37": {
        'number': 2,
        'interpretation': [
            {
                'rank': 1.0, 
                'scope': 'Climate', 
                'variable': 'temperature', 
                'variableDetail': 'sea surface', 
                'direction': 'positive'
            }
        ], 
        'variableName': 'ukprime37', 
        'resolution': {
            'hasMaxValue': 10.856999999999971, 
            'hasMeanValue': 2.3355875057418465, 
            'hasMedianValue': 2.211999999999989, 
            'hasMinValue': 0.06999999999993634
        }, 
        'units': 'unitless', 
        'TSid': 'PYTM9N6HCQM', 
        'variableType': 'measured', 
        'takenAtDepth': 'depth', 
        'proxyObservationType': 'Uk37Prime'
    },
    "age": {
        'interpretation': [
            {
                'rank': 1.0, 
                'scope': 'Age', 
                'variableDetail': 'calendar', 
                'direction': 'positive'
            }
        ], 
        'number': 3, 
        'variableName': 'age', 
        'hasStandardVariable': 'age', 
        'TSid': 'PYTXJB98403', 
        'variableType': 'inferred', 
        'takenAtDepth': 'depth', 
        'inferredVariableType': 'Age'
    },
    "depth": {
        'number': 4, 
        'variableName': 'depth', 
        'notes': 'depth rmcd', 
        'resolution': {
            'hasMaxValue': 10.856999999999971, 
            'hasMeanValue': 2.3355875057418465, 
            'hasMedianValue': 2.211999999999989, 
            'hasMinValue': 0.06999999999993634
        }, 
        'hasStandardVariable': 'depth', 
        'units': 'm', 
        'TSid': 'PYTKRFVW61B', 
        'variableType': 'measured'
    }
}

# Create a random dataframe
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(100, 4), columns=attrs.keys())
df.attrs = attrs
display(df.head())

newtable = DataTable()
newtable.setFileName("paleo0measurement2.csv")
newtable.setMissingValue("NaN")
newtable.setDataFrame(df)

pdata.addMeasurementTable(newtable)

# Create a lipd from the Dataset ds
savelipd = LiPD()
savelipd.load_datasets([ds])
savelipd.create_lipd(ds.getName(), "./ODP846.Lawrence.2006.updated.lpd")
pass


Loading 1 LiPD files


100%|██████████| 1/1 [00:00<00:00,  1.74it/s]

Loaded..





Unnamed: 0,site,ukprime37,age,depth
0,-1.21487,0.034884,0.374212,0.814633
1,1.006401,-0.380296,0.199751,-0.296262
2,-0.291744,-0.425899,-0.583914,0.876235
3,-1.115534,0.474679,-0.687251,0.406229
4,1.801796,-0.710398,-0.123302,1.837779


In [3]:
# Try to load the New LiPD File, and check if everything is ok
D = LiPD()
D.load("./ODP846.Lawrence.2006.updated.lpd")

Loading 1 LiPD files


100%|██████████| 1/1 [00:00<00:00,  1.79it/s]

Loaded..





In [4]:
# Try to load the New LiPD File, and check if everything is ok
D = LiPD()
D.load("./ODP846.Lawrence.2006.updated.lpd")
datasets = D.get_datasets()

# Browse the dataset as usual
for ds in datasets:
    print("\n")
    print(ds.getName())
    print("=========================")
    for pdata in ds.getPaleoData():
        for table in pdata.getMeasurementTables():
            print(f"- Paleo Table: {table.getFileName()}")
            
            # Can get the dataframe for the whole table
            df = table.getDataFrame(use_standard_names=True)
            display(df.head())

            # The returned dataframe also contains the attributes for the variables
            for varname in df.attrs:
                vardata = df.attrs[varname]
                # print(varname, vardata)

            # Can also get the variables one by one and make calls to their class functions
            print(f"Missing Value: {table.getMissingValue()}")
            for var in table.getVariables():
                if var.getUnits():
                    print(f"- {var.getName()} [{var.getUnits().getLabel()}]")
                else:
                    print(f"- {var.getName()}")

Loading 1 LiPD files


100%|██████████| 1/1 [00:00<00:00,  1.80it/s]

Loaded..


ODP846.Lawrence.2006
- Paleo Table: paleo0measurement2.csv





Unnamed: 0,depth,ukprime37,age,site
0,0.814633,0.034884,0.374212,-1.21487
1,-0.296262,-0.380296,0.199751,1.006401
2,0.876235,-0.425899,-0.583914,-0.291744
3,0.406229,0.474679,-0.687251,-1.115534
4,1.837779,-0.710398,-0.123302,1.801796


Missing Value: NaN
- depth [m]
- ukprime37 [unitless]
- age
- site [unitless]
- Paleo Table: paleo0measurement1.csv


Unnamed: 0,depth cr,event,depth,sampleID,u. peregrina d18o,u. peregrina d13c,c. wuellerstorfi d18o,c. wuellerstorfi d13c,depth comp
0,0.12,138-846B,0.12,138-846B-1H-1,0.14,,0.12,3.38,12.0
1,0.23,138-846B,0.23,138-846B-1H-1,0.01,,0.23,3.46,23.0
2,0.33,138-846B,0.33,138-846B-1H-1,-0.1,,0.33,3.65,33.0
3,0.33,138-846B,0.33,138-846B-1H-1,-0.06,,0.33,3.88,33.0
4,0.43,138-846B,0.43,138-846B-1H-1,-0.17,,0.43,4.14,43.0


Missing Value: nan
- depth cr
- event [unitless]
- depth [m]
- sampleID [unitless]
- u. peregrina d18o
- u. peregrina d13c
- c. wuellerstorfi d18o
- c. wuellerstorfi d13c
- depth comp
- Paleo Table: paleo0measurement0.csv


Unnamed: 0,section,site,temp prahl,ukprime37,c37 total,depth,deleteMe,temp muller,age
0,1H-1,846B,23.0,0.821,2.37,0.16,15-16,23.545,5.228
1,1H-1,846B,23.1,0.824,2.1,0.26,25-26,23.648,8.947
2,1H-1,846B,23.2,0.828,1.87,0.36,35-36,23.752,11.966
3,1H-1,846B,22.0,0.787,2.74,0.46,45-46,22.515,14.427
4,1H-1,846B,21.7,0.777,3.75,0.56,55-56,22.206,16.502


Missing Value: nan
- section [unitless]
- site [unitless]
- temp prahl
- ukprime37 [unitless]
- c37 total
- depth [m]
- deleteMe [cm]
- temp muller
- age


## Creating a new LiPD file

In [5]:
from pylipd.classes.dataset import Dataset
from pylipd.classes.archivetype import ArchiveTypeConstants
from pylipd.classes.funding import Funding
from pylipd.classes.interpretation import Interpretation
from pylipd.classes.interpretationvariable import InterpretationVariableConstants
from pylipd.classes.location import Location
from pylipd.classes.paleodata import PaleoData
from pylipd.classes.datatable import DataTable
from pylipd.classes.paleounit import PaleoUnitConstants
from pylipd.classes.paleovariable import PaleoVariableConstants
from pylipd.classes.person import Person
from pylipd.classes.publication import Publication
from pylipd.classes.resolution import Resolution
from pylipd.classes.variable import Variable

import json

dataset1 = Dataset()

# Set the name of the dataset
dataset1.setName("TestDataset.2024")
dataset1.id = dataset1.ns + "/" + dataset1.getName()

# Set collection name
dataset1.setCollectionName("TestCollection")

# Set the Archive Type (from a list of constants)
dataset1.setArchiveType(ArchiveTypeConstants.Coral)

# Add a publication
pub1 = Publication()
pub1.setTitle("Sample Publication Title")
person1 = Person(); person1.setName("Deborah Khider")
person2 = Person(); person2.setName("Varun Ratnakar")
pub1.setAuthors([person1, person2])
# Add the publication to the dataset
dataset1.addPublication(pub1)

# Add funding information
funding1 = Funding()
funding1.addGrant("NSF Grant 23423A")
funding1.addInvestigator(person1)
# Add funding to the dataset
dataset1.addFunding(funding1)

# Add location information
loc1 = Location()
loc1.setLatitude("24.21232")
loc1.setLongitude("48.32323")
loc1.setElevation("342")
loc1.setCountry("USA")
# Set location for the dataset
dataset1.setLocation(loc1)

# Create Paleodata table
table1 = DataTable()
table1.setFileName("paleo0measurement1.csv")
table1.setMissingValue("NaN")

# Populate the table with variable data
#
# Option 1: Via a Dataframe with attributes:
# ------------------------------------------
# Create a random dataframe
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(100, 2), columns=["site", "ukprime37"])
# Set column attributes
df.attrs = {
    "site": {
        'number': 1, 
        'variableName': 'site/hole', 
        'units': 'unitless', 
        'TSid': 'PYTJ3PSH0LT', 
        'variableType': 'measured', 
        'takenAtDepth': 'depth'
    },
    "ukprime37": {
        'number': 2,
        'interpretation': [
            {
                'rank': 1.0, 
                'scope': 'Climate', 
                'variable': 'temperature', 
                'variableDetail': 'sea surface', 
                'direction': 'positive'
            }
        ], 
        'variableName': 'ukprime37', 
        'resolution': {
            'hasMaxValue': 10.856999999999971, 
            'hasMeanValue': 2.3355875057418465, 
            'hasMedianValue': 2.211999999999989, 
            'hasMinValue': 0.06999999999993634
        }, 
        'TSid': 'PYTM9N6HCQM', 
        'variableType': 'measured', 
        'takenAtDepth': 'depth' 
    }
}
table1.setDataFrame(df)

# Create Another Paleodata table by setting variables with OOP calls
table2 = DataTable()
table2.setFileName("paleo0measurement2.csv")
table2.setMissingValue("NaN")
#
# Option 2: Via OOP Calls to create each variable
# -----------------------------------------------
# Add a variable
var1 = Variable()
var1.setName("site")
var1.setColumnNumber(1)
var1.setVariableId("PYTJ3PSH0LT")
var1.setVariableType("measured")
var1.set_non_standard_property("takenAtDepth", "depth")
# Set random values for this variable
var1.setValues(json.dumps(np.random.randn(100).tolist()))

# Add another variable
var2 = Variable()
var2.setName("ukprime37")
var2.setColumnNumber(2)
var2.setVariableId('PYTM9N6HCQM')
var2.setVariableType('measured')
var2.setStandardVariable(PaleoVariableConstants.Uk37)
var2.setUnits(PaleoUnitConstants.cm3)

# Add the variable interpretation
interp1 = Interpretation()
interp1.setRank("1")
interp1.setScope("Climate")
interp1.setVariable(InterpretationVariableConstants.temperature)
interp1.setVariableDetail("sea surface")
interp1.setDirection("positive")
var2.addInterpretation(interp1)
# Add the variable resolution
resolution1 = Resolution()
resolution1.setMaxValue(10.856999999999971)
resolution1.setMeanValue(2.3355875057418465)
resolution1.setMedianValue(2.211999999999989)
resolution1.setMinValue(0.06999999999993634)
var2.setResolution(resolution1)

# Set random values for this variable
var2.setValues(json.dumps(np.random.randn(100).tolist()))

table2.setVariables([var1, var2])

# Create Paleodata, and add the created tables to it
paleodata1 = PaleoData()
paleodata1.setMeasurementTables([table1, table2])

dataset1.addPaleoData(paleodata1)

In [6]:
# Browse the dataset as usual
print(dataset1.getName())
print("=========================")
for pdata in dataset1.getPaleoData():
    for table in pdata.getMeasurementTables():
        print(f"- Paleo Table: {table.getFileName()}")
        
        # Can get the dataframe for the whole table
        df = table.getDataFrame(use_standard_names=True)
        display(df.head())

        # The returned dataframe also contains the attributes for the variables
        for varname in df.attrs:
            vardata = df.attrs[varname]
            # print(varname, vardata)

        # Can also get the variables one by one and make calls to their class functions
        print(f"Missing Value: {table.getMissingValue()}")
        for var in table.getVariables():
            if var.getUnits():
                print(f"- {var.getName()} [{var.getUnits().getLabel()}]")
            else:
                print(f"- {var.getName()}")

TestDataset.2024
- Paleo Table: paleo0measurement1.csv


Unnamed: 0,site/hole,ukprime37
0,0.503543,-0.729
1,0.928305,1.839691
2,0.512589,-0.004273
3,-2.415368,-1.414838
4,2.158841,0.712292


Missing Value: NaN
- site/hole [unitless]
- ukprime37
- Paleo Table: paleo0measurement2.csv


Unnamed: 0,site,Uk37
0,-0.977363,-0.02798
1,1.041971,-0.45065
2,0.960247,-0.200802
3,-2.516197,1.215636
4,0.814137,-0.705652


Missing Value: NaN
- site
- ukprime37 [cm3]


In [7]:
from pylipd.lipd import LiPD

# Create a lipd from the newly created Dataset ds
lipd1 = LiPD()
lipd1.load_datasets([dataset1])

In [8]:
query = """PREFIX le: <http://linked.earth/ontology#>
select ?dsname ?title (GROUP_CONCAT(?authorName) as ?authors) 
        ?doi ?year ?pubyear ?journal ?volume ?issue ?pages ?type ?publisher ?report ?citeKey ?edition ?institution 
        where { 
            ?ds a le:Dataset .
            ?ds le:hasName ?dsname .
            ?ds le:hasPublication ?pub .
            OPTIONAL{?pub le:hasDOI ?doi .}
            OPTIONAL{
                ?pub le:hasAuthor ?author .
                ?author le:hasName ?authorName .
            }
            OPTIONAL{?pub le:hasYear ?year .}
            OPTIONAL{?pub le:hasTitle ?title .}
            OPTIONAL{?pub le:hasJournal ?journal .}
        }
        GROUP BY ?pub
"""
result, df = lipd1.query(query)
df

Unnamed: 0,dsname,title,authors,doi,year,pubyear,journal,volume,issue,pages,type,publisher,report,citeKey,edition,institution
0,ODP846.Lawrence.2006,High-latitude influence on the eastern equator...,D. Pate M.A. Hall T.D. Herbert N.J. Shackleton...,,2004.0,,Nature,,,,,,,,,
1,ODP846.Lawrence.2006,Evolution of the Eastern Tropical Pacific Thro...,K. T. Lawrence,10.1126/science.1120395,2006.0,,Science,,,,,,,,,
2,ODP846.Lawrence.2006,A Pliocene-Pleistocene stack of 57 globally di...,Lorraine E. Lisiecki Maureen E. Raymo,10.1029/2004PA001071,2005.0,,Paleoceanography,,,,,,,,,
3,ODP846.Lawrence.2006,Pleiocene stable isotope stratigraphy of ODP S...,D. Pate T.D. Herbert Z. Liu N.J. Shackleton M....,,1995.0,,Proceedinds of the Ocean Drilling Program Scie...,,,,,,,,,
4,ODP846.Lawrence.2006,The Role of Uncertainty in Estimating Lead/Lag...,D. Khider M. Kienast L. E. Lisiecki S. Ahn C. ...,10.1002/2016PA003057,2017.0,,Paleoceanography,,,,,,,,,
5,ODP846.Lawrence.2006,Benthic Foraminiferal Stable Isotope Stratigra...,A.C. Mix J. Le N.J. Shackleton,10.2973/odp.proc.sr.138.160.1995,1995.0,,"Proceedings of the Ocean Drilling Program, 138...",,,,,,,,,
6,TestDataset.2024,Sample Publication Title,Deborah Khider Varun Ratnakar,,,,,,,,,,,,,


In [9]:
query = """PREFIX le: <http://linked.earth/ontology#>
    SELECT distinct ?archiveType WHERE {
        ?ds a le:Dataset .
        ?ds le:hasArchiveType ?archiveTypeObj .
        ?archiveTypeObj rdfs:label ?archiveType .
    }
"""
result, df = lipd1.query(query)
df

Unnamed: 0,archiveType
0,Marine sediment
1,Coral


In [12]:
lipd1.create_lipd(dataset1.getName(), "./TestDataset.2024.lpd")
pass

In [13]:
# Load and browse the newly created LiPD file
from pylipd.classes.dataset import Dataset
from pylipd.lipd import LiPD
path = 'TestDataset.2024.lpd'
D = LiPD()
D.load(path)

datasets = D.get_datasets()

for ds in datasets:
    # Now we can call individual function on the dataset to get its details
    print("\n")
    print(ds.getName())
    print("=========================")
    for funding in ds.getFundings():
        if funding.getGrants():
            print(f"Funding: {funding.getGrants()}")
    
    for pub in ds.getPublications():
        print(f"Publication: {pub.getTitle()} by {list(map(lambda x: x.getName(), pub.getAuthors()))}")

    for pd in ds.getPaleoData():
        for table in pd.getMeasurementTables():
            print(f"- Paleo Table: {table.getFileName()}")
            
            # Can get the dataframe for the whole table
            df = table.getDataFrame(use_standard_names=True)
            display(df.head())

            # The returned dataframe also contains the attributes for the variables
            for varname in df.attrs:
                vardata = df.attrs[varname]
                # print(varname, vardata)

            # Can also get the variables one by one and make calls to their class functions
            print(f"Missing Value: {table.getMissingValue()}")
            for var in table.getVariables():
                if var.getUnits():
                    print(f"- {var.getName()} [{var.getUnits().getLabel()}]")
                else:
                    print(f"- {var.getName()}")

Loading 1 LiPD files


100%|██████████| 1/1 [00:00<00:00, 72.69it/s]

Loaded..


TestDataset.2024
Funding: ['NSF Grant 23423A']
Publication: Sample Publication Title by ['Varun Ratnakar', 'Deborah Khider']
- Paleo Table: paleo0measurement2.csv





Unnamed: 0,site,Uk37
0,-0.977363,-0.02798
1,1.041971,-0.45065
2,0.960247,-0.200802
3,-2.516197,1.215636
4,0.814137,-0.705652


Missing Value: NaN
- site
- Uk37 [cm3]
- Paleo Table: paleo0measurement1.csv


Unnamed: 0,site,ukprime37
0,0.503543,-0.729
1,0.928305,1.839691
2,0.512589,-0.004273
3,-2.415368,-1.414838
4,2.158841,0.712292


Missing Value: NaN
- site/hole [unitless]
- ukprime37
