## PyLiPD OOP Demo

#### PyLiPD Ontologies

The PyLiPD OOP Classes are generated from the ontologies at: https://linked.earth/ontology/
* The [Linked Earth Core Ontology](https://linked.earth/ontology/core/2.0.0/index-en.html) provides the main concepts and relationships to describe a paleoclimate dataset and its values.
* The [Archive Type Ontology](https://linked.earth/ontology/archive/2.0.0/index-en.html) describes a taxonomy of the most common types of archives.
* The [Paleo Variables Ontology](https://linked.earth/ontology/paleo_variables/2.0.0/index-en.html) describes a taxonomy of the most common types of paleo variables.
* The [Paleo Proxy Ontology](https://linked.earth/ontology/paleo_proxy/2.0.0/index-en.html) describes a taxonomy of the most common types of paleo proxies.
* The [Paleo Units Ontology](https://linked.earth/ontology/paleo_units/2.0.0/index-en.html) describes a taxonomy of the most common types of paleo units.
* The [Interpretation Ontology](https://linked.earth/ontology/interpretation/2.0.0/index-en.html) describes a taxonomy of the most common interpretations.
* The [Instrument Ontology](https://linked.earth/ontology/instrument/2.0.0/index-en.html) describes a taxonomy of the most common instrument for taking measurements.
* The [Chron Variables Ontology](https://linked.earth/ontology/chron_variables/2.0.0/index-en.html) describes a taxonomy of the most common types of chron variables.
* The [Chron Proxy Ontology](https://linked.earth/ontology/chron_proxy/2.0.0/index-en.html) describes a taxonomy of the most common types of chron proxies.
* The [Chron Units Ontology](https://linked.earth/ontology/chron_units/2.0.0/index-en.html) describes a taxonomy of the most common types of chron units.



### PyLiPD OOP Classes UML Diagram
![PyLiPD OOP Classes UML Diagram](UMLDiagram.png "yLiPD OOP Classes UML Diagram")

## Reading an existing LiPD file

In [1]:
# Dataset is the Main OOP Class
from pylipd.classes.dataset import Dataset

# LiPD is the LiPD parser/writer
from pylipd.lipd import LiPD

# Load LiPD files as usual.
# - This loads the LiPD data into the internal RDF graph
path = '../data/Pages2k'
D = LiPD()
D.load_from_dir(path)

# Convert the LiPD datasets to the PyLiPD OOP "Dataset" class. 
# - This allows to modify the datasets via OOP calls in memory
# - To write the LiPD back, we have to call the save function
datasets = D.get_datasets()

for ds in datasets:
    # Now we can call individual function on the dataset to get its details
    print("\n")
    print(ds.getName())
    print("=========================")
    for funding in ds.getFundings():
        if funding.getGrants():
            print(f"Funding: {funding.getGrants()}")
    
    for pub in ds.getPublications():
        print(f"Publication: {pub.getTitle()} by {list(map(lambda x: x.getName(), pub.getAuthors()))}")

    for pd in ds.getPaleoData():
        for table in pd.getMeasurementTables():
            print(f"- Paleo Table: {table.getFileName()}")
            
            # Can get the dataframe for the whole table
            df = table.getDataFrame(use_standard_names=True)
            display(df.head())

            # The returned dataframe also contains the attributes for the variables
            for varname in df.attrs:
                vardata = df.attrs[varname]
                # print(varname, vardata)

            # Can also get the variables one by one and make calls to their class functions
            print(f"Missing Value: {table.getMissingValue()}")
            for var in table.getVariables():
                if var.getUnits():
                    print(f"- {var.getName()} [{var.getUnits().getLabel()}]")
                else:
                    print(f"- {var.getName()}")

Loading 16 LiPD files


100%|██████████| 16/16 [00:00<00:00, 67.10it/s]


Loaded..


Ocn-RedSea.Felis.2000
Publication: A coral oxygen isotope record from the northern Red Sea documenting NAO, ENSO, and North Pacific teleconnections on Middle East climate variability since the year 1750 by ['Ahmed H. Nawar', 'Maoz Fine', 'Yossi Loya', 'Jürgen Pätzold', 'Thomas Felis', 'Gerold Wefer']
Publication: Tropical sea surface temperatures for the past four centuries reconstructed from coral archives by ['Jens Zinke', 'Kevin J. Anchukaitis', 'Nerilie J. Abram', 'Casey P. Saenger', 'Henry C. Wu', 'Jessica E. Tierney', 'Cyril Giry', 'K. Halimeda Kilbourne', 'Michael N. Evans']
Publication: World Data Center for Paleoclimatology by ['T. Felis']
- Paleo Table: Ocn-RedSea.Felis.2000.paleo1measurement1.csv


Unnamed: 0,d18O,year
0,-4.12,1995.583
1,-3.82,1995.417
2,-3.05,1995.25
3,-3.02,1995.083
4,-3.62,1994.917


Missing Value: NaN
- d18O [permil]
- year [yr AD]


Ant-WAIS-Divide.Severinghaus.2012
Publication: This study by ['Severinghaus J.']
Publication: Little Ice Age cold interval in West Antarctica: Evidence from borehole temperature at the West Antarctic Ice Sheet (WAIS) Divide by ['Bruce D. Cornuelle', 'Anais J. Orsi', 'Jeffrey P. Severinghaus']
- Paleo Table: Ant-WAIS-Divide.Severinghaus.2012.paleo1measurement1.csv


Unnamed: 0,uncertainty,year,temperature
0,1.327,8,-29.607
1,1.328,9,-29.607
2,1.328,10,-29.606
3,1.329,11,-29.606
4,1.33,12,-29.605


Missing Value: NaN
- uncertainty_temperature [degC]
- year [yr AD]
- temperature [degC]


Asi-SourthAndMiddleUrals.Demezhko.2007
Publication: Climatic changes in the Urals over the past millennium; an analysis of geothermal and meteorological data by ['D. Yu. Demezhko', 'I. V. Golovanova']
Publication: This study by ['D Demezhko']
- Paleo Table: Asi-SourthAndMiddleUrals.Demezhko.2007.paleo1measurement1.csv


Unnamed: 0,year,temperature
0,800,0.166
1,850,0.264
2,900,0.354
3,950,0.447
4,1000,0.538


Missing Value: NaN
- year [yr AD]
- temperature [degC]


Ocn-AlboranSea436B.Nieto-Moreno.2013
Funding: ['Research Group 0179']
Funding: ['CTM2009-7715']
Funding: ['FP7/2007-2013)/ERC Grant Agreement #226600']
Funding: ['CGL2009-07603']
Funding: ['200800050084447 (MARM)']
Funding: ['Project RNM 05212']
Publication: PANGAEA by ['V. Nieto-Moreno']
Publication: Climate conditions in the westernmost Mediterranean over the last two millennia: An integrated biomarker approach by ['P. Masqué', 'J. García-Orellana', 'V. Willmott', 'V. Nieto-Moreno', 'J.S. Sinninghe Damsté', 'F. Martínez-Ruiz']
Publication: Robust global ocean cooling trend for the pre-industrial Common Era by ['Helen V. McGregor', 'Helena L. Filipsson', 'Steven J. Phipps', 'Marit-Solveig Seidenkrantz', 'Jason A. Addison', 'Hugues Goosse', 'Vasile Ersek', 'Marie-Alexandrine Sicre', 'Belen Martrat', 'Kandasamy Selvaraj', 'P. Graham Mortyn', 'Guillaume Leduc', 'Delia W. Oppo', 'Michael N. Evans', 'Kaustubh Thirumalai']
- Paleo Tab

Unnamed: 0,temperature,year
0,18.79,1999.07
1,19.38,1993.12
2,19.61,1987.17
3,18.88,1975.26
4,18.74,1963.36


Missing Value: NaN
- temperature [degC]
- year [yr AD]


Eur-SpannagelCave.Mangini.2005
Publication: Reconstruction of temperature in the Central Alps during the past 2000 yr from a δ18O stalagmite record by ['P. Verdes', 'A. Mangini', 'C. Spötl']
Publication: World Data Center for Paleoclimatology by ['A. Mangini']
- Paleo Table: Eur-SpannagelCave.Mangini.2005.paleo1measurement1.csv


Unnamed: 0,year,d18O
0,1935.0,-7.49
1,1932.0,-7.41
2,1930.0,-7.36
3,1929.0,-7.15
4,1929.0,-7.28


Missing Value: NaN
- year [yr AD]
- d18O [permil]


Ocn-FeniDrift.Richter.2009
Publication: Late Holocene (0–2.4kaBP) surface water temperature and salinity variability, Feni Drift, NE Atlantic Ocean by ['T.C.E. van Weering', 'F.J.C. Peeters', 'T.O. Richter']
Publication: Robust global ocean cooling trend for the pre-industrial Common Era by ['Hugues Goosse', 'P. Graham Mortyn', 'Marie-Alexandrine Sicre', 'Guillaume Leduc', 'Marit-Solveig Seidenkrantz', 'Jason A. Addison', 'Belen Martrat', 'Vasile Ersek', 'Delia W. Oppo', 'Michael N. Evans', 'Kaustubh Thirumalai', 'Kandasamy Selvaraj', 'Helena L. Filipsson', 'Helen V. McGregor', 'Steven J. Phipps']
Publication: World Data Center for Paleoclimatology by ['T.O. Richter']
- Paleo Table: Ocn-FeniDrift.Richter.2009.paleo2measurement1.csv


Unnamed: 0,year,Mg/Ca,temperature,depthBottom,depthTop,notes
0,1998,2.31,12.94,0.5,0.5,M200309
1,1987,1.973,10.99,1.5,1.5,M200309
2,1975,1.901,10.53,2.5,2.5,M200309
3,1962,1.887,10.44,3.5,3.5,M200309
4,1949,2.038,11.39,4.5,4.5,M200309


Missing Value: NaN
- year [yr AD]
- Mg_Ca
- temperature [degC]
- depth_bottom [cm]
- depth_top [cm]
- notes
- Paleo Table: Ocn-FeniDrift.Richter.2009.paleo1measurement1.csv


Unnamed: 0,year,temperature,Mg/Ca
0,1998,12.94,2.31
1,1987,10.99,1.973
2,1975,10.53,1.901
3,1962,10.44,1.887
4,1949,11.39,2.038


Missing Value: NaN
- year [yr AD]
- temperature [degC]
- Mg_Ca


Eur-LakeSilvaplana.Trachsel.2010
Publication: Scanning reflectance spectroscopy (380–730 nm): a novel method for quantitative high-resolution climate reconstructions from minerogenic lake sediments by ['M. Grosjean', 'M. Trachsel', 'C. Kamenik', 'D. Schnyder', 'B. Rein']
Publication: World Data Center for Paleoclimatology by ['M. Trachsel']
- Paleo Table: Eur-LakeSilvaplana.Trachsel.2010.paleo1measurement1.csv


Unnamed: 0,year,temperature
0,1175,0.181707
1,1176,0.111083
2,1177,0.001382
3,1178,-0.008682
4,1179,-0.048438


Missing Value: NaN
- year [yr AD]
- temperature [degC]


Ocn-PedradeLume-CapeVerdeIslands.Moses.2006
Publication: Evidence of multidecadal salinity variability in the eastern tropical North Atlantic by ['Brad E. Rosenheim', 'Christopher S. Moses', 'Peter K. Swart']
Publication: World Data Center for Paleoclimatology by ['C.S. Moses']
- Paleo Table: Ocn-PedradeLume-CapeVerdeIslands.Moses.2006.paleo1measurement1.csv


Unnamed: 0,year,d18O
0,1928.96,-3.11
1,1929.04,-2.9
2,1929.12,-2.88
3,1929.21,-2.73
4,1929.29,-2.73


Missing Value: NaN
- year [yr AD]
- d18O [permil]


Ocn-SinaiPeninsula,RedSea.Moustafa.2000
Publication: Mid-Holocene stable isotope record of corals from the northern Red Sea by ['Jürgen Pätzold', 'Yossi Loya', 'Gerold Wefer', 'Yaser Ahmed Moustafa']
Publication: Tropical sea surface temperatures for the past four centuries reconstructed from coral archives by ['Jens Zinke', 'Henry C. Wu', 'Kevin J. Anchukaitis', 'Nerilie J. Abram', 'Casey P. Saenger', 'Cyril Giry', 'Michael N. Evans', 'Jessica E. Tierney', 'K. Halimeda Kilbourne']
Publication: PANGAEA by ['Y.A. Moustafa']
- Paleo Table: Ocn-SinaiPeninsula,RedSea.Moustafa.2000.paleo1measurement1.csv


Unnamed: 0,year,d18O
0,1993.12,-3.05
1,1992.86,-3.63
2,1992.66,-3.53
3,1992.39,-3.47
4,1992.12,-3.1


Missing Value: NaN
- year [yr AD]
- d18O [permil]


Eur-NorthernSpain.Martin-Chivelet.2011
Publication: World Data Center for Paleoclimatology by ['J. Martín-Chivelet']
Publication: Land surface temperature changes in northern Iberia since 4000 yr BP, based on δ13C of speleothems by ['Javier Martín-Chivelet', 'M. Belén Muñoz-García', 'Ana I. Ortega', 'R. Lawrence Edwards', 'María J. Turrero']
- Paleo Table: Eur-NorthernSpain.Martin-Chivelet.2011.paleo1measurement1.csv


Unnamed: 0,d18O,year
0,0.94,2000
1,0.8,1987
2,0.23,1983
3,0.17,1978
4,0.51,1975


Missing Value: NaN
- d18O [permil]
- year [yr AD]


Arc-Kongressvatnet.D'Andrea.2012
Publication: World Data Center for Paleoclimatology by ["W.J. D'Andrea"]
Publication: Mild Little Ice Age and unprecedented recent warmth in an 1800 year lake sediment record from Svalbard by ['S. R. Roof', 'N. L. Balascio', "W. J. D'Andrea", 'A. Werner', 'M. Retelle', 'R. S. Bradley', 'D. A. Vaillencourt']
- Paleo Table: Arc-Kongressvatnet.D'Andrea.2012.paleo1measurement1.csv


Unnamed: 0,temperature,year,Uk37
0,5.9,2008,-0.65
1,5.1,2004,-0.67
2,6.1,2000,-0.65
3,5.3,1996,-0.67
4,4.3,1990,-0.69


Missing Value: NaN
- temperature [degC]
- year [yr AD]
- Uk37


Eur-CoastofPortugal.Abrantes.2011
Publication: Climate of the last millennium at the southern pole of the North Atlantic Oscillation: an inner-shelf sediment record of flooding and upwelling by ['L Witt', 'F Abrantes', 'AHL Voelker', 'C Lopes', 'B Montanari', 'C Santos', 'T Rodrigues']
Publication: PANGAEA by ['F. Abrantes']
- Paleo Table: Eur-CoastofPortugal.Abrantes.2011.paleo1measurement1.csv


Unnamed: 0,temperature,year
0,15.235,971.19
1,15.329,982.672
2,15.264,991.858
3,15.376,1001.044
4,15.4,1010.23


Missing Value: NaN
- temperature [degC]
- year [yr AD]


Eur-SpanishPyrenees.Dorado-Linan.2012
Publication: Estimating 750 years of temperature variations and uncertainties in the Pyrenees by tree-ring reconstructions and climate simulations by ['I. Dorado Liñán', 'E. Gutiérrez', 'J. P. Montávez', 'M. Brunet', 'G. Helle', 'U. Büntgen', 'I. Heinrich', 'E. Zorita', 'J. J. Gómez-Navarro', 'F. González-Rouco']
Publication: World Data Center for Paleoclimatology by ['I. Dorado-Linan']
- Paleo Table: Eur-SpanishPyrenees.Dorado-Linan.2012.paleo1measurement1.csv


Unnamed: 0,ringWidth,year
0,-1.612,1260
1,-0.703,1261
2,-0.36,1262
3,-0.767,1263
4,-0.601,1264


Missing Value: NaN
- trsgi
- year [yr AD]


Eur-FinnishLakelands.Helama.2014
Publication: World Data Center for Paleoclimatology by ['S. Helama']
Publication: A palaeotemperature record for the Finnish Lakeland based on microdensitometric variations in tree rings by ['Taneli Kolström', 'Samuli Helama', 'Jari Holopainen', 'Matti Vartiainen', 'Hanna Mäkelä', 'Jouko Meriläinen']
- Paleo Table: Eur-FinnishLakelands.Helama.2014.paleo1measurement1.csv


Unnamed: 0,year,temperature
0,2000,14.603
1,1999,14.643
2,1998,12.074
3,1997,13.898
4,1996,13.671


Missing Value: NaN
- year [yr AD]
- temperature [degC]


Eur-NorthernScandinavia.Esper.2012
Publication: Orbital forcing of tree-ring data by ['Jürg Luterbacher', 'Jan Esper', 'Eduardo Zorita', 'Daniel Nievergelt', 'Nils Fischer', 'Steffen Holzkämper', 'Mauri Timonen', 'David C. Frank', 'Ulf Büntgen', 'Rob J. S. Wilson', 'Anne Verstege', 'Sebastian Wagner']
Publication: World Data Center for Paleoclimatology by ['J. Esper']
- Paleo Table: Eur-NorthernScandinavia.Esper.2012.paleo1measurement1.csv


Unnamed: 0,year,MXD
0,-138,0.46
1,-137,1.305
2,-136,0.755
3,-135,-0.1
4,-134,-0.457


Missing Value: NaN
- year [yr AD]
- MXD


Eur-Stockholm.Leijonhufvud.2009
Publication: Five centuries of Stockholm winter/spring temperatures reconstructed from documentary evidence and instrumental observations by ['Rob Wilson', 'Anders Moberg', 'Lotta Leijonhufvud', 'Dag Retsö', 'Johan Söderberg', 'Ulrica Söderlind']
Publication: World Data Center for Paleoclimatology by ['L. Leijonhufvud']
- Paleo Table: Eur-Stockholm.Leijonhufvud.2009.paleo1measurement1.csv


Unnamed: 0,temperature,year
0,-1.7212,1502
1,-1.6382,1503
2,-0.6422,1504
3,0.1048,1505
4,-0.7252,1506


Missing Value: NaN
- temperature [degC]
- year [yr AD]


## Editing an existing LiPD file

In [2]:
# Dataset is the Main OOP Class
from pylipd.classes.dataset import Dataset

# LiPD is the LiPD parser/writer
from pylipd.classes.datatable import DataTable
from pylipd.lipd import LiPD

# Load LiPD files as usual.
# - This loads the LiPD data into the internal RDF graph
path = '../data/ODP846.Lawrence.2006.lpd'
D = LiPD()
D.load(path)

# Convert the LiPD datasets to the PyLiPD OOP "Dataset" class. 
# - This allows to modify the datasets via OOP calls in memory
# - To write the LiPD back, we have to call the save function
datasets = D.get_datasets()

ds = datasets[0]
pdata = ds.getPaleoData()[0]

# ********************************************
# We will add a table to the paleoData here
# ********************************************
attrs = {
    "site": {
        'number': 1, 
        'variableName': 'site/hole', 
        'hasStandardVariable': 'site', 
        'units': 'unitless', 
        'TSid': 'PYTJ3PSH0LT', 
        'variableType': 'measured', 
        'takenAtDepth': 'depth'
    },
    "ukprime37": {
        'number': 2,
        'interpretation': [
            {
                'rank': 1.0, 
                'scope': 'Climate', 
                'variable': 'temperature', 
                'variableDetail': 'sea surface', 
                'direction': 'positive'
            }
        ], 
        'variableName': 'ukprime37', 
        'resolution': {
            'hasMaxValue': 10.856999999999971, 
            'hasMeanValue': 2.3355875057418465, 
            'hasMedianValue': 2.211999999999989, 
            'hasMinValue': 0.06999999999993634
        }, 
        'units': 'unitless', 
        'TSid': 'PYTM9N6HCQM', 
        'variableType': 'measured', 
        'takenAtDepth': 'depth', 
        'proxyObservationType': 'Uk37Prime'
    },
    "age": {
        'interpretation': [
            {
                'rank': 1.0, 
                'scope': 'Age', 
                'variableDetail': 'calendar', 
                'direction': 'positive'
            }
        ], 
        'number': 3, 
        'variableName': 'age', 
        'hasStandardVariable': 'age', 
        'TSid': 'PYTXJB98403', 
        'variableType': 'inferred', 
        'takenAtDepth': 'depth', 
        'inferredVariableType': 'Age'
    },
    "depth": {
        'number': 4, 
        'variableName': 'depth', 
        'notes': 'depth rmcd', 
        'resolution': {
            'hasMaxValue': 10.856999999999971, 
            'hasMeanValue': 2.3355875057418465, 
            'hasMedianValue': 2.211999999999989, 
            'hasMinValue': 0.06999999999993634
        }, 
        'hasStandardVariable': 'depth', 
        'units': 'm', 
        'TSid': 'PYTKRFVW61B', 
        'variableType': 'measured'
    }
}

# Create a random dataframe
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(100, 4), columns=attrs.keys())
df.attrs = attrs
display(df.head())

newtable = DataTable()
newtable.setFileName("paleo0measurement2.csv")
newtable.setMissingValue("NaN")
newtable.setDataFrame(df)

pdata.addMeasurementTable(newtable)

# Create a lipd from the Dataset ds
savelipd = LiPD()
savelipd.load_datasets([ds])
savelipd.create_lipd(ds.getName(), "./ODP846.Lawrence.2006.updated.lpd")
pass


Loading 1 LiPD files


100%|██████████| 1/1 [00:00<00:00,  1.75it/s]

Loaded..





Unnamed: 0,site,ukprime37,age,depth
0,-0.303637,0.972003,-0.758921,-1.92407
1,-0.798301,-0.431769,1.342539,-0.23497
2,-0.342608,-2.1683,0.170522,0.884602
3,-0.127563,-0.709825,-0.067036,0.747706
4,-1.551063,-1.312118,-0.047366,-0.169749


In [3]:
# Try to load the New LiPD File, and check if everything is ok
D = LiPD()
D.load("./ODP846.Lawrence.2006.updated.lpd")

Loading 1 LiPD files


100%|██████████| 1/1 [00:00<00:00,  1.77it/s]

Loaded..





In [4]:
# Try to load the New LiPD File, and check if everything is ok
D = LiPD()
D.load("./ODP846.Lawrence.2006.updated.lpd")
datasets = D.get_datasets()

# Browse the dataset as usual
for ds in datasets:
    print("\n")
    print(ds.getName())
    print("=========================")
    for pdata in ds.getPaleoData():
        for table in pdata.getMeasurementTables():
            print(f"- Paleo Table: {table.getFileName()}")
            
            # Can get the dataframe for the whole table
            df = table.getDataFrame(use_standard_names=True)
            display(df.head())

            # The returned dataframe also contains the attributes for the variables
            for varname in df.attrs:
                vardata = df.attrs[varname]
                # print(varname, vardata)

            # Can also get the variables one by one and make calls to their class functions
            print(f"Missing Value: {table.getMissingValue()}")
            for var in table.getVariables():
                if var.getUnits():
                    print(f"- {var.getName()} [{var.getUnits().getLabel()}]")
                else:
                    print(f"- {var.getName()}")

Loading 1 LiPD files


100%|██████████| 1/1 [00:00<00:00,  1.86it/s]

Loaded..


ODP846.Lawrence.2006
- Paleo Table: paleo0measurement1.csv





Unnamed: 0,event,depth cr,depth,c. wuellerstorfi d13c,u. peregrina d18o,depth comp,sampleID,u. peregrina d13c,c. wuellerstorfi d18o
0,138-846B,0.12,0.12,3.38,0.14,12.0,138-846B-1H-1,,0.12
1,138-846B,0.23,0.23,3.46,0.01,23.0,138-846B-1H-1,,0.23
2,138-846B,0.33,0.33,3.65,-0.1,33.0,138-846B-1H-1,,0.33
3,138-846B,0.33,0.33,3.88,-0.06,33.0,138-846B-1H-1,,0.33
4,138-846B,0.43,0.43,4.14,-0.17,43.0,138-846B-1H-1,,0.43


Missing Value: nan
- event [unitless]
- depth cr
- depth [m]
- c. wuellerstorfi d13c
- u. peregrina d18o
- depth comp
- sampleID [unitless]
- u. peregrina d13c
- c. wuellerstorfi d18o
- Paleo Table: paleo0measurement2.csv


Unnamed: 0,age,ukprime37,site,depth
0,-0.758921,0.972003,-0.303637,-1.92407
1,1.342539,-0.431769,-0.798301,-0.23497
2,0.170522,-2.1683,-0.342608,0.884602
3,-0.067036,-0.709825,-0.127563,0.747706
4,-0.047366,-1.312118,-1.551063,-0.169749


Missing Value: NaN
- age
- ukprime37 [unitless]
- site [unitless]
- depth [m]
- Paleo Table: paleo0measurement0.csv


Unnamed: 0,c37 total,temp muller,deleteMe,site,section,temp prahl,ukprime37,age,depth
0,2.37,23.545,15-16,846B,1H-1,23.0,0.821,5.228,0.16
1,2.1,23.648,25-26,846B,1H-1,23.1,0.824,8.947,0.26
2,1.87,23.752,35-36,846B,1H-1,23.2,0.828,11.966,0.36
3,2.74,22.515,45-46,846B,1H-1,22.0,0.787,14.427,0.46
4,3.75,22.206,55-56,846B,1H-1,21.7,0.777,16.502,0.56


Missing Value: nan
- c37 total
- temp muller
- deleteMe [cm]
- site [unitless]
- section [unitless]
- temp prahl
- ukprime37 [unitless]
- age
- depth [m]


## Creating a new LiPD file

In [5]:
from pylipd.classes.dataset import Dataset
from pylipd.classes.archivetype import ArchiveTypeConstants
from pylipd.classes.funding import Funding
from pylipd.classes.interpretation import Interpretation
from pylipd.classes.interpretationvariable import InterpretationVariableConstants
from pylipd.classes.location import Location
from pylipd.classes.paleodata import PaleoData
from pylipd.classes.datatable import DataTable
from pylipd.classes.paleounit import PaleoUnitConstants
from pylipd.classes.paleovariable import PaleoVariableConstants
from pylipd.classes.person import Person
from pylipd.classes.publication import Publication
from pylipd.classes.resolution import Resolution
from pylipd.classes.variable import Variable

import json

dataset1 = Dataset()

# Set the name of the dataset
dataset1.setName("TestDataset.2024")
dataset1.id = dataset1.ns + "/" + dataset1.getName()

# Set collection name
dataset1.setCollectionName("TestCollection")

# Set the Archive Type (from a list of constants)
dataset1.setArchiveType(ArchiveTypeConstants.Coral)

# Add a publication
pub1 = Publication()
pub1.setTitle("Sample Publication Title")
person1 = Person(); person1.setName("Deborah Khider")
person2 = Person(); person2.setName("Varun Ratnakar")
pub1.setAuthors([person1, person2])
# Add the publication to the dataset
dataset1.addPublication(pub1)

# Add funding information
funding1 = Funding()
funding1.addGrant("NSF Grant 23423A")
funding1.addInvestigator(person1)
# Add funding to the dataset
dataset1.addFunding(funding1)

# Add location information
loc1 = Location()
loc1.setLatitude("24.21232")
loc1.setLongitude("48.32323")
loc1.setElevation("342")
loc1.setCountry("USA")
# Set location for the dataset
dataset1.setLocation(loc1)

# Create Paleodata table
table1 = DataTable()
table1.setFileName("paleo0measurement1.csv")
table1.setMissingValue("NaN")

# Populate the table with variable data
#
# Option 1: Via a Dataframe with attributes:
# ------------------------------------------
# Create a random dataframe
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(100, 2), columns=["site", "ukprime37"])
# Set column attributes
df.attrs = {
    "site": {
        'number': 1, 
        'variableName': 'site/hole', 
        'units': 'unitless', 
        'TSid': 'PYTJ3PSH0LT', 
        'variableType': 'measured', 
        'takenAtDepth': 'depth'
    },
    "ukprime37": {
        'number': 2,
        'interpretation': [
            {
                'rank': 1.0, 
                'scope': 'Climate', 
                'variable': 'temperature', 
                'variableDetail': 'sea surface', 
                'direction': 'positive'
            }
        ], 
        'variableName': 'ukprime37', 
        'resolution': {
            'hasMaxValue': 10.856999999999971, 
            'hasMeanValue': 2.3355875057418465, 
            'hasMedianValue': 2.211999999999989, 
            'hasMinValue': 0.06999999999993634
        }, 
        'TSid': 'PYTM9N6HCQM', 
        'variableType': 'measured', 
        'takenAtDepth': 'depth' 
    }
}
table1.setDataFrame(df)

# Create Another Paleodata table by setting variables with OOP calls
table2 = DataTable()
table2.setFileName("paleo0measurement2.csv")
table2.setMissingValue("NaN")
#
# Option 2: Via OOP Calls to create each variable
# -----------------------------------------------
# Add a variable
var1 = Variable()
var1.setName("site")
var1.setColumnNumber(1)
var1.setVariableId("PYTJ3PSH0LT")
var1.setVariableType("measured")
var1.set_non_standard_property("takenAtDepth", "depth")
# Set random values for this variable
var1.setValues(json.dumps(np.random.randn(100).tolist()))

# Add another variable
var2 = Variable()
var2.setName("ukprime37")
var2.setColumnNumber(2)
var2.setVariableId('PYTM9N6HCQM')
var2.setVariableType('measured')
var2.setStandardVariable(PaleoVariableConstants.Uk37)
var2.setUnits(PaleoUnitConstants.cm3)

# Add the variable interpretation
interp1 = Interpretation()
interp1.setRank("1")
interp1.setScope("Climate")
interp1.setVariable(InterpretationVariableConstants.temperature)
interp1.setVariableDetail("sea surface")
interp1.setDirection("positive")
var2.addInterpretation(interp1)
# Add the variable resolution
resolution1 = Resolution()
resolution1.setMaxValue(10.856999999999971)
resolution1.setMeanValue(2.3355875057418465)
resolution1.setMedianValue(2.211999999999989)
resolution1.setMinValue(0.06999999999993634)
var2.setResolution(resolution1)

# Set random values for this variable
var2.setValues(json.dumps(np.random.randn(100).tolist()))

table2.setVariables([var1, var2])

# Create Paleodata, and add the created tables to it
paleodata1 = PaleoData()
paleodata1.setMeasurementTables([table1, table2])

dataset1.addPaleoData(paleodata1)

In [6]:
# Browse the dataset as usual
print(dataset1.getName())
print("=========================")
for pdata in dataset1.getPaleoData():
    for table in pdata.getMeasurementTables():
        print(f"- Paleo Table: {table.getFileName()}")
        
        # Can get the dataframe for the whole table
        df = table.getDataFrame(use_standard_names=True)
        display(df.head())

        # The returned dataframe also contains the attributes for the variables
        for varname in df.attrs:
            vardata = df.attrs[varname]
            # print(varname, vardata)

        # Can also get the variables one by one and make calls to their class functions
        print(f"Missing Value: {table.getMissingValue()}")
        for var in table.getVariables():
            if var.getUnits():
                print(f"- {var.getName()} [{var.getUnits().getLabel()}]")
            else:
                print(f"- {var.getName()}")

TestDataset.2024
- Paleo Table: paleo0measurement1.csv


Unnamed: 0,site/hole,ukprime37
0,0.561119,-1.499768
1,-0.929443,-0.169344
2,0.41473,1.894262
3,-0.651496,0.996514
4,-0.263709,0.307729


Missing Value: NaN
- site/hole [unitless]
- ukprime37
- Paleo Table: paleo0measurement2.csv


Unnamed: 0,site,Uk37
0,-1.349216,-1.102813
1,0.060254,1.451194
2,-1.43606,-0.960168
3,0.746369,-2.491131
4,1.431329,-0.530799


Missing Value: NaN
- site
- ukprime37 [cm3]


In [7]:
from pylipd.lipd import LiPD

# Create a lipd from the newly created Dataset ds
lipd1 = LiPD()
lipd1.load_datasets([dataset1])
lipd1.create_lipd(dataset1.getName(), "./TestDataset.2024.lpd")
pass

In [8]:
# Load and browse the newly created LiPD file
from pylipd.classes.dataset import Dataset
from pylipd.lipd import LiPD
path = 'TestDataset.2024.lpd'
D = LiPD()
D.load(path)

datasets = D.get_datasets()

for ds in datasets:
    # Now we can call individual function on the dataset to get its details
    print("\n")
    print(ds.getName())
    print("=========================")
    for funding in ds.getFundings():
        if funding.getGrants():
            print(f"Funding: {funding.getGrants()}")
    
    for pub in ds.getPublications():
        print(f"Publication: {pub.getTitle()} by {list(map(lambda x: x.getName(), pub.getAuthors()))}")

    for pd in ds.getPaleoData():
        for table in pd.getMeasurementTables():
            print(f"- Paleo Table: {table.getFileName()}")
            
            # Can get the dataframe for the whole table
            df = table.getDataFrame(use_standard_names=True)
            display(df.head())

            # The returned dataframe also contains the attributes for the variables
            for varname in df.attrs:
                vardata = df.attrs[varname]
                # print(varname, vardata)

            # Can also get the variables one by one and make calls to their class functions
            print(f"Missing Value: {table.getMissingValue()}")
            for var in table.getVariables():
                if var.getUnits():
                    print(f"- {var.getName()} [{var.getUnits().getLabel()}]")
                else:
                    print(f"- {var.getName()}")

Loading 1 LiPD files


100%|██████████| 1/1 [00:00<00:00, 74.43it/s]

Loaded..


TestDataset.2024
Funding: ['NSF Grant 23423A']
Publication: Sample Publication Title by ['Deborah Khider', 'Varun Ratnakar']
- Paleo Table: paleo0measurement1.csv





Unnamed: 0,ukprime37,site
0,-1.499768,0.561119
1,-0.169344,-0.929443
2,1.894262,0.41473
3,0.996514,-0.651496
4,0.307729,-0.263709


Missing Value: NaN
- ukprime37
- site/hole [unitless]
- Paleo Table: paleo0measurement2.csv


Unnamed: 0,site,Uk37
0,-1.349216,-1.102813
1,0.060254,1.451194
2,-1.43606,-0.960168
3,0.746369,-2.491131
4,1.431329,-0.530799


Missing Value: NaN
- site
- Uk37 [cm3]
