diff --git a/.gitignore b/.gitignore index b29066a..77badb8 100644 --- a/.gitignore +++ b/.gitignore @@ -32,3 +32,4 @@ notebooks/.ipynb_checkpoints/L0_loading_lipd_datasets-checkpoint.ipynb notebooks/.ipynb_checkpoints/L0_lipd_object-checkpoint.ipynb .DS_Store data/~$Oman.Tian.2023.xlsx +.DS_Store diff --git a/data/Oman.Tian.2023.lpd b/data/Oman.Tian.2023.lpd new file mode 100644 index 0000000..b5a1274 Binary files /dev/null and b/data/Oman.Tian.2023.lpd differ diff --git a/notebooks/L3_create_template.ipynb b/notebooks/L3_create_template.ipynb index de4eddf..315eee5 100644 --- a/notebooks/L3_create_template.ipynb +++ b/notebooks/L3_create_template.ipynb @@ -18,7 +18,7 @@ "\n", "If you are planning to only create one LiPD file on your own, we recommend using the [LiPD Playground](https://lipd.net/playground). This tutorial is intended for users who wish to programatically create multiple files from a template. \n", "\n", - "In this example, we use [this templated file](https://github.com/LinkedEarth/pylipdTutorials/blob/main/data/Oman.Tian.2023.xlsx).You can repurpose the Excel template as needed; it is only meant as an example. \n", + "In this example, we use [this templated file](https://github.com/LinkedEarth/pylipdTutorials/blob/main/data/Oman.Tian.2023.xlsx). You can repurpose the Excel template as needed; it is only meant as an example. \n", "\n", "### Goals\n", "\n", @@ -26,7 +26,7 @@ "* Adding an ensemble table \n", "* Save the Dataset to a file\n", "\n", - "Reading Time: 10 minutes\n", + "Reading Time: 40 minutes\n", "\n", "### Keywords\n", "\n", @@ -34,7 +34,47 @@ "\n", "### Pre-requisites\n", "\n", - "An understanding of OOP and the LinkedEarth Ontology. Completion of [Dataset class example](L3_dataset_class.ipynb). An understanding how to [edit LiPD files](L3_editing.ipynb) can also be useful. \n", + "An understanding of OOP and the LinkedEarth Ontology. Completion of [Dataset class example](L3_dataset_class.ipynb). An understanding how to [edit LiPD files](L3_editing.ipynb) can also be useful.\n", + "\n", + "
\n", + " Note: Please read the pre-requisites below as it contains valuable information about the LiPD structure, the LinkedEarth Ontology and their relationship to the classes and methods available in PyLiPD.\n", + "
\n", + "\n", + "For reference, below is a diagram of the classed in `PyliPD`, the methods associated with them, and the resulting objects:\n", + "\n", + "![image](https://github.com/LinkedEarth/pylipd/blob/main/examples/notebooks/UMLDiagram.png?raw=true)\n", + "\n", + "This diagram will help you create the objects and place them in the appropriate nested structure. Each object is associated with methods that allow you to look at the information present or create that information. Have a look at the [documentation on the LiPD classes module](https://pylipd.readthedocs.io/en/latest/api.html#lipd-classes). If you click on any of the classes, you should notice a pattern in the associated methods:\n", + "* `get` + PropertyName allows you to retrieve to values associated with a property\n", + "* `set` + PropertyName allows you to set or change the value for an exisiting property value with another one of type string, float, integer, boolean. If the property value is a list, set will replace any exisitng value already present in the metadata (refer to the diagram below for the expected type). \n", + "* `add` + PropertyName allows you to set or add a value for an exisiting property that takes a list.\n", + "\n", + "In addition, there are two functionalies that allow you to add your custom properties: `set_non_standard_property` and `add_non_standard_property`. For now, these properties can only be used for values that do not require a new class to be created. \n", + "\n", + "
\n", + " Warning: LiPD uses a standard vocabulary for some information.\n", + "
\n", + "\n", + "In order to support querying, LiPD files have several standardized fields, which corresponds in most cases to the terms that NOAA World Data Center for paleoclimatology has also standardized:\n", + "* [archiveType](https://lipdverse.org/vocabulary/archivetype/)\n", + "* [interpretation_seasonality](https://lipdverse.org/vocabulary/interpretation_seasonality/)\n", + "* [interpretation_variable](https://lipdverse.org/vocabulary/interpretation_variable/)\n", + "* [paleoData_proxy](https://lipdverse.org/vocabulary/paleodata_proxy/)\n", + "* [paleoData_units](https://lipdverse.org/vocabulary/paleodata_units/)\n", + "* [paleoData_variableName](https://lipdverse.org/vocabulary/paleodata_variablename/)\n", + "\n", + "Consequently, we have represented these names as objects in the Ontology. Therefore, some information that would naturally be entered as a string (e.g., `coral` as an `archiveType`) should result in an object creation. `PyLiPD` can help you with this, as long as you don't forget to create an object!\n", + "\n", + "There is one big exception to this general rule: `paleoData_variableName`. We understand that in many instances, the name contains more than just the variable name. For instance, paleoceanographers making measurements on the shells of foraminifera often report their variable name with the species name and the geochemical measurement (e.g., \"G. ruber Mg/Ca\"). To preserve this information at the variable level, the [`setName` method](https://pylipd.readthedocs.io/en/latest/api.html#pylipd.classes.variable.Variable.setName) takes a string. To store the `paleoData_variableName` information following the standard vocabulary, use the [`setStandardVariable` method](https://pylipd.readthedocs.io/en/latest/api.html#pylipd.classes.variable.Variable.setStandardVariable).\n", + "\n", "\n", "## Data Description\n", "\n", @@ -47,21 +87,21 @@ }, { "cell_type": "code", - "execution_count": 19, + "execution_count": 22, "id": "a2583558-9d41-4f02-b4c3-441666dfd6e1", "metadata": {}, "outputs": [], "source": [ "from pylipd.classes.dataset import Dataset\n", - "from pylipd.classes.archivetype import ArchiveTypeConstants\n", + "from pylipd.classes.archivetype import ArchiveTypeConstants, ArchiveType\n", "from pylipd.classes.funding import Funding\n", "from pylipd.classes.interpretation import Interpretation\n", - "from pylipd.classes.interpretationvariable import InterpretationVariableConstants\n", + "from pylipd.classes.interpretationvariable import InterpretationVariableConstants, InterpretationVariable\n", "from pylipd.classes.location import Location\n", "from pylipd.classes.paleodata import PaleoData\n", "from pylipd.classes.datatable import DataTable\n", - "from pylipd.classes.paleounit import PaleoUnitConstants\n", - "from pylipd.classes.paleovariable import PaleoVariableConstants\n", + "from pylipd.classes.paleounit import PaleoUnitConstants, PaleoUnit\n", + "from pylipd.classes.paleovariable import PaleoVariableConstants, PaleoVariable\n", "from pylipd.classes.person import Person\n", "from pylipd.classes.publication import Publication\n", "from pylipd.classes.resolution import Resolution\n", @@ -69,8 +109,11 @@ "from pylipd.classes.model import Model\n", "from pylipd.classes.chrondata import ChronData\n", "\n", + "from pylipd import LiPD\n", + "\n", "import pandas as pd\n", "import json\n", + "import numpy as np\n", "\n", "import re" ] @@ -94,17 +137,31 @@ "\n", "Let's start with the root metadata portion.\n", "\n", - "#### Metadata" + "### Metadata\n", + "\n", + "If you have a look at the Metadata sheet in the Excel file, you should notice that the information is orgnaized in four sections:\n", + "* root metadata, which contains general information about the dataset such as its name and the type of archive the measurements were made on (e.g., coral, speleothem).\n", + "* Publication information - Note that if more than one publication is associated with the dataset, then this information can be added in seperate columns. \n", + "* Location information\n", + "* Funding information\n", + "\n", + "The first step is to create a function to separate these different sections into four dataframes:" ] }, { "cell_type": "code", - "execution_count": 15, + "execution_count": 24, "id": "41bb1f05-59ac-4d28-a87a-8343f47169ee", "metadata": {}, "outputs": [], "source": [ "def read_metadata(df):\n", + " '''\n", + " Reads the inforamtion contained in the metadata sheet of the Excel template.\n", + " Note that the algorithm uses the blank lines in the template to denote the block (e.g., publication).\n", + "\n", + " The code returns 4 pieces of information: root meatadata, location metadata, funding metadata, and publication metadata.\n", + " '''\n", " # Check for empty rows across all columns\n", " empty_rows = df.isnull().all(axis=1)\n", " \n", @@ -146,7 +203,7 @@ }, { "cell_type": "code", - "execution_count": 16, + "execution_count": 25, "id": "7a7aef60-3dd9-4f3d-844c-3c01f15c2382", "metadata": {}, "outputs": [], @@ -159,7 +216,7 @@ }, { "cell_type": "code", - "execution_count": 17, + "execution_count": 26, "id": "bece25be-7ad6-4d27-ac3d-663bd4aef875", "metadata": {}, "outputs": [], @@ -178,13 +235,2520 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 27, "id": "caf41a37-4a2e-47da-b460-24b4fc1fa012", "metadata": {}, "outputs": [], "source": [ "ds = Dataset()" ] + }, + { + "cell_type": "markdown", + "id": "334bcd6f-5044-4ed1-9693-81fd03826366", + "metadata": {}, + "source": [ + "Let's go over each of the information that we have stored in Pandas Dataframe, namely some root information such as the name of the dataset, geographical location, information about the source of funding, and publication(s) associated with the data. \n", + "\n", + "#### Root metadata\n", + "\n", + "Let's start with the root information:" + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "id": "a739f953-9896-4494-9d40-3073c5f9d34c", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
Unnamed: 0Unnamed: 1Unnamed: 2
0Dataset Name (siteName.firstAuthor.year)Oman.Tian.2023NaN
1Archive TypespeleothemNaN
2Original Source_URL (if applicable)https://www.nature.com/articles/s41467-023-404...NaN
3Investigators (Lastname, first; lastname2, fir...Tian, Y., Fleitmann, D., Zhang, Q., Sha, L. J,...NaN
\n", + "
" + ], + "text/plain": [ + " Unnamed: 0 \\\n", + "0 Dataset Name (siteName.firstAuthor.year) \n", + "1 Archive Type \n", + "2 Original Source_URL (if applicable) \n", + "3 Investigators (Lastname, first; lastname2, fir... \n", + "\n", + " Unnamed: 1 Unnamed: 2 \n", + "0 Oman.Tian.2023 NaN \n", + "1 speleothem NaN \n", + "2 https://www.nature.com/articles/s41467-023-404... NaN \n", + "3 Tian, Y., Fleitmann, D., Zhang, Q., Sha, L. J,... NaN " + ] + }, + "execution_count": 28, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "root" + ] + }, + { + "cell_type": "markdown", + "id": "cf596936-2859-471e-b577-59fb5f2e1ebb", + "metadata": {}, + "source": [ + "Let's add this information to the `Dataset` object. You can see a list of possible properties under the [`Dataset` class in the documentation](https://pylipd.readthedocs.io/en/latest/api.html#pylipd.classes.dataset.Dataset). Remember from our previous tutorial that `set+PropertyName` is meant to create the information, which is what we will be doing here. If you cannot find you property in the list, don't panic! The LiPD format is flexible so you can add your own properties using the [`set_non_standard_property(key,value)` function](https://pylipd.readthedocs.io/en/latest/api.html#pylipd.classes.dataset.Dataset.set_non_standard_property). \n", + "\n", + "Also, remember that archiveType needs to be set as an object. To do so, we can use the `.from_synomym` method that is applicable to all objects making up the standard vocabulary (see the pre-requistes section from this notebook for more details):" + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "id": "ce979af8-7a06-4424-9f03-def613e2bf4d", + "metadata": {}, + "outputs": [], + "source": [ + "ds.setName(root.iloc[0,1]) # set the name for the dataset\n", + "archiveType = ArchiveType.from_synonym(root.iloc[1,1]) # Try to identify a standard archiveType name from the vocabulary. \n", + "if archiveType is not None:\n", + " ds.setArchiveType(archiveType) #set the archive type\n", + "else:\n", + " raise ValueError('This archiveType is not part of the standard vocabulary; check your spelling or create a new one')\n", + "ds.setOriginalDataUrl(root.iloc[2,1]) # set the original data URL" + ] + }, + { + "cell_type": "markdown", + "id": "7c8b80e0-8720-4437-b56e-b84f0956b23b", + "metadata": {}, + "source": [ + "The [`ArchiveTypeConstant` object](https://pylipd.readthedocs.io/en/latest/api.html#pylipd.classes.archivetype.ArchiveTypeConstants) also allows you to see what is available in the standard vocabulary and set it directly as:\n", + "\n", + "```ds.setArchiveType(ArchiveTypeConstants.Speleothem)```\n", + "\n", + "Just make sure you impart the `ArchiveTypeConstants` object first. " + ] + }, + { + "cell_type": "markdown", + "id": "fc89e13d-2f47-41a0-816c-6c298ec41f94", + "metadata": {}, + "source": [ + "Let's check if the information was stored properly into a `ArchiveType` object:" + ] + }, + { + "cell_type": "code", + "execution_count": 33, + "id": "8cedfdb9-9b22-4828-b39b-6e485bca684f", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'Speleothem'" + ] + }, + "execution_count": 33, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "ds.getArchiveType().label" + ] + }, + { + "cell_type": "markdown", + "id": "45863daa-28b9-4f4b-936e-4632a13cd30a", + "metadata": {}, + "source": [ + "Everything looks good! But what happens if you trigger the `ValueError` in the cell above?\n", + "\n", + "##### Dealing with new standard terms\n", + "\n", + "The error raises two possibilities:\n", + "1. You misspelled the name. The `.from_synonym` function is for exact matches with terms found in the thesaurus. Using a language model to automatically guess could result in errors. This may not be a problem for the archive but you can imagine that the variable name with various isotopes can have very slight changes that have very different scientific interpretations but are close enough that a language models may interpret them as similar items (e.g., d18O and d17O). In an abundance of caution, we decided to not allow for fuzzy matches.\n", + "2. You are actually creating a new archiveType. In this case, you need to create a new class in the ontology to represent your new archiveType. In short you need to create a new archiveType object in `PyLiPD`:" + ] + }, + { + "cell_type": "code", + "execution_count": 35, + "id": "10b0724e-9052-4fac-baa9-825b9c7c5318", + "metadata": {}, + "outputs": [], + "source": [ + "from pylipd.globals.urls import ARCHIVEURL\n", + "mynewarchive = ArchiveType(f\"{ARCHIVEURL}#MyArchiveType\", \"MyArchiveType\")" + ] + }, + { + "cell_type": "markdown", + "id": "358fa813-79d1-4920-a076-a2f1be502b69", + "metadata": {}, + "source": [ + "Yo have now created a new `ArchiveType` object that can be inserted in your dataset. Just make sure that the `label` and `id` field look like the following: " + ] + }, + { + "cell_type": "code", + "execution_count": 36, + "id": "3c10d695-1259-4a01-b810-82ca62f05b13", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "label: MyArchiveType\n", + "ID: http://linked.earth/ontology/archive#MyArchiveType\n" + ] + } + ], + "source": [ + "print('label: '+mynewarchive.label)\n", + "print('ID: '+mynewarchive.id)" + ] + }, + { + "cell_type": "markdown", + "id": "5a23cfc4-62b8-4edd-ba56-61d71eaabdf6", + "metadata": {}, + "source": [ + "Remember that you will need to do this for every field that has been standardized in the vocabulary:\n", + "* [archiveType](https://lipdverse.org/vocabulary/archivetype/)\n", + "* [interpretation_seasonality](https://lipdverse.org/vocabulary/interpretation_seasonality/)\n", + "* [interpretation_variable](https://lipdverse.org/vocabulary/interpretation_variable/)\n", + "* [paleoData_proxy](https://lipdverse.org/vocabulary/paleodata_proxy/)\n", + "* [paleoData_units](https://lipdverse.org/vocabulary/paleodata_units/)\n", + "* [paleoData_variableName](https://lipdverse.org/vocabulary/paleodata_variablename/)\n", + "\n", + "As with the `ArchiveType` each of these have a corresponding `ObjectConstants` that can be used to see what terms are already available. See the [LiPD Controlled Vocabulary module](https://pylipd.readthedocs.io/en/latest/api.html#lipd-controlled-vocabulary) for details." + ] + }, + { + "cell_type": "markdown", + "id": "b09994c7-1811-4f5b-82de-9ef17343811e", + "metadata": {}, + "source": [ + "The next step is to enter the investigators, which takes a list of `Person` objects (see Figure in the Preamble): " + ] + }, + { + "cell_type": "code", + "execution_count": 37, + "id": "0f166950-75e9-4a0c-9b6d-1290354e3915", + "metadata": {}, + "outputs": [], + "source": [ + "authors = root.iloc[3,1]\n", + "\n", + "# Step 1: Split the string by commas\n", + "parts = authors.split(',')\n", + "\n", + "# Prepare a list to hold the formatted names\n", + "investigators = []\n", + "\n", + "# Step 2: Iterate over the parts to process each\n", + "for i in range(0, len(parts) - 1, 2): # Step by 2 since each name and initial are next to each other\n", + " last_name = parts[i].strip() # Remove any leading/trailing whitespace\n", + " initial = parts[i + 1].strip() # The initial follows the last name\n", + " person = Person() # create the Person object\n", + " person.setName(f\"{last_name}, {initial}\")\n", + " investigators.append(person)\n", + "\n", + "# Step 3: Store the list of Persons into the ds object\n", + "ds.setInvestigators(investigators)" + ] + }, + { + "cell_type": "markdown", + "id": "6b9a39eb-c483-437f-ad50-ecd1c0f5aa35", + "metadata": {}, + "source": [ + "Let's have a quick look at what we've done so far:" + ] + }, + { + "cell_type": "code", + "execution_count": 38, + "id": "138f58c7-7c7b-4c67-895e-7be9352c63e9", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'Oman.Tian.2023'" + ] + }, + "execution_count": 38, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "ds.getName()" + ] + }, + { + "cell_type": "code", + "execution_count": 39, + "id": "79a5a30a-00ae-4898-8f62-2605e61f7f68", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'Tian, Y.'" + ] + }, + "execution_count": 39, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "ds.getInvestigators()[0].getName() #get the name of the first person in the list" + ] + }, + { + "cell_type": "markdown", + "id": "598b0709-e35e-4b75-891e-b79f28f514c0", + "metadata": {}, + "source": [ + "Everything looks go so far. " + ] + }, + { + "cell_type": "markdown", + "id": "e345982a-2288-46fe-b172-c4bd2a6c4fd6", + "metadata": {}, + "source": [ + "#### Publication metadata" + ] + }, + { + "cell_type": "code", + "execution_count": 40, + "id": "d1ccb398-a1ea-4c8f-bbc8-38d44b6cebf1", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
5Publication SectionRef #1Ref #2
0Authors (last, first; last2, first2; separate ...Tian, Y., Fleitmann, D., Zhang, Q., Sha, L. J,...NaN
1Publication titleHolocene climate change in southern Oman decip...NaN
2JournalNature CommunicationsNaN
3Year2023NaN
4VolumeNaNNaN
5IssueNaNNaN
6PagesNaNNaN
7Report NumberNaNNaN
8DOI10.1038/s41467-023-40454-zNaN
9AbstractQunf Cave oxygen isotope (δ18Oc) record from s...NaN
10Alternate citation in paragraph format (For bo...NaNNaN
\n", + "
" + ], + "text/plain": [ + "5 Publication Section \\\n", + "0 Authors (last, first; last2, first2; separate ... \n", + "1 Publication title \n", + "2 Journal \n", + "3 Year \n", + "4 Volume \n", + "5 Issue \n", + "6 Pages \n", + "7 Report Number \n", + "8 DOI \n", + "9 Abstract \n", + "10 Alternate citation in paragraph format (For bo... \n", + "\n", + "5 Ref #1 Ref #2 \n", + "0 Tian, Y., Fleitmann, D., Zhang, Q., Sha, L. J,... NaN \n", + "1 Holocene climate change in southern Oman decip... NaN \n", + "2 Nature Communications NaN \n", + "3 2023 NaN \n", + "4 NaN NaN \n", + "5 NaN NaN \n", + "6 NaN NaN \n", + "7 NaN NaN \n", + "8 10.1038/s41467-023-40454-z NaN \n", + "9 Qunf Cave oxygen isotope (δ18Oc) record from s... NaN \n", + "10 NaN NaN " + ] + }, + "execution_count": 40, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "pub" + ] + }, + { + "cell_type": "markdown", + "id": "51e3f097-31a6-4d1c-862a-01e60f179988", + "metadata": {}, + "source": [ + "The first step is to create a [`Publication` object](https://pylipd.readthedocs.io/en/latest/api.html#pylipd.classes.publication.Publication). In this case, we only have one publication; otherwise you may need to loop over the various columns in the dataframe to create others. " + ] + }, + { + "cell_type": "code", + "execution_count": 41, + "id": "f4b516e3-e7f4-4f11-a1c0-e757c44e5363", + "metadata": {}, + "outputs": [], + "source": [ + "pub1 = Publication()" + ] + }, + { + "cell_type": "markdown", + "id": "b6ac7a77-3ff9-47cc-828d-b2e06b303a7b", + "metadata": {}, + "source": [ + "And now let's add the information. Let's start with the authors:" + ] + }, + { + "cell_type": "code", + "execution_count": 42, + "id": "80523c9f-5ffe-45bb-9063-7e96356971b3", + "metadata": {}, + "outputs": [], + "source": [ + "authors = pub.iloc[0,1]\n", + "\n", + "# Step 1: Split the string by commas\n", + "parts = authors.split(',')\n", + "\n", + "# Prepare a list to hold the formatted names\n", + "investigators = []\n", + "\n", + "# Step 2: Iterate over the parts to process each\n", + "for i in range(0, len(parts) - 1, 2): # Step by 2 since each name and initial are next to each other\n", + " last_name = parts[i].strip() # Remove any leading/trailing whitespace\n", + " initial = parts[i + 1].strip() # The initial follows the last name\n", + " person = Person() # create the Person object\n", + " person.setName(f\"{last_name}, {initial}\")\n", + " investigators.append(person)\n", + "\n", + "# Step 3: Store the list of Persons into the ds object\n", + "pub1.setAuthors(investigators)" + ] + }, + { + "cell_type": "markdown", + "id": "3cfce393-b5a4-4c89-bb31-378ada81a265", + "metadata": {}, + "source": [ + "Let's add other information:" + ] + }, + { + "cell_type": "code", + "execution_count": 43, + "id": "088c420f-4549-42be-9849-02e2eda13070", + "metadata": {}, + "outputs": [], + "source": [ + "pub1.setTitle(pub.iloc[1,1])\n", + "pub1.setJournal(pub.iloc[2,1])\n", + "pub1.setYear(pub.iloc[3,1])\n", + "pub1.setDOI(pub.iloc[8,1])\n", + "pub1.setAbstract(pub.iloc[9,1])" + ] + }, + { + "cell_type": "markdown", + "id": "313f4ccb-3de8-4c32-974d-03cf867970bb", + "metadata": {}, + "source": [ + "Let's add our `Publication` object to the `Dataset` object:" + ] + }, + { + "cell_type": "code", + "execution_count": 44, + "id": "71bc4205-9f09-4a8d-ba41-4f2573bc925b", + "metadata": {}, + "outputs": [], + "source": [ + "ds.addPublication(pub1)" + ] + }, + { + "cell_type": "markdown", + "id": "73d74efb-5222-46eb-bf57-558ee7b3302b", + "metadata": {}, + "source": [ + "#### Location metadata\n", + "\n", + "Let's add geographical information next. First, we need to create a [`Location` object](https://pylipd.readthedocs.io/en/latest/api.html#pylipd.classes.location.Location):" + ] + }, + { + "cell_type": "code", + "execution_count": 45, + "id": "f63e4e51-bf68-46b3-b36b-58ffe8dd2f55", + "metadata": {}, + "outputs": [], + "source": [ + "loc = Location()" + ] + }, + { + "cell_type": "code", + "execution_count": 47, + "id": "29763ff6-ec1a-4a63-9d68-fe4da739f6b2", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
18Site InformationUse appropriate significant digits for all valuesNaN
0Northernmost latitude (decimal degree, South n...17.17NaN
1Southernmost latitude (decimal degree, South n...17.17NaN
2Easternmost longitude (decimal degree, West ne...54.3NaN
3Westernmost longitude (decimal degree, West ne...54.3NaN
4elevation (m), below sea level negative650NaN
\n", + "
" + ], + "text/plain": [ + "18 Site Information \\\n", + "0 Northernmost latitude (decimal degree, South n... \n", + "1 Southernmost latitude (decimal degree, South n... \n", + "2 Easternmost longitude (decimal degree, West ne... \n", + "3 Westernmost longitude (decimal degree, West ne... \n", + "4 elevation (m), below sea level negative \n", + "\n", + "18 Use appropriate significant digits for all values NaN \n", + "0 17.17 NaN \n", + "1 17.17 NaN \n", + "2 54.3 NaN \n", + "3 54.3 NaN \n", + "4 650 NaN " + ] + }, + "execution_count": 47, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "geo" + ] + }, + { + "cell_type": "code", + "execution_count": 48, + "id": "2e22bef4-4c3d-4248-b882-292587d8c4d4", + "metadata": {}, + "outputs": [], + "source": [ + "loc.setLatitude(geo.iloc[0,1])\n", + "loc.setLongitude(geo.iloc[2,1])\n", + "loc.setElevation(geo.iloc[4,1])" + ] + }, + { + "cell_type": "markdown", + "id": "faff6e86-0de6-45f6-a8b7-eaba6032d696", + "metadata": {}, + "source": [ + "Let's add the `Location` object into the `Dataset`:" + ] + }, + { + "cell_type": "code", + "execution_count": 49, + "id": "affb61d5-1b64-4ef0-a9be-89ddb8efcf29", + "metadata": {}, + "outputs": [], + "source": [ + "ds.setLocation(loc)" + ] + }, + { + "cell_type": "markdown", + "id": "235f1b08-297d-492c-9d31-865a746fcc9f", + "metadata": {}, + "source": [ + "#### Funding metadata\n", + "\n", + "Finally, let's look at funding information:" + ] + }, + { + "cell_type": "code", + "execution_count": 50, + "id": "0bf779ea-32e1-48c7-b56c-b24a2fd300b0", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
26Funding_AgencyAny additional Funding agencies and grants should be entered in Columns C,D, etc.NaN
0Funding_Agency_NameNaNNaN
1GrantNaNNaN
2Principal_InvestigatorNaNNaN
3countryNaNNaN
\n", + "
" + ], + "text/plain": [ + "26 Funding_Agency \\\n", + "0 Funding_Agency_Name \n", + "1 Grant \n", + "2 Principal_Investigator \n", + "3 country \n", + "\n", + "26 Any additional Funding agencies and grants should be entered in Columns C,D, etc. \\\n", + "0 NaN \n", + "1 NaN \n", + "2 NaN \n", + "3 NaN \n", + "\n", + "26 NaN \n", + "0 NaN \n", + "1 NaN \n", + "2 NaN \n", + "3 NaN " + ] + }, + "execution_count": 50, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "fund" + ] + }, + { + "cell_type": "markdown", + "id": "3783c254-7e64-4cb5-af35-1e90a65157fa", + "metadata": {}, + "source": [ + "Since no information is available, let's move on to the PaleoData Section.\n", + "\n", + "### PaleoData" + ] + }, + { + "cell_type": "code", + "execution_count": 51, + "id": "72871d71-a416-4679-a235-aafb869c215b", + "metadata": {}, + "outputs": [], + "source": [ + "sheet_name = 'paleo1measurementTable1'\n", + "\n", + "# Read the information into a Pandas DataFrame\n", + "pd_df = pd.read_excel(file_path, sheet_name=sheet_name, header=None)\n", + "\n", + "# Drop completely empty rows\n", + "pd_df = pd_df.dropna(how=\"all\").reset_index(drop=True)" + ] + }, + { + "cell_type": "markdown", + "id": "ba8323c3-c11d-4c3b-86b6-c9714fca273e", + "metadata": {}, + "source": [ + "Let's create a [`PaleoData` object](https://pylipd.readthedocs.io/en/latest/api.html#pylipd.classes.paleodata.PaleoData):" + ] + }, + { + "cell_type": "code", + "execution_count": 52, + "id": "216bb4f7-eaff-42ac-88bc-aa0604c6bc98", + "metadata": {}, + "outputs": [], + "source": [ + "paleodata = PaleoData()" + ] + }, + { + "cell_type": "markdown", + "id": "523606f1-687f-4302-9d34-f8b6de553c18", + "metadata": {}, + "source": [ + "Our next step is to create measurement tables in this object. To do so, we can use the [`DataTable` object](https://pylipd.readthedocs.io/en/latest/api.html#pylipd.classes.datatable.DataTable): " + ] + }, + { + "cell_type": "code", + "execution_count": 53, + "id": "cdb8c0d7-5432-4551-8aac-87afa170f13f", + "metadata": {}, + "outputs": [], + "source": [ + "table = DataTable()" + ] + }, + { + "cell_type": "markdown", + "id": "5d7e9780-b2fd-455e-9caa-cc3605034f48", + "metadata": {}, + "source": [ + "Now let's add some information about the table such as the name and the value use for missin values in the data:" + ] + }, + { + "cell_type": "code", + "execution_count": 54, + "id": "6174fe5e-93c6-48da-84ff-2c651188b307", + "metadata": {}, + "outputs": [], + "source": [ + "table.setFileName(\"paleo0measurement0.csv\")\n", + "table.setMissingValue(\"NaN\")" + ] + }, + { + "cell_type": "markdown", + "id": "e2551f01-6a1a-4e7b-b392-5c9f60fe75b0", + "metadata": {}, + "source": [ + "The next step is to add columns to our table. In other words, we need to create some variables.\n", + "\n", + "Let's have a look at the sheet in the excel file. It contains three sections:\n", + "* Notes, which would be attached to the table\n", + "* Variables information such as the name, units... Each row in the template represents metadata information for each of the column.\n", + "* Data, which contains the numerical values for the variables.\n", + "\n", + "Let's first create an algorithm that separates the various section and returns the information in three dataframes (Notes, Variables, and Data)." + ] + }, + { + "cell_type": "code", + "execution_count": 55, + "id": "9357a40a-bc3e-48e1-be9e-61088faa9028", + "metadata": {}, + "outputs": [], + "source": [ + "def extract_data(df): \n", + " '''\n", + " This function extracts the relevant sections for measurementTables. \n", + " '''\n", + " # Find the index positions of the section headers\n", + " notes_start = df[df[0] == \"Notes\"].index[0]\n", + " variables_start = df[df[0] == \"Variables\"].index[0]\n", + " data_start = df[df[0] == \"Data\"].index[0]\n", + "\n", + " # Extract sections, ensuring blank rows are removed\n", + " df_notes = df.iloc[notes_start + 1:variables_start].dropna(how=\"all\").reset_index(drop=True)\n", + "\n", + " # Extract the Variables section\n", + " df_variables_raw = df.iloc[variables_start + 1:data_start].dropna(how=\"all\").reset_index(drop=True)\n", + "\n", + " # Set the first row as the header for the Variables section\n", + " df_variables = df_variables_raw[1:].reset_index(drop=True) # Data rows\n", + " df_variables.columns = df_variables_raw.iloc[0] # Set first row as column headers\n", + "\n", + " # Extract the Data section\n", + " df_data_raw = df.iloc[data_start + 2:].dropna(how=\"all\").reset_index(drop=True)\n", + "\n", + " # Correctly skip the first row and set the second row as the header\n", + " df_data = df_data_raw.iloc[1:].reset_index(drop=True) # Skip first row, keep rest\n", + " df_data.columns = df_data_raw.iloc[0] # Use second row as column headers\n", + " df_data = df_data.dropna(axis=1, how=\"all\") # Drop the columns with NaN.\n", + "\n", + " return df_notes, df_variables, df_data" + ] + }, + { + "cell_type": "code", + "execution_count": 56, + "id": "f51c521a-9d0b-405d-8bdf-cb7721377b90", + "metadata": {}, + "outputs": [], + "source": [ + "df_notes, df_variables, df_data = extract_data(pd_df)" + ] + }, + { + "cell_type": "markdown", + "id": "4ec7d526-981c-46eb-95a2-2502bd0596f6", + "metadata": {}, + "source": [ + "Let's loop over the variables and get the relevant information. We will also be calculating some relevant information such as the average value and average resolution.\n", + "\n", + "In LiPD, each variable is also given a unique ID. The function below generates one: " + ] + }, + { + "cell_type": "code", + "execution_count": 57, + "id": "b4a1d46b-225a-4a16-9b92-b130dacdca6c", + "metadata": {}, + "outputs": [], + "source": [ + "import uuid\n", + "\n", + "def generate_unique_id(prefix='PYD'):\n", + " # Generate a random UUID\n", + " random_uuid = uuid.uuid4() # Generates a random UUID.\n", + " \n", + " # Convert UUID format to the specific format we need\n", + " # UUID is usually in the form '1e2a2846-2048-480b-9ec6-674daef472bd' so we slice and insert accordingly\n", + " id_str = str(random_uuid)\n", + " formatted_id = f\"{prefix}-{id_str[:5]}-{id_str[9:13]}-{id_str[14:18]}-{id_str[19:23]}-{id_str[24:28]}\"\n", + " \n", + " return formatted_id" + ] + }, + { + "cell_type": "markdown", + "id": "29d2ebec-d3cd-4c4d-b980-96a22fbc1798", + "metadata": {}, + "source": [ + "Also, remember that some of these fiels should result in objects (namely, variableName and units). Let's see if we can get the information from the synonyms before we proceed:" + ] + }, + { + "cell_type": "code", + "execution_count": 58, + "id": "0cf6b3ea-37bd-4ff4-8eb7-9b857090c91e", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
variableNamevariableTypeUnitsProxyObservationTypeInferredVariableTypeTakenAtDepthInferredFromnotesInterpretation1_variableInterpretation1_variableDetail...calibration_referencecalibration_uncertaintycalibration_uncertaintyTypesensorGenussensorSpeciesPhysicalSample_NamePhysicalSample_IdentifierPhysicalSample_hasIGSNPhysicalSample_housedAtPhysicalSample_collectionMethod
0DepthNaNmmNaNNaNNaNNaNNaNNaNNaN...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
1AgeNaNyear BPNaNNaNNaNNaNNaNNaNNaN...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
2d18ONaNper mil VPDBNaNNaNNaNNaNNaNNaNNaN...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
3d13CNaNper mil VPDBNaNNaNNaNNaNNaNNaNNaN...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
\n", + "

4 rows × 28 columns

\n", + "
" + ], + "text/plain": [ + "0 variableName variableType Units ProxyObservationType \\\n", + "0 Depth NaN mm NaN \n", + "1 Age NaN year BP NaN \n", + "2 d18O NaN per mil VPDB NaN \n", + "3 d13C NaN per mil VPDB NaN \n", + "\n", + "0 InferredVariableType TakenAtDepth InferredFrom notes \\\n", + "0 NaN NaN NaN NaN \n", + "1 NaN NaN NaN NaN \n", + "2 NaN NaN NaN NaN \n", + "3 NaN NaN NaN NaN \n", + "\n", + "0 Interpretation1_variable Interpretation1_variableDetail ... \\\n", + "0 NaN NaN ... \n", + "1 NaN NaN ... \n", + "2 NaN NaN ... \n", + "3 NaN NaN ... \n", + "\n", + "0 calibration_reference calibration_uncertainty calibration_uncertaintyType \\\n", + "0 NaN NaN NaN \n", + "1 NaN NaN NaN \n", + "2 NaN NaN NaN \n", + "3 NaN NaN NaN \n", + "\n", + "0 sensorGenus sensorSpecies PhysicalSample_Name PhysicalSample_Identifier \\\n", + "0 NaN NaN NaN NaN \n", + "1 NaN NaN NaN NaN \n", + "2 NaN NaN NaN NaN \n", + "3 NaN NaN NaN NaN \n", + "\n", + "0 PhysicalSample_hasIGSN PhysicalSample_housedAt \\\n", + "0 NaN NaN \n", + "1 NaN NaN \n", + "2 NaN NaN \n", + "3 NaN NaN \n", + "\n", + "0 PhysicalSample_collectionMethod \n", + "0 NaN \n", + "1 NaN \n", + "2 NaN \n", + "3 NaN \n", + "\n", + "[4 rows x 28 columns]" + ] + }, + "execution_count": 58, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df_variables" + ] + }, + { + "cell_type": "markdown", + "id": "b80ab5d4-e474-414d-a27d-06bae945b7dd", + "metadata": {}, + "source": [ + "Let's start with identifying proper objects for the variableNames:" + ] + }, + { + "cell_type": "code", + "execution_count": 62, + "id": "baa81116-cefa-4b07-950d-ff6859345f9d", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{'Depth': 'depth', 'Age': 'age', 'd18O': 'd18O', 'd13C': 'd13C'}" + ] + }, + "execution_count": 62, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "check_names = {}\n", + "for index, row in df_variables.iterrows():\n", + " check_names[row['variableName']]= PaleoVariable.from_synonym(row['variableName']).label\n", + "\n", + "check_names" + ] + }, + { + "cell_type": "markdown", + "id": "7486aa7b-0103-40b0-9215-36c0709cf1e0", + "metadata": {}, + "source": [ + "This looks good for the variable names. Let's move on to the units:" + ] + }, + { + "cell_type": "code", + "execution_count": 65, + "id": "95877823-91ed-44d7-bfb7-6ae95f2c0586", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{'mm': 'mm', 'year BP': 'yr BP', 'per mil VPDB': None}" + ] + }, + "execution_count": 65, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "check_units = {}\n", + "for index, row in df_variables.iterrows():\n", + " try:\n", + " check_units[row['Units']]= PaleoUnit.from_synonym(row['Units']).label\n", + " except:\n", + " check_units[row['Units']]= None\n", + "\n", + "check_units" + ] + }, + { + "cell_type": "markdown", + "id": "3e35515a-b33e-4b88-82bc-0d8bdee53814", + "metadata": {}, + "source": [ + "The \"per mil VPDB\" unit is not recognized automatically (not in the thesaurus). Let's have a look at the [standard unit names]() and see if we can manually select one that will match.\n", + "\n", + "The [permil](https://lipdverse.org/vocabulary/paleodata_units/#permil) entry will match, so let's use this. " + ] + }, + { + "cell_type": "code", + "execution_count": 67, + "id": "4bc988d2-22fb-41b6-a8d2-4cc9c809345f", + "metadata": {}, + "outputs": [], + "source": [ + "variables = []\n", + "\n", + "res = df_data.iloc[:, 1].diff()[1:].to_numpy()\n", + "Res = Resolution() # create a Resolution object - it will be the same for all variables since it is based on time\n", + "Res.setMinValue(np.min(res))\n", + "Res.setMaxValue(np.max(res))\n", + "Res.setMeanValue(np.mean(res))\n", + "Res.setMedianValue(np.median(res))\n", + "\n", + "for index, row in df_variables.iterrows():\n", + " var = Variable()\n", + " var.setName(row['variableName']) # name of the variable\n", + " # Now let's do the standard name\n", + " var.setStandardVariable(PaleoVariable.from_synonym(row['variableName']))\n", + " var.setColumnNumber(index+1) #The column in which the data is stored. Note that LiPD uses index 1\n", + " var.setVariableId(generate_unique_id(prefix='TIAN')) # create a unique ID for the variable\n", + " # Units\n", + " if row['Units']=='per mil VPDB':\n", + " var.setUnits(PaleoUnit.from_synonym('permil'))\n", + " else:\n", + " var.setUnits(PaleoUnit.from_synonym(row['Units']))\n", + " # Make sure the data is JSON writable (no numpy arrays or Pandas DataFrame)\n", + " var.setValues(json.dumps(df_data.iloc[:,index].tolist()))\n", + " # Calculate some metadata about the values - this makes it easier to do some queries later on, including looking for data in a particular time slice. \n", + " var.setMinValue(df_data.iloc[:,index].min())\n", + " var.setMaxValue(df_data.iloc[:,index].max())\n", + " var.setMeanValue(df_data.iloc[:,index].mean())\n", + " var.setMedianValue(df_data.iloc[:,index].median())\n", + " # Attach the resolution metadata information to the variable\n", + " var.setResolution(Res)\n", + " # append in the list\n", + " variables.append(var) " + ] + }, + { + "cell_type": "markdown", + "id": "9360b4cd-b0b8-49da-bb77-6492c62b8068", + "metadata": {}, + "source": [ + "Let's now put our variables in the `DataTable`:" + ] + }, + { + "cell_type": "code", + "execution_count": 68, + "id": "15293327-1ebc-4a84-8ac5-9e5fcb7ec330", + "metadata": {}, + "outputs": [], + "source": [ + "table.setVariables(variables)" + ] + }, + { + "cell_type": "markdown", + "id": "8378eeaa-5db8-46dc-9aa5-c1ef64d33e12", + "metadata": {}, + "source": [ + "The `Table` into the `PaleoData` object:" + ] + }, + { + "cell_type": "code", + "execution_count": 69, + "id": "320e7cc8-df59-43bf-95f8-a45f678aeec0", + "metadata": {}, + "outputs": [], + "source": [ + "paleodata.setMeasurementTables([table])" + ] + }, + { + "cell_type": "markdown", + "id": "24cb8e42-9879-46c2-b508-0c42dab6d529", + "metadata": {}, + "source": [ + "And finally, the `PaleoData` object into the `Dataset`:" + ] + }, + { + "cell_type": "code", + "execution_count": 70, + "id": "50787580-21dc-4edf-91d4-868a8d2742ac", + "metadata": {}, + "outputs": [], + "source": [ + "ds.addPaleoData(paleodata)" + ] + }, + { + "cell_type": "markdown", + "id": "07c969a7-b5f3-44bb-a0c3-e0a257e49249", + "metadata": {}, + "source": [ + "### ChronData\n", + "\n", + "The next step is to create a ChronData object to store the information about chronology. In the last section, we used an OOP approach to add the information about each variable. In this section, we will use an approach involving `Pandas DataFrame`.\n", + "\n", + "Let's open the data. The same function we wrote to read in the PaleoData can be used here since the template is the same for both objects." + ] + }, + { + "cell_type": "code", + "execution_count": 71, + "id": "550e6e2e-2c98-40b1-83f9-5336aef3ac18", + "metadata": {}, + "outputs": [], + "source": [ + "sheet_name = 'chron1measurementTable1'\n", + "\n", + "# Read the information into a Pandas DataFrame\n", + "cd_df = pd.read_excel(file_path, sheet_name=sheet_name, header=None)\n", + "\n", + "# Drop completely empty rows\n", + "cd_df = cd_df.dropna(how=\"all\").reset_index(drop=True)" + ] + }, + { + "cell_type": "markdown", + "id": "82844562-0174-4e50-973d-d7c9f52861bc", + "metadata": {}, + "source": [ + "Let's create a [`ChronData` object](https://pylipd.readthedocs.io/en/latest/api.html#pylipd.classes.chrondata.ChronData):" + ] + }, + { + "cell_type": "code", + "execution_count": 72, + "id": "101120cf-1992-42b2-8a68-13e7843dcf50", + "metadata": {}, + "outputs": [], + "source": [ + "chrondata = ChronData()" + ] + }, + { + "cell_type": "markdown", + "id": "b524b922-144a-45cf-b5f3-2b4d2ef5732c", + "metadata": {}, + "source": [ + "We need to create a `DataTable`, a process similar to what we have done for the PaleoData:" + ] + }, + { + "cell_type": "code", + "execution_count": 73, + "id": "6a0cc2b4-9ce9-4fc6-9680-9b29f29574e5", + "metadata": {}, + "outputs": [], + "source": [ + "chrontable = DataTable()" + ] + }, + { + "cell_type": "markdown", + "id": "d5f57c62-3e38-4da2-bb2a-438b25755240", + "metadata": {}, + "source": [ + "Let's add some basic information about the table:" + ] + }, + { + "cell_type": "code", + "execution_count": 74, + "id": "f12d519e-be8e-4130-958b-e477616637d5", + "metadata": {}, + "outputs": [], + "source": [ + "chrontable.setFileName(\"chron0measurement0.csv\")\n", + "chrontable.setMissingValue(\"NaN\")" + ] + }, + { + "cell_type": "code", + "execution_count": 75, + "id": "2ac29a54-db86-40d9-a056-9b91cf61cae2", + "metadata": {}, + "outputs": [], + "source": [ + "dfc_notes, dfc_variables, dfc_data = extract_data(cd_df)" + ] + }, + { + "cell_type": "markdown", + "id": "203963b4-6f39-4526-a583-863505447112", + "metadata": {}, + "source": [ + "We will used the [`setDataFrame` function](https://pylipd.readthedocs.io/en/latest/api.html#pylipd.classes.datatable.DataTable.setDataFrame) to incorporate the columns into the table. In this framework, the values are held in a dataframe (which is represented by `dfc_data` in our framework) and the metadata for each variable is added as attributes to the DataFrame. Since the variables and units associated with Chronology are not yet standardized, we can leave them as strings:" + ] + }, + { + "cell_type": "code", + "execution_count": 76, + "id": "ae23656c-072b-4483-bb58-bbef568ae6db", + "metadata": {}, + "outputs": [], + "source": [ + "metadata_dict = {} # create a dictionary\n", + "\n", + "for index, row in dfc_variables.iterrows():\n", + " temp_dict = {}\n", + " temp_dict['number']=index+1\n", + " temp_dict['variableName']=row['variableName']\n", + " temp_dict['TSid']= generate_unique_id(prefix='TC')\n", + " if pd.notna(row['Units']):\n", + " temp_dict['units']=row['Units']\n", + " else:\n", + " temp_dict['units']='NA'\n", + " try:\n", + " temp_dict['hasMinValue']=dfc_data.iloc[:,index].min()\n", + " temp_dict['hasMaxValue']=dfc_data.iloc[:,index].max()\n", + " temp_dict['hasMeanValue']=dfc_data.iloc[:,index].mean()\n", + " temp_dict['hasMedianValue']=dfc_data.iloc[:,index].median()\n", + " except:\n", + " pass \n", + " metadata_dict[row['variableName']]=temp_dict\n", + "\n", + "dfc_data.attrs = metadata_dict" + ] + }, + { + "cell_type": "markdown", + "id": "c4e87aca-e4b8-49d7-9a88-f1badbd2fd8d", + "metadata": {}, + "source": [ + "Use the DataFrame to construct the `Table` object:" + ] + }, + { + "cell_type": "code", + "execution_count": 77, + "id": "26e7f9da-6377-46d2-831a-d97cfa62e004", + "metadata": {}, + "outputs": [], + "source": [ + "chrontable.setDataFrame(dfc_data)" + ] + }, + { + "cell_type": "markdown", + "id": "bdd1e274-f926-4c75-9135-cd5346077e01", + "metadata": {}, + "source": [ + "Put the `Table` into the `ChronData` object:" + ] + }, + { + "cell_type": "code", + "execution_count": 78, + "id": "9e5e5bf4-97bf-47c1-9eb3-16549c1b9d86", + "metadata": {}, + "outputs": [], + "source": [ + "chrondata.setMeasurementTables([chrontable])" + ] + }, + { + "cell_type": "markdown", + "id": "2216b9c0-f8a1-4bf5-b43e-8db620ddc076", + "metadata": {}, + "source": [ + "Put the `ChronData` into the `Dataset`:" + ] + }, + { + "cell_type": "code", + "execution_count": 79, + "id": "8ebc1298-7350-4a5f-8a5f-2e58028277c7", + "metadata": {}, + "outputs": [], + "source": [ + "ds.addChronData(chrondata)" + ] + }, + { + "cell_type": "markdown", + "id": "601cb196-e0ce-4820-8f9c-6cf5dae16d82", + "metadata": {}, + "source": [ + "#### Adding an ensemble table\n", + "\n", + "This particular dataset also has an ensemble table available in [Oman.Tian.2023.chrondf.csv](https://github.com/LinkedEarth/pylipdTutorials/raw/refs/heads/main/data/Oman.Tian.2023.chrondf.csv). Let's add the information to an ensembleTable in the LiPD file.\n", + "\n", + "Let's first open the data: " + ] + }, + { + "cell_type": "code", + "execution_count": 80, + "id": "a6d650dd-430d-4eb5-805f-c3e045118a3e", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
Unnamed: 0aged18Odepthchron_0chron_1chron_2chron_3chron_4chron_5...chron_990chron_991chron_992chron_993chron_994chron_995chron_996chron_997chron_998chron_999
00401.88-0.200.00444.0445.0417.0290.0403.0349.0...430.0407.0410.0369.0392.0347.0359.0416.0414.0408.0
11408.55-0.590.69445.0446.0445.0355.0404.0367.0...431.0410.0411.0371.0395.0358.0362.0417.0416.0414.0
22424.07-0.581.37489.0491.0482.0425.0448.0422.0...469.0468.0459.0411.0437.0397.0422.0441.0463.0435.0
33438.75-0.732.06562.0568.0525.0495.0521.0504.0...533.0565.0541.0479.0507.0455.0524.0482.0541.0466.0
44450.24-1.262.75568.0574.0536.0503.0531.0514.0...547.0575.0548.0491.0510.0461.0530.0493.0549.0480.0
\n", + "

5 rows × 1004 columns

\n", + "
" + ], + "text/plain": [ + " Unnamed: 0 age d18O depth chron_0 chron_1 chron_2 chron_3 \\\n", + "0 0 401.88 -0.20 0.00 444.0 445.0 417.0 290.0 \n", + "1 1 408.55 -0.59 0.69 445.0 446.0 445.0 355.0 \n", + "2 2 424.07 -0.58 1.37 489.0 491.0 482.0 425.0 \n", + "3 3 438.75 -0.73 2.06 562.0 568.0 525.0 495.0 \n", + "4 4 450.24 -1.26 2.75 568.0 574.0 536.0 503.0 \n", + "\n", + " chron_4 chron_5 ... chron_990 chron_991 chron_992 chron_993 \\\n", + "0 403.0 349.0 ... 430.0 407.0 410.0 369.0 \n", + "1 404.0 367.0 ... 431.0 410.0 411.0 371.0 \n", + "2 448.0 422.0 ... 469.0 468.0 459.0 411.0 \n", + "3 521.0 504.0 ... 533.0 565.0 541.0 479.0 \n", + "4 531.0 514.0 ... 547.0 575.0 548.0 491.0 \n", + "\n", + " chron_994 chron_995 chron_996 chron_997 chron_998 chron_999 \n", + "0 392.0 347.0 359.0 416.0 414.0 408.0 \n", + "1 395.0 358.0 362.0 417.0 416.0 414.0 \n", + "2 437.0 397.0 422.0 441.0 463.0 435.0 \n", + "3 507.0 455.0 524.0 482.0 541.0 466.0 \n", + "4 510.0 461.0 530.0 493.0 549.0 480.0 \n", + "\n", + "[5 rows x 1004 columns]" + ] + }, + "execution_count": 80, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "ens_path = \"../data/Oman.Tian.2023.chrondf.csv\"\n", + "\n", + "df_ens = pd.read_csv(ens_path)\n", + "df_ens.head()" + ] + }, + { + "cell_type": "markdown", + "id": "dcbc283b-9209-487b-a5eb-ca340f6470b5", + "metadata": {}, + "source": [ + "
\n", + " ⚠ Warning: LiPD files require ensemble tables to have the following format: the first column contains depth and the other column the ensemble members as a list. \n", + "
\n", + "\n", + "Our first step is to drop the first 3 columns in the DataFrame:" + ] + }, + { + "cell_type": "code", + "execution_count": 81, + "id": "67aeee84-51d3-403d-b754-cd6e4da4a0ec", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
depthchron_0chron_1chron_2chron_3chron_4chron_5chron_6chron_7chron_8...chron_990chron_991chron_992chron_993chron_994chron_995chron_996chron_997chron_998chron_999
00.00444.0445.0417.0290.0403.0349.0387.0412.0341.0...430.0407.0410.0369.0392.0347.0359.0416.0414.0408.0
10.69445.0446.0445.0355.0404.0367.0405.0413.0369.0...431.0410.0411.0371.0395.0358.0362.0417.0416.0414.0
21.37489.0491.0482.0425.0448.0422.0447.0442.0441.0...469.0468.0459.0411.0437.0397.0422.0441.0463.0435.0
32.06562.0568.0525.0495.0521.0504.0506.0492.0543.0...533.0565.0541.0479.0507.0455.0524.0482.0541.0466.0
42.75568.0574.0536.0503.0531.0514.0511.0501.0555.0...547.0575.0548.0491.0510.0461.0530.0493.0549.0480.0
\n", + "

5 rows × 1001 columns

\n", + "
" + ], + "text/plain": [ + " depth chron_0 chron_1 chron_2 chron_3 chron_4 chron_5 chron_6 \\\n", + "0 0.00 444.0 445.0 417.0 290.0 403.0 349.0 387.0 \n", + "1 0.69 445.0 446.0 445.0 355.0 404.0 367.0 405.0 \n", + "2 1.37 489.0 491.0 482.0 425.0 448.0 422.0 447.0 \n", + "3 2.06 562.0 568.0 525.0 495.0 521.0 504.0 506.0 \n", + "4 2.75 568.0 574.0 536.0 503.0 531.0 514.0 511.0 \n", + "\n", + " chron_7 chron_8 ... chron_990 chron_991 chron_992 chron_993 \\\n", + "0 412.0 341.0 ... 430.0 407.0 410.0 369.0 \n", + "1 413.0 369.0 ... 431.0 410.0 411.0 371.0 \n", + "2 442.0 441.0 ... 469.0 468.0 459.0 411.0 \n", + "3 492.0 543.0 ... 533.0 565.0 541.0 479.0 \n", + "4 501.0 555.0 ... 547.0 575.0 548.0 491.0 \n", + "\n", + " chron_994 chron_995 chron_996 chron_997 chron_998 chron_999 \n", + "0 392.0 347.0 359.0 416.0 414.0 408.0 \n", + "1 395.0 358.0 362.0 417.0 416.0 414.0 \n", + "2 437.0 397.0 422.0 441.0 463.0 435.0 \n", + "3 507.0 455.0 524.0 482.0 541.0 466.0 \n", + "4 510.0 461.0 530.0 493.0 549.0 480.0 \n", + "\n", + "[5 rows x 1001 columns]" + ] + }, + "execution_count": 81, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df_ens = df_ens.iloc[:, 3:]\n", + "df_ens.head()" + ] + }, + { + "cell_type": "markdown", + "id": "3c1e9ed8-229f-49ad-9b73-afd0127e16f9", + "metadata": {}, + "source": [ + "Next, let's create the proper dataframe format. The first column will stay the same. The second column will contain each values on the ensemble in a list: " + ] + }, + { + "cell_type": "code", + "execution_count": 82, + "id": "fe7fe239-e045-4194-8570-2424c6b30704", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
depthyear
00.00[444.0, 445.0, 417.0, 290.0, 403.0, 349.0, 387...
10.69[445.0, 446.0, 445.0, 355.0, 404.0, 367.0, 405...
21.37[489.0, 491.0, 482.0, 425.0, 448.0, 422.0, 447...
32.06[562.0, 568.0, 525.0, 495.0, 521.0, 504.0, 506...
42.75[568.0, 574.0, 536.0, 503.0, 531.0, 514.0, 511...
\n", + "
" + ], + "text/plain": [ + " depth year\n", + "0 0.00 [444.0, 445.0, 417.0, 290.0, 403.0, 349.0, 387...\n", + "1 0.69 [445.0, 446.0, 445.0, 355.0, 404.0, 367.0, 405...\n", + "2 1.37 [489.0, 491.0, 482.0, 425.0, 448.0, 422.0, 447...\n", + "3 2.06 [562.0, 568.0, 525.0, 495.0, 521.0, 504.0, 506...\n", + "4 2.75 [568.0, 574.0, 536.0, 503.0, 531.0, 514.0, 511..." + ] + }, + "execution_count": 82, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "#Let's keep the first column (depth) in place\n", + "ens_table = pd.DataFrame({'depth': df_ens['depth'].tolist()})\n", + "\n", + "# Add the year data - each row will contain one vector from a data array. \n", + "array = df_ens.iloc[:, 1:].to_numpy()\n", + "ens_table['year'] = [array[i,:].tolist() for i in range(array.shape[0])]\n", + "ens_table.head()" + ] + }, + { + "cell_type": "markdown", + "id": "57c92b55-3097-4a01-a041-9db76b57a99e", + "metadata": {}, + "source": [ + "Add attributes to the Pandas Dataframe to store the metadata. \n", + "\n", + "
\n", + " ⚠ Warning: Metadata attributes are necessary to save a LiPD file. \n", + "
" + ] + }, + { + "cell_type": "code", + "execution_count": 83, + "id": "07f6372b-aa23-4eb0-9877-e63cd5f3fa48", + "metadata": {}, + "outputs": [], + "source": [ + "num_year_columns = len(array[0,:])\n", + "year_columns = [i+2 for i in range(num_year_columns)]\n", + "ens_table.attrs = {\n", + " 'year': {'number': str(year_columns), 'variableName': 'year', 'units': 'yr AD', 'TSid':generate_unique_id()},\n", + " 'depth': {'number': 1, 'variableName': 'depth', 'units': 'cm', 'TSid':generate_unique_id()}\n", + "}" + ] + }, + { + "cell_type": "markdown", + "id": "9f268220-813e-4509-929d-cc14cc6302b4", + "metadata": {}, + "source": [ + "Let's create a `DataTable` object for our ensemble table:" + ] + }, + { + "cell_type": "code", + "execution_count": 84, + "id": "1f0ca108-352d-4df7-92e4-0ae3bf657fee", + "metadata": {}, + "outputs": [], + "source": [ + "ensemble_table = DataTable()" + ] + }, + { + "cell_type": "code", + "execution_count": 85, + "id": "33a5beeb-f74f-407f-8c49-6dff5487387c", + "metadata": {}, + "outputs": [], + "source": [ + "ensemble_table.setDataFrame(ens_table)\n", + "ensemble_table.setFileName(\"chron0model0ensemble0.csv\")" + ] + }, + { + "cell_type": "markdown", + "id": "b4c37674-f6d3-4f23-be10-832c5f33c6b6", + "metadata": {}, + "source": [ + "Now add the table to a model:" + ] + }, + { + "cell_type": "code", + "execution_count": 86, + "id": "eee2c51f-720e-43f1-926c-f342e0fdb230", + "metadata": {}, + "outputs": [], + "source": [ + "model = Model()\n", + "model.addEnsembleTable(ensemble_table)" + ] + }, + { + "cell_type": "markdown", + "id": "c8dc02f6-06e4-41a3-8b53-9ad55e720c51", + "metadata": {}, + "source": [ + "And add the Model to a ChronData object:" + ] + }, + { + "cell_type": "code", + "execution_count": 87, + "id": "3936363d-02d6-4c0e-9557-78e68e84aaae", + "metadata": {}, + "outputs": [], + "source": [ + "chrondata.addModeledBy(model)" + ] + }, + { + "cell_type": "markdown", + "id": "703a9f1f-8a47-48c7-a934-2541d3a331d3", + "metadata": {}, + "source": [ + "### Writing a LiPD file" + ] + }, + { + "cell_type": "markdown", + "id": "16acf78b-2a68-4edb-b707-4acb24c3a19e", + "metadata": {}, + "source": [ + "The last step in this process is to write to a LiPD file. To do so, you need to pass the Dataset `ds` back into a LiPD object:" + ] + }, + { + "cell_type": "code", + "execution_count": 88, + "id": "bd560c4c-8a71-4e52-8f6a-5f3b50ae580a", + "metadata": {}, + "outputs": [], + "source": [ + "lipd = LiPD()\n", + "lipd.load_datasets([ds])\n", + "lipd.create_lipd(ds.getName(), \"../data/Oman.Tian.2023.lpd\");" + ] + }, + { + "cell_type": "markdown", + "id": "205736d2-d722-46fd-b9a4-3d7bcb3f6bbb", + "metadata": {}, + "source": [ + "### Opening the LiPD file\n", + "\n", + "Let's re-open the LiPD file that we have just created and check some of our work." + ] + }, + { + "cell_type": "code", + "execution_count": 89, + "id": "9093e13c-f14e-4f41-9147-49137a7442da", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Loading 1 LiPD files\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "100%|█████████████████████████████████████████████| 1/1 [00:00<00:00, 2.93it/s]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Loaded..\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "\n" + ] + } + ], + "source": [ + "L = LiPD()\n", + "file = \"../data/Oman.Tian.2023.lpd\"\n", + "\n", + "L.load(file)" + ] + }, + { + "cell_type": "markdown", + "id": "e53b08f4-1f40-4e26-addc-3c33f81995a6", + "metadata": {}, + "source": [ + "Let's get the name of the dataset: " + ] + }, + { + "cell_type": "code", + "execution_count": 90, + "id": "8620e097-2c8d-4112-bb01-99649de6593c", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "['Oman.Tian.2023']" + ] + }, + "execution_count": 90, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "L.get_all_dataset_names()" + ] + }, + { + "cell_type": "code", + "execution_count": 91, + "id": "e253180a-3e3c-4d1c-afe4-f18f947df3e6", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
dataSetNamearchiveTypegeo_meanLatgeo_meanLongeo_meanElevpaleoData_variableNamepaleoData_valuespaleoData_unitspaleoData_proxypaleoData_proxyGeneraltime_variableNametime_valuestime_unitsdepth_variableNamedepth_valuesdepth_units
0Oman.Tian.2023Speleothem17.1754.3650.0d13C[-2.71, -4.12, -3.45, -3.11, -3.61, -4.86, -4....permilNoneNoneage[401.88, 408.55, 424.07, 438.75, 450.24, 461.3...yr BPdepth[0.0, 0.69, 1.37, 2.06, 2.75, 3.43, 4.12, 4.81...mm
1Oman.Tian.2023Speleothem17.1754.3650.0d18O[-0.2, -0.59, -0.58, -0.73, -1.26, -1.19, -0.6...permilNoneNoneage[401.88, 408.55, 424.07, 438.75, 450.24, 461.3...yr BPdepth[0.0, 0.69, 1.37, 2.06, 2.75, 3.43, 4.12, 4.81...mm
2Oman.Tian.2023Speleothem17.1754.3650.0d13C[-2.71, -4.12, -3.45, -3.11, -3.61, -4.86, -4....permilNoneNoneage[401.88, 408.55, 424.07, 438.75, 450.24, 461.3...yr BPdepth[0.0, 0.69, 1.37, 2.06, 2.75, 3.43, 4.12, 4.81...mm
3Oman.Tian.2023Speleothem17.1754.3650.0d18O[-0.2, -0.59, -0.58, -0.73, -1.26, -1.19, -0.6...permilNoneNoneage[401.88, 408.55, 424.07, 438.75, 450.24, 461.3...yr BPdepth[0.0, 0.69, 1.37, 2.06, 2.75, 3.43, 4.12, 4.81...mm
4Oman.Tian.2023Speleothem17.1754.3650.0d13C[-2.71, -4.12, -3.45, -3.11, -3.61, -4.86, -4....permilNoneNoneage[401.88, 408.55, 424.07, 438.75, 450.24, 461.3...yr BPdepth[0.0, 0.69, 1.37, 2.06, 2.75, 3.43, 4.12, 4.81...mm
\n", + "
" + ], + "text/plain": [ + " dataSetName archiveType geo_meanLat geo_meanLon geo_meanElev \\\n", + "0 Oman.Tian.2023 Speleothem 17.17 54.3 650.0 \n", + "1 Oman.Tian.2023 Speleothem 17.17 54.3 650.0 \n", + "2 Oman.Tian.2023 Speleothem 17.17 54.3 650.0 \n", + "3 Oman.Tian.2023 Speleothem 17.17 54.3 650.0 \n", + "4 Oman.Tian.2023 Speleothem 17.17 54.3 650.0 \n", + "\n", + " paleoData_variableName paleoData_values \\\n", + "0 d13C [-2.71, -4.12, -3.45, -3.11, -3.61, -4.86, -4.... \n", + "1 d18O [-0.2, -0.59, -0.58, -0.73, -1.26, -1.19, -0.6... \n", + "2 d13C [-2.71, -4.12, -3.45, -3.11, -3.61, -4.86, -4.... \n", + "3 d18O [-0.2, -0.59, -0.58, -0.73, -1.26, -1.19, -0.6... \n", + "4 d13C [-2.71, -4.12, -3.45, -3.11, -3.61, -4.86, -4.... \n", + "\n", + " paleoData_units paleoData_proxy paleoData_proxyGeneral time_variableName \\\n", + "0 permil None None age \n", + "1 permil None None age \n", + "2 permil None None age \n", + "3 permil None None age \n", + "4 permil None None age \n", + "\n", + " time_values time_units \\\n", + "0 [401.88, 408.55, 424.07, 438.75, 450.24, 461.3... yr BP \n", + "1 [401.88, 408.55, 424.07, 438.75, 450.24, 461.3... yr BP \n", + "2 [401.88, 408.55, 424.07, 438.75, 450.24, 461.3... yr BP \n", + "3 [401.88, 408.55, 424.07, 438.75, 450.24, 461.3... yr BP \n", + "4 [401.88, 408.55, 424.07, 438.75, 450.24, 461.3... yr BP \n", + "\n", + " depth_variableName depth_values \\\n", + "0 depth [0.0, 0.69, 1.37, 2.06, 2.75, 3.43, 4.12, 4.81... \n", + "1 depth [0.0, 0.69, 1.37, 2.06, 2.75, 3.43, 4.12, 4.81... \n", + "2 depth [0.0, 0.69, 1.37, 2.06, 2.75, 3.43, 4.12, 4.81... \n", + "3 depth [0.0, 0.69, 1.37, 2.06, 2.75, 3.43, 4.12, 4.81... \n", + "4 depth [0.0, 0.69, 1.37, 2.06, 2.75, 3.43, 4.12, 4.81... \n", + "\n", + " depth_units \n", + "0 mm \n", + "1 mm \n", + "2 mm \n", + "3 mm \n", + "4 mm " + ] + }, + "execution_count": 91, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df_essential = L.get_timeseries_essentials()\n", + "\n", + "df_essential.head()" + ] + }, + { + "cell_type": "markdown", + "id": "f63a64c2-a505-4d1f-afd4-ed1d7d0a3093", + "metadata": {}, + "source": [ + "And voila! The LiPD file is ready to be used." + ] } ], "metadata": {