In [1]:
from pytups import Dataset
import pandas as pd

## From a study ID

This way assumes that one knows the NOAA study ID for the dataset they want to open.

In [2]:
ds=Dataset()

Let's do a simple search, knowing the NOAA study ID. For this example, let's use the dataset from [Clemens et al. (2021)](https://www.science.org/doi/10.1126/sciadv.abg3848), which can be accessed through [NOAA Paleo portal](https://www.ncei.noaa.gov/access/paleo-search/study/33213).

In [3]:
ds.search_studies(noaa_id=33213)

Let's have a look:

In [4]:
ds.get_summary()

Unnamed: 0,StudyID,XMLID,StudyName,DataType,EarliestYearBP,MostRecentYearBP,EarliestYearCE,MostRecentYearCE,StudyNotes,ScienceKeywords,Investigators,Publications,Sites
0,33213,74834,"Bay of Bengal, Northeast Indian Margin Stable ...",PALEOCEANOGRAPHY,1462580,280,-1460630,1670,"Provided Keywords: Indian monsoon, South Asian...",,"Kaustubh Thirumalai, Liviu Giosan, Julie Riche...","[{'Author': 'Clemens, Steven; Yamamoto, Masano...","[[{'DataTableID': '45857', 'DataTableName': 'U..."


Let's have a look at the publications:

In [5]:
ds.get_publications()

Unnamed: 0,Author,Title,Journal,Year,Volume,Number,Pages,Type,DOI,URL,CitationKey,StudyID,StudyName
0,"Clemens, Steven; Yamamoto, Masanobu; Thirumala...",Remote and Local Drivers of Pleistocene South ...,Science Advances,2021,7,23,,publication,10.1126/sciadv.abg3848,http://dx.doi.org/10.1126/sciadv.abg3848,M._Remote_2021_33213,33213,"Bay of Bengal, Northeast Indian Margin Stable ..."


A nifty functionality is to be able to transform this table to bibtex for attribution:

In [6]:
### TBD

The next step is to return all the data tables associated with the study. 

In [7]:
ds.get_sites()

Unnamed: 0,DataTableID,DataTableName,TimeUnit,FileURL,Variables,FileDescription,TotalFilesAvailable,SiteID,SiteName,LocationName,Latitude,Longitude,MinElevation,MaxElevation,StudyID,StudyName
0,45857,U1446 Benthic Isotopes Clemens2021,cal yr BP,https://www.ncei.noaa.gov/pub/data/paleo/contr...,"[Site, Hole, Type, Section, Core, Section_Dept...",NOAA Template File,1,58697,IODP U1446,Ocean>Indian Ocean,19.083,85.733,-1440,-1440,33213,"Bay of Bengal, Northeast Indian Margin Stable ..."
1,45858,U1446 Planktic Isotopes Clemens2021,cal yr BP,https://www.ncei.noaa.gov/pub/data/paleo/contr...,"[Site, Hole, Type, Section, Comment, Core, Sec...",NOAA Template File,1,58697,IODP U1446,Ocean>Indian Ocean,19.083,85.733,-1440,-1440,33213,"Bay of Bengal, Northeast Indian Margin Stable ..."
2,45859,U1446 TEX86H_SST Clemens2021,cal yr BP,https://www.ncei.noaa.gov/pub/data/paleo/contr...,"[Site, Hole, Type, Section, Core, Section_Dept...",NOAA Template File,1,58697,IODP U1446,Ocean>Indian Ocean,19.083,85.733,-1440,-1440,33213,"Bay of Bengal, Northeast Indian Margin Stable ..."
3,45860,U1446 d18Osw Clemens2021,cal yr BP,https://www.ncei.noaa.gov/pub/data/paleo/contr...,"[d18Osw_out_Mg/Ca, Age, SL_Scaled, SL_Scaled_a...",NOAA Template File,1,58697,IODP U1446,Ocean>Indian Ocean,19.083,85.733,-1440,-1440,33213,"Bay of Bengal, Northeast Indian Margin Stable ..."
4,45861,U1446 LeafWax CarbonIsotope Clemens2021,cal yr BP,https://www.ncei.noaa.gov/pub/data/paleo/contr...,"[Site, Hole, Type, Section, Core, Section_Dept...",NOAA Template File,1,58697,IODP U1446,Ocean>Indian Ocean,19.083,85.733,-1440,-1440,33213,"Bay of Bengal, Northeast Indian Margin Stable ..."
5,45862,U1446 Mg/Ca Clemens2021,cal yr BP,https://www.ncei.noaa.gov/pub/data/paleo/contr...,"[Site, Hole, Type, Analytical_Facility, Core, ...",NOAA Template File,1,58697,IODP U1446,Ocean>Indian Ocean,19.083,85.733,-1440,-1440,33213,"Bay of Bengal, Northeast Indian Margin Stable ..."
6,45863,U1446 Rb/Ca Clemens2021,cal yr BP,https://www.ncei.noaa.gov/pub/data/paleo/contr...,"[Site, Hole, Type, Section, Core, Section_Dept...",NOAA Template File,1,58697,IODP U1446,Ocean>Indian Ocean,19.083,85.733,-1440,-1440,33213,"Bay of Bengal, Northeast Indian Margin Stable ..."
7,45864,U1446 Age Model Clemens2021,cal yr BP,https://www.ncei.noaa.gov/pub/data/paleo/contr...,"[Site, Comments, Sample_Depth, Age]",NOAA Template File,1,58697,IODP U1446,Ocean>Indian Ocean,19.083,85.733,-1440,-1440,33213,"Bay of Bengal, Northeast Indian Margin Stable ..."


Let's get the data. This can be achieved by either passing the `DataTableID` or `FileURL` to the `get_data()` method:

In [8]:
dfs = ds.get_data(dataTableIDs="45859")

Note: You can pass multiple tables IDs as a list. The function returns a list of DataFrame. Let's have a look at our data:

In [9]:
df = dfs[0]
df.head()

Unnamed: 0,Site,Hole,Core,Type,Section,Section_Depth,Sample_Depth,Age,TEX86H,SST
0,U1446,C,1,H,1,4.5,0.045,0.31,-0.1177,30.55
1,U1446,C,1,H,1,31.5,0.315,0.66,-0.1216,30.28
2,U1446,C,1,H,1,61.5,0.615,1.04,-0.1183,30.51
3,U1446,C,1,H,1,90.5,0.905,1.41,-0.1089,31.15
4,U1446,C,1,H,1,121.5,1.215,1.8,-0.1155,30.7


The relevant metadata for each column is stored in the DataFrame attributes:

In [10]:
df.attrs

{'variables': ['Site',
  'Hole',
  'Core',
  'Type',
  'Section',
  'Section_Depth',
  'Sample_Depth',
  'Age',
  'TEX86H',
  'SST'],
 'NOAAStudyId': '33213',
 'StudyName': 'Bay of Bengal, Northeast Indian Margin Stable Isotope, Biomarker and SST Reconstructions since the Mid-Pleistocene'}

You can also pass the file URL:

In [11]:
df2 = ds.get_data(file_urls="https://www.ncei.noaa.gov/pub/data/paleo/contributions_by_author/clemens2021/clemens2021-u1446-mgca-noaa.txt")[0]

In [12]:
df2.head()

Unnamed: 0,Site,Hole,Core,Type,Section,Section_Depth,Sample_Depth,Age,Mg/Ca,Analytical_Facility,Mg/Ca_SST
0,U1446,C,1,H,1,5,0.05,0.32,4.63,Rosenthal (Rutgers),28.41
1,U1446,C,1,H,1,32,0.32,0.66,4.62,Rosenthal (Rutgers),28.46
2,U1446,C,1,H,1,62,0.62,1.05,4.72,Rosenthal (Rutgers),28.43
3,U1446,C,1,H,1,91,0.91,1.42,4.58,Rosenthal (Rutgers),28.63
4,U1446,C,1,H,1,122,1.22,1.81,4.7,Rosenthal (Rutgers),28.61


## Query from NOAA keywords

Let's start with a simple geographical query. Let's look for all the datasets within 

In [13]:
ds2 = Dataset()

Querying terms and their definitions are available through NOAA: https://www.ncei.noaa.gov/access/paleo-search/api

In [14]:
ds2.search_studies(max_lat=5, min_lat=-5, max_lon=109,
                       min_lon=125)

AttributeError: 'NoneType' object has no attribute 'get'