<div style="text-align: center">
<img src="https://github.com/LinkedEarth/Logos/blob/master/PyleoTUPS/pyleotups_logo.png?raw=true" alt="PyleoTUPS logo" width="400">
</div>

# Querying data from the NOAA database

## Authors

Deborah Khider
<a href="https://orcid.org/0000-0001-7501-8430" target="_blank" rel="noopener noreferrer">
  <img src="https://orcid.org/sites/default/files/images/orcid_16x16.png" alt="ORCID iD" style="vertical-align: text-bottom;"/>
</a>

## Preamble

### Goals

### Pre-requisites

### Reading time

Let's import our packages!

In [2]:
from pyleotups import Dataset
import pandas as pd

## From a study ID

This way assumes that one knows the NOAA study ID for the dataset they want to open.

In [3]:
ds=Dataset()

Let's do a simple search, knowing the NOAA study ID. For this example, let's use the dataset from [Clemens et al. (2021)](https://www.science.org/doi/10.1126/sciadv.abg3848), which can be accessed through [NOAA Paleo portal](https://www.ncei.noaa.gov/access/paleo-search/study/33213).

In [4]:
ds.search_studies(noaa_id=33213)

Parsing NOAA studies: 100%|█████████████████████| 1/1 [00:00<00:00, 2770.35it/s]


Unnamed: 0,StudyID,XMLID,StudyName,DataType,EarliestYearBP,MostRecentYearBP,EarliestYearCE,MostRecentYearCE,StudyNotes,ScienceKeywords,Investigators,Publications,Sites,Funding
0,33213,74834,"Bay of Bengal, Northeast Indian Margin Stable ...",PALEOCEANOGRAPHY,1462580,280,-1460630,1670,"Provided Keywords: Indian monsoon, South Asian...",,"Kaustubh Thirumalai, Liviu Giosan, Julie Riche...","[{'Author': 'Clemens, Steven; Yamamoto, Masano...","[[{'DataTableID': '45857', 'DataTableName': 'U...",[{'fundingAgency': 'US National Science Founda...


Let's have a look:

In [5]:
ds.get_summary()

Unnamed: 0,StudyID,XMLID,StudyName,DataType,EarliestYearBP,MostRecentYearBP,EarliestYearCE,MostRecentYearCE,StudyNotes,ScienceKeywords,Investigators,Publications,Sites,Funding
0,33213,74834,"Bay of Bengal, Northeast Indian Margin Stable ...",PALEOCEANOGRAPHY,1462580,280,-1460630,1670,"Provided Keywords: Indian monsoon, South Asian...",,"Kaustubh Thirumalai, Liviu Giosan, Julie Riche...","[{'Author': 'Clemens, Steven; Yamamoto, Masano...","[[{'DataTableID': '45857', 'DataTableName': 'U...",[{'fundingAgency': 'US National Science Founda...


Let's have a look at the publications:

In [8]:
bib, df = ds.get_publications()
df

Unnamed: 0,Author,Title,Journal,Year,Volume,Number,Pages,Type,DOI,URL,CitationKey,StudyID,StudyName
0,"Clemens, Steven; Yamamoto, Masanobu; Thirumala...",Remote and Local Drivers of Pleistocene South ...,Science Advances,2021,7,23,,publication,10.1126/sciadv.abg3848,http://dx.doi.org/10.1126/sciadv.abg3848,M._Remote_2021_33213,33213,"Bay of Bengal, Northeast Indian Margin Stable ..."


A nifty functionality is to be able to transform this table to bibtex for attribution:

In [6]:
### TBD

The next step is to return all the data tables associated with the study. 

In [9]:
ds.get_tables()

Unnamed: 0,DataTableID,DataTableName,TimeUnit,FileURL,Variables,FileDescription,TotalFilesAvailable,SiteID,SiteName,LocationName,Latitude,Longitude,MinElevation,MaxElevation,StudyID,StudyName
0,45857,U1446 Benthic Isotopes Clemens2021,cal yr BP,https://www.ncei.noaa.gov/pub/data/paleo/contr...,"[Site, Hole, Type, Section, Core, Section_Dept...",NOAA Template File,1,58697,IODP U1446,Ocean>Indian Ocean,19.083,85.733,-1440,-1440,33213,"Bay of Bengal, Northeast Indian Margin Stable ..."
1,45858,U1446 Planktic Isotopes Clemens2021,cal yr BP,https://www.ncei.noaa.gov/pub/data/paleo/contr...,"[Site, Hole, Type, Section, Comment, Core, Sec...",NOAA Template File,1,58697,IODP U1446,Ocean>Indian Ocean,19.083,85.733,-1440,-1440,33213,"Bay of Bengal, Northeast Indian Margin Stable ..."
2,45859,U1446 TEX86H_SST Clemens2021,cal yr BP,https://www.ncei.noaa.gov/pub/data/paleo/contr...,"[Site, Hole, Type, Section, Core, Section_Dept...",NOAA Template File,1,58697,IODP U1446,Ocean>Indian Ocean,19.083,85.733,-1440,-1440,33213,"Bay of Bengal, Northeast Indian Margin Stable ..."
3,45860,U1446 d18Osw Clemens2021,cal yr BP,https://www.ncei.noaa.gov/pub/data/paleo/contr...,"[Age, SL_Scaled, SL_Scaled_averaged, d18O_G_ru...",NOAA Template File,1,58697,IODP U1446,Ocean>Indian Ocean,19.083,85.733,-1440,-1440,33213,"Bay of Bengal, Northeast Indian Margin Stable ..."
4,45861,U1446 LeafWax CarbonIsotope Clemens2021,cal yr BP,https://www.ncei.noaa.gov/pub/data/paleo/contr...,"[d13C_C28, d13C_C30, d13C_C32, d13C_Ave, Site,...",NOAA Template File,1,58697,IODP U1446,Ocean>Indian Ocean,19.083,85.733,-1440,-1440,33213,"Bay of Bengal, Northeast Indian Margin Stable ..."
5,45862,U1446 Mg/Ca Clemens2021,cal yr BP,https://www.ncei.noaa.gov/pub/data/paleo/contr...,"[Site, Hole, Type, Analytical_Facility, Core, ...",NOAA Template File,1,58697,IODP U1446,Ocean>Indian Ocean,19.083,85.733,-1440,-1440,33213,"Bay of Bengal, Northeast Indian Margin Stable ..."
6,45863,U1446 Rb/Ca Clemens2021,cal yr BP,https://www.ncei.noaa.gov/pub/data/paleo/contr...,"[Site, Hole, Type, Section, Core, Section_Dept...",NOAA Template File,1,58697,IODP U1446,Ocean>Indian Ocean,19.083,85.733,-1440,-1440,33213,"Bay of Bengal, Northeast Indian Margin Stable ..."
7,45864,U1446 Age Model Clemens2021,cal yr BP,https://www.ncei.noaa.gov/pub/data/paleo/contr...,"[Site, Comments, Sample_Depth, Age]",NOAA Template File,1,58697,IODP U1446,Ocean>Indian Ocean,19.083,85.733,-1440,-1440,33213,"Bay of Bengal, Northeast Indian Margin Stable ..."


Let's get the data. This can be achieved by either passing the `DataTableID` or `FileURL` to the `get_data()` method:

In [10]:
dfs = ds.get_data(dataTableIDs="45859")

Note: You can pass multiple tables IDs as a list. The function returns a list of DataFrame. Let's have a look at our data:

In [11]:
df = dfs[0]
df.head()

Unnamed: 0,Site,Hole,Core,Type,Section,Section_Depth,Sample_Depth,Age,TEX86H,SST
0,U1446,C,1,H,1,4.5,0.045,0.31,-0.1177,30.55
1,U1446,C,1,H,1,31.5,0.315,0.66,-0.1216,30.28
2,U1446,C,1,H,1,61.5,0.615,1.04,-0.1183,30.51
3,U1446,C,1,H,1,90.5,0.905,1.41,-0.1089,31.15
4,U1446,C,1,H,1,121.5,1.215,1.8,-0.1155,30.7


The relevant metadata for each column is stored in the DataFrame attributes:

In [12]:
df.attrs

{'variables': ['Site',
  'Hole',
  'Core',
  'Type',
  'Section',
  'Section_Depth',
  'Sample_Depth',
  'Age',
  'TEX86H',
  'SST'],
 'NOAAStudyId': '33213',
 'StudyName': 'Bay of Bengal, Northeast Indian Margin Stable Isotope, Biomarker and SST Reconstructions since the Mid-Pleistocene'}

You can also pass the file URL:

In [13]:
df2 = ds.get_data(file_urls="https://www.ncei.noaa.gov/pub/data/paleo/contributions_by_author/clemens2021/clemens2021-u1446-mgca-noaa.txt")[0]

In [14]:
df2.head()

Unnamed: 0,Site,Hole,Core,Type,Section,Section_Depth,Sample_Depth,Age,Mg/Ca,Analytical_Facility,Mg/Ca_SST
0,U1446,C,1,H,1,5,0.05,0.32,4.63,Rosenthal (Rutgers),28.41
1,U1446,C,1,H,1,32,0.32,0.66,4.62,Rosenthal (Rutgers),28.46
2,U1446,C,1,H,1,62,0.62,1.05,4.72,Rosenthal (Rutgers),28.43
3,U1446,C,1,H,1,91,0.91,1.42,4.58,Rosenthal (Rutgers),28.63
4,U1446,C,1,H,1,122,1.22,1.81,4.7,Rosenthal (Rutgers),28.61


## Query from NOAA keywords

Let's start with a simple geographical query. Let's look for all the datasets within 

In [15]:
ds2 = Dataset()

Querying terms and their definitions are available through NOAA: https://www.ncei.noaa.gov/access/paleo-search/api

In [16]:
ds2.search_studies(max_lat=5, min_lat=-5, max_lon=109,
                       min_lon=125)

Parsing NOAA studies: 100%|████████████████| 100/100 [00:00<00:00, 11974.49it/s]


Unnamed: 0,StudyID,XMLID,StudyName,DataType,EarliestYearBP,MostRecentYearBP,EarliestYearCE,MostRecentYearCE,StudyNotes,ScienceKeywords,Investigators,Publications,Sites,Funding
0,11194,9632,"1,100 Year El Niño/Southern Oscillation (ENSO)...",CLIMATE RECONSTRUCTIONS,1050.0,-52.0,900.0,2002.0,An index of canonical ENSO variability for the...,[Atmospheric and Oceanic Circulation Patterns ...,"Jinbao Li, Shang-Ping Xie, Edward Cook, Rosann...","[{'Author': 'Li, J., S.-P. Xie, E.R. Cook, G. ...","[[{'DataTableID': '19791', 'DataTableName': 'e...",[{'fundingAgency': 'US National Science Founda...
1,22031,20009,1200 Year Atlantic Multidecadal Variability an...,CLIMATE RECONSTRUCTIONS,1150.0,-60.0,800.0,2010.0,Summer (May-September) Atlantic Multidecadal V...,[Atmospheric and Oceanic Circulation Patterns ...,"Jianglin Wang, Bao Yang, Fredrik Ljungqvist, J...","[{'Author': 'Jianglin Wang, Bao Yang, Fredrik ...","[[{'DataTableID': '33108', 'DataTableName': 'W...",[{'fundingAgency': 'National Natural Science F...
2,39047,80187,1500 Year Sedimentological and Geochemical Dat...,PALEOLIMNOLOGY,1200.0,-50.0,750.0,2000.0,Elevations of lakes: Tota = 3015m; Siscunsi = ...,[Precipitation Reconstruction],"Broxton Bird, Byron Steinman, Jaime Escobar, A...","[{'Author': 'Bird, B.W., B.A. Steinman, J. Esc...","[[{'DataTableID': '51864', 'DataTableName': 'M...","[{'fundingAgency': 'Indiana University', 'fund..."
3,2614,1685,350 KYr Sea Level Reconstruction and Foraminif...,PALEOCEANOGRAPHY,361500.0,1000.0,-359550.0,950.0,,[Sea Level Reconstruction],"David Lea, Pamela Martin, Dorothy Pak, Howard ...","[{'Author': 'Lea, D.W., P.A. Martin, D.K. Pak,...","[[{'DataTableID': '4301', 'DataTableName': 'TR...",[]
4,14632,12613,700 Year El Niño/Southern Oscillation (ENSO) N...,CLIMATE RECONSTRUCTIONS,649.0,-55.0,1301.0,2005.0,An index of canonical El Niño/Southern Oscilla...,[Atmospheric and Oceanic Circulation Patterns ...,"Jinbao Li, Shang-Ping Xie, Edward Cook, Marian...","[{'Author': 'Li, J., S.-P. Xie, E.R. Cook, G. ...","[[{'DataTableID': '24679', 'DataTableName': 'L...",[{'fundingAgency': 'US National Science Founda...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,25611,23814,Eastern Equatorial Pacific d18O and d13C Data ...,PALEOCEANOGRAPHY,24734.0,0.0,-22784.0,1950.0,,[Last Glacial Maximum],"Heather Ford, Celia McChesney, Jennifer Hertzb...","[{'Author': 'Ford, H.L., C.L. McChesney, J.E. ...","[[{'DataTableID': '37453', 'DataTableName': 'O...",[{'fundingAgency': 'US National Science Founda...
96,30152,71852,Eastern Pacific Ocean TEX86 and Mg/Ca SST Reco...,CLIMATE RECONSTRUCTIONS,24734.0,1239.0,-22784.0,711.0,Metadata are from the Temperature-12k project ...,,"Jennifer Hertzberg, Matthew Schmidt, Thomas Bi...","[{'Author': 'Kaufman, D., McKay, N., Routson, ...","[[{'DataTableID': '42637', 'DataTableName': 'M...",[]
97,2624,1695,Eastern Pacific Pleistocene Alkenone Data and ...,PALEOCEANOGRAPHY,1833700.0,4472.0,-1831750.0,-2522.0,,[Sea Surface Temperature Reconstruction],"Zhonghui Liu, Timothy Herbert","[{'Author': 'Liu, Z. and T.D. Herbert', 'Title...","[[{'DataTableID': '4320', 'DataTableName': 'OD...",[]
98,19139,16805,"Eastern Tropical Indian Ocean 45,000 Year d18O...",PALEOCEANOGRAPHY,45330.0,60.0,-43380.0,1890.0,High-resolution (~30-80 years) foraminiferal o...,[Sea Surface Temperature Reconstruction],"Mahyar Mohtadi, Matthias Prange, Delia Oppo, R...","[{'Author': 'Mahyar Mohtadi, Matthias Prange, ...","[[{'DataTableID': '29599', 'DataTableName': 'M...",[{'fundingAgency': 'Bundesministerium für Bild...
