This notebook explores the remote database and using native pandas to represent time.

In [1]:
import json
import requests
import pandas as pd
import numpy as np
import io

from paleopandas.paleoarray import *

## Querying for retrograde data

This query return information about age. Let's first use 'years BP' or 'BP' to look for data

In [2]:
url = 'https://linkedearth.graphdb.mint.isi.edu/repositories/LiPDVerse'

query = """
PREFIX le: <http://linked.earth/ontology#>
select ?val ?timeval ?varunits ?timeunits ?dsname ?varname ?timevarname where { 
    ?ds le:name ?dsname .
    ?ds le:includesPaleoData ?data .   
    ?data le:foundInMeasurementTable ?table .
    ?table le:includesVariable ?var .
    ?var le:name ?varname .
    FILTER (?varname != "age")
    FILTER (?varname != "year")
    ?var le:hasVariableID ?varID .
    ?var le:hasValues ?val .
        OPTIONAL{?var le:hasUnits ?varunits .}
    ?table le:includesVariable ?timevar .
    ?timevar le:name ?timevarname .
        VALUES ?timevarname {"age"} .
    ?timevar le:hasValues ?timeval .
    ?timevar le:hasUnits ?timeunits .
        VALUES ?timeunits {"yr BP" "BP"}
}
LIMIT 100
"""
response = requests.post(url, data = {'query': query})

data = io.StringIO(response.text)
df = pd.read_csv(data, sep=",")

# Make list from the values string
df['val']=df['val'].apply(lambda row : np.fromstring(row.strip("[]"), sep=','))
df['timeval']=df['timeval'].apply(lambda row : np.fromstring(row.strip("[]"), sep=','))

df.head()

  df['val']=df['val'].apply(lambda row : np.fromstring(row.strip("[]"), sep=','))


Unnamed: 0,val,timeval,varunits,timeunits,dsname,varname,timevarname
0,"[20.4, 20.4, 20.7, 20.6, 20.5, 20.2, 19.3, 19....","[0.0, 13.0, 26.0, 39.0, 52.0, 64.0, 77.0, 90.0...",degC,BP,NAm-DarkLake.Gajewski.1988,temperature,age
1,"[18.4, 18.4, 19.1, 18.4, 18.9, 18.7, 18.6, 19....","[-10.0, 24.0, 58.0, 93.0, 127.0, 191.0, 248.0,...",degC,BP,NAm-ClearPond.Gajewski.1988,temperature,age
2,[],"[-44.0, -34.0, -13.0, 7.0, 18.0, 34.0, 45.0, 5...",,BP,Eur-CentralandEasternPyrenees.Pla.2004,sampleID,age
3,"[0.0, 0.09114, -0.19458, 0.07387, -0.42006, -0...","[-44.0, -34.0, -13.0, 7.0, 18.0, 34.0, 45.0, 5...",degC,BP,Eur-CentralandEasternPyrenees.Pla.2004,temperature,age
4,"[0.13984, 0.15345, 0.16085, 0.13493, 0.14066, ...","[-44.0, -34.0, -13.0, 7.0, 18.0, 34.0, 45.0, 5...",degC,BP,Eur-CentralandEasternPyrenees.Pla.2004,uncertainty_temperature,age


In [7]:
idx = pd.Index(df['timeval'].iloc[0],dtype=PaleoDtype('yrs BP'))
ser = pd.Series(df['val'].iloc[0], name=df['varname'].iloc[0], index=idx)

ser

1950-01-01    20.4
1937-01-01    20.4
1924-01-01    20.7
1911-01-01    20.6
1898-01-01    20.5
1886-01-01    20.2
1873-01-01    19.3
1860-01-01    19.7
1847-01-01    19.5
1834-01-01    19.6
1828-01-01    20.4
1821-01-01    19.8
1815-01-01    19.8
1809-01-01    19.9
1792-01-01    19.0
1775-01-01    19.3
1761-01-01    19.3
1749-01-01    19.5
1736-01-01    19.2
1727-01-01    19.6
1709-01-01    19.7
1655-01-01    19.5
1605-01-01    19.5
1561-01-01    19.3
1517-01-01    19.5
1480-01-01    19.9
1438-01-01    20.0
1405-01-01    20.5
1374-01-01    19.8
1345-01-01    19.7
1318-01-01    20.0
1290-01-01    19.8
1263-01-01    20.0
1232-01-01    20.0
1200-01-01    20.0
1164-01-01    20.0
1126-01-01    20.0
1086-01-01    19.7
1044-01-01    19.9
Name: temperature, dtype: float64

**Question**: Should we use floats here rather than dates as display?

In [8]:
ser.plot()

AssertionError: (<class 'paleopandas.paleoarray.PaleoDtype'>, <class 'property'>)

In [13]:
ser.dt.year

AttributeError: Can only use .dt accessor with datetimelike values

In [15]:
print(ser.mean())
print(ser.max())

19.833333333333336
20.7


## Let's try for a kyr Series

Let's open the ODP846 data which is in kyr for a trial


In [9]:
odp = pd.read_csv('../data/ODP846.csv')
odp.head()

Unnamed: 0,Age,d18O
0,3.645,3.38
1,7.99,3.46
2,11.18,3.765
3,13.803,4.14
4,15.886,4.47


Create a pandas Series object with age as an index:

In [11]:
idx = pd.Index(odp['Age'],dtype=PaleoDtype('"yrs KA'))
ts = pd.Series(odp['d18O'], name='d18O', index=idx)

ts

Age
-1695-01-01 00:00:00      NaN
-6040-01-01 00:00:00      NaN
-9230-01-01 00:00:00      NaN
-11852-01-01 05:59:59     NaN
-13936-01-01 00:00:00     NaN
                           ..
-5012332-01-01 00:00:00   NaN
-5015612-01-01 00:00:00   NaN
-5019417-01-01 05:59:59   NaN
-5023475-01-01 00:00:00   NaN
-5027777-01-01 00:00:00   NaN
Name: d18O, Length: 2000, dtype: float64

In [12]:
odp['d18O']

0       3.380
1       3.460
2       3.765
3       4.140
4       4.470
        ...  
1995    2.971
1996    2.981
1997    3.141
1998    2.901
1999    2.931
Name: d18O, Length: 2000, dtype: float64

**Bug**: Not sure why d18O is NaN in ts..