<img src="http://www.organicdatacuration.org/linkedearth/images/5/51/EarthLinked_Banner_blue_NoShadow.jpg">

# Searching the LinkedEarth wiki: Guide to SPARQL queries

One the main advantages of the LinkedEarth Ontology is in its ability to search for the datasets using very specific criteria. For intance, it is possible to search all the sea surface temperature datasets covering the Holocene from a specific location.

If one would break down the query into simpler pieces, this is what they would need to look for.
1. The archive is "marine sediment"
2. The type of inferred variables are "sea surface temperature" and "age" (To look for the Holocene)
3. The InferredVariableType "age" needs to have a HasMinValue of 0 and a HasMAxValue of 10 (kyr) or 10000 (yr).
4. The dataset needs to be between specified "lat" and "lon" coordinates.

We've just thrown a lot of terms into this query that you may not be familiar with. If you want to know more about how the LinkedEarth database is organized and makes use of the LinkedEarth Ontology, visit [this page](http://wiki.linked.earth/LinkedEarth_Ontology).

The possibilities to query the database are endless; however, it requires knowlegde of the LinkedEarth Ontology and the SPARQL language. We understand that not every use need to form highly complex query. This Notebook is for you! We have identified several types of queries that are often performed and created a user-friendly (no knowledge of SPARQL required) way for you to enter specific search criteria. 

If you're new to LinkedEarth, read all the sections below. If you've already performed queries and are familiar with the database, skip to the last section.
If you don't know how to use a Jupyter Notebook, please look at [this example](https://github.com/nickmckay/LiPD-utilities/blob/master/Examples/Welcome%20Jupyter%20-%20Quickstart.ipynb).

Table of Contents:
* [How to use this Notebook?](#howto)
* [Introduction to the LinkedEarth Ontology](#ontology)
    - [The LiPD Ontology](#LiPD)
    - [The Proxy Archive Ontology](#proxyarchive)
    - [The Proxy Observation Ontology](#proxyobe)
    - [The Proxy Sensor Ontology](#proxysensor)
    - [The Instrument Ontology](#instrument)
    - [The Inferred Variable Ontology](#inferredvar)
* [Sample query](#sample)
* [Create your own query](#create)

## <a name=howto></a> How to use this Notebook?

This Notebook is intended for user with very little to no knowledge of SPAQRL. It will help you create a text file for the query.

First, you need to know a little bit more about the LinkedEarth wiki and name standardization. Let's assume you want to look for "marine sediment" with the proxy "d18O". Two things need to happen: (1) The wiki needs to understand what a proxy is and (2) it needs to search for the exact character match (i.e. d18O rathern than delta18O, delta-18O...).

We taught the wiki that a "climate proxy" is in fact composed of a Proxy Archive (e.g., marine sediment), a Proxy Sensor (e.g. Globigerinoides ruber), and a Proxy Observation (e.g. d18O) following the work of [Evans et al. (2013)](http://www.sciencedirect.com/science/article/pii/S0277379113002011).

As for standardizing the terms, we are in the process of doing so with input from the community. To see what observations, archives and sensors are already available on the wiki, we essentially created a query to ask it just that! The results are in the cells below (rerun them to have the most up-to-date snapshot of possibilities). All you have to do is then make sure that your query term follows the wiki nomenclature (i.e. query for D18O rather than d18O!).

In [45]:
import json
import requests

url = "http://wiki.linked.earth/store/ds/query"

query = """PREFIX core: <http://linked.earth/ontology#>
PREFIX wiki: <http://wiki.linked.earth/Special:URIResolver/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT distinct ?a 
WHERE {
{
    ?dataset wiki:Property-3AArchiveType ?a.
}UNION
{
    ?w core:proxyArchiveType ?t.
    ?t rdfs:label ?a
}
}"""

response = requests.post(url, data = {'query': query})
res = json.loads(response.text)
#res
for item in res['results']['bindings']:
    print (item['a']['value'])

marine sediment
coral
lake sediment
glacier ice
tree
documents
speleothem
sclerosponge
borehole
hybrid
bivalve
Rock
Sclerosponge
Speleothem
Wood
Coral
MarineSediment
LakeSediment
GlacierIce
Documents
Hybrid
MolluskShell
Lake


In [17]:
query = """PREFIX core: <http://linked.earth/ontology#>
PREFIX wiki: <http://wiki.linked.earth/Special:URIResolver/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT distinct ?a 
WHERE 
{
    ?w core:proxyObservationType ?t.
    ?t rdfs:label ?a
}"""

response = requests.post(url, data = {'query': query})
res = json.loads(response.text)
for item in res['results']['bindings']:
    print (item['a']['value'])

DiffuseSpectralReflectance
JulianDay
Al/Ca
B/Ca
Ba/Ca
Mn/Ca
Sr/Ca
Zn/Ca
Radiocarbon
D18O
Mg/Ca
TEX86
TRW
Dust
Chloride
Sulfate
Nitrate
D13C
Depth
Age
Mg
Floral
DD
C
N
P
Si
Uk37
Uk37Prime
Density
GhostMeasured
Trsgi
Mg Ca
SampleCount
Segment
RingWidth
Residual
ARS
Corrs
RBar
SD
SE
EPS
Core
Uk37prime
Upper95
Lower95
Year old
Thickness
Na
DeltaDensity
Reflectance
BlueIntensity
VarveThickness
Reconstructed
AgeMin
AgeMax
SampleID
Depth top
Depth bottom
R650 700
R570 630
R660 670
RABD660 670
WaterContent
C N
BSi
MXD
EffectiveMoisture
Pollen
Precipitation
Unnamed
Sr Ca
Calcification1
Calcification2
Calcification3
CalcificationRate
Composite
Calcification4
Notes
Notes1
Calcification5
Calcification
Calcification6
Calcification7
Trsgi1
Trsgi2
Trsgi3
Trsgi4
IceAccumulation
F
Cl
Ammonium
K
Ca
Duration
Hindex
VarveProperty
X radiograph dark layer
D18O1
SedAccumulation
Massacum
Melt
SampleDensity
37:2AlkenoneConcentration
AlkenoneConcentration
AlkenoneAbundance
BIT
238U
Distance
232Th
230Th/232Th
D2

In [21]:
query = """PREFIX core: <http://linked.earth/ontology#>
PREFIX wiki: <http://wiki.linked.earth/Special:URIResolver/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT distinct ?a 
WHERE 
{
    ?w core:inferredVariableType ?t.
    ?t rdfs:label ?a
}"""

response = requests.post(url, data = {'query': query})
res = json.loads(response.text)
for item in res['results']['bindings']:
    print (item['a']['value'])

Year
Radiocarbon Age
D18O
Sea Surface Temperature
Age
Temperature
Salinity
Uncertainty temperature
Temperature1
Temperature2
Temperature3
Uncertainty temperature1
Thermocline Temperature
Sedimentation Rate
Relative Sea Level
Sea Surface Salinity
Subsurface Temperature
Accumulation rate
Carbonate Ion Concentration
Mean Accumulation Rate
Accumulation rate, total organic carbon
Accumulation rate, calcium carbonate


In [28]:
query = """PREFIX core: <http://linked.earth/ontology#>
PREFIX wiki: <http://wiki.linked.earth/Special:URIResolver/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT distinct ?a ?b
WHERE 
{
    ?w core:sensorGenus ?a.
    ?w core:sensorSpecies ?b .
    
}"""

response = requests.post(url, data = {'query': query})
res = json.loads(response.text)
for item in res['results']['bindings']:
    print ('Genus: '+item['a']['value']+' Species: ' +item['b']['value'])

Genus: Siderastrea Species: Siderastrea siderea
Genus: Siderastrea Species: siderea
Genus: Siderastrea Species: Siderastrea radians
Genus: Globigerinoides Species: ruber
Genus: Globigerinoides Species: sacculifer
Genus: Ceratoporella Species: nicholsoni
Genus: Ceratoporella Species: Ceratoporella nicholsoni
Genus: Diploria Species: labyrinthiformis
Genus: Diploria Species: Diploria labyrinthiformis
Genus: Diploria Species: Diploria strigosa
Genus: Porites Species: lutea
Genus: Porites Species: Porites austraiensis
Genus: Porites Species: NA
Genus: Porites Species: Porites
Genus: Porites Species: Porites sp.
Genus: Porites Species: Porites lobata
Genus: Porites Species: Porites lutea
Genus: Porites Species: NaN
Genus: Porites Species: P. australiensis, possibly P. lobata
Genus: Porites Species: Porites australiensis
Genus: Porites Species: lobata
Genus: Cibicidoides Species: mundulus
Genus: Cibicidoides Species: wuellerstorfi
Genus: Cibicidoides Species: mabahethi
Genus: Platygyra Speci

In [30]:
query = """PREFIX core: <http://linked.earth/ontology#>
PREFIX wiki: <http://wiki.linked.earth/Special:URIResolver/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT distinct ?a ?b
WHERE 
{
    ?w core:name ?a.
    ?w core:detail ?b .
    
}"""

response = requests.post(url, data = {'query': query})
res = json.loads(response.text)
for item in res['results']['bindings']:
    print ('Name: '+item['a']['value']+' Detail: ' +item['b']['value'])

Name: core Detail: middle of sample
Name: Salinity Detail: sea surface
Name: Salinity Detail: Sea Surface
Name: Salinity Detail: Sea surface
Name: Salinity Detail: surface water
Name: Temperature Detail: sea surface
Name: Temperature Detail: Sea Surface
Name: Temperature Detail: bottom water
Name: Temperature Detail: Sea surface
Name: Temperature Detail: thermocline
Name: Temperature Detail: surface water
Name: Temperature Detail: subsurface
Name: Calendar Detail: Age
Name: Age Detail: Calendar
Name: Age Detail: calendar
Name: age Detail: calendar
Name: D18O Detail: sea surface
Name: d180w Detail: Sea Surface
Name: d18O Detail: top of sample; signal believed to primarily reflect temperature, but influence of other factors cannot be excluded.
Name: d18O Detail: deviation of oxygen isotope ratio 18O:16O in H2O sample compared to standard mean ocean water (V-SMOW)
Name: d18O Detail: annual resolution
Name: d18O Detail: use at least 3yr averages
Name: temperature Detail: from documentary d

In [41]:
query = """PREFIX core: <http://linked.earth/ontology#>
PREFIX wiki: <http://wiki.linked.earth/Special:URIResolver/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT distinct ?b
WHERE 
{
    ?w core:inferredVariableType ?a.
    ?w core:hasUnits ?b .
    {
    ?a rdfs:label "Age" .
    }UNION
    {
    ?a rdfs:label "Year" .
    }
    
}"""

response = requests.post(url, data = {'query': query})
res = json.loads(response.text)
for item in res['results']['bindings']:
    print (item['b']['value']) # + item['w']['value']

AD
yr B.P.
yr BP
kyr BP
yr
BP
kyr
yrs BP
kaBP
yr AD
year
yrs


In [108]:
archiveType = ["marine sediment","Marine Sediment"]
proxObsType = ["Mg/Ca"]

query = """PREFIX core: <http://linked.earth/ontology#>
PREFIX wiki: <http://wiki.linked.earth/Special:URIResolver/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT  distinct ?dataset
WHERE {
"""
archiveTypeQ=""
if (len(archiveType)>0):
    #add values for marine sediment
    query += "VALUES ?a {"
    for item in archiveType:
        query +="\""+item+"\" "
    query += "}\n"
    
    archiveTypeQ = """
#Archive Type query
{
    ?dataset wiki:Property-3AArchiveType ?a.
}UNION
{
    ?p core:proxyArchiveType / rdfs:label ?a.
}"""

#CONTINUE HERE            
#if(len)            
            
query += """VALUES ?b {"Mg/Ca" "Mg Ca"}
VALUES ?c {"Sea Surface Temperature"}
VALUES ?genus {"Globigerinoides"}
VALUES ?species {"ruber"}
VALUES ?intName {"temperature" "Temperature"}
VALUES ?intDetail {"sea surface"}
VALUES ?units {"yr BP"}
VALUES ?ageOrYear{"Age" "Year"}

?dataset a core:Dataset.  
?dataset core:includesChronData|core:includesPaleoData ?data.
?data core:foundInMeasurementTable / core:includesVariable ?v.

?v core:proxyObservationType/rdfs:label ?b.

?data core:foundInMeasurementTable / core:includesVariable ?v1.
?v1 core:inferredVariableType ?t.
?t rdfs:label ?c.

?v ?proxySystem ?p .

?p core:proxySensorType ?sensor.
?sensor core:sensorGenus ?genus.
?sensor core:sensorSpecies ?species.

"""+archiveTypeQ+"""


{?v1 core:interpretedAs ?interpretation}
UNION
{?v core:interpretedAs ?interpretation}

?interpretation core:name ?intName.
?interpretation core:detail ?intDetail.

?data core:foundInMeasurementTable / core:includesVariable ?v2.
?v2 core:inferredVariableType ?aoy.
?aoy rdfs:label ?ageOrYear.
?v2 core:hasUnits ?units .
?v2 core:hasMinValue ?e1.
?v2 core:hasMaxValue ?e2 .

filter(?e1<=3000 && ?e2>=6000  && abs(?e1-?e2)>=1500 ).

?dataset core:collectedFrom ?z.
?z <http://www.w3.org/2003/01/geo/wgs84_pos#lat> ?lat. 
filter(xsd:float(?lat)<30 && xsd:float(?lat)>-30). 
?z <http://www.w3.org/2003/01/geo/wgs84_pos#long> ?long. # Get the longitude property
?z <http://www.w3.org/2003/01/geo/wgs84_pos#alt> ?alt. # Get the altitude property

filter(xsd:float(?long)<160 && xsd:float(?long)>100). #filter
filter(xsd:float(?alt)<0).
#filter(xsd:float(?alt)>160).


#Resolution (only if the variables are there)
{
?v core:hasResolution/(core:hasMeanValue |core:hasMedianValue) ?resValue.
filter (xsd:float(?resValue)<100)
}
UNION
{
?v1 core:hasResolution/(core:hasMeanValue |core:hasMedianValue) ?resValue1.
filter (xsd:float(?resValue1)<100)
}


}"""

print(query)

response = requests.post(url, data = {'query': query})
res = json.loads(response.text)
for item in res['results']['bindings']:
    print (item['dataset']['value'])# + ' ' + item['b']['value']+' ' + item['c']['value']+' '+ item['genus']['value'])

PREFIX core: <http://linked.earth/ontology#>
PREFIX wiki: <http://wiki.linked.earth/Special:URIResolver/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT  distinct ?dataset
WHERE {
VALUES ?a {"marine sediment" "Marine Sediment" }
VALUES ?b {"Mg/Ca" "Mg Ca"}
VALUES ?c {"Sea Surface Temperature"}
VALUES ?genus {"Globigerinoides"}
VALUES ?species {"ruber"}
VALUES ?intName {"temperature" "Temperature"}
VALUES ?intDetail {"sea surface"}
VALUES ?units {"yr BP"}
VALUES ?ageOrYear{"Age" "Year"}

?dataset a core:Dataset.  
?dataset core:includesChronData|core:includesPaleoData ?data.
?data core:foundInMeasurementTable / core:includesVariable ?v.

?v core:proxyObservationType/rdfs:label ?b.

?data core:foundInMeasurementTable / core:includesVariable ?v1.
?v1 core:inferredVariableType ?t.
?t rdfs:label ?c.

?v ?proxySystem ?p .

?p core:proxySensorType ?sensor.
?sensor core:sensorGenus ?genus.
?sensor core:sensorSpecies ?species.

{
   