<img src='https://www.icos-cp.eu/sites/default/files/2017-11/ICOS_CP_logo.png' width=400 align=right>

# ICOS Carbon Portal Python Libraries: icoscp_core

This example uses a foundational library called `icoscp_core` which can be used to access time-series ICOS data that are <i>previewable</i> in the ICOS Data Portal. "Previewable" means that it is possible to visualize the data variables in the preview plot. The library can also be used to access (meta-)data from [ICOS Cities](https://citydata.icos-cp.eu/portal/) and [SITES](https://data.fieldsites.se/portal/) data repositories. 

Documentation of the library, including information on running it locally, can be found on [PyPI.org](https://pypi.org/project/icoscp_core/).

# Example: Execute SPARQL queries for custom metadata searches

**Note** that to develop and test the queries, it is convenient to use the [SPARQL client web app](https://meta.icos-cp.eu/sparqlclient/)

## Import helpful functions

In [None]:
from icoscp_core.icos import meta
from icoscp_core.sparql import as_string, as_opt_str, as_uri
import pandas as pd

## Discover ecosystem types

In [None]:
query = """
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix cpmeta: <http://meta.icos-cp.eu/ontologies/cpmeta/>
select * where{
    ?et a cpmeta:EcosystemType ; rdfs:label ?etName .
    filter(strstarts(str(?et), "http://meta.icos-cp.eu")) # to filter out SITES instances
    optional{?et rdfs:comment ?etComment}
}
"""

eco_types = [
    {
        "uri": as_uri("et", row),
        "name": as_string("etName", row),
        "comment": as_opt_str("etComment", row)
    }
    for row in meta.sparql_select(query).bindings
]
pd.DataFrame(eco_types)

## Filter stations by ecosystem and country code

Let us find stations in Deciduous Broadleaf Forests in Belgium, France, Germany, Italy.

In [None]:
query = """
prefix cpmeta: <http://meta.icos-cp.eu/ontologies/cpmeta/>
select ?s ?name where{
    values ?countryCode {"BE" "FR" "DE" "IT"}
    ?s cpmeta:hasEcosystemType <http://meta.icos-cp.eu/ontologies/cpmeta/igbp_DBF> ;
        cpmeta:countryCode ?countryCode ;
        cpmeta:hasName ?name .
}
"""
selected_stations = {
    as_uri("s", row): as_string("name", row)
    for row in meta.sparql_select(query).bindings
}
selected_stations

## Find interesting data objects

For the selected stations, find `ETC L2 Fluxnet` (http://meta.icos-cp.eu/resources/cpmeta/etcL2Fluxnet) data objects that overlap with a chosen time interval

In [None]:
from icoscp_core.metaclient import TimeFilter

start_date = '2023-01-01'
end_date = '2023-12-31'

data_objs = meta.list_data_objects(
    datatype="http://meta.icos-cp.eu/resources/cpmeta/etcL2Fluxnet",
    station=list(selected_stations),
    filters = [
        TimeFilter("timeStart", "<", f"{end_date}T12:00:00Z"),
        TimeFilter("timeEnd", ">", f"{start_date}T12:00:00Z")
    ]
)
pd.DataFrame(data_objs)

## Pick an interesting variable and SPARQL for its unit of measurement

The following query relies on the variable name being unique (i.e. not used in other data types). For non-unique variables, the query would need to be enhanced to give the variable the context of a data type.

In [None]:
ts_col = "TIMESTAMP"
nee_col = "NEE_VUT_REF"

query = """
prefix cpmeta: <http://meta.icos-cp.eu/ontologies/cpmeta/>
select ?unit where{
    ?col cpmeta:hasColumnTitle "NEE_VUT_REF" ;
        cpmeta:hasValueType/cpmeta:hasUnit ?unit .
}
"""

nee_unit = [
    as_string("unit", row)
    for row in meta.sparql_select(query).bindings
][0]
nee_unit

## Import modules for subsequent data fetching and plotting

In [None]:
from icoscp_core.icos import data
import numpy as np

# bokeh for plotting the data
from bokeh.plotting import figure, show
from bokeh.layouts import gridplot, column, row
from bokeh.io import output_notebook
from bokeh.models import Div
output_notebook()

## Fetch data, plot the graphs

In [None]:
data_batch = data.batch_get_columns_as_arrays(data_objs, [ts_col, nee_col])

figures = []

for dobj, arrs in data_batch:

    # prepare a date mask
    mask = (arrs[ts_col] >= np.datetime64(start_date)) & (arrs[ts_col] <= np.datetime64(end_date))

    # prepare columns with the date mask filter
    cols = {col: arrs[col][mask] for col in [ts_col, nee_col]}
    
    fig = figure(plot_width=350, plot_height=300, title=selected_stations[dobj.station_uri], x_axis_type='datetime')
    fig.circle(cols[ts_col], cols[nee_col], size=1, alpha=0.3)
    fig.yaxis.axis_label = nee_unit
    
    figures.append(fig)

rows_of_3 = [figures[i:i+3] for i in range(0, len(figures), 3)]

title = Div(text='<h2>Net Ecosystem Exchange in selected deciduous broadleaf forest stations</h2>')

show(column(title, gridplot(rows_of_3)))

## Get a custom data-search sparql query from the portal app

- Go to https://data.icos-cp.eu  and find datasets you want
- Press the icon in the middle of the screen (see image below),  to copy your sparql query
- Come back here and create the variable `query` 

<img src="img/sparql.png" width="80%">

For the following example, we have searched for 20 latest data objects directly associated with a chosen keyword:

https://data.icos-cp.eu/portal/#%7B%22filterKeywords%22%3A%5B%22AVENGERS%22%5D%7D



In [None]:
query = '''
prefix cpmeta: <http://meta.icos-cp.eu/ontologies/cpmeta/>
prefix prov: <http://www.w3.org/ns/prov#>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
prefix geo: <http://www.opengis.net/ont/geosparql#>
select ?dobj ?hasNextVersion ?spec ?fileName ?size ?submTime ?timeStart ?timeEnd
where {
    ?spec cpmeta:hasDataLevel [] .
    FILTER(STRSTARTS(str(?spec), "http://meta.icos-cp.eu/"))
    FILTER NOT EXISTS {?spec cpmeta:hasAssociatedProject/cpmeta:hasHideFromSearchPolicy "true"^^xsd:boolean}
    ?dobj cpmeta:hasObjectSpec ?spec .
    BIND(EXISTS{[] cpmeta:isNextVersionOf ?dobj} AS ?hasNextVersion)
    ?dobj cpmeta:hasSizeInBytes ?size .
    ?dobj cpmeta:hasName ?fileName .
    ?dobj cpmeta:wasSubmittedBy/prov:endedAtTime ?submTime .
    ?dobj cpmeta:hasStartTime | (cpmeta:wasAcquiredBy / prov:startedAtTime) ?timeStart .
    ?dobj cpmeta:hasEndTime | (cpmeta:wasAcquiredBy / prov:endedAtTime) ?timeEnd .
    FILTER NOT EXISTS {[] cpmeta:isNextVersionOf ?dobj}
    VALUES ?keyword {"AVENGERS"^^xsd:string}
    ?dobj cpmeta:hasKeyword ?keyword
}
order by desc(?submTime)
offset 0 limit 20

'''

dobjs = [
    (as_uri("dobj", row), as_string("fileName", row))
    for row in meta.sparql_select(query).bindings
]
dobjs