<img src='https://www.icos-cp.eu/sites/default/files/2017-11/ICOS_CP_logo.png' width=400 align=right>

# ICOS Carbon Portal Python Libraries

This example uses a foundational library called `icoscp_core` which can be used to access time-series ICOS data that are <i>previewable</i> in the ICOS Data Portal. "Previewable" means that it is possible to visualize the data variables in the preview plot. The library can also be used to access (meta-)data from [ICOS Cities](https://citydata.icos-cp.eu/portal/) and [SITES](https://data.fieldsites.se/portal/) data repositories. 

General information on all ICOS Carbon Portal Python libraries can be found on our [help pages](https://icos-carbon-portal.github.io/pylib/). 

Documentation of the `icoscp_core` library, including information on running it locally, can also be found on [PyPI.org](https://pypi.org/project/icoscp_core/).

Note that for running this example locally, authentication is required (see the `how_to_authenticate.ipynb` notebook).



# Example: Access collection data and meta data

## Import the library

In [None]:
from icoscp_core.icos import meta

# bokeh for plotting the data
from bokeh.plotting import figure, show
from bokeh.layouts import gridplot, column, row
from bokeh.io import output_notebook
from bokeh.models import Div
output_notebook()

## Read collection metadata

### Resolve a DOI
This step can be skipped if the collection landing page URI is known

In [None]:
import requests
doi = "10.18160/VXCS-95EV"
coll_uri = requests.head(f"https://doi.org/{doi}").headers.get('Location')
coll_uri

### Read the metadata

In [None]:
coll = meta.get_collection_meta(coll_uri)
coll.title

### Fast-forward to the latest version if a later version exists
This step is also optional, should only be used if the code should find the latest version of the collection

In [None]:
if coll.latestVersion != coll_uri:
    coll = meta.get_collection_meta(coll.latestVersion)
coll.title

## Collection metadata overview

Available (nested) metadata properties can be discovered with Tab-completion after "."

For a more principled discovery of the metadata properties, one can examine Python documentation for respective types as follows.

In [None]:
from icoscp_core import metacore
help(metacore.StaticCollection)
#metacore.StaticDataItem # is a union type to represent collection members (objects and other collections)
#help(metacore.PlainStaticObject)
#metacore.Sha256Sum # is an alias for str
#metacore.URI # is an alias for str

### Citation

In [None]:
coll.references.citationString

In [None]:
coll.references.citationBibTex

### Collection description

In [None]:
coll.description

### List collection members

In [None]:
coll.members[:4]

### Group the collection members by station and data type

This is a rather fragile code, fully relying on file naming conventions used by the Atmosphere Thematic Center.
An alternative is to perform a batch lookup of data-type- and station URIs for each of the objects in the collection,
which at the time of this writing requires a custom SPARQL query.
It should also be noted that the following grouping is done only to match the design of this example. In many practical situations
one would want to work with uniform lists of data objects of a certain data type, rather than with a handful different types of objects.

In [None]:
from icoscp_core.metacore import PlainStaticObject

# extract short label for the data type from the file name
def data_type(pso):
    # filename segment after last underscore, minus the file extension
    return pso.name.split('_')[-1][:-4]

# extract station id from the file name
def station_id(pso):
    return pso.name.split('_')[4]

# generic function to group Python lists by a key function into dictionaries with lists as values
def groupby(elem_list, key_func):
    res = {}
    for elem in elem_list:
        res.setdefault(key_func(elem), []).append(elem)
    return res

# helper function to group PlainStaticObject list by data type, and take the last
# data object landing page URI from each group
def by_datatype(memb_list):
    return {
        key: group[-1].res
        for key, group in groupby(memb_list, data_type).items()
    }

# use only plain data object members, ignore subcollections, if any
plain_obj_members = [memb for memb in coll.members if isinstance(memb, PlainStaticObject)]

# commence the grouping
by_station = groupby(plain_obj_members, station_id)
by_station_and_type = {station: by_datatype(group) for station, group in by_station.items()}

# show part of the resulting dictionary of dictionaries
dict(list(by_station_and_type.items())[:3])

## Linked plot for CO2, CH4, N2O and CO 
Let's create a plot to compare some of the data provided by the collection. The plot is interactive (the toolbar is on the top right) and the x-axes are linked. So, if you zoom in in one plot, all three plots are zoomed. As a title we use meta data provided from the collection.

In [None]:
from icoscp_core.icos import data

chosen_station = 'CBW' # choose Cabaw
by_type = by_station_and_type.get(chosen_station)

def make_subplot(data_type_label):
    dobj_uri = by_type.get(data_type_label)
    d_meta = meta.get_dobj_meta(dobj_uri)
    d_data = data.get_columns_as_arrays(d_meta)

    var_name = data_type_label.lower()
    var_arr = d_data.get(var_name)
    time_arr = d_data.get('TIMESTAMP')

    columns_meta = d_meta.specificInfo.columns
    unit = [col for col in columns_meta if col.label==var_name][0].valueType.unit

    subplot = figure(plot_width=300, plot_height=300, title=data_type_label, x_axis_type="datetime",y_axis_label=unit)
    subplot.circle(time_arr, var_arr, size=1, color="navy", alpha=0.3)
    return subplot
    
p = gridplot([[
    make_subplot('CO2'),
    make_subplot('CH4'),
    make_subplot('N2O'),
    make_subplot('CO')
]])
    
# show the results
show(column(Div(text="<h2>" + coll.title+ "</h2><br>" + coll.description + "<br>" + coll.references.citationString),p))