<img src='https://www.icos-cp.eu/sites/default/files/2017-11/ICOS_CP_logo.png' width=400 align=right>

# ICOS Carbon Portal Python Libraries: icoscp_core

This example uses a foundational library called `icoscp_core` which can be used to access time-series ICOS data that are <i>previewable</i> in the ICOS Data Portal. "Previewable" means that it is possible to visualize the data variables in the preview plot. The library can also be used to access (meta-)data from [ICOS Cities](https://citydata.icos-cp.eu/portal/) and [SITES](https://data.fieldsites.se/portal/) data repositories. 

Documentation of the library, including information on running it locally, can be found on [PyPI.org](https://pypi.org/project/icoscp_core/).

# Example: Access and work with ecosystem data

### Import libraries

In [None]:
from icoscp_core.icos import data, meta, ATMO_STATION, ECO_STATION, OCEAN_STATION
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates

### List stations

In [None]:
# Stations specific for the ecosystem domain (see example 1a and 1c for examples for the atmosphere and ocean domains) 
stations = meta.list_stations(ECO_STATION)

# Filter stations by country (e.g., Sweden, 'SE')
country_code = 'SE'
filtered_stations = [
    s for s in stations
    if s.country_code == country_code
]

# Display available stations in selected country for the ecosystem domain
print("Available stations:")
for station in filtered_stations:
    print(f"Station ID: {station.id}, Name: {station.name}, URI: {station.uri}")


### View metadata for a selected station 

The example shows how to access some of the metadata associated with the station. 

In [None]:
# Specify a station uri from list above
station_uri = 'http://meta.icos-cp.eu/resources/stations/ES_SE-Deg'
station_meta = meta.get_station_meta(station_uri)

# Print the station name
print('Name:', station_meta.org.name)

print('Staff:')
# Loop over all staff members and print their first name, last name, and email
for staff_member in station_meta.staff:
    first_name = staff_member.person.firstName
    last_name = staff_member.person.lastName
    email = staff_member.person.email
    print(f"{first_name} {last_name} ({email})")


### See a list of data types

There are data types that in combination with the selected station make it possible to access specific data objects. In the example, filters are applied so that only data types associated with ICOS Level 2 data from the ecosystem domain that are previewable are shown. See more information [about data levels](https://www.icos-cp.eu/data-services/data-collection/data-levels-quality) here. Additional filters can be applied. Please refer to the documentation for more details.


In [None]:
# Available datatypes
data_types = meta.list_datatypes()

# filters applied:
# data types with data access (possible to view with Python)
# data types with level 2 data
# data types associated with stations from the ecosystem domain
data_level = 2
previewable_datatypes = [
    dt for dt in data_types
    if dt.has_data_access and dt.data_level==data_level and dt.theme.label == 'Ecosystem data'
]

for data_type in previewable_datatypes:
    datatype_uri = data_type.uri
    datatype_label = data_type.label
    
    print(f"{datatype_label} ({datatype_uri})")

### Find data objects based on the selected station and a specified data type

This example shows how to get a list of data objects associated with the selected station and the data type "Fluxnet Product". In this case, it is only one object available.

In [None]:
# Specify a data type from the list above 
data_type = 'http://meta.icos-cp.eu/resources/cpmeta/miscFluxnetProduct'

station_data_objects = meta.list_data_objects(datatype = data_type, 
                                         station = station_uri)
for station_data_object in station_data_objects:
    station_object_filename = station_data_object.filename

    print(station_object_filename)

if len(station_data_objects) == 0:
    print(f'No available objects with data type {data_type} at station {station_uri}')

### Access data

This example shows how to access the data and metadata from FLX_SE-Deg_FLUXNET2015_FULLSET_HH_2001-2020_beta-3.csv.zip.


In [None]:
# Select a filename from the list above
filename = 'FLX_SE-Deg_FLUXNET2015_FULLSET_HH_2001-2020_beta-3.csv.zip'
selected_data_object = next((station_data_object for station_data_object in station_data_objects if station_data_object.filename == filename), None)

if selected_data_object is not None:
    # Access metadata associated with the object
    dobj_meta = meta.get_dobj_meta(selected_data_object)

    # Access the object's data
    dobj_arrays = data.get_columns_as_arrays(dobj_meta)

    # Convert to a pandas dataframe
    df = pd.DataFrame(dobj_arrays)

    display(df)
else:
    print('Check filename')


### Make a plot: single data column

The selected data object that has been accessed contains data for GPP, which can be calculated in different ways. In this example, we use the GPP stored in the "GPP_DT_VUT_REF" column in the dataframe df above. If you access a different data object, the data may be stored in a column with a different name. Additionally, the names of the columns containing the observation timestamp and quality flag may also differ.

<mark>Note that only the latest year of data are plotted</mark>. This selection was made because there are very many data points. 

Before the data is plotted, the "NEE_VUT_REF_QC" column is used to exclude data that has been marked as poor.

In [None]:
# Run this to see available all columns
df.columns

In [None]:
time_column = 'TIMESTAMP'
data_column = 'GPP_DT_VUT_REF'
quality_flag = 'NEE_VUT_REF_QC'
value_accept_quality = '0'

# Find the latest year based on the time_column
latest_year = df[time_column].dt.year.max()

# Filter the DataFrame to include only rows from the latest year
df_latest_year = df[df[time_column].dt.year == latest_year]

# apply flag to excluded poor data (maked "U" in column "Flag")
df_latest_year_quality = df_latest_year[df_latest_year[quality_flag] == value_accept_quality]

# dobj_meta accessed in "Access data" section
columns_meta = dobj_meta.specificInfo.columns

if data_column in df_latest_year_quality.columns and time_column in df_latest_year_quality.columns:

    # find metadata associated with the selected column (data_column)
    dobj_value_type = [col for col in columns_meta if col.label==data_column][0].valueType

    # create label for y-axis based on the metadata
    y_axis_label = f"{dobj_value_type.self.label} [{dobj_value_type.unit}]"
    station = dobj_meta.specificInfo.acquisition.station.org.name

    plot = df_latest_year_quality.plot(x=time_column, y=data_column, grid=True, title=station, style='o', markersize = 3)
    plot.set(ylabel=y_axis_label)

else:
    print(f'The selected data_column or time_column is not one of the columns in df. Choose among {list(df.columns)}.' )

### Make a plot: two data columns on different axes

Possible for two of the selected data columns. Even if more are given, only the first two in the list "selected_data_columns" will be used.

<mark>Note that only the latest year of data are plotted</mark>. This selection was made because there are very many data points. 

Before the data is plotted, quality flags are applied to exclude poor data.

In [None]:
time_column = 'TIMESTAMP'
# Select two of the data column in dataframe "df"
data_column1 = 'GPP_DT_VUT_REF'
quality_flag1 = 'NEE_VUT_REF_QC'
value_accept_quality1 = '0'

data_column2 = 'SW_IN_F'
quality_flag2 = 'SW_IN_F_QC'
value_accept_quality2 = '0'

# Find the latest year based on the time_column
latest_year = df[time_column].dt.year.max()

# Filter the DataFrame to include only rows from the latest year
df_latest_year = df[df[time_column].dt.year == latest_year]

# Set up the plot with the first variable
fig, ax1 = plt.subplots()

# Filter based on quality flag associated with selected data_column1
df_latest_year_quality = df_latest_year[df_latest_year[quality_flag1] == value_accept_quality1]

# Find the unit for data column 1 
dobj_value_type = [col for col in columns_meta if col.label==data_column1][0].valueType

# create label for y-axis based on the metadata
y_axis_label1 = f"{dobj_value_type.self.label} [{dobj_value_type.unit}]"

# b stands for blue and "." for circle markers
ax1.plot(df_latest_year_quality[time_column], df_latest_year_quality[data_column1], 'b.')
ax1.set_xlabel('Time')
ax1.set_ylabel(y_axis_label1, color='b')
ax1.tick_params(axis='y', labelcolor='b')

# Find station name
# dobj_meta accessed in "Access data" section
station = dobj_meta.specificInfo.acquisition.station.org.name

# Set the title with the station name
ax1.set_title(station)

# Create a secondary y-axis for the second variable
ax2 = ax1.twinx()  # instantiate a second axes that shares the same x-axis

# Filter based on quality flag associated with selected data_column2
df_latest_year_quality = df_latest_year[df_latest_year[quality_flag2] == value_accept_quality2]

# Find the unit for data column 2
dobj_value_type = [col for col in columns_meta if col.label==data_column2][0].valueType

# create label for y-axis based on the metadata
y_axis_label2 = f"{dobj_value_type.self.label} [{dobj_value_type.unit}]"

# r stands for red and "." for circle markers
ax2.plot(df_latest_year_quality[time_column], df_latest_year_quality[data_column2], 'r.')
ax2.set_ylabel(y_axis_label2, color='r')
ax2.tick_params(axis='y', labelcolor='r')

# show the dates in this specific format (YYYY-MM-DD)
ax1.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m-%d'))

# Rotate the dates to fit better
ax1.tick_params(axis='x', rotation=45)

# Add grid
ax1.grid(True)

# Show the plot
fig.tight_layout()  # to make sure labels/axes don't overlap
plt.show()