<img src="imgs/ecas_logo.png" alt="ECAS Logo" width="500" align="left"><img src="imgs/onedata_logo.png" alt="Onedata" width="400" align="right" />

# Integration of Onedata into the ECAS environment: how to compute a climate indicator from a shared data collection

This notebook explains how to compute a simple climate indicator exploiting the features provided by the **ECAS** environment (in particular, those provided by the *Ophidia Framework*) and accessing input datafiles shared on the **Onedata** platform.

The **ENES Climate Analytics Service** (ECAS) is one of the *EOSC-Hub Thematic Services*. It builds on top of the Ophidia big data analytics framework with additional components and services from the INDIGO-DataCloud software stack, EUDAT and EGI e-infrastructures.

**Onedata** (https://onedata.org/#/home) is a global data management system, providing easy access to distributed storage resources, supporting wide range of use cases from personal data management to data-intensive scientific computations.

The goal of this training notebook is to implement a real indicator from the *extreme climate indices* set directly on a shared and geographically distributed data collection exposed in the *Federated Data Archive* in a transparent way, without unnecessary data movement.

This set comprises 27 indices based on daily temperatures or daily precipitation amount, defined with the purpose to analyze extreme events. In this training we are going to compute the *Summer Days index*: starting from the daily maximum temperature (2096-2100), the Summer Days index is the annual count of days where TX (daily maximum temperature) > 25°C.
The full list of indices is provided here: [http://etccdi.pacificclimate.org/list_27_indices.shtml](http://etccdi.pacificclimate.org/list_27_indices.shtml).

The training session will be completely carried out in this Jupyter Notebook using python code and the set of modules and libraries available in the ECASlab; in particular, the calculation of the climate index will exploit a global data repository provided by the modelling groups and backed by computing centers and storage providers worldwide.

Before starting the actual implementation of the indicator, let's understand the main features of Onedata, how data is organized and what are the key concepts behind its implementation.
For more details, please visit the official website: https://onedata.org/#/home

## 1. Getting started with Onedata

**Onedata** is a global data access solution for eScience. It allows users to:
- access data in a dropbox-like fashion regardless of its location
- perform heavy computations on huge datasets
- publish and share results with public or closed communities

The platform consists of 3 main components:
- *Spaces*: distributed virtual volumes, where users can organize their data
- *Providers*: entities who support user spaces with actual storage resources exposed via Oneprovider services
- *Zones*: federations of providers, which enable creation of closed or interconnected communities, managed by Onezone services.

## 2. Onedata integration in ECAS

In the context of ECAS, an existing (read-only) data collection, provided for instance by a modelling group, is exposed via dedicated Onedata services and made available in the ECASLab environment.
Each user can easily access and process data, which are made available in its own workspace.

More in details, when you log in to JupyterHub, the following data folder are available in you home directory:

- **/data:** input data required for the workflows/notebooks locally available in the ECAS environment
- **/onedata:** a Federated Data Archive exposed via Onedata    

In this example notebook, we are going to import a NetCDF file stored in the **ECAS space** which is
 - supported by the **ECAS provider** deployed at the CMCC SuperComputing Center
 - shared on the ECAS cluster and available at
 
 
      /home/{username}/onedata/repository/ECAS_space

<br>
    
<img src="imgs/ECAS_space.png" alt="Onedata" width="400"> 

## 3. Calculate the Summer Days index

Now that you have an understanding of the data organizaton, you're ready to calculate the Summer Days index using the NetCDF files provided in the *ECAS_space* repository.

First of all, connect to the remote ECAS instance

In [None]:
from PyOphidia import cube
cube.Cube.setclient(read_env=True)

List the files available into the shared data collection

In [None]:
cube.Cube.fs(command='ls', dpath='/onedata/repository/ECAS_space/*.nc', display=True)

Import the *tasmax* source file (minimum temperature °K) from the repository into an Ophidia datacube

In [None]:
maxtemp = cube.Cube(src_path='/onedata/repository/ECAS_space/tasmax_day_CMCC-CESM_rcp85_r1i1p1_20960101-21001231.nc',
    measure='tasmax',
    import_metadata='yes',
    imp_dim='time',
    imp_concept_level='d', vocabulary='CF',hierarchy='oph_base|oph_base|oph_time',
    ncores=4,
    description='Max Temps'
    )

Identify the summer days

In [None]:
summerdays = maxtemp.apply(
    query="oph_predicate('oph_float','oph_int',measure,'x-298.15','>0','1','0')"
)

Count the number of summer days

In [None]:
count = summerdays.reduce2(
    operation='sum',
    dim='time',
    concept_level='y',
)

Finally plot the indicator

In [None]:
firstyear = count.subset(subset_filter=1, subset_dims='time')

In [None]:
%matplotlib inline
import cartopy.crs as ccrs
import matplotlib.pyplot as plt
from cartopy.mpl.geoaxes import GeoAxes
from cartopy.util import add_cyclic_point
import numpy as np
import warnings
warnings.filterwarnings("ignore")

fig = plt.figure(figsize=(15, 6), dpi=100)

#Add Geo axes to the figure with the specified projection (PlateCarree)
projection = ccrs.PlateCarree()
ax = plt.axes(projection=projection)

#Draw coastline and gridlines
ax.coastlines()

gl = ax.gridlines(crs=projection, draw_labels=True, linewidth=1, color='black', alpha=0.9, linestyle=':')
gl.xlabels_top = False
gl.ylabels_right = False

data = firstyear.export_array(show_time='yes')
lat = data['dimension'][0]['values'][ : ]
lon = data['dimension'][1]['values'][ : ]
var = data['measure'][0]['values'][ : ]
var = np.reshape(var, (len(lat), len(lon)))

#Wraparound points in longitude
var_cyclic, lon_cyclic = add_cyclic_point(var, coord=np.asarray(lon))
x, y = np.meshgrid(lon_cyclic,lat)

#Define color levels for color bar
levStep = (np.nanmax(var)-np.nanmin(var))/20
clevs = np.arange(np.nanmin(var),np.nanmax(var)+levStep,levStep)

#Set filled contour plot
cnplot = ax.contourf(x, y, var_cyclic, clevs, transform=projection,cmap=plt.cm.Oranges)
plt.colorbar(cnplot,ax=ax)

ax.set_aspect('auto', adjustable=None)

plt.title('Summer Days (year 2096)')
plt.show()

## 4. Final remarks

Congrats! You've completed this training regarding some basics operations that can be performed within the ECASlab and you should now be able to:

* use the PyOphidia module to access a OneData repository;
* perform the computation of climate index over a shared data collection.

If you want to clear your user space before running other notebooks, run the following commands:

In [None]:
cube.Cube.deletecontainer(container='tasmax_day_CMCC-CESM_rcp85_r1i1p1_20960101-21001231.nc',force='yes')

## References

1. [OneData website](https://onedata.org/#/home)
2. [EOSC-Hub project website](https://www.eosc-hub.eu)
3. [PyOphidia library](https://github.com/OphidiaBigData/PyOphidia)