# Compute Demo: Use Rooki to access CMIP6 data

## Overview

[Rooki](https://github.com/roocs/rooki) is a Python client to interact with [Rook](https://github.com/roocs/rook) data subsetting service for climate model data. This service is used in the backend by the [European Copernicus Climate Data Store](https://cds.climate.copernicus.eu) to access the CMIP6 data pool. The Rook service is deployed for load-balancing at IPSL (Paris) and DKRZ (Hamburg). The CMIP6 data pool is shared with ESGF. The provided CMIP6 subset for Copernicus is synchronized at both sites. 

*Rook* provides operators for *subsetting*, *averaging* and *regridding* to retrieve a subset of the CMIP6 data pool. These operators are implemented by the [clisops](https://github.com/roocs/clisops) Python libray and are based on [xarray](https://pypi.org/project/xarray/). The *clisops* library is developed by Ouranos (Canada), CEDA (UK) and DKRZ (Germany). 

The operators can be called remotly using the [OGC Web Processing Service](https://ogcapi.ogc.org/processes/) (WPS) standard.

![rook 4 cds](https://github.com/atmodatcode/tgif_copernicus/raw/main/media/rook.png)

**ROOK**: **R**emote **O**perations **O**n **K**limadaten

* Rook: https://github.com/roocs/rook
* Rooki: https://github.com/roocs/rooki
* Clisops: https://github.com/roocs/clisops
* Rook Presentation: https://github.com/cehbrecht/talk-rook-status-kickoff-meeting-2022/blob/main/Rook_C3S2_380_2022-02-11.pdf

## Prerequisites

| Concepts | Importance | Notes |
| --- | --- | --- |
| [Intro to Xarray](https://foundations.projectpythia.org/core/xarray/xarray-intro.html) | Necessary | |
| [Understanding of NetCDF](https://foundations.projectpythia.org/core/data-formats/netcdf-cf.html) | Helpful | Familiarity with metadata structure |
| [Knowing OGC services](https://ogcapi.ogc.org/processes/) | Helpful | Understanding of the service interfaces |


- **Time to learn**: 15 minutes

## Init Rooki

In [None]:
import os

# Configuration line to set the wps node - in this case, use DKRZ in Germany
os.environ['ROOK_URL'] = 'http://rook.dkrz.de/wps'

from rooki import rooki

## Retrieve subset of CMIP6 data

The CMIP6 dataset is identified by a dataset-id. An intake catalog as available to lookup the available datasets:

https://nbviewer.org/github/roocs/rooki/blob/master/notebooks/demo/demo-intake-catalog.ipynb

In [None]:
resp = rooki.subset(
    collection='c3s-cmip6.CMIP.MPI-M.MPI-ESM1-2-HR.historical.r1i1p1f1.Amon.tas.gn.v20190710',
    time='2000-01-01/2000-01-31',
    area='-30,-40,70,80',
)
resp.ok

### Open Dataset with xarray

In [None]:
ds = resp.datasets()[0]
ds

### Plot CMIP6 Dataset

In [None]:
ds.tas.isel(time=0).plot()

### Show Provenance

A provenance document is generated remotely to document the operation steps.
The provenance uses the [W3C PROV](https://www.w3.org/TR/prov-overview/Overview.html) standard.

In [None]:
from IPython.display import Image
Image(resp.provenance_image())

## Run workflow with subset and average operator

Instead of running a single operator one can also chain several operators in a workflow.

### Use rooki operators to create a workflow 

In [None]:
from rooki import operators as ops

### Define the workflow 

... internally the workflow tree is a json document

In [None]:
tas = ops.Input(
    'tas', ['c3s-cmip6.CMIP.MPI-M.MPI-ESM1-2-HR.historical.r1i1p1f1.Amon.tas.gn.v20190710']
)

wf = ops.Subset(
    tas, 
    time="2000/2000",
    time_components="month:jan,feb,mar",
    area='-30,-40,70,80',  
)

wf = ops.WeightedAverage(wf)

### Optional: look at the workflow json document

... *only* to give some insight

In [None]:
import json
print(json.dumps(wf._tree(), indent=4))

### Submit workflow job 

In [None]:
resp = wf.orchestrate()
resp.ok

### Open as xarray dataset

In [None]:
ds = resp.datasets()[0]
ds

### Plot dataset

In [None]:
ds.tas.plot()

### Show provenance

In [None]:
Image(resp.provenance_image())

## Summary
In this notebook, we used the Rooki Python client to retrieve a subset of a CMIP6 dataset. The operations are executed remotely on a Rook subsetting service (using OGC API and xarray/clisops). The dataset is plotted and a provenance document is shown. We also showed that remote operators can be chained to be executed in a single workflow operation.

### What's next?

This service is used by the European Copernicus Climate Data Store. 

We need to figure out how this service can be used in the new ESGF: 
* where will it be deployed? 
* how can it be integrated in the ESGF search (STAC catalogs, ...)
* ???

## Resources and references
- [Roocs on GitHub](https://github.com/roocs)
- [Copernicus Climate Data Store](https://cds.climate.copernicus.eu/)
- [STAC](https://stacspec.org/en)