# WCRP 2023 - LL02: Open Earth System Science in Cloud Part 1

This is the first part of the short curse. It will concentrate on data access and loading the data. Also this part is to get familiar with the DeepESDL platform.

## The DeepESDL Hub Platform

The DeepESDL platform is a Jupyterhub which encloses data storing and aquisition, code for processing, and the possibility for extensive plots in one single point. Collaboration and sharing is encouraged in such an environment.

### Minimal Jupyter introduction

You are currently looking at a Jupyter Hub. Short: A combination of interactive coding, text, equations and other outputs. It provides Jupyter Notebooks and text editors, terminals, data file viewer and interactive plottings, etc.

The Interface: menu bar, left sidebar, and main work area. 

**Menu bar**: File, Edit, View, Run, Kernel, Tabs, Settings, Help

**Left Sidebar**: Different tabs where you can browse your files, see which tabs are open, which kernels and terminals are running, table of content, Softwares (Modules that are already installed and can be used at the cluster), extention manager

**Main work area**: Documents like notebooks, images, consoles and datasets are organised in panels or tabs that can be subdivided or resized depending on your workflow

In a Jupyter Notebook, one can write and execute code, visualise data, and also share this code with others. The special feature of Jupyter Notebook is that the code and the description of the code are written in independent cells, so that individual code blocks can be executed individually. You can run a cell with the key combination `[Ctrl]` + `[Enter]`.

## xcube and xarray

[`Xarray`](https://docs.xarray.dev/en/stable/index.html) is a python module that ease the usage of labeled multi-dimensional (also known as N-dimensional) arrays built on [`NumPy`](https://numpy.org/). It allows fast and efficient array-computing, and provides functions for visualizations. Therefore it is very well suited to work with the Earth System Data Cube which contains a variety of climate and meteorological variables.

In Earth system sciences, it is usually necessary to map data points in their spatial and temporal space (e.g. latitude, longitude and time). Many data formats (e.g. `zarr` or `netCDF`) allow data to be stored together with information about the dimensions and coordinates. Xarray provides two fundamental data structures to work with these data formats: DataArrays and Datasets. In the following image you can see a visual example of these structures ([source](https://docs.xarray.dev/en/stable/user-guide/data-structures.html)): 

![](https://docs.xarray.dev/en/stable/_images/dataset-diagram.png)

- A `DataArray` contains a single multi-dimensional variable and its coordinates
- A `Dataset` holds multiple variables and (potentially) share the same coordinates.

[`xcube`](https://xcube.readthedocs.io/en) is a frontend to guarantee fast and easy data access. It is part of the DeepESDL ecosystem and provides a wide range of useful functionalities.

The DeepESDL project is ongoing and **Here write about the timeline**

For the environment it is only necessary to load the following packages:

In [None]:
import xcube
from xcube.core.store import new_data_store
from xcube.webapi.viewer import Viewer

## Loading Datasets

The Data is part of the DeepESDL platform and thus no downloads need to be done. The following datasets are available as part of the public datastore of the platform. They are the backbone of the science cases of the DeepESDL Project. 

- Earth System Data Cube
- Black Sea Cube
- Land Cover Cube
- Ocean Cube
- SMOS freeze/thaw Cube
- Polar Cube
- Permafrost Cube

Other data access possibilities including registration-aware ones can be used as well. Then you use a different datastore. The following plugins are available:

- [Copernicus Marine Data Store](https://data.marine.copernicus.eu/products)
- [ESA Climate Change Initiative](https://climate.esa.int/en/esa-climate/esa-cci/)
- [Sentinel Hub cloud](https://www.sentinel-hub.com/)
- [Copernicus Climate Change Service](https://cds.climate.copernicus.eu/)

In [None]:
# setting up the datastore
store = new_data_store("s3", 
                       root="deep-esdl-public", 
                       storage_options=dict(anon=True))

# list the available datasets in the store
store.list_data_ids()

### Accessing Data

After the setup of the store we will access the data through `xcube` to create a `xarray`. Here we can concentrate on single or multiple variables and also perform *slicing* operations. This means to subset the cube to specific times or spatial parameters.

The output of `xarray` gives the possibility to explore the dataset a bit. In this case just uncomment the line with the variable and ruin the cell.

In [None]:
dataset = store.open_data('esdc-8d-0.25deg-256x128x128-3.0.1.zarr')
dataset

In [None]:
# subsetting for a specific variable
subset_variable = dataset['bare_soil_evaporation']
#subset_variable

In [None]:
# subsetting for a specific time
subset_time = dataset.sel(time=slice('2015-01-01', '2018-12-31'))
#subset_time

In [None]:
# subsetting for a specific area
subset_area = dataset.sel(lat=slice(-4.5, 2), lon=slice(28.5, 35.5))
#subset_area

In [None]:
# combinatiopns are possible
subset = dataset['air_temperature_2m'].sel(
    time = slice('2015-01-01', '2018-12-31'), 
    lat = slice(-4.5, 2), 
    lon = slice(28.5, 35.5)
)
#subset

In [None]:
# sometimes a specific location or time is not covered by the data. Then you probably want the nearest neighbour
subset = dataset['air_temperature_2m'].sel(time='2000-01-01', method='nearest')
#subset

### TODO
You may explore the dataset a bit before you continue. Feel free to access any dataset and any variable just to test the workflow before we continue

In [None]:
# This cell is for you to explore the dataset a bit. 
subset = dataset['air_temperature_2m'].sel(time='2000-01-01', method='nearest')
subset

### Another Example: Open Data portal of the ESA Climate Change Initiative (ESA CCI)

Next to the public DeepESDL datastore, there are other stores as mentioned above. The ESA CCI provides much data for Earth observation. The workflow is the same in this case.

The amount of different datasets is huge. xcube also provides possibilities to search for specific data. This is not covered in this tutorial. You can find the instructions in other notebooks of the default catalogue.

In [None]:
# setup the store and explore the possibilities
store = new_data_store('cciodp')
store.list_data_ids()

The following line will open the daily Sea Surface Temperature: https://www.eea.europa.eu/data-and-maps/data/external/esa-sst-cci-level-4 as an example. You can explore the dataset a bit.


In [None]:
dataset = store.open_data('esacci.SST.day.L4.SSTdepth.multi-sensor.multi-platform.OSTIA.2-1.sst')
dataset

## Accessing the real data

So far we just accessed stores and datasets. We didn't see the actual data so far. This has something to do with *lazyness*. The data is accesssed only when it's really needed. This ensures less stress on the platform and also speeds up development of our workflows.

Sometimes it is necessary though to get to the real data. then we can use the field `values` which is available for all `xarrays`. Please have in mind, that you first subset to a specific location or time. The following cell retrieves the ERA5 precipitation data of Greenwich, London and actually accesses the data.

In [None]:
# setup the store again to public DeepESDL data and access the ESDC
store = new_data_store("s3", 
                       root="deep-esdl-public", 
                       storage_options=dict(anon=True))
dataset = store.open_data('esdc-8d-0.25deg-256x128x128-3.0.1.zarr')
#subset to Greenwich, London
subset = dataset["precipitation_era5"].sel(lat=51.48, lon=0, method='nearest')
subset.values

Other possibilities to access data:

- convert to dataframe

In [None]:
# conversion to dataframe with 1D data
subset_1D = dataset["precipitation_era5"].sel(lat=51.48, lon=0, method='nearest')
subset_1D.to_dataframe()

In [None]:
# conversion to numPy array for 2D data
subset_2D = dataset["precipitation_era5"].sel(time = '2015-06-20', method='nearest')
subset_2D.to_numpy()

## Exercise: Land Surface Temperature in Kigali

This is a small guided exercise to extract information about Kigali.

The goal is to extract the Land Surface temperature of Kigali in 2012. Use the dataset `esacci.LST.day.L3S.LST.multi-sensor.multi-platform.IRCDR.2-00.DAY` from the ESA CCI dataset. 

In [None]:
# TODO: Setup of the ESA CCI store


In [None]:
# TODO: Access the dataset and explore it


In [None]:
# TODO: subset the dataset to the single variable lst


In [None]:
# find out about the lat and lon coordinates of Kigali and extract the time-series of Kigali


In [None]:
# convert the result to a dataframe
