# Demo: Run an esmvaltool recipe using containers

In this example, we demonstrate how to execute an ESMValtool recipe using containers on SURF's Spider infrastructure infrastructure.

## Prerequisite: Starting a Jupyter Server on Spider.

A Jupyter Server should be started on Spider as the environment to execute this notebook. You can follow [this instruction](https://github.com/RS-DAT/JupyterDaskOnSLURM) to start a Jupyter server on Spider.

After successfully setting up the Jupyter server, please copy this notebook to the Spider file system. Then open it from the browser on your local PC (as specified in the above instruction). Effectively, this notebook runs as a Slurm job on Spider.

## Step 1: build a Singularity container

ESMValTool is provided in the form of Docker containers. Like most HPC systems, Spider supports Singularity as the container technology, according to the [documentation of Spider](https://spiderdocs.readthedocs.io/en/latest/Pages/software_on_spider.html?highlight=singularity#singularity-containers). Note that although the documentation mentions that Spider does not provide an environment for building Singularity images, one actually can convert an existing Docker image to a Singularity image, e.g. by running this command on Spider:

```sh
# Step1: build sif image (this should be done once)
singularity build esmvaltool_stable.sif docker://esmvalgroup/esmvaltool:stable
```

This will download a Docker image from Docker Hub and build a Singularity Image File (.sif) named `esmvaltool_stable.sif` on the Spider file system, from DockerHub. Note that this may take ~20 minutes.


## Step 2: User configeration
One can run the following command to apply user configurations for the ESMValTool.

In [2]:
# Get user config file
!singularity run esmvaltool_stable.sif config get_config_user

2022-09-27 10:30:33,813 UTC [3531224] INFO    Creating folder /home/caroline-oku/.esmvaltool
2022-09-27 10:30:33,815 UTC [3531224] INFO    Copying file /opt/conda/envs/esmvaltool/lib/python3.10/site-packages/esmvalcore/config-user.yml to path /home/caroline-oku/.esmvaltool/config-user.yml.
2022-09-27 10:30:33,817 UTC [3531224] INFO    Copy finished.



## Step 3: Execute recipe

We will execute two recipes under the folder `recipes`. Both of them plots a map of global temperature in January 2000, and plot a time series of mean annual temperature from 1850 to 2000. The difference of the two recepies are the climate datasets they use. In practice, if the dataset is huge, sequetially executing the two recipies will not be efficient. Analogous situations would be the execution of a computationally expensive recipe over a long time period, which could be split, and/or over large spatial extent at high resolution.

In this example, we will demonstrate how to parallely executing the two recipies with a dask cluster.

To add a Dask cluster to this notebook, you can use the Dask JupyterLab extension (look for the Dask logo on the left tab of the JupyterLab interface):
- Click on the Dask logo;
- click the `Scale` button, set up the number of workers to 2;
- then click `<>` to add a code block.

Then a code cell will be added to this notebook. Please drop this cell below. By executing it, a Dask SLURMCluster with 2 workers will be created.

--ADD DASK SLURMCluster HERE--

In [20]:
# Set up the commands for execution
from pathlib import Path

# Get the absolute path for the sif image
sif_image = 'esmvaltool_stable.sif'

# Two recipes for two datasets
recipes = ['recipes/recipe_dataset1.yml', 
           'recipes/recipe_dataset2.yml'
          ]

# Set up shells commands
commands = [f"singularity run --pwd {Path.cwd()} {sif_image} run {recipe} --offline=False" for recipe in recipes]
commands

['singularity run /home/caroline-oku/demo_singularity/demo_singularity_esmvaltool/esmvaltool_stable.sif run /home/caroline-oku/demo_singularity/demo_singularity_esmvaltool/recipes/recipe_dataset1.yml --offline=False',
 'singularity run /home/caroline-oku/demo_singularity/demo_singularity_esmvaltool/esmvaltool_stable.sif run /home/caroline-oku/demo_singularity/demo_singularity_esmvaltool/recipes/recipe_dataset2.yml --offline=False']

One can submit the commands to the Dask clusteras follow:

In [4]:
import os
# Submit the commands
futures = client.map(os.system, commands)

In [15]:
futures

[<Future: finished, type: int, key: system-e879bb29e07bc8513966c1f2d5a62896>,
 <Future: finished, type: int, key: system-35008dcc096c451ca0fbfe5cb37433a3>]

Once finished, one can check the downloaded climate data files and the generated results:

In [18]:
# Check the retrived climate data
!tree -L 4 ~/climate_data/

[38;5;33m/home/caroline-oku/climate_data/[0m
├── [38;5;33mcmip5[0m
│   └── [38;5;33moutput1[0m
│       ├── [38;5;33mCCCma[0m
│       │   └── [38;5;33mCanESM2[0m
│       ├── [38;5;33mCNRM-CERFACS[0m
│       │   └── [38;5;33mCNRM-CM5[0m
│       └── [38;5;33mNSF-DOE-NCAR[0m
│           └── [38;5;33mCESM1-CAM5[0m
└── [38;5;33mCMIP6[0m
    └── [38;5;33mCMIP[0m
        └── [38;5;33mBCC[0m
            └── [38;5;33mBCC-ESM1[0m

12 directories, 0 files


In [19]:
# Check generated results
!tree -L 2 ~/esmvaltool_output/

[38;5;33m/home/caroline-oku/esmvaltool_output/[0m
├── [38;5;33mrecipe_dataset1_20220927_103054[0m
│   ├── index.html
│   ├── [38;5;33mplots[0m
│   ├── [38;5;33mrun[0m
│   └── [38;5;33mwork[0m
└── [38;5;33mrecipe_dataset2_20220927_103100[0m
    ├── index.html
    ├── [38;5;33mplots[0m
    ├── [38;5;33mrun[0m
    └── [38;5;33mwork[0m

8 directories, 2 files
