This exercise will introduce you to ESMValTool, a tool to evaluate Earth System Models (ESMs) against observations like those available through the Earth System Grid Federation (ESGF).
In this document, we will use the Virtual Desktop Infrastructure (VDI). We also provide an adjusted Jupyter version for advanced users of ESMValTool.
Running ESMValTool on Gadi is supported by ACCESS-NRI with further information on this ACCESS-Hive page to supplement the official ESMValTool documentation.
To run this exercise, you need to be a member of the following NCI projects:
nf33, xp65, al33, rr3, r87
Go to the Australian Research Environment website and login with your NCI username and password. If you don't have an NCI account, you can sign up for one at the NCI website.
Click on Virtual Desktop
under Featured Apps to configure a new VDI instance. This option is also available under the All Apps section at the bottom of the page and the Interactive Apps dropdown located in the top menu.
You will now be presented with the main VDI instance configuration form. Please complete only the fields below - leave all other fields blank or to their default values.
- 3.1 Walltime: The number of hours the VDI instance will run.
1
hour is sufficient for each of the tutorials.
- 3.2 Compute Size: Select
small(2 cpus, 9 mem)
from the dropdown menu.
- 3.3 Project: Please enter
nf33
. This will allocate SU usage to the workshop project.
- 3.4 Storage: This is the list of
/g/data/
project data storage locations required to complete the workshop tutorials. In ARE, storage locations need to be explicitly defined to access these data from within a VDI instance. Please enter the following string:
gdata/nf33+gdata/xp65
-
3.5 Click
Advanced options ...
-
3.6 PBS Flags The xp65 conda environment is a containerised environment that requires the
SINGULARITY_OVERLAYIMAGE
environment variable to be defined. Copy and paste the following:
-v SINGULARITY_OVERLAYIMAGE=/g/data/xp65/public/apps/med_conda/envs/access-med-0.3.sqsh
in the PBS Flags field of the advanced options section:
Once the VDI instance has started (this usually takes around 30 seconds) and this status window should update and look something like the following, reporting that the instance has started and the time remaining. More detailed information on the instance can be accessed by clicking the Session ID link.
All that remains to get started is to click Launch VDI Desktop
.
- click on the control bar in the center left of the VDI window
- click on the clipboard: you can copy text from your local machine into this with the usual shortkeys
- right-click and click Paste to paste the content in VDI
Start a terminal in the VDI session.
Then open a terminal, change the directory to your directory in this training section
cd /scratch/nf33/$USER
In this directory, we need you to clone the whole repo from GitHub with the command below (if you already have this repo in your directory, you can jump to STEP 7):
git clone https://github.com/ACCESS-NRI/workshop-training-2023.git
Then you are all set to start the exercises.
In the terminal, prompt:
cd /scratch/nf33/$USER/workshop-training-2023/esmvaltool
module use /g/data/xp65/public/modules
module load conda/access-med
esmvaltool --help
Prompting this help command should produce the following output:
In the next step, we want to have a look at the esmvaltool configuration file that we will use in this tutorial. You can use a text editor of your choice. In this tutorial, we will simply print the content via more
:
more config-user-on-gadi-v2.9.yml
This file contains the information for:
- Output settings
- Destination directory
- Download and auxiliary data directories
- Number of tasks that can be run in parallel
- Rootpath to input data
- Directory structure for the data from different projects
KEY POINTS
- The
config-user-on-gadi-v2.9.yml
tells ESMValTool where to find input data. output_dir
defines the destination directory.rootpath
defines the root path of the data.drs
defines the directory structure of the data.
The configuration file starts with output settings that inform ESMValTool about your preference for output. You can turn on or off the setting by true or false values. Most of these settings are fairly self-explanatory.
The destination directory is the rootpath where ESMValTool will store its output folders containing e.g. figures, data, logs, etc. With every run, ESMValTool automatically generates a new output folder determined by recipe name, and date and time using the format: YYYYMMDD_HHMMSS.
# Destination directory where all output will be written
# Includes log files and performance stats.
output_dir: esmvaltool_output
ESMValTool uses several categories (in ESMValTool, this is referred to as projects) for input data based on their source. The current categories in the configuration file are mentioned below. For example, CMIP is used for a dataset from the Climate Model Intercomparison Project whereas OBS may be used for an observational dataset. More information about the projects used in ESMValTool is available in the official ESMValTool documentation. When using ESMValTool on your own machine, you can create a directory to download climate model data or observation data sets and let the tool use data from there. It is also possible to ask ESMValTool to download climate model data as needed. This can be done by specifying a download directory and by setting the option to download data as shown below.
# Directory for storing downloaded climate data and find auxiliary data
download_dir: esmvaltool_climate_data
auxiliary_data_dir: /g/data/xp65/public/apps/cartopy-data
search_esgf: never
If you are working offline or do not want to download the data then set the option above to never
. If you want to download data only when the necessary files are missing at the usual location, you can set the option to when_missing
. In particular, cartopy
will be needed as auxiliary data for several plots. We provide them through xp65
as shown above.
The rootpath
specifies the directories where ESMValTool will look for input data. For each category, you can define either one path or several paths as a list. For example:
# Rootpaths to the data from different projects
# This default setting will work if files have been downloaded by the
# ESMValTool via ``offline=False``. Lists are also possible. For site-specific
# entries and more examples, see below. Comment out these when using a
# site-specific path.
rootpath:
default: esmvaltool_climate_data
CMIP5: [/g/data/r87/DRSv3/CMIP5, /g/data/al33/replicas/CMIP5/combined, /g/data/rr3/publications/CMIP5/output1]
native6: /g/data/nf33/public/data/ESMValTool/obsdata
Input data can be from various models, observations and reanalysis data that adhere to the CF/CMOR standard.
The drs
setting describes the file structure for several projects (e.g. CMIP6, CMIP5, obs4mips, OBS6, OBS) on several key machines (e.g. BADC, CP4CDS, DKRZ, ETHZ, SMHI, BSC, NCI). For more information about drs
, you can visit the ESMValTool documentation on Data types and the Data Reference Syntax (DRS).
# Directory structure for input data --- [default]/ESGF/BADC/DKRZ/ETHZ/etc.
# This default setting will work if files have been downloaded by the
# ESMValTool via ``offline=False``. See ``config-developer.yml`` for
# definitions. Comment out/replace as per needed.
drs:
CMIP5: BADC
To see all the recipes that are shipped with ESMValTool, type
esmvaltool recipes list
For this tutorial, we will choose recipe_climwip_test_basic.yml
as an example recipe. ACCESS-NRI is working on supporting all the above recipes on NCI. You can check the current support status here.
Use the following command to copy the recipe to your working directory
esmvaltool recipes get recipe_climwip_test_basic.yml
Now you should see the recipe file in your working directory (type ls
to verify). Use your text editor to open this file or display the contents via more
:
more recipe_climwip_test_basic.yml
Have a look at the recipe structure:
- Documentation with relevant (citation) information
- Datasets that should be analysed
- Preprocessors groups of common preprocessing steps
- Diagnostics scripts performing more specific evaluation steps
Projections of future climate change are often based on multi-model ensembles of global climate models such as CMIP6. To condense the information from these models they are often combined into probabilistic estimates such as mean and a related uncertainty range (such as the standard deviation). However, not all models in a given multi-model ensemble are always equally fit for purpose and it can make sense to weight models based on their ability to simulate observed quantities related to the target. In addition, multi-model ensembles, such as CMIP can contain several models based on a very similar code base (sharing of components, only differences in resolution, etc.) leading to complex inter-dependencies between the models. Adjusting for this by weighting models according to their independence helps to adjust for this.
This recipe implements the Climate model Weighting by Independence and Performance
(ClimWIP) method. It is based on work by Knutti et al. (2017),
Lorenz et al. (2018),
Brunner et al. (2019),
Merrifield et al. (2020),
Brunner et al. (2020). Weights are
calculated based on historical model performance in several metrics (which can be
defined by the performance_contributions
parameter) as well as by their independence
to all the other models in the ensemble based on their output fields in several metrics
(which can be defined by the independence_contributions
parameter). These weights
can be used in subsequent evaluation scripts (some of which are implemented as part of
this diagnostic).
Note: This recipe is still being developed! A more comprehensive (yet older) implementation can be found on GitHub: https://github.com/lukasbrunner/ClimWIP
Because of the computational costs, we will submit a job to Gadi through the Portable Batch System. To do so, you need to use a submission script, for example the one that we already provide. Open the launch_recipe_climwip_test_basic.pbs
file:
#!/bin/bash -l
# For help with PBS directives on Gadi, go to https://opus.nci.org.au/display/Help/PBS+Directives+Explained
#PBS -S /bin/bash
#PBS -P nf33
#PBS -l storage=gdata/rr3+gdata/xp65+gdata/al33+gdata/nf33+scratch/nf33
#PBS -N recipe_climwip_test_basic
#PBS -l wd
#PBS -q normal
#PBS -l walltime=01:00:00
#PBS -l mem=64GB
#PBS -l ncpus=10
module use /g/data/xp65/public/modules
module load conda/access-med
esmvaltool run --config_file config-user-on-gadi-v2.9.yml recipe_climwip_test_basic.yml
Submit the job to the queue system:
qsub launch_recipe_climwip_test_basic.pbs
To monitor the progress, you can use the status prompt for the job ID
qstat
Once the job is finished, you can open the log message (recipe_climwip_test_basic.o*
) and check a few things:
After the banner and general information, the output starts with some important locations.
- Did ESMValTool use the right config file?
- What is the path to the example recipe?
- What is the main output folder generated by ESMValTool?
- Can you guess what the different output directories are for?
- ESMValTool creates two log files. What is the difference?
Open a new terminal (top left of the VDI screen) and navigate to the esmvaltool_output
directory, them use the commmand below to start a local HTTP server.
cd /scratch/nf33/$USER/workshop-training-2023/esmvaltool/esmvaltool_output
python3 -m http.server
You can then start Firefox in the VDI screen and access the following localhost address to navigate into your specific recipe*
directory and its index.html
:
http://0.0.0.0:8000/
From there you can navigate to through the different directories to show the different evaluation plots:
- Close the browser window
- Close the
http
server by promptingctrl+C
in the terminal, then promptexit
to close the terminal - In the menu bar (top left), click on
System
and thenLog Out
and close the browser tab or delete the session in My Interactive Sessions of the ARE