Skip to content

Getting started

Andrew Kiss edited this page Feb 25, 2020 · 39 revisions

This page will help you to get to the point of running some basic test cases using ACCESS-OM2. It is divided into several sections:

  1. Quick start for Raijin users
  2. Downloading source code
  3. Building the models
  4. Running a test case
  5. Checking model output

Quick start

If you are using the Raijin supercomputer at NCI then you can use pre-built executables and the standard inputs.

The following steps will run the 1 degree JRA55 RYF experiment.

First download the experiment configuration:

cd /short/${PROJECT}/${USER}/
mkdir -p access-om2/control
cd access-om2/control
git clone https://github.com/COSIMA/1deg_jra55_ryf.git
cd 1deg_jra55_ryf

Edit the shortpath line in the config.yaml to reflect your ${PROJECT}. Then load the conda environment containing payu and run the model:

module use /g/data/hh5/public/modules
module load conda/analysis3
payu run

This uses the payu workflow management tool to prepare the run and submit it as a job to the PBS job queue. See the Raijin User Guide to learn about job management.

You can now skip to the end of this page: Checking model output.

Standard test cases

In addition to the 1 degree JRA55 RYF experiment used above, there are a number of other standard experiments which can be run by following the Quick start instructions. To run these replace:

git clone https://github.com/COSIMA/1deg_jra55_ryf.git
cd 1deg_jra55_ryf

With any of the following repositories (included in access-om2/control):

Downloading ACCESS-OM2 and understanding the repository layout

As the previous section showed, on Raijin only the experiment configuration needs to be downloaded to use ACCESS-OM2. This is because, by default, the configurations reference pre-existing executables and inputs. However, if you wish to dig into the details of the model, modify input or code, or set it up on a different machine, then it is necessary to download and familiarise yourself with the entire ACCESS-OM2 repository.

Downloading the ACCESS-OM2 repository

First, to download the entire ACCESS-OM2 repository, including all of the configurations and source code:

cd /short/${PROJECT}/${USER}
git clone --recursive https://github.com/COSIMA/access-om2.git

Then, to simplify future instructions we give this directory a short name with:

cd access-om2
export ACCESS_OM2_DIR=$(pwd)

If you already have an existing download and would like to update to the latest version see the tutorial on updating an experiment.

The repository layout

The project is arranged into a collection of repositories that are linked together using git submodules. The access-om2 repository is the parent, with other repositories embedded within it.

cd $ACCESS_OM2_DIR
tree -L 2 -d
.
├── control
│   ├── 01deg_jra55_iaf
│   ├── 01deg_jra55_ryf
│   ├── 025deg_jra55_iaf
│   ├── 025deg_jra55_ryf
│   ├── 1deg_core_nyf
│   ├── 1deg_jra55_iaf
│   └── 1deg_jra55_ryf
├── src
│   ├── cice5
│   ├── libaccessom2
│   └── mom
├── test
│   └── checksums
└── tools
    ├── contrib
        └── esmgrids

The control directory contains a number of submodules, one for each standard configuration. These repositories that can be downloaded independently and used to run experiments on Raijin. The src directory contains a submodule for each of the dynamical models. The libaccessom2 repository contains the file-based atmosphere YATM, the OASIS coupler and other code needed to bolt everything together.

Working with git submodules can be tricky, fortunately there is a lot of good documentation out there, for example here and here.

If this repository layout is particularly dis(agreeable) to you for any reason, please feel free to add a comment (as others have done) on a related issue, such as this one.

Building the models

The easiest is simply

cd $ACCESS_OM2_DIR
./install.sh

which should will build executables for all models at all resolutions. The executables will be placed in the $ACCESS_OM2_DIR/bin/ directory.

Running a test case

Each of the model configurations is run by payu from within its respective directory in $ACCESS_OM2_DIR/control/. The config.yaml within each of these subdirectories gives the PBS specification for the job, including executable names.

You will need to edit config.yaml to set the project: and shortpath: settings appropriately. If you have built your own executabls or wish to use your own input then it will also be necessary to modify the exe: and input: settings.

For example, to run the 1 degree JRA-55 RYF experiment:

export EXP_DIR=$ACCESS_OM2_DIR/control/1deg_jra55_ryf/
cd $EXP_DIR
module use /g/data/hh5/public/modules
module load conda/analysis3
payu run

or to do N runs:

payu run -n N

On NCI, status of submitted runs can be checked with qstat -u ${USER}.

Checking model output

If the run is successful output is stored in $EXP_DIR/archive. The payu documentation provides and explanation of where run output is stored.

If the run is unsuccessful you can find output in $EXP_DIR/access-om2.err and any output from the run will be in $EXP_DIR/work.

Todo

Please see this issue: https://github.com/COSIMA/access-om2/issues/10 for further information that may need to be integrated into this page.

Running on gadi

This is a work in progress. Instructions here are (very) incomplete!

General gadi transition info is here: https://opus.nci.org.au/display/Help/Preparing+for+Gadi

The changes most relevant to ACCESS-OM2 are:

  1. /short will not exist on gadi. The replacement /scratch is time-limited, so is not suitable for storing model inputs or executables. We have therefore moved inputs from /short/public/access-om2/ to /g/data/ik11/inputs/access-om2/. To access this you will need to be a member of project ik11 (apply via mancini). You'll also need to be a member of ua8 for RYF or qv56 for IAF.
  2. The openMPI and NetCDF library versions we used on raijin will not be available on gadi. We've upgraded them in new executables but your old raijin executables will not run on gadi.
  3. The intel compilers we used on raijin will not be available on gadi.
  4. There are 48 CPUs per node on the normal queue on gadi (compared to 16 on raijin), so if using the normal queue you will need this at the end of config.yaml:
platform:
    nodesize: 48

Your options

Starting a new experiment

If you want to start a brand-new experiment, we recommend you use the latest executables and configurations, which will fix the above issues (and more). We haven't (yet) released a version of ACCESS-OM2 suitable for gadi, but we're working on it. There are test configurations available on the gadi-transition and ak-dev branches of the JRA-do-forced IAF and RYF configurations at all three resolutions in the individual config repos. gadi-transition is close to the old raijin configurations so is more suitable for continuation of existing runs. ak-dev includes many improvements an bug fixes (see the incomplete summary in the merge ak-dev branch pull request, e.g. https://github.com/COSIMA/025deg_jra55_iaf/pull/4).

Not all configurations have been tested, and those that have been run have not had their output carefully checked.

The IAF configs use JRA55-do from qv56 which is a slightly newer version (1.3.1) from that on ua8 used in previous runs and RYF (with small differences in near-surface temperature and humidity only). You could revert to the ua8 version for continuation runs by undoing these changes to atmosphere/forcing.json https://github.com/COSIMA/025deg_jra55_iaf/commit/d19b5c86e0600f1b8d52cc35cb7dba206d53f15b#diff-cd3ba4f3a1c8dd37efc78631f27566b3 and undoing the atmosphere input changes in config.yaml: https://github.com/COSIMA/025deg_jra55_iaf/commit/d19b5c86e0600f1b8d52cc35cb7dba206d53f15b#diff-259fe82e12a866f01123927480c7851b and also replacing /g/data1/ with /g/data/ in these files.

To try out a config, do this on gadi:

git clone https://github.com/COSIMA/1deg_jra55_iaf.git my-test-run
cd my-test-run
git checkout gadi-transition
git checkout -b my-test-run

then edit accessom2.nml, sync_output_to_gdata.sh, config.yaml and run with

module use /g/data/hh5/public/modules
module load conda/analysis3-unstable
payu setup
git commit -am "my test run"
payu sweep
payu run

with "my-test-run" being whatever name you want. You can replace 1deg_jra55_iaf above with any of these:

1deg_jra55_ryf
025deg_jra55_iaf
025deg_jra55_ryf
01deg_jra55_iaf
01deg_jra55_ryf

If you really want to live on the bleeding edge, replace gadi-transition with ak-dev above.

Continuing an experiment from raijin

This is where it gets tricky. Your executables from raijin will not run on gadi due to library changes. If you want to continue a run as close as possible to your raijin experiment you'll need to recompile the versions you used so they work on gadi. Note that due to compiler changes this will not give bit-for-bit reproducibility of your previous raijin runs.

It's a bit involved:

  1. Determine the versions (i.e. git hashes) of the executables used for your raijin run. Look in the exe fields in config.yaml. These contain the git hash (at the end for yatm and before _libaccessom2 for mom and cice), e.g. in bold (yours will probably differ):

    exe: /short/public/access-om2/bin/yatm_b6caeab.exe

    exe: /short/public/access-om2/bin/fms_ACCESS-OM_50dc61e_libaccessom2_b6caeab.x

    exe: /short/public/access-om2/bin/cice_auscom_360x300_24p_47650cc_libaccessom2_b6caeab.exe

Before embarking on the steps below, first check whether the changes between your versions and those on gadi-transition are significant enough to warrant recompiling the code. If not, just use the gadi-transition branch as above.

  1. Download and compile access-om2 on gadi:
git clone --recursive https://github.com/COSIMA/access-om2.git
cd access-om2
git checkout gadi-transition
./install.sh
  1. Check out the versions of the code you used on raijin, using the hashes you found in step 1 above (yours will probably differ):
cd src/libaccessom2
git checkout b6caeab
git checkout -b "recompiling b6caeab for gadi"
cd ../mom
git checkout 50dc61e
git checkout -b "recompiling 50dc61e for gadi"
cd ../cice5
git checkout 47650cc
git checkout -b "recompiling 47650cc for gadi"
  1. Make the necessary compiler and library changes in libaccessom2, mom and cice:
  1. Commit these changes: cd src; for d in *; do cd $d; git commit -am "update compiler and libraries for gadi"; cd -; done

  2. Cross fingers and run ./install again. If it works you'll have new executables in bin. Their names will include hashes that differ from your raijin run but match those of the commits at step 5.

  3. Put the executables somewhere permanent, which is visible to the gadi compute nodes (e.g. your home directory).

  4. Update config.yaml to point to these executables and use /g/data/ik11/inputs/ rather than /short/public/, e.g. https://github.com/COSIMA/1deg_jra55_ryf/compare/gadi-transition#diff-259fe82e12a866f01123927480c7851b You'll also need to set PBS flags in config.yaml and sync_output_to_gdata.sh, and also define min_thickness = 1.0 in the &ocean_topog_nml group in ocean/input.nml if it wasn't already defined there (more info here).

Don't update atmosphere/forcing.json (there are small differences between the JRA55-do v1.3 in /g/data/ua8 and v1.3.1 in /g/data/qv56/replicas/input4MIPs/CMIP6/OMIP/MRI/MRI-JRA55-do-1-3).

  1. Load a version of payu that works on gadi and do payu setup
module use /g/data/hh5/public/modules
module load conda/analysis3-unstable
payu setup
  1. If this works, do payu sweep and then try a (short!) run: payu run

Further notes:

  • payu v1.0.6 and later works on raijin and gadi. For the former the default location for laboratories is /short/$PROJECT/$USER and for gadi it is /scratch/$PROJECT/$USER. payu v1.0.6 is available via the conda/analysis3-unstable environment. More info: http://climate-cms.wikis.unsw.edu.au/Payuongadi
module use /g/data/hh5/public/modules
module load conda/analysis3-unstable
git clone --recursive https://github.com/COSIMA/access-om2.git
cd access-om2
git checkout gadi-transition
for d in src/*; do cd $d; git checkout gadi-transition; cd ../..; done
./install.sh

your executables will be in ./bin.

You can’t perform that action at this time.