-
Notifications
You must be signed in to change notification settings - Fork 3
rrfs‐workflow code norm
-
Adopt the NCO standards.The following rules are mostly based on the NCO standards.
-
Think big. The workflow will include complicated DA logics, different domains/resolutions (such as CONUS vs NA, 3km vs 12km, etc), spinup and prod cycles, ensemble components, air quality components, RTMA applications, etc.
-
The core of the workflow will only consider the NCO naming convention for all existing operational products (such as gfs grib2 files, etc). However, the workflow will provide link scripts to use hard or soft links to convert users' specific naming conventions to match the NCO standard.
-
Reduce the Python library dependencies as much as possible for the workflow (excluding the scripts/ or ush/ part)
-
It is recommended to use only BASH or Python for scripting.
Every script file (bash or python) should be set as executable and can run from the command line directly as./myscript.shor./myscript.py
so add the following shebang to the first line of the script files:
#!/usr/bin/env bash
the above is preferred, but#!/bin/bashis also acceptable.
or
#!/usr/bin/env python -
One can source other bash files to set relevant environment variables at the beginning of a J-job or ex-script
-
Put these four lines in a J-job script at the beginning:
set -x
export PS4='+ $SECONDS + '
date
This is to return the number of seconds since the shell was started when running in debugging mode
-
Use
sourceinstead of a dot for better readability. (There is no difference between source and dot in BASH) -
The ending of a J-job should be like this:
#
#----------------------------------------
# Execute the script.
#----------------------------------------
export pgmout="${DATA}/OUTPUT.${task_id}"
$SCRIPTSrrfs/exrrfs_${task_id}.sh
export err=$?; err_chk
if [ -e "$pgmout" ]; then
cat $pgmout
fi
#
#----------------------------------------
# Remove the Temporary working directory
#----------------------------------------
cd ${DATAROOT}
KEEPDATA_da=${KEEPDATA_da:-${KEEPDATA}}
[[ "${KEEPDATA_da}" == "NO" ]] && rm -rf ${DATA}
#
date
echo "JOB ${jobid:-} HAS COMPLETED NORMALLY!"
exit 0
-
It is preferred to use
[[instead of[.[[is bash's extension to the[command. It has several enhancements that make it a better choice if you write scripts that target bash. -
Use
==instead of=for string comparison. Double-quote strings to be compared. For example:if [[ "${begin}" == "YES" ]]; then -
Enfore base-10 arithmetic operations to avoid unexpected errors when dealing with such as numbers "03" or "003":
if (( 10#${ENSMEM:-0} > 0 )); then -
use ${cpreq} to copy files/directories that are required for a job to function. In most situations, soft links work better for community users, so the following line is added in jobs/rocoto/launch.sh to tweak the cpreq command form community users:
export cpreq="ln -snf" #use soft link instead of copy for non-NCO experiments -
All files under jobs/rocoto will NOT go into the NCO operation. It is used to do some tweaks, mimic ecflow job cards, and then provide flexibility for community users.
-
The following environmental variables should always be available for any tasks per the NCO standard:
HOMErrfs, EXPDIR, CDATE, PDY, cyc, COMROOT, DATAROOT, VERSION, MACHINE, NET, RUN, TAG
Examples for CDATE, PDY, cyc: (NOTE:cycis an exception and all in lower cases)
CDATE=2024052703
PDY=20240527
cyc=03
-
All tasks have an input and output. Input data should be under a directory defined by a
COMINvariable. For example,COMINgfsprovides IC/LBC grib2 files
Output data should be under a directory defined by aCOMOUTvariable. -
The working directory should be defined by the ${DATA} variable. In NCO, a working directory will be removed immediately after a job is completed successfully. Community users can set
KEEPDATA=YESto keep working directories.
In the rrfs-workflow, users can further choose to keep data for individual tasks by setting such asKEEPDATA_da=YES. -
It is preferred to use all uppercases to lead the names of exported environmental variables in config.* files and scripts (such as
LBC_OFFSET_HRS,COMINgfs, etc). But there are a few exceptions mostly due to NCO practices (such ascyc). A variable in all lower cases is normally assumed not to go to sub-shells. -
Use
-sinstead of-fto check if a file exists and is not size zero. -
Catch and handle return code on all cases (Run executable, wgrib2, python, ush, script, utility...).
-
Correctly label output information as
INFO,WARNINGorFATAL ERROR. -
Use
${NDATE}to find previous or future cycles/dates, only usedateto output a format string -
A workflow calls a J-job, a J-job calls an ex-script, and an ex-script call scripts/executables under ush/exec respectively.
-
Use nouns for task names, avoid verbs as much as possible.
-
rrfs-workflow adopts a config cascade and an environmental variable cascade so that one can optionally fine-tune settings for individual tasks.
For example: To get the walltime setting for a spinup forecast job, the workflow check the following variables in order:
WALLTIME_FCST_SPINUP, WALLTIME_FCST, WALLTIME
until a variable is defined. -
rrfs-workflow uses the powerful but at the same time intuitive python language to generate the rocoto workflow (and the ecflow, cylc workflow).
-
Use 2 spaces for indentation in Python and BASH scripts. Avoid using a TAB or 4 spaces.
-
In bash, double quotes and single quotes function differently while in Python there are no differences.
-
In config files, don't forget to add "export" for any variables required by the workflow.
-
To be safe, put a space before and after any operators in BASH:
FHRin=$((10#${FHR}+10#${offset}) # This is wrong, but may not be easy to debug
FHRin=$(( 10#${FHR} + 10#${offset} )) #If we put spaces before and after operators, it will help reduce bugs
if [[ "${TYPE}"=="IC" ]] || [[ "${TYPE}"=="ic" ]]; then # Similarly, this is wrong
if [[ "${TYPE}" == "IC" ]] || [[ "${TYPE}" == "ic" ]]; then # this is correct
-
export WALLTIME_UPP=${WALLTIME_UPP:"00:50:00"}# this is wrong. Be sure to have:-instead of:only -
In scripts or config files, except for a few exceptions, a variable whose name starts with upper cases is assumed to be exported to subshell while all lower cases mean a temporary variable that is only visible to the the script defining it.
-
Use ${var} to reference a variable. This is more robust and avoids situations where $var may cause problems.