-
Notifications
You must be signed in to change notification settings - Fork 3
coordinating the operation part and the develop part
This is intended to collect possible workflow differences between the operation part and the develop part so that we may have a better picture on adding the MPAS+JEDI components and making them NCO-compliant and also flexible to run on the develop machines (WCOSS2, Hera, Jet, Orion, Hercules).
To clarify, the operation part refers to all directories/files excluding the rocoto/ directory.
The develop part refers to the rocoto/ directory which helps generate develop experiments and associated rocoto workflow XML files.
operation has dedicated computing resources so jobs will run without waiting.
operation will purge job run directories immediately after completion.
operation will only run one fix configuration on one machine
operation will use the ecflow workflow management software.
In operation , the archive, graphics will run outside of the rrfs-workflow.
In develop,
jobs usually wait a certain amount of time to run,
need to keep job run directories for a certain amount of time to facilitate debugging,
no data purge on disks on Hera/Jet/Orion/Hercules
in-workflow clean and archive tasks will be needed to help clean up disk spaces and archive develop experiments.
develop will run different configurations (such as conus 3km, conus 12km, atlantic 12km, atlantic 4km, North American 3km, etc) on different computer platforms (such as Hera/Jet/Orion/Hercules/WCOSS2, etc).
develop will use the rocoto workflow management software.
To accommodate those differences, the following measures are considered:
- side loading for non-NCO tasks, such as clean, archive, graphics. They don't need J-jobs/ex-scripts and will be put under
rocoto/sideload. - As the rocoto workflow management software does not provide some job card variables as ecflow, a
rocoto/sideload/launch.shscript is added to mimic the ecflow behavior and provide a switch routing a task to either J-jobs or non-NCO tasks - use
${cpreq}to copy files/directories that are required for a job to function. In most situations, soft links work better fordeveop, so the following line is added inrocoto/sideload/launch.shto tweak thecpreqcommand fordevelop:export cpreq="ln -snf" #use a soft link instead of copy for develop experiments - Use links to manage fix files (more detailed thoughts here). In NCO implementation, coping similarly as this
cp -rpL fix fix2; rm -rf fix; mv fix2 fixwill make a hard copy of fix files needed foroperation - In order to separate concerns and only export required environmental variables for a task at runtime, a cascade config structure will be adopted. Resource configuration (such as
ACCOUNT, QUEUE, PARTITION, RESERVATION, NODES, WALLTIME, NATIVE, MEMORYetc) are only needed in the experiment setup process and will be separated from the runtime configuration. More detailed thoughts here. - A
rocoto/expdirectory will containexp.setupor similar files which can be used to set up top-level options for an experiment, such as directories,NET, VERSION, TAG, days if it is a realtime run or retro period if it is a retro run. Users can also preempt some environmental variables here. These files are to facilitate quickly setting up a develop experiment (retro or realtime, different machines and different grids/resolutions). These files are not needed inoperation - The core of the workflow will only consider the NCO naming convention for all existing operational products (such as gfs grib2 files, etc). However, the workflow will provide example link utilities to use hard or soft links to convert users' specific naming conventions to match the NCO standard.