-
Notifications
You must be signed in to change notification settings - Fork 42
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Enable more complete configurability of jsrun launcher (#384)
* initial patch for more complete configuration of jsrun launcher * First pass at documenting usage of the lsf/jsrun launcher * Add corresponding maestro steps for each jsrun variant * Add one example of a memory hungry application * Build table mapping jsrun to maestro step keys * Add binding controls, rename keys from snake case, document defaults * Fix up straggling snake case keys * Improve debugging info in schema error messages * Fix rs_per_node, make gpu binding optional since it's new in lsf 10.1 * Update binding flag in examples, add note about gpu binding availability * Add initial lsfscriptadapter tests * Initial pass at general batch block documentation * Remove old commentary * Remove unneeded nodes/procs math in jsrun launcher substitution * Remove unneeded loggin output * Remove more debugging log outputs * Update lsf examples to match json schema for resource specification keys * Cleanup the cpus per rs machinery, schema * Add openmp and mpi lulesh study to exercise lsf resource specification keys * Document the sample lsf lulesh specification
- Loading branch information
Showing
9 changed files
with
594 additions
and
21 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,83 @@ | ||
LULESH Specification Breakdown | ||
=============================== | ||
|
||
Stub | ||
LULESH Specification Breakdown | ||
=============================== | ||
|
||
Stub | ||
|
||
Vanilla Unix/Linux Specification | ||
++++++++++++++++++++++++++++++++ | ||
|
||
Stub | ||
|
||
Parallel Specifications | ||
+++++++++++++++++++++++ | ||
|
||
The next three variants of the Lulesh example make changes to enable running the steps | ||
on HPC clusters using a variety of schedulers, as well as invoking the various parallel | ||
options in Lulesh itself (MPI, OpenMP, ...). | ||
|
||
Slurm Scheduled | ||
--------------- | ||
|
||
Stub | ||
|
||
Flux Scheduled | ||
-------------- | ||
|
||
Stub | ||
|
||
LSF Scheduled Parallel Specification | ||
------------------------------------ | ||
|
||
This example is configured for running on LLNL's IBM/nVidia HPC machines which use the | ||
LSF job scheduler to manage compute resources. Due to variations in system setups this | ||
may not work on all LSF installations and may require some tweaking to account for the | ||
differences. As with the other parallel specifications, adjust the machine name, bank, | ||
and queue's to suit the system you run this spec on. | ||
|
||
|
||
First change, the batch block to use the LSF scheduler to get appropriate batch submission | ||
and parallel command injection working: | ||
|
||
.. literalinclude:: ../../samples/lulesh/lulesh_sample1_unix_lsf.yaml | ||
:language: yaml | ||
:lines: 18-22 | ||
:emphasize-lines: 2 | ||
|
||
We'll briefly skip ahead to the parameters block as there's multiple different parameters | ||
used to specify all three modes of running. ``TASKS`` specifies the mpi tasks, while | ||
``CPUS_PER_TASK`` captures the openmp threads inside each task. These can be nested | ||
but here we are only showing the pure-mpi and pure-openmp modes of Lulesh: | ||
|
||
.. literalinclude:: ../../samples/lulesh/lulesh_sample1_unix_lsf.yaml | ||
:language: yaml | ||
:lines: 80-99 | ||
:emphasize-lines: 10-12, 14-16 | ||
|
||
|
||
There are a few differences in the first step to deal with the llnl system where we are | ||
using the lmod setup to select the mpi and compiler installs to use. In | ||
addition, we see the resource keys which have a few differences owing to the | ||
way LSF describes resources using resource sets (rs). In this step it's not very important | ||
yet as the compilation doesn't need to run in parallel. Note the depends is empty as | ||
this is the first step that needs to run and has no dependencies. | ||
|
||
.. literalinclude:: ../../samples/lulesh/lulesh_sample1_unix_lsf.yaml | ||
:language: yaml | ||
:lines: 25-48 | ||
:emphasize-lines: 5-11, 18-24 | ||
|
||
|
||
The run step gets a little more interesting now that we are mixing serial, mpi parallel, and | ||
openmp parallel modes in a single step. The biggest difference here is that some of the | ||
resource parameters get used in the step to set the environment variables needed to control | ||
the OpenMP thread counts using the ``$(CPUS_PER_TASK)`` Maestro parameter. | ||
|
||
.. literalinclude:: ../../samples/lulesh/lulesh_sample1_unix_lsf.yaml | ||
:language: yaml | ||
:lines: 51-77 | ||
:emphasize-lines: 16-18, 21-27 | ||
|
||
And finally, we have the complete specification here | ||
|
||
.. literalinclude:: ../../samples/lulesh/lulesh_sample1_unix_lsf.yaml | ||
:language: yaml |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,231 @@ | ||
Scheduling Studies (a.k.a. the Batch Block) | ||
=========================================== | ||
|
||
The batch block is an optional component of the workflow specification that enables | ||
job submission and management on remote clusters. The block contains a handful of | ||
keys for specifying system level information that's applicable to all scheduled | ||
steps: | ||
|
||
.. list-table:: Batch block keys | ||
|
||
* - Key | ||
- Description | ||
* - type | ||
- Select scheduler adapter to use. Currently supported are ``slurm``, ``lsf``, and | ||
``flux``, ``local``. Default is ``local``, non-scheduled job steps. | ||
* - host | ||
- Name of cluster being run on | ||
* - queue | ||
- Machine partition to schedule jobs to. '--partition' for slurm systems, '-q' for | ||
lsf systems, ... | ||
* - bank | ||
- Account which runs the job; this is used for computing job priority on the cluster. | ||
'--account' on slurm, '-G' on lsf, ... | ||
* - reservation | ||
- Name of any prereserved machine partition to submit jobs to | ||
* - qos | ||
- Optional quality of service options (slurm only) | ||
* - flux_uri | ||
- Flux specific reservation functionality | ||
|
||
The information in this block is used to populate the step specific batch scripts with the appropriate | ||
header comment blocks (e.g. '#SBATCH --partition' for slurm). Additional keys such as step specific | ||
resource requirements (number of nodes, cpus/tasks, gpus, ...) get added here when processing | ||
individual steps; see subsequent sections for scheduler specific details. Note that job steps will | ||
run locally unless at least the ``nodes`` or ``procs`` key in the step is populated. | ||
|
||
LOCAL | ||
***** | ||
|
||
Stub | ||
|
||
SLURM | ||
***** | ||
|
||
Stub | ||
|
||
FLUX | ||
**** | ||
|
||
Stub | ||
|
||
LSF: a Tale of Two Launchers | ||
**************************** | ||
|
||
The LSF scheduler has multiple options for the parallel launcher commands: | ||
|
||
* `lsrun <https://www.ibm.com/docs/en/spectrum-lsf/10.1.0?topic=jobs-run-interactive-tasks>`_ | ||
* `jsrun <https://www.ibm.com/docs/en/spectrum-lsf/10.1.0?topic=SSWRJV_10.1.0/jsm/jsrun.html>`_ | ||
|
||
Maestro currently supports only the jsrun version, which differs from slurm | ||
via a more flexible specification of resources available for each task. In | ||
addition to the `procs`, `cores per task`, and `gpu` keys, there are also | ||
`tasks_per_rs` and `rs_per_node`. `jsrun` describes things in terms of resource | ||
sets, with several keywords controlling these resource sets and mapping them to | ||
the actual machine/node allocations: | ||
|
||
.. list-table:: Mapping of LSF args to Maestro step keys | ||
|
||
* - LSF (jsrun) | ||
- Maestro | ||
- Description | ||
- Default setting | ||
* - -n, --nrs | ||
- procs | ||
- number of resource sets | ||
- 1 | ||
* - -a, --tasks_per_rs | ||
- tasks per rs | ||
- Number of MPI tasks (ranks) in a resource | ||
- 1 | ||
* - -c, --cpu_per_rs | ||
- cores per task | ||
- Number of physical CPU cores in a resource set | ||
- 1 | ||
* - -g, --gpu_per_rs | ||
- gpus | ||
- Number of GPU's per resource set | ||
- 0 | ||
* - -b, --bind | ||
- bind | ||
- Controls binding of tasks in a resource set | ||
- 'rs' | ||
* - -B, --bind_gpus | ||
- bind gpus | ||
- Controls binding of tasks to GPU's in a resource set | ||
- 'none' | ||
* - -r, --rs_per_host | ||
- rs per node | ||
- Number of resource sets per node | ||
- 1 | ||
|
||
|
||
.. note:: | ||
|
||
``bind_gpus`` is new in lsf 10.1 and may not be available on all systems | ||
|
||
Now for a few examples of how to map these to Maestro's resource specifications. | ||
Note the `node` key is not directly used for any of these, but is still used for | ||
the reservation itself. The rest of the keys serve to control the per task resources | ||
and then the per node packing of resource sets. Consider a few examples: | ||
|
||
* 1 resource set per gpu on a cluster with 4 gpus per node with an application requesting | ||
8 gpus. This will consume 2 full nodes of the cluster with 1 MPI rank associated with | ||
each gpu and having 1 cpu each. | ||
|
||
.. code-block:: bash | ||
jsrun -nrs 8 -a 1 -c 1 -g 1 -r 4 --bind rs my_awesome_gpu_application | ||
And the corresponding maestro step that generates it | ||
|
||
.. code-block:: yaml | ||
study: | ||
- name: run-my-app | ||
description: launch the best gpu application. | ||
run: | ||
cmd: | | ||
$(LAUNCHER) my_awesome_gpu_application | ||
procs: 8 | ||
nodes: 2 | ||
gpus: 1 | ||
rs per node: 4 | ||
tasks per rs: 1 | ||
cores per task: 1 | ||
Note that `procs` here maps more to the tasks/resource set concept in lsf/jsrun, and | ||
nodes is a multiplier on `rs_per_node` which yields the `nrs` jsrun key | ||
|
||
* 1 resource set per cpu, with no gpus, and using all 44 cpus on the node | ||
|
||
.. code-block:: bash | ||
jsrun -nrs 44 -a 1 -c 1 -g 0 -r 44 --bind rs my_awesome_mpi_cpu_application | ||
.. code-block:: yaml | ||
study: | ||
- name: run-my-app | ||
description: launch a pure mpi-cpu application. | ||
run: | ||
cmd: | | ||
$(LAUNCHER) my_awesome_mpi_cpu_application | ||
procs: 44 | ||
nodes: 1 | ||
gpus: 0 | ||
rs per node: 44 | ||
tasks per rs: 1 | ||
cores per task: 1 | ||
Again, note that `procs` is a multiple of `rs_per_node`. | ||
* Several multithreaded mpi ranks per node, with no gpus | ||
|
||
.. code-block:: bash | ||
jsrun -nrs 4 -a 1 -c 11 -g 0 -r 4 --bind rs my_awesome_omp_mpi_cpu_application | ||
.. code-block:: yaml | ||
study: | ||
- name: run-my-app | ||
description: launch an application using mpi and omp | ||
run: | ||
cmd: | | ||
$(LAUNCHER) my_awesome_omp_mpi_cpu_application | ||
procs: 4 | ||
nodes: 1 | ||
gpus: 0 | ||
rs per node: 4 | ||
tasks per rs: 1 | ||
cores per task: 11 | ||
* Several multithreaded mpi ranks per node with one gpu per rank, spanning multiple | ||
nodes having 4 gpu's each | ||
|
||
.. code-block:: bash | ||
jsrun -nrs 8 -a 1 -c 11 -g 1 -r 4 --bind rs my_awesome_all_the_threads_application | ||
.. code-block:: yaml | ||
study: | ||
- name: run-my-app | ||
description: Use all the threads! | ||
run: | ||
cmd: | | ||
$(LAUNCHER) my_awesome_all_the_threads_application | ||
procs: 8 | ||
nodes: 2 | ||
gpus: 1 | ||
rs per node: 4 | ||
tasks per rs: 1 | ||
cores per task: 11 | ||
* An mpi application that needs lots of memory per rank | ||
|
||
.. code-block:: bash | ||
jsrun -nrs 2 -a 1 -c 1 -g 0 -r 1 --bind rs my_memory_hungry_application | ||
.. code-block:: yaml | ||
study: | ||
- name: run-my-app | ||
description: Use all the memory for single task per node | ||
run: | ||
cmd: | | ||
$(LAUNCHER) my_memory_hungry_application | ||
procs: 2 | ||
nodes: 2 | ||
gpus: 0 | ||
rs per node: 1 | ||
tasks per rs: 1 | ||
cores per task: 1 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.