-
-
Notifications
You must be signed in to change notification settings - Fork 139
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Kick-off doc section about common work-arounds. (#430)
- Loading branch information
Showing
2 changed files
with
75 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,74 @@ | ||
Common work-arounds | ||
=================== | ||
|
||
The universe of HPC clusters is extremely diverse, with different job | ||
schedulers, different configuration, different decisions (security, usage, etc...) | ||
made by each HPC cluster. An unfortunate consequence of this is that this is | ||
impossible for Dask-Jobqueue to cover all possible tiny edge cases of some HPC | ||
clusters. | ||
|
||
This page is an attempt to document work-arounds that are likely to be useful | ||
on some clusters (strictly more than one ideally although hard to be sure ...). | ||
|
||
Skipping unrecognised line in submission script with ``header_skip`` | ||
-------------------------------------------------------------------- | ||
|
||
On some clusters the submission script generated by Dask-Jobqueue (you can look | ||
at it with ``print(cluster.job_script())``) may not work on your HPC cluster | ||
because on some configuration quirk of your HPC cluster. Probably there are | ||
some reasons behind this configuration quirk of course. | ||
|
||
You'll get an error when doing ``cluster.scale`` (i.e. where you actually | ||
submit some jobs) that will tell you your job scheduler is not happy with your | ||
job submission script (see examples below). The main parameter you can use to | ||
work-around this is ``header_skip``: | ||
|
||
.. code-block:: python | ||
# this will remove any line containing either '--mem' or | ||
# 'another-string' from the job submission script | ||
cluster = YourCluster( | ||
header_skip=['--mem', 'another-string'], | ||
**other_options_go_here) | ||
An example of this problem is very well detailed in this `blog post | ||
<https://blog.dask.org/2019/08/28/dask-on-summit#invalid-operations-in-the-job-script>`_ | ||
by Matthew Rocklin. In his case, the error was: | ||
|
||
.. code-block:: text | ||
Command: | ||
bsub /tmp/tmp4874eufw.sh | ||
stdout: | ||
Typical usage: | ||
bsub [LSF arguments] jobscript | ||
bsub [LSF arguments] -Is $SHELL | ||
bsub -h[elp] [options] | ||
bsub -V | ||
NOTES: | ||
* All jobs must specify a walltime (-W) and project id (-P) | ||
* Standard jobs must specify a node count (-nnodes) or -ln_slots. These jobs cannot specify a resource string (-R). | ||
* Expert mode jobs (-csm y) must specify a resource string and cannot specify -nnodes or -ln_slots. | ||
stderr: | ||
ERROR: Resource strings (-R) are not supported in easy mode. Please resubmit without a resource string. | ||
ERROR: -n is no longer supported. Please request nodes with -nnodes. | ||
ERROR: No nodes requested. Please request nodes with -nnodes. | ||
Another example of this issue is this github `issue | ||
<https://github.com/dask/dask-jobqueue/issues/238>`_ where ``--mem`` is not an | ||
accepted option on some SLURM clusters. The error was something like this: | ||
|
||
.. code-block:: text | ||
$sbatch submit_slurm.sh | ||
sbatch: error: Memory specification can not be satisfied | ||
sbatch: error: Batch job submission failed: Requested node configuration is not available | ||
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters