Not the slurm job dispatcher you need, but the one you deserve.
slurm
is a workload manager, and it is typically used to parallelize work
at a job level. It is heavily configurable, but can sometimes be quite
overwhelming when the work to be performed is simple.
A typical usecase in our lab is to use slurm
to process metagenomes in the
EMBL cluster: we want to run a single command (like hmmsearch
or gecco
)
on a very large number of files, and also possibly with different threshold
values. Doing so efficiently requires writing a custom script that ends up
being copied and pasted around. As a programmer, I found this unacceptable.
smatrix
leverages the most common tasks of splitting the workload evenly,
generating a job script with the parameters, and launching the jobs to the
cluster. Think xargs
, except it spawns slurm
jobs instead of processes.
smatrix
uses the same names as sbatch
or srun
for parameters if needed,
and some additional flags to pass parameters. A quick example:
$ smatrix --cpus-per-task 2 -P:f1 0.02 0.01 -P:file /data/seq1.fa /data/seq2.fa \
--wrap 'hmmsearch --F1=$f1 Pfam.hmm $file'
This command will launch 4 jobs, using 2 CPUs per job (using the same option
as with sbatch
), for all possible combinations of $f1
and $file
as given
in the CLI arguments. --cpus-per-task
is a builtin sbatch
option, so it
will be transparently given to SLURM when we queue the job.
The other arguments however are being used by smatrix
to setup the job array.
The --wrap
CLI flag is used to pass the command to wrap in a script. It will
get executed once for every element of the job matrix created with the parameters
given to the CLI.
The -P
flag is the only new flag introduced by smatrix
. Use it to specify
parameter arrays
The format for the --param
flag is designed to accommodate globing and
sub-command calls in the shell:
$ smatrix --param:n $(seq 1 100) --param:file /etc/*.conf --wrap '...'
Note that, in this example, the glob pattern expansion is done by the shell and may have escaping issues if the filenames contain whitespace characters.
--wrap
was already there in sbatch
, but smatrix
wraps the command
differently, since it will also expose the parameters you request with -P
.