Skip to content

Not the slurm job dispatcher you need, but the one you deserve.

License

Notifications You must be signed in to change notification settings

althonos/smatrix

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

smatrix

Not the slurm job dispatcher you need, but the one you deserve.

License Source GitHub issues

Introduction

slurm is a workload manager, and it is typically used to parallelize work at a job level. It is heavily configurable, but can sometimes be quite overwhelming when the work to be performed is simple.

A typical usecase in our lab is to use slurm to process metagenomes in the EMBL cluster: we want to run a single command (like hmmsearch or gecco) on a very large number of files, and also possibly with different threshold values. Doing so efficiently requires writing a custom script that ends up being copied and pasted around. As a programmer, I found this unacceptable.

smatrix leverages the most common tasks of splitting the workload evenly, generating a job script with the parameters, and launching the jobs to the cluster. Think xargs, except it spawns slurm jobs instead of processes.

Usage

smatrix uses the same names as sbatch or srun for parameters if needed, and some additional flags to pass parameters. A quick example:

$ smatrix --cpus-per-task 2 -P:f1 0.02 0.01 -P:file /data/seq1.fa /data/seq2.fa   \
    --wrap 'hmmsearch --F1=$f1 Pfam.hmm $file'

This command will launch 4 jobs, using 2 CPUs per job (using the same option as with sbatch), for all possible combinations of $f1 and $file as given in the CLI arguments. --cpus-per-task is a builtin sbatch option, so it will be transparently given to SLURM when we queue the job. The other arguments however are being used by smatrix to setup the job array.

smatrix-hijacked options

--wrap flag

The --wrap CLI flag is used to pass the command to wrap in a script. It will get executed once for every element of the job matrix created with the parameters given to the CLI.

smatrix-specific options

-P / --param flag

The -P flag is the only new flag introduced by smatrix. Use it to specify parameter arrays

The format for the --param flag is designed to accommodate globing and sub-command calls in the shell:

$ smatrix --param:n $(seq 1 100) --param:file /etc/*.conf --wrap '...'

Note that, in this example, the glob pattern expansion is done by the shell and may have escaping issues if the filenames contain whitespace characters.

--wrap flag

--wrap was already there in sbatch, but smatrix wraps the command differently, since it will also expose the parameters you request with -P.

About

Not the slurm job dispatcher you need, but the one you deserve.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages