Simple Slurm

A simple Python wrapper for Slurm with flexibility in mind

import datetime

from simple_slurm import Slurm

slurm = Slurm(
    array=range(3, 12),
    cpus_per_task=15,
    dependency=dict(after=65541, afterok=34987),
    gres=['gpu:kepler:2', 'gpu:tesla:2', 'mps:400'],
    ignore_pbs=True,
    job_name='name',
    output=f'{Slurm.JOB_ARRAY_MASTER_ID}_{Slurm.JOB_ARRAY_ID}.out',
    time=datetime.timedelta(days=1, hours=2, minutes=3, seconds=4),
)
slurm.add_cmd('module load python')
slurm.sbatch('python demo.py', Slurm.SLURM_ARRAY_TASK_ID)

The above snippet is equivalent to running the following command:

sbatch << EOF
#!/bin/sh

#SBATCH --array               3-11
#SBATCH --cpus-per-task       15
#SBATCH --dependency          after:65541,afterok:34987
#SBATCH --gres                gpu:kepler:2,gpu:tesla:2,mps:400
#SBATCH --ignore-pbs
#SBATCH --job-name            name
#SBATCH --output              %A_%a.out
#SBATCH --time                1-02:03:04

module load python
python demo.py \$SLURM_ARRAY_TASK_ID

EOF

Get it using either one of :

pip install simple_slurm
conda install -c conda-forge simple_slurm

Introduction

The sbatch and srun commands in Slurm allow submitting parallel jobs into a Linux cluster in the form of batch scripts that follow a certain structure.

The goal of this library is to provide a simple wrapper for these functions (sbatch and srun) so that Python code can be used for constructing and launching the aforementioned batch script.

Indeed, the generated batch script can be shown by printing the Slurm object:

from simple_slurm import Slurm

slurm = Slurm(array=range(3, 12), job_name='name')
print(slurm)

>> #!/bin/sh
>> 
>> #SBATCH --array               3-11
>> #SBATCH --job-name            name

Then, the job can be launched with either command:

slurm.srun('echo hello!')
slurm.sbatch('echo hello!')

>> Submitted batch job 34987

While both commands are quite similar, srun will wait for the job completion, while sbatch will launch and disconnect from the jobs.

More information can be found in Slurm's Quick Start Guide and in here.

Moreover, multi-line commands can be added using add_cmd and reset with reset_cmd. The sbatch directive will call add_cmd before launching the job.

slurm.add_cmd('echo hello for the first time!')
slurm.add_cmd('echo hello for the second time!')
slurm.sbatch('echo hello for the last time!')
slurm.reset_cmd()
slurm.sbatch('echo hello again!')

This results in two outputs

hello for the first time!
hello for the second time!
hello for the last time!

hello again!

Many syntaxes available

slurm = Slurm('-a', '3-11')
slurm = Slurm('--array', '3-11')
slurm = Slurm('array', '3-11')
slurm = Slurm(array='3-11')
slurm = Slurm(array=range(3, 12))
slurm.add_arguments(array=range(3, 12))
slurm.set_array(range(3, 12))

All these arguments are equivalent! It's up to you to choose the one(s) that best suits you needs.

"With great flexibility comes great responsability"

You can either keep a command-line-like syntax or a more Python-like one

slurm = Slurm()
slurm.set_dependency('after:65541,afterok:34987')
slurm.set_dependency(['after:65541', 'afterok:34987'])
slurm.set_dependency(dict(after=65541, afterok=34987))

All the possible arguments have their own setter methods (ex. set_array, set_dependency, set_job_name).

Please note that hyphenated arguments, such as --job-name, need to be underscored (so to comply with Python syntax and be coherent).

slurm = Slurm('--job_name', 'name')
slurm = Slurm(job_name='name')

# slurm = Slurm('--job-name', 'name')  # NOT VALID
# slurm = Slurm(job-name='name')       # NOT VALID

Moreover, boolean arguments such as --contiguous, --ignore_pbs or --overcommit can be activated with True or an empty string.

slurm = Slurm('--contiguous', True)
slurm.add_arguments(ignore_pbs='')
slurm.set_wait(False)
print(slurm)

#!/bin/sh

#SBATCH --contiguous
#SBATCH --ignore-pbs

Using configuration files

Let's define the static components of a job definition in a YAML file default.slurm

cpus_per_task: 15
job_name: 'name'
output: '%A_%a.out'

Including these options with the using the yaml package is very simple

import yaml

from simple_slurm import Slurm

slurm = Slurm(**yaml.load(open('default.slurm')))

...

slurm.set_array(range(NUMBER_OF_SIMULATIONS))

The job can be updated according to the dynamic project needs (ex. NUMBER_OF_SIMULATIONS).

Using the command line

For simpler dispatch jobs, a comand line entry point is also made available.

simple_slurm [OPTIONS] "COMMAND_TO_RUN_WITH_SBATCH"

As such, both of these python and bash calls are equivalent.

slurm = Slurm(partition='compute.p', output='slurm.log', ignore_pbs=True)
slurm.sbatch('echo \$HOSTNAME')

simple_slurm --partition=compute.p --output slurm.log --ignore_pbs "echo \$HOSTNAME"

Job dependencies

The sbatch call prints a message if successful and returns the corresponding job_id

job_id = slurm.sbatch('python demo.py ' + Slurm.SLURM_ARRAY_TAKSK_ID)

If the job submission was successful, it prints:

Submitted batch job 34987

And returns the variable job_id = 34987, which can be used for setting dependencies on subsequent jobs

slurm_after = Slurm(dependency=dict(afterok=job_id)))

Additional features

For convenience, Filename Patterns and Output Environment Variables are available as attributes of the Simple Slurm object.

See https://slurm.schedmd.com/sbatch.html for details on the commands.

from slurm import Slurm

slurm = Slurm(output=('{}_{}.out'.format(
    Slurm.JOB_ARRAY_MASTER_ID,
    Slurm.JOB_ARRAY_ID))
slurm.sbatch('python demo.py ' + slurm.SLURM_ARRAY_JOB_ID)

This example would result in output files of the form 65541_15.out. Here the job submission ID is 65541, and this output file corresponds to the submission number 15 in the job array. Moreover, this index is passed to the Python code demo.py as an argument.

Note that they can be accessed either as Slurm.<name> or slurm.<name>, here slurm is an instance of the Slurm class.

Filename Patterns

sbatch allows for a filename pattern to contain one or more replacement symbols.

They can be accessed with Slurm.<name>

name	value	description
JOB_ARRAY_MASTER_ID	%A	job array's master job allocation number
JOB_ARRAY_ID	%a	job array id (index) number
JOB_ID_STEP_ID	%J	jobid.stepid of the running job. (e.g. "128.0")
JOB_ID	%j	jobid of the running job
HOSTNAME	%N	short hostname. this will create a separate io file per node
NODE_IDENTIFIER	%n	node identifier relative to current job (e.g. "0" is the first node of the running job) this will create a separate io file per node
STEP_ID	%s	stepid of the running job
TASK_IDENTIFIER	%t	task identifier (rank) relative to current job. this will create a separate io file per task
USER_NAME	%u	user name
JOB_NAME	%x	job name
PERCENTAGE	%%	the character "%"
DO_NOT_PROCESS	\\	do not process any of the replacement symbols

Output Environment Variables

The Slurm controller will set the following variables in the environment of the batch script.

They can be accessed with Slurm.<name>.

name	description
SLURM_ARRAY_TASK_COUNT	total number of tasks in a job array
SLURM_ARRAY_TASK_ID	job array id (index) number
SLURM_ARRAY_TASK_MAX	job array's maximum id (index) number
SLURM_ARRAY_TASK_MIN	job array's minimum id (index) number
SLURM_ARRAY_TASK_STEP	job array's index step size
SLURM_ARRAY_JOB_ID	job array's master job id number
...	...

See https://slurm.schedmd.com/sbatch.html for a complete list.

squeue

You can use the built-in squeue to retrieve information about running jobs, or even filter jobs according to their name

from simple_slurm import Slurm

slurm = Slurm(**yaml.safe_load(open('slurm_default.yml', 'r')))
slurm.squeue.update_squeue()
slurm.squeue.display_jobs()

scancel

Invokes the scancel command. It provides two methods scancel.cancel_job() which sends a straightforward scancel and scancel.signal_job() which attempts to send a sigterm first.

Example below cancels the first found running job from the user

from simple_slurm import Slurm

slurm = Slurm(**yaml.safe_load(open('slurm_default.yml', 'r')))
slurm.squeue.update_squeue()
for job_id in slurm.squeue.jobs:
    slurm.scancel.cancel_job(job_id)
    break

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
.github/workflows		.github/workflows
conda		conda
simple_slurm		simple_slurm
test		test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Simple Slurm

Contents

Introduction

Many syntaxes available

Using configuration files

Using the command line

Job dependencies

Additional features

Filename Patterns

Output Environment Variables

squeue

scancel

About

Releases 7

Packages

Contributors 8

Languages

License

amq92/simple_slurm

Folders and files

Latest commit

History

Repository files navigation

Simple Slurm

Contents

Introduction

Many syntaxes available

Using configuration files

Using the command line

Job dependencies

Additional features

Filename Patterns

Output Environment Variables

squeue

scancel

About

Resources

License

Stars

Watchers

Forks

Releases 7

Packages 0

Contributors 8

Languages

Packages