# SLURM - Simple Linux Utility for Resource Management 

## Submitting jobs with [`sbatch`](https://slurm.schedmd.com/sbatch.html)

The command `sbatch [options] job_script` is used to submit jobs to SLURM's job queue, where `options` stands for additional options given to SLURM and `job_script` is the Bash script that details how to execute the job.
The output of a job is written to `slurm-<job_ID>.out` in the directory from which the job has been submitted. Job scripts have to start with `#!/bin/bash`, followed by the SLURM header lines and the actual script. The SLURM options given in the job script can also be directly passed to `sbatch`, but this is not recommended.

In [1]:
%%writefile slurm/example.py
#!/usr/bin/python3

import numpy as np
from mpi4py import MPI


comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()

pname = MPI.Get_processor_name()

print(f'Process {rank} is executed on processor {pname}')

Overwriting slurm/example.py


In [1]:
%%writefile slurm/job-mpi4py.sh
#!/bin/bash

#SBATCH --job-name=mpi4py           # SLURM_JOB_NAME
#SBATCH --nodes=2                   # SLURM_JOB_NUM_NODES (@course [1,2])
#SBATCH --tasks-per-node=48         # SLURM_NTASKS_PER_NODE (48 cores on vsc4)
#SBATCH --reservation=training      # @course
#SBATCH --qos=mem_0096              # @course
#SBATCH --partition=mem_0096        # @course
#SBATCH --export=NONE               # do not inherit the submission environment
#SBATCH --time=00:01:00             # time limit

module purge # always start with a clean environment
spack load --dependencies /ul23634 ; spack load /7fl6jme # python+numpy+mpi4py
export I_MPI_PIN_PROCESSOR_LIST=0-47 # pinning with intel-mpi

mpirun --np 96 python3 mpi4py/mpi04_allreduce.py

Overwriting slurm/job-mpi4py.sh


In [4]:
!sbatch slurm/job-mpi4py.sh

Submitted batch job 2278386


## Monitoring jobs with [`squeue`](https://slurm.schedmd.com/squeue.html)

`Squeue` allows to monitor running and queued jobs. In order to get a workable output, it is highly recommended to pass the username as a parameter.

In [8]:
!squeue --user=$USER

             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
           2278386  mem_0096   mpi4py   katrin PD       0:00      2 (Resources)
           2278381   jupyter vsc4_jup   katrin  R       0:01      1 n412-072


In [24]:
!bash -c -i sq

             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
           2278381   jupyter vsc4_jup   katrin  R      12:28      1 n412-072


## Cancelling jobs with [`scancel`](https://slurm.schedmd.com/scancel.html)

SLURM jobs can be canceled with `scancel`. The job ID is returned after the job is submitted and can be queried by `squeue`.

In [7]:
!scancel [job_ID]

scancel: error: Invalid job id [job_ID]
