# Batch processing: Slurm
- allocating access to compute nodes
- starting, executing and monitoring work on a set of allocated nodes
- arbitrating contention of computing resources by managing a queue of pending jobs
- No FIFO-basis, but multi-factor fair-share algorithm scheduling: user jobs based on the portion of the computing resources ()= allocated cores*seconds + main memory usage)


In [None]:
# List of own current usage and resulting shares
sshare -U

# submit a job script (resource requests about no. of nodes, cores per node, main memory or computation time)
sbatch <jobscript>

Every Job script begins with:
#!/bin/bash
#SBATCH
Batch parameters are: 


In [None]:
#!/bin/bash
#SBATCH
--partition=        # batch class, -p
--job-name=         # -J
--output=           # Stdout file name, -o
--error= error.txt  # Stderr file name, if not specified, redirected to stdout file, -e
--nodes             # no. of nodes, -N
--tasks-per-node=   # no. of tasts per node
--cpus-per-task=    # no. of cores per task, -c
--gpus-per-node=
--mem=              # real memory required per node, default unit Mb, G for Gb
--time=             # walltime, "days-hours:minutes:seconds", max. 48h  -t
--no-requeue        # never requeue the job
--constraint=       # request special node feature, feature as input, sinfo for available features, -C 
--qos=              # quality of service, qos-name as input, sacctmgr show qos for available -q
--mail-user=        # set email address for nofification  (currently not working)
--mail-type=        # BEGIN, END, FAIL, ALL

# never request full main memory, leave at least 1-2 GB for the operating system 

# relative or absolute path for output and error
    # absolute path: gxfs_home/geomar/smomw681

# if longer walltime than 48 h required: qos parameter 
    #SBATCH --qos=long
    #SBATCH --time=5-00:00:00  # example: 5 days

# interactive access of the GPU nodes 
srun --pty --partition=gpu --gpus-per-node=1 --mem=10000 --time=01:00:00 /bin/bash

After job submission: 
check wheteher the job is submitted successfully. 
The job states are: 
PD: pending
R: running

In [None]:
squeue  # every job in the node
#only my jobs submitted
squeue --me 
squeue -u smomw681

squeue -j <jobid>
scontrol show job <jobid>

Further commands: 

In [None]:
# gather resource informations of a running job
sstat -j <jobid>.batch

# general node information
sinfo
# show node list incl. available cpus, memory, features, local disk space
sinfo --node -o 

If an access to internet required (default no internet access): 

In [None]:
export http_proxy=http://10.0.7.235:3128
export https_proxy=http://10.0.7.235:3128
export ftp_proxy=http://10.0.7.235:3128

## Partitions: 

Cluster subsystem:
- base
- highmem

GPU subsystem: 
- gpu
- interactive access of the GPU nodes 
    srun --pty --partition=gpu --gpus-per-node=1 --mem=10000 --time=01:00:00 /bin/bash

Vector subsystem:
- vector-test
- vector

Data subsystem: 
- data

Interactive subsystem: 
- interactive


Job arrarys: 
#SBATCH --array 0-100%5

In [None]:
echo "Hi, I am task $SLURM_ARRAY_TASK_ID in the job array $SLURM_ARRAY_JOB_ID"
# limiting the no. of tasks ran at once
#SBATCH --array 0-100%5

Job and resource monitoring

In [None]:
# summary of parameters and resources used by your batch
jobinfo

# process monitoring
ssh nesh-srp100 'top -b -u smomw681

# Script of running slrum job
scontrol write batch_script <job_ID>    # output: <script>.sh 
less -S <script>.sh


## Set up of required environment
The conda package is already installed in the working directory of the smomw681. 
Conda environments for each of the pipeline are required, containing following bioconda, other packages and modules:

