# First things first
If you are running this notebook yourself or copying the commands into a new notebook, make sure you have these variables set.

In [1]:
mypirg = 'lcni' # set this to your own pirg please

Point "bidsdir" to a directory of example data. This one is some shared data from OpenNeuro.

In [2]:
bidsdir = '/projects/lcni/jolinda/shared/TalapasClass/ds000114/'

Make sure the lcnimodules are in your path (for slurmpy later)

In [3]:
import sys
if '/projects/lcni/jolinda/shared/site-packages' not in sys.path:
    sys.path.append('/projects/lcni/jolinda/shared/site-packages')

# Example data  
Let's take a look at the example data we will use from here on out. This is from openneuro: https://openneuro.org/datasets/ds000114/versions/1.0.1

In [11]:
bidsdir = '/projects/lcni/jolinda/shared/TalapasClass/ds000114/'

In [12]:
!tree {bidsdir} -L 1

/projects/lcni/jolinda/shared/TalapasClass/ds000114/
|-- CHANGES
|-- dataset_description.json
|-- dwi.bval
|-- dwi.bvec
|-- participants.tsv
|-- sub-01
|-- sub-02
|-- sub-03
|-- sub-04
|-- sub-05
|-- sub-06
|-- sub-07
|-- sub-08
|-- sub-09
|-- sub-10
|-- task-covertverbgeneration_bold.json
|-- task-covertverbgeneration_events.tsv
|-- task-fingerfootlips_bold.json
|-- task-fingerfootlips_events.tsv
|-- task-linebisection_bold.json
|-- task-overtverbgeneration_bold.json
|-- task-overtverbgeneration_events.tsv
|-- task-overtwordrepetition_bold.json
`-- task-overtwordrepetition_events.tsv

10 directories, 14 files


In [13]:
!tree {bidsdir}/sub-01

/projects/lcni/jolinda/shared/TalapasClass/ds000114//sub-01
|-- ses-retest
|   |-- anat
|   |   `-- sub-01_ses-retest_T1w.nii.gz
|   |-- dwi
|   |   `-- sub-01_ses-retest_dwi.nii.gz
|   `-- func
|       |-- sub-01_ses-retest_task-covertverbgeneration_bold.nii.gz
|       |-- sub-01_ses-retest_task-fingerfootlips_bold.nii.gz
|       |-- sub-01_ses-retest_task-linebisection_bold.nii.gz
|       |-- sub-01_ses-retest_task-linebisection_events.tsv
|       |-- sub-01_ses-retest_task-overtverbgeneration_bold.nii.gz
|       `-- sub-01_ses-retest_task-overtwordrepetition_bold.nii.gz
`-- ses-test
    |-- anat
    |   `-- sub-01_ses-test_T1w.nii.gz
    |-- dwi
    |   `-- sub-01_ses-test_dwi.nii.gz
    `-- func
        |-- sub-01_ses-test_task-covertverbgeneration_bold.nii.gz
        |-- sub-01_ses-test_task-fingerfootlips_bold.nii.gz
        |-- sub-01_ses-test_task-linebisection_bold.nii.gz
        |-- sub-01_ses-test_task-linebisection_events.tsv
        |-- sub-01_ses-test_task-overtverbgenera

# About Slurm
Slurm Workload Manager is an open source job scheduler, formerly known as the Simple Linux Utility for Resource Management. When you submit a job to the talapas cluster, Slurm is the software that decides where and when to send it and keeps track of whether it completes successfully or not. 

In [4]:
!sacct

       JobID    JobName  Partition    Account  AllocCPUS      State ExitCode 
------------ ---------- ---------- ---------- ---------- ---------- -------- 
11691589     sys/dashb+      short       lcni          1    TIMEOUT      0:0 
11691589.ba+      batch                  lcni          1  CANCELLED     0:15 
11691589.ex+     extern                  lcni          1  COMPLETED      0:0 
11745141     sys/dashb+      short       lcni          1    RUNNING      0:0 
11745141.ba+      batch                  lcni          1    RUNNING      0:0 
11745141.ex+     extern                  lcni          1    RUNNING      0:0 
11745178     sys/dashb+      short       lcni          1    RUNNING      0:0 
11745178.ba+      batch                  lcni          1    RUNNING      0:0 
11745178.ex+     extern                  lcni          1    RUNNING      0:0 
11745191        convert      short       lcni          1  COMPLETED      0:0 
11745191.ba+      batch                  lcni          1  COMPLE

'sacct' shows recent job information for the current user. Each job has a JobID and a name. Jobs may have separate steps (indicated by jobId.something), with their own names. The '+' signs mean the names are too long for the default column widths. We can use the -j parameter to show the result of a specific jobid (it can be another user's job).

In [5]:
!sacct -j 11617266 

       JobID    JobName  Partition    Account  AllocCPUS      State ExitCode 
------------ ---------- ---------- ---------- ---------- ---------- -------- 
11617266     snakejob.+      short    kernlab          1  COMPLETED      0:0 
11617266.ba+      batch               kernlab          1  COMPLETED      0:0 
11617266.ex+     extern               kernlab          1  COMPLETED      0:0 


You can use the format command to change which columns are shown. There are a ton of columns you can include in the format command, and there's a special command to list them all.

In [6]:
!sacct --helpformat

Account             AdminComment        AllocCPUS           AllocGRES          
AllocNodes          AllocTRES           AssocID             AveCPU             
AveCPUFreq          AveDiskRead         AveDiskWrite        AvePages           
AveRSS              AveVMSize           BlockID             Cluster            
Comment             Constraints         ConsumedEnergy      ConsumedEnergyRaw  
CPUTime             CPUTimeRAW          DerivedExitCode     Elapsed            
ElapsedRaw          Eligible            End                 ExitCode           
Flags               GID                 Group               JobID              
JobIDRaw            JobName             Layout              MaxDiskRead        
MaxDiskReadNode     MaxDiskReadTask     MaxDiskWrite        MaxDiskWriteNode   
MaxDiskWriteTask    MaxPages            MaxPagesNode        MaxPagesTask       
MaxRSS              MaxRSSNode          MaxRSSTask          MaxVMSize          
MaxVMSizeNode       MaxVMSizeTask       

For completed jobs, "Elapsed"and "MaxRSS" are particularly helpful. They tell us how long the job ran, and how much memory it used. This can be used for planning how much time & memory to request in the future.

In [7]:
!sacct -j 11617266 --format="Elapsed, MaxRSS, ReqMem, user, account, partition"

   Elapsed     MaxRSS     ReqMem      User    Account  Partition 
---------- ---------- ---------- --------- ---------- ---------- 
  00:11:44                 220Gc   jadrion    kernlab      short 
  00:11:44 107742656K      220Gc              kernlab            
  00:11:44          0      220Gc              kernlab            


This job used 107G of ram, and requested 220G, and took 11 minutes 44 seconds to run.

# PIRGS and jobs
All jobs run on talapas must include information about which account (pirg) should be billed for the time. If you have set the environment variables SLURM_ACCOUNT and SBATCH_ACCOUNT, it will use one of those. Otherwise you will need to specify it.

# Running a job: sbatch
You launch jobs from the command line with the sbatch command. Very simple jobs can be launched using the --wrap keyword. More complex jobs can be launched from a saved file.

In [14]:
!sbatch --account={mypirg} --wrap "echo hello"

Submitted batch job 11749761


In [16]:
!sacct -j 11749761

       JobID    JobName  Partition    Account  AllocCPUS      State ExitCode 
------------ ---------- ---------- ---------- ---------- ---------- -------- 
11749761           wrap      short       lcni          1  COMPLETED      0:0 
11749761.ba+      batch                  lcni          1  COMPLETED      0:0 
11749761.ex+     extern                  lcni          1  COMPLETED      0:0 


In [17]:
cat slurm-11749761.out

hello


If we want a more descriptive name than "wrap", we can set that. There are lots of options you can set, here's a convenient list of them. https://slurm.schedmd.com/pdfs/summary.pdf

In [18]:
!sbatch --account={mypirg} --job-name='echo' --wrap "echo hello" 

Submitted batch job 11749773


In [19]:
!sacct -j 11749773

       JobID    JobName  Partition    Account  AllocCPUS      State ExitCode 
------------ ---------- ---------- ---------- ---------- ---------- -------- 
11749773           echo      short       lcni          1  COMPLETED      0:0 
11749773.ba+      batch                  lcni          1  COMPLETED      0:0 
11749773.ex+     extern                  lcni          1  COMPLETED      0:0 


In many cases it's easier to write a script file and submit that. Here's a simple one.

In [20]:
cat fslinfo.srun

#!/bin/bash
#SBATCH --job-name=fslinfo
#SBATCH --account=lcni

module load fsl/6.0.1
fslinfo /projects/lcni/jolinda/shared/TalapasClass/ds000114/sub-01/ses-test/func/sub-01_ses-test_task-fingerfootlips_bold.nii.gz

# Parts of a slurm script  
  
Let's take apart that script line by line. First we define the interpreter. You'll probably usually want bash, but you can use other scripting languages such as python. Just make sure you have the right path  
`#!/bin/bash`   
  
Next up are our SLURM options, one per line. Account is the only required one (and that's only if you don't have environment variables set)  
`#SBATCH --job-name=fslinfo`   
`#SBATCH --account=lcni` 
   
Finally, we have our bash commands:  
`module load fsl/6.0.1`  
`fslinfo /home/jolinda/lcni/shared/TalapasClass/ds000114/sub-01/ses-test/func/sub-01_ses-test_task-fingerfootlips_bold.nii.gz`

In [21]:
# submit it with sbatch
!sbatch fslinfo.srun

Submitted batch job 11749811


In [22]:
!sacct -j 11749811

       JobID    JobName  Partition    Account  AllocCPUS      State ExitCode 
------------ ---------- ---------- ---------- ---------- ---------- -------- 
11749811        fslinfo      short       lcni          1  COMPLETED      0:0 
11749811.ba+      batch                  lcni          1  COMPLETED      0:0 
11749811.ex+     extern                  lcni          1  COMPLETED      0:0 


In [23]:
cat slurm-11749811.out

data_type	INT16
dim1		64
dim2		64
dim3		30
dim4		184
datatype	4
pixdim1		4.000000
pixdim2		4.000000
pixdim3		3.999975
pixdim4		2.500000
cal_max		0.000000
cal_min		0.000000
file_type	NIFTI-1+


# Slurm parameters to know (with examples
`--partition=long`  
Partition you are submitting to. The default is short.  
`--output=slurmjobname.out`  
`--error=slurmjobname-%j.err`  
Alternative to the slurm-{jobnumber}.out filename for standard out and standard error. %j will be replaced with the jobnumber  
`--time=5`  
`--time=5:30:0`  
`--time=2-0`  
Time limit for your job. First example is 5 minutes, second is 5 hours 30 minutes (h:m:s), third is 2 days (d-h). If unspecified it will be the limit for the partition (24 hours for short, 14 days for long, etc). Asking for less time means your job should launch more quickly. You'll want to set this if you are getting close to any planned talapas downtime, otherwise your job won't launch until after the cluster is back up.  
`--mem=16G`  
Memory requested for your job. The default amount varies, but it's about 4 GB for a standard node. This will not be enough for some jobs. You'll know you need more memory because your job will fail. Don't just ask for all the memory on the node (128 GB). It will take longer to launch your job and you will be taking resources away from other users. If you are doing something that requires more then 128 GB, use fat or longfat.  
`--cpus-per-task=1`  
Number of threads per task. If you are running something that can take advantage of multithreading, increase this number.  
`--dependency=afterok:123456`  
Run this job after job 123456 finishes with no errors  
`--mail-user=email_address -mail-type=END `  
Send an email when the job ends. I like to combine this with email-to-text from my carrier to get a text messsage notification.  
`--comment=idx:index`  
Write a comment. Use idx:index to indicate which project should be billed for your time.

# Array jobs  
We often need to run the same command on a long list of subjects. This is when array jobs come in handy. Instead of writing 10 different scripts with different subject names, we write one script with an array of names. Just separate out the part of the command that changes and replace it with ${x}, set a bash array variable named x, and include the array parameter in your file.

In [24]:
cat fslinfo_array.srun

#!/bin/bash
#SBATCH --job-name=fslinfo_array
#SBATCH --account=lcni
#SBATCH --array=0-9

data=(01 02 03 04 05 06 07 08 09 10)

x=${data[$SLURM_ARRAY_TASK_ID]}


module load fsl/6.0.1
echo ${x}
fslinfo /home/jolinda/lcni/shared/TalapasClass/ds000114/sub-${x}/ses-test/anat/sub-${x}_ses-test_T1w.nii.gz

In [25]:
!sbatch fslinfo_array.srun

Submitted batch job 11749822


In [26]:
!sacct -j 11749822

       JobID    JobName  Partition    Account  AllocCPUS      State ExitCode 
------------ ---------- ---------- ---------- ---------- ---------- -------- 
11749822_0   fslinfo_a+      short       lcni          1  COMPLETED      0:0 
11749822_0.+      batch                  lcni          1  COMPLETED      0:0 
11749822_0.+     extern                  lcni          1  COMPLETED      0:0 
11749822_1   fslinfo_a+      short       lcni          1  COMPLETED      0:0 
11749822_1.+      batch                  lcni          1  COMPLETED      0:0 
11749822_1.+     extern                  lcni          1  COMPLETED      0:0 
11749822_2   fslinfo_a+      short       lcni          1  COMPLETED      0:0 
11749822_2.+      batch                  lcni          1  COMPLETED      0:0 
11749822_2.+     extern                  lcni          1  COMPLETED      0:0 
11749822_3   fslinfo_a+      short       lcni          1  COMPLETED      0:0 
11749822_3.+      batch                  lcni          1  COMPLE

There's a slurm output file for each value of x.

In [27]:
ls slurm-11749822*

slurm-11749822_0.out  slurm-11749822_4.out  slurm-11749822_8.out
slurm-11749822_1.out  slurm-11749822_5.out  slurm-11749822_9.out
slurm-11749822_2.out  slurm-11749822_6.out
slurm-11749822_3.out  slurm-11749822_7.out


In [28]:
cat slurm-11749822_0.out

01
data_type	FLOAT32
dim1		256
dim2		156
dim3		256
dim4		1
datatype	16
pixdim1		1.000000
pixdim2		1.299376
pixdim3		1.000000
pixdim4		0.009668
cal_max		0.000000
cal_min		0.000000
file_type	NIFTI-1+


# Slurmpy
Writing similar scripts over and over got pretty old, so I wrote a slurm-script generator module named slurmpy. Using slurmpy you can create, submit, and see the results of slurm jobs. Here's how I could have generated today's examples.

In [29]:
import slurmpy

In [48]:
jobid = slurmpy.WrapSlurmCommand(command = 'echo hello', account = 'lcni')

sbatch --account=lcni --wrap "echo hello"
Submitted batch job 11750065



In [49]:
jobid

'11750065'

In [50]:
cat slurm-{jobid}.out

hello


Slurmpy has a SlurmJob class that can make this even easier.

In [51]:
job = slurmpy.SlurmJob(account = 'lcni', jobname = 'echo', command = 'echo hello')

Note that it's jobname, not job-name, because a dash is a minus operator in python and we can't have one in the middle of a variable name. 

In [52]:
job.WrapSlurmCommand()

sbatch --account=lcni --wrap "echo hello"
Submitted batch job 11750213



'11750213'

In [53]:
print(job.JobInfo())

          JobID                        JobName  Partition      State    Elapsed     MaxRSS 
--------------- ------------------------------ ---------- ---------- ---------- ---------- 
       11750213                           wrap      short  COMPLETED   00:00:01            
 11750213.batch                          batch             COMPLETED   00:00:01          0 
11750213.extern                         extern             COMPLETED   00:00:01          0 



In [54]:
job.ShowOutput()

slurm-11750213.out
hello



Slurmpy can write slurm script files too.

In [55]:
job.WriteSlurmFile('example.srun')

'example.srun'

In [56]:
job.PrintSlurmFile()

#!/bin/bash
#SBATCH --job-name=echo
#SBATCH --account=lcni

echo hello


Multiline commands can either be strings with '\n' between lines, or a list of strings.

In [58]:
command = list()
command.append('module load fsl/6.0.1')
command.append('fslinfo /projects/lcni/jolinda/shared/TalapasClass/ds000114/sub-01/ses-test/anat/sub-01_ses-test_T1w.nii.gz')

In [59]:
job = slurmpy.SlurmJob(account = 'lcni', jobname = 'fslinfo', command = command)

In [60]:
job.WriteSlurmFile('example2.srun')

'example2.srun'

In [61]:
job.PrintSlurmFile()

#!/bin/bash
#SBATCH --job-name=fslinfo
#SBATCH --account=lcni

module load fsl/6.0.1
fslinfo /projects/lcni/jolinda/shared/TalapasClass/ds000114/sub-01/ses-test/anat/sub-01_ses-test_T1w.nii.gz


There are many ways to build up an array job! I usually use glob to get the list of files I want to work with.

In [62]:
import os
import glob
files = sorted(glob.glob(os.path.join(bidsdir, 'sub*', 'ses-test', 'anat', '*T1w.nii.gz')))

In [63]:
files

['/projects/lcni/jolinda/shared/TalapasClass/ds000114/sub-01/ses-test/anat/sub-01_ses-test_T1w.nii.gz',
 '/projects/lcni/jolinda/shared/TalapasClass/ds000114/sub-02/ses-test/anat/sub-02_ses-test_T1w.nii.gz',
 '/projects/lcni/jolinda/shared/TalapasClass/ds000114/sub-03/ses-test/anat/sub-03_ses-test_T1w.nii.gz',
 '/projects/lcni/jolinda/shared/TalapasClass/ds000114/sub-04/ses-test/anat/sub-04_ses-test_T1w.nii.gz',
 '/projects/lcni/jolinda/shared/TalapasClass/ds000114/sub-05/ses-test/anat/sub-05_ses-test_T1w.nii.gz',
 '/projects/lcni/jolinda/shared/TalapasClass/ds000114/sub-06/ses-test/anat/sub-06_ses-test_T1w.nii.gz',
 '/projects/lcni/jolinda/shared/TalapasClass/ds000114/sub-07/ses-test/anat/sub-07_ses-test_T1w.nii.gz',
 '/projects/lcni/jolinda/shared/TalapasClass/ds000114/sub-08/ses-test/anat/sub-08_ses-test_T1w.nii.gz',
 '/projects/lcni/jolinda/shared/TalapasClass/ds000114/sub-09/ses-test/anat/sub-09_ses-test_T1w.nii.gz',
 '/projects/lcni/jolinda/shared/TalapasClass/ds000114/sub-10/ses

In [68]:
command = list()
command.append('module load fsl/6.0.1')
command.append('echo ${x}')
command.append('fslinfo ${x}')

In [69]:
job = slurmpy.SlurmJob(account = 'lcni', jobname = 'fslinfo_array', command = command, array = files)

In [70]:
job.WriteSlurmFile('example3.srun')

'example3.srun'

In [71]:
job.PrintSlurmFile()

#!/bin/bash
#SBATCH --job-name=fslinfo_array
#SBATCH --account=lcni
#SBATCH --array=0-9

data=(/projects/lcni/jolinda/shared/TalapasClass/ds000114/sub-01/ses-test/anat/sub-01_ses-test_T1w.nii.gz /projects/lcni/jolinda/shared/TalapasClass/ds000114/sub-02/ses-test/anat/sub-02_ses-test_T1w.nii.gz /projects/lcni/jolinda/shared/TalapasClass/ds000114/sub-03/ses-test/anat/sub-03_ses-test_T1w.nii.gz /projects/lcni/jolinda/shared/TalapasClass/ds000114/sub-04/ses-test/anat/sub-04_ses-test_T1w.nii.gz /projects/lcni/jolinda/shared/TalapasClass/ds000114/sub-05/ses-test/anat/sub-05_ses-test_T1w.nii.gz /projects/lcni/jolinda/shared/TalapasClass/ds000114/sub-06/ses-test/anat/sub-06_ses-test_T1w.nii.gz /projects/lcni/jolinda/shared/TalapasClass/ds000114/sub-07/ses-test/anat/sub-07_ses-test_T1w.nii.gz /projects/lcni/jolinda/shared/TalapasClass/ds000114/sub-08/ses-test/anat/sub-08_ses-test_T1w.nii.gz /projects/lcni/jolinda/shared/TalapasClass/ds000114/sub-09/ses-test/anat/sub-09_ses-test_T1w.nii.gz /proj

The effect will be the same as the original fslinfo_array.srun file. Whichever way you do it is a matter of preference and ease. If you are working with the output of dicom2bids, you'll have the run numbers in the file names to deal with, so simply substituting in the subject names may not work. Add in output files and keeping it all sorted out can be a little tricky. We'll dig into that more next week.

In [72]:
job.SubmitSlurmFile()

Submitted batch job 11751898



'11751898'

In [73]:
job.ShowStatus()

COMPLETED 10


In [74]:
print(job.JobInfo())

          JobID                        JobName  Partition      State    Elapsed     MaxRSS 
--------------- ------------------------------ ---------- ---------- ---------- ---------- 
     11751898_0                  fslinfo_array      short  COMPLETED   00:00:02            
11751898_0.bat+                          batch             COMPLETED   00:00:02          0 
11751898_0.ext+                         extern             COMPLETED   00:00:03          0 
     11751898_1                  fslinfo_array      short  COMPLETED   00:00:03            
11751898_1.bat+                          batch             COMPLETED   00:00:03          0 
11751898_1.ext+                         extern             COMPLETED   00:00:03          0 
     11751898_2                  fslinfo_array      short  COMPLETED   00:00:02            
11751898_2.bat+                          batch             COMPLETED   00:00:02          0 
11751898_2.ext+                         extern             COMPLETED   00:00:03 

In [77]:
print(job.JobInfo(['jobid%20']))

               JobID 
-------------------- 
          11751898_0 
    11751898_0.batch 
   11751898_0.extern 
          11751898_1 
    11751898_1.batch 
   11751898_1.extern 
          11751898_2 
    11751898_2.batch 
   11751898_2.extern 
          11751898_3 
    11751898_3.batch 
   11751898_3.extern 
          11751898_4 
    11751898_4.batch 
   11751898_4.extern 
          11751898_5 
    11751898_5.batch 
   11751898_5.extern 
          11751898_6 
    11751898_6.batch 
   11751898_6.extern 
          11751898_7 
    11751898_7.batch 
   11751898_7.extern 
          11751898_8 
    11751898_8.batch 
   11751898_8.extern 
          11751898_9 
    11751898_9.batch 
   11751898_9.extern 

