## Slurm commands

Interactive: 

  - salloc : allocate resources  e.g. 
  
  ```salloc -p main -t 10:00 --mem=20M -N 1 -n 1 -c 1```
  
  - srun :   run a job  e.g.  `srun --pty bash -i`
  
  - both at the same time: 
  
  ```srun -p main -t 10:00 --mem=20M -N 1 -n 1 -c 1 --pty bash -i```
  
Batch:  `sbatch my_slurm_batch_script.sh`
  
Checking jour job status: `squeue -u kp807`

## Slurm batch script example
```
#!/bin/bash

##### DO NOT SET ANY VARIBLES DECLARATION BEFORE SBATCH INSTRUCTION !!!!! ######
### Modified by Vlad Kholodovych 01/24/2017 ###

#SBATCH --job-name=kris_hello ## Replace with your jobname 
#SBATCH --partition=main # Partition (job queue) 
#SBATCH -N 1 # Number of nodes 
#SBATCH -n 1 # Number of tasks - if not MPI make it 1
#SBATCH -c 2 # Number of cores per task
#SBATCH --mem=100 # Real memory per node required (MB) 
#SBATCH --time=00:10:00 # Total run time limit (HH:MM:SS) 
#SBATCH --output=slurm.%j.%N.out # STDOUT output file 
#SBATCH --error=slurm.%j.%N.err # STDERR error file 

### Declare job non-rerunable
#SBATCH --no-requeue

module use /projects/community/modulefiles/
module load bowtie/1.2.2-gc563

bowtie --help    #this is not parallel execution! no srun!
# alternatively: 
# srun bowtie --help        # slurm will put it in database if srun used
# time srun bowtie --help   # time your execution
```



## Right-sizing your jobs

  - they may run sooner if they are small
     + think of the job scheduler as trying to fit cubes of different sizes - if a hole opens up, a smaller cube might fit in
  - you are using your fair share needlessly if you ask for too much
     + the priority for you is pushed down if you have asked for (even if not used) a lot of resources lately
  - time a portion of the job so that you can estimate the completion time

## Using N, n, and c options in slurm
  - if in doubt, use -N 1 -n 1  (N=nodes, n=ntasks)
  - 'N'>1 and 'n'>1 option can be used only with MPI-enabled software or such. Not a lot of software falls into that category!
    + software that does: 
       * MPI, 
       * ipython parallel (uses mpi4py), 
       * GATK 4 (uses Spark as parallelizing engine)
  - a lot of software can parallelize across the cores on a single machine so feel free to use '-c' > 1


## Debugging your jobs
  - include -e (error file) and -o (output file) = name it something that you will recognize
  - scontrol show job 123456 -dd  and what it tells you
    + command, batch, workdir
    + resources used (or asked???)
    + only available for a short time after job ends
  - interactive development i.e. run bash on a compute node:
    + IMPORTANT: use only one core at a time; and `scancel 12345` cancel your job or logout to make sure you have relinquished slurm allocation
    + IMPORTANT: without srun, the help team cannot debug properly!


## Useful slurm commands

```
scontrol show job 1234556 -dd      # shows detailed view of the options submitted to the job 1234556
srun -p main -t 10:00 --mem=20M -N 1 -n 1 -c 1 --pty bash -i #interactive session - remember to log out when done
squeue  -u kp807                   # show jobs for netid kp807 (replace with your own)
sacct -o Elapsed,Start,End,job,jobName #show job statistics for past jobs
sacct -o MaxRSS,job,jobName,state      #another set of job statistics
```

## Slurm good practice
  - number of jobs not more than 5,000 at a time - queue is 10,000 
  - keep track of how much memory you are using 
      
      `sacct -o MaxRSS,job,jobName,state`
  - estimate the wall time 
  
      `sacct -o Elapsed,Start,End,job,jobName`
  - be aware of resources needed for GPUs - cuda machines need to keep a few CPUs free to use GPUs - leave at least 2 cores per GPU


## Bash commands
  - manipulating strings: `basename` and `namedir`; `cut -d '_' -f2 `
  - gotchas like `$file_i`  vs `${file}_i` ; don't leave spaces
  - list comprehensions `echo file{1..10}`
  - Bash command substitution using backticks \`pwd\`

Try some of these commands on the command line and see what it does: 

```
dirname /home/kp807/projects/cluster_reports/cluster.csv       # everything but the last part
basename /home/kp807/projects/cluster_reports/cluster.csv      # last part of path
CURRENT_DIR=`pwd` ; echo $CURRENT_DIR                          # backtick for execution of bash command
echo 'projects_cluster_file.csv' | cut -d '_' -f1              # split name to retain a part of file
echo $((1 + 2))                                                # double parenthesis for arithmetic expressions
echo filename_fly{5..10}.csv                                   # list comprehensions   {start..end}

# for-loop: 
for file in filename_fly{5..10}.csv; do  echo $file ; done     

# if-statement - 2 examples
if [ 1 -gt 2 ]; then  echo '1 > 2' ; else echo '1<2' ; fi      
if [ -d "newdir" ]; then  echo 'directory exists' ; else echo 'this directory doesnt exist' ; fi                        
#variable assignment
a=10; echo $a        #good - no spaces
b = 10; echo $b      #bad - spaces around =
```

### useful commands: 

|command | description | usage example |
|:-----------|:--------|:-------------|
|which <command>| see where command is installed | which python|
|pwd| which directory I'm in | pwd |
|man <command>| manual page for command | man cut|
|grep <pattern>| filter for lines which fit pattern | cat myfile &#124; grep GATK |
|cut -d<delimiter> -f<number>| split line by delimiter and get field number 3| cat myfile &#124; cut -d'_' -f3 |
|sort <file>| sort lines, often used with `uniq` | sort myfile &#124; uniq |
|uniq| suppress repeated lines, works only if sorted | see above example |
|less | paginated output | less myfile |
| >| redirect output (e.g. list files and save filenames in aaa.txt) | ls > aaa.txt |
|>>| append output to existing file | echo "blah" >> aaa.txt |
| find| find files with some properties e.g. display all files recursively from current directory| `find .`|
| chmod| change permissions on a file or directory |chmod u+x myscript.sh|
|top| display most intensive processes | `top`|
|ps auxw| list processes | `ps auxw` |


## Data parallelism

This is if your job is "embarassingly" parallel

  - for loops and creating/submitting lots of jobs at once
  - job arrays  - see example at [this github repo](https://github.com/KristinaPlazonic/slurm_data_parallelization)

## Disk utilization

How to see who used how much space 
  
- Will show file usage on /scratch/netid

``` mmlsquota scratch --block-size=auto   ```

- Will show usage in /home/netid

```mmlsquota home --block-size=auto     ```

- Will show individual usage in the shared folder foran (/projects/foran)

```mmlsquota home:foran --block-size=auto ```

- Will show quota and usage of the whole fileset foran.

```mmlsquota -j foran home  --block-size=auto ```

- Will show human-readable sizes of all 1st-level subdirectories of `/directory/to/query/`
   + `du -hs /directory/to/query/*` 
     

## Resources

- [linux tutorial by Galen](http://www.rci.rutgers.edu/~gc563/linux/index.html)
- [OARC cluster user guide Amarel/Perceval](https://rutgers-oarc.github.io/amarel/)
- [web-based access to the cluster (still testing) - only from campus or VPN](https://ondemand.hpc.rutgers.edu/)
- [intro videos by Kristina](https://github.com/KristinaPlazonic/videos)