# 5: Running Jobs on the Cluster

With all the preliminaries out of the way, we are finally ready to look at how to run jobs on the cluster. As we mentioned earlier, jobs on the cluster should be run on the compute-nodes, not the head-nodes, and we do this by submitting our jobs to the job scheduler.

The Princeton clusters use SLURM which is an open source cluster management and job scheduling system for Linux clusters. Since all clusters use the same job scheduling software it means that once you learn how to use it, you will be able to use that knowledge to run on all Princeton clusters. We will now take a closer look at the essentials of working with SLURM.


## 5.1: SLURM

Submitting jobs to the SLURM job scheduler is done with a submission script. In this script we specify the necessary commands to run our program and then we request that SLURM execute our script on the cluster.

The submission script begins with a line identifying the Unix shell to be used by the script. Then follows a number of SLURM directives that begin with `#SBATCH`, and finally the necessary commands to run our program. A minimal SLURM submission script looks like this:

```bash
#!/bin/bash

#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --time=00:10:00

./my_program
```

We see that the first line specifies that we are using a bash shell. Then follows three SLURM directives where we specify how many nodes and tasks we want, and for how long we want these resources. Finally we execute our program, `my_program`.

If the above script is called `run.slurm`, we would submit this script to the SLURM scheduler with the command:

```bash
sbatch run.slurm
```

We can then monitor our job with the following command

```bash
squeue -u <your_puid>
```

This will show us some information about the job: the job id, its status, how long it has been running, which node it is running on, etc.

---

**NOTE:** A convenient way to monitor our job is to use the `watch` command. This command simply executes another command at a given interval. It means that we don't have to manually type `squeue -u <your_puid>` every time. We can use the following command to monitor our job every 10 seconds:

```bash
watch -n 10 squeue -u <your_puid>
```

To stop the `watch` command, hit `Ctrl+C`.

---

Once our job is finished the STDOUT and STDERR for our job can be found in a file which by default is called `<job_id>.slurm` in the directory where we submitted the job. This file simply contains everything that our program would print to the terminal if we had run it directly in the terminal instead of through the scheduler.

The following is an overview of useful SLURM commands:

| **Command**             | **Description**                                 |
|-------------------------|-------------------------------------------------|
| `sbatch <slurm_script>` | Submit a job (e.g. run.slurm)                   |
| `squeue`                | Show jobs in the queue                          |
| `squeue -u <your_puid>` | Show jobs in the queue for specific user        |
| `squeue --start`        | Report the expected start time for pending jobs |
| `squeue -j <job_id>`    | Show the nodes allocated to a running job       |
| `scancel <job_id>`      | Cancel a job (e.g. scancel 2534640)             |
| `sinfo`                 | Show how nodes are being used                   |
| `sshare/sprio`          | Show the priority assigned to jobs              |
| `smap/sview`            | Graphical display of the queues                 |
| `slurmtop`              | Text-based view of cluster nodes                |

Another convenient SLURM feature is that we can ask SLURM to send us an e-mail when a job begins and ends by adding some additional SLURM directives to our submission script. The following directives tell SLURM to send us an e-mail when the job beigns, one when it ends, and specifies the e-mail address to use.

```bash
#SBATCH --mail-type=begin
#SBATCH --main-type=end
#SBATCH --mail-user=<your_puid>@princeton.edu
```

This covers the basics of SLURM that we need to get started submitting jobs on the cluster. Next, we will look at two more advanced use cases: Submitting a parallel job, submitting a job that requires GPUs.


### 5.1.1: Submitting a Parallel Job to SLURM

SPECFEM3D is a software package with parallel capabilities that simulates seismic wave propagation. The following SLURM script shows how one would run a parallel program (e.g. SPECFEM3D) on the cluster.

```bash
#!/bin/bash

#SBATCH --nodes=8
#SBATCH --ntasks-per-node=40
#SBATCH --time=00:30:00

#SBATCH --mail-type=begin
#SBATCH --main-type=end
#SBATCH --mail-user=<your_puid>@princeton.edu

# load necessary modules
module load intel/18.0/64/18.0.3.222
module load openmpi/intel-18.0/3.0.0/64

# change directory to build directory
cd /total/path/to/current/directory

# run SPECFEM3D
srun ./bin/xspecfem3D
```

Here we are asking for 8 nodes, and 40 cores on each node, for a total of 320 cores. We want these resources for 30 minutes, and want e-mail notification when the job starts and finishes. Then, we load the necessary modules that our program needs. In this case these are the Intel compilers that SPECFEM3D was compiled with. We explicitly change path to the current directory and run the job with `srun` and a relative path to the executable.

**NOTE:** The number of cores per node depends on which cluster is being used. The above script was used with TigerCPU where the number of cores on each node is 40. This number will vary on other clusters.


### 5.1.2: Submitting a GPU Job to SLURM

The following SLURM submission script also runs SPECFEM3D in parallel, but uses GPUs instead of CPUs.

```bash
#!/bin/bash

#SBATCH --nodes=6
#SBATCH --ntasks-per-node=4
#SBATCH --time=00:10:00

#SBATCH --gres=gpu:4
#SBATCH --ntasks-per-socket=2

#SBATCH --mail-type=begin
#SBATCH --main-type=end
#SBATCH --mail-user=<your_puid>@princeton.edu

# load necessary modules
module load intel/18.0/64/18.0.3.222
module load openmpi/intel-18.0/3.0.0/64
module load cudatoolkit/8.0

# change directory to build directory
cd /total/path/to/current/directory

# run SPECFEM3D
srun ./bin/xspecfem3D
```

Here we are asking for 6 nodes and 4 GPUs per node, for a total of 24 GPUs. We ask for the resources for 10 minutes and want e-mail notification when the job starts and finishes. We load the Intel compilers that were used for compilation, and the CUDA-toolkit module that is needed for GPU usage. Then we explicitly change directory to the current directory and run the job with `srun` and a relative path to the executable.

**NOTE:** The number of GPUs per node depends on which cluster is being used. The Princeton clusters that have GPUs are TigerGPU and Adroit, and there are 4 GPUs per node and 2 GPUs per socket in both cases.

For more information on how to use SLURM, check out this [PICSciE article](https://researchcomputing.princeton.edu/education/online-tutorials/getting-started/introducing-slurm) and the official [SLURM webpage](https://slurm.schedmd.com/documentation.html).