Skip to content

Commit

Permalink
Update GNUParallel.md
Browse files Browse the repository at this point in the history
  • Loading branch information
mtrahan41 committed Dec 17, 2020
1 parent 53d9087 commit 37727fa
Showing 1 changed file with 3 additions and 3 deletions.
6 changes: 3 additions & 3 deletions docs/software/GNUParallel.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
## GNU Parallel

GNU Parallel is an effective tool for optimally using multiple cores and nodes on RMACC Summit to run lots of independent tasks without the need to learn OpenMP or MPI. This tutorial assumes user knowledge of Slurm job submission, shell scripting, and some Python.
GNU Parallel is an effective tool for optimally using multiple cores and nodes on RMACC Summit to run lots of independent tasks without the need to learn OpenMP or MPI. This tutorial assumes user knowledge of Slurm jobs, shell scripting, and some Python.

### Why Use GNU Parallel?

Expand All @@ -18,7 +18,7 @@ import sys
print “Hello World from task: ”, sys.argv[1]
```

Now create a job script called `run_hello.sh` that will use GNU Parallel to submit as many instances of your python script as you want. Before running GNU Parallel in our script, we need to load the Python and GNU Parallel modules. Your job script should look something like this:
Now create a job script called `run_hello.sh` that will use GNU Parallel to run as many instances of your python script as you want. Before running GNU Parallel in our script, we need to load the Python and GNU Parallel modules. Your job script should look something like this:

```bash
#!/bin/bash
Expand All @@ -38,7 +38,7 @@ my_srun="srun --export=all --exclusive -n1 --cpus-per-task=1 --cpu-bind=cores"
$my_parallel "$my_srun python hello_World.py" ::: {1..20}
```

Note the last three lines of the script. We customize the GNU Parallel `parallel` command by creating a variable called `$my_parallel` that delays the submission of each task by 0.2 seconds (`--delay 0.2`) which mitigates bottlenecks for tasks that have heavy I/O when they start, and which specifies the number of tasks to run simultaneously (`-j $SLURM_NTASKS`). The environment variable `$SLURM_NTASKS` is set by Slurm at runtime and contains the number of `—ntasks` (cores) requested in the `#SBATCH` directives near the top of the job script (in this case the value is 4). We then customize the `srun` command so that it properly allocates the GNU parallel tasks to the allocated cores (`--export=all --exclusive -N1 -n1 --cpus-per-task=1 --cpu-bind=cores`). Note that the use of `srun` will also ensure that GNU parallel runs properly for cases where we request cores across multiple nodes (e.g., if we request `--ntasks=100`). Finally, we invoke GNU Parallel to run our python script 20 times using the customized `parallel` and `srun` commands we just created, `$my_parallel` and `$my_srun` respectively. Submitting this script via `sbatch` will run the commands. A successful job will result in output that looks something like this:
Note the last three lines of the script. We customize the GNU Parallel `parallel` command by creating a variable called `$my_parallel` that delays the execution of each task by 0.2 seconds (`--delay 0.2`) which mitigates bottlenecks for tasks that have heavy I/O when they start, and which specifies the number of tasks to run simultaneously (`-j $SLURM_NTASKS`). The environment variable `$SLURM_NTASKS` is set by Slurm at runtime and contains the number of `—ntasks` (cores) requested in the `#SBATCH` directives near the top of the job script (in this case the value is 4). We then customize the `srun` command so that it properly allocates the GNU parallel tasks to the allocated cores (`--export=all --exclusive -N1 -n1 --cpus-per-task=1 --cpu-bind=cores`). Note that the use of `srun` will also ensure that GNU parallel runs properly for cases where we request cores across multiple nodes (e.g., if we request `--ntasks=100`). Finally, we invoke GNU Parallel to run our python script 20 times using the customized `parallel` and `srun` commands we just created, `$my_parallel` and `$my_srun` respectively. Running this script via `sbatch` will run the commands. A successful job will result in output that looks something like this:

```
Hello World from task: 1
Expand Down

0 comments on commit 37727fa

Please sign in to comment.