Skip to content

Commit

Permalink
programming and parallelization updates
Browse files Browse the repository at this point in the history
  • Loading branch information
Trevhall52 committed Dec 16, 2022
1 parent 9c2d605 commit 39afaa1
Show file tree
Hide file tree
Showing 4 changed files with 43 additions and 66 deletions.
21 changes: 11 additions & 10 deletions docs/programming/MPI-C.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,11 +16,11 @@ __Resources:__

### Setup and “Hello, World”

Begin by logging into the cluster and using ssh to log in to a compile
node. This can be done with the command:
Begin by logging into the cluster and logging in to a compile
node. This can be done by loading the Alpine scheduler and using the command:

```bash
ssh scompile
acompile
```

Next we must load MPI into our environment. Begin by loading in your
Expand Down Expand Up @@ -84,7 +84,7 @@ directives:
> This function cleans up the MPI environment and ends MPI communications.
These four directives should be enough to get our parallel 'hello
world' running. We will begin by creating `two variables`,
world' running. We will begin by creating two variables,
`process_Rank`, and `size_Of_Cluster`, to store an identifier for each
of the parallel processes and the number of processes running in the
cluster respectively. We will also implement the `MPI_Init` function
Expand Down Expand Up @@ -161,19 +161,19 @@ __Intel MPI__
mpiicc hello_world_mpi.cpp -o hello_world_mpi.exe
```

This will produce an executable we can pass to Summit as a job. In
This will produce an executable we can pass to the cluster as a job. In
order to execute MPI compiled code, a special command must be used:

```bash
mpirun -np 4 ./hello_world_mpi.exe
```

The flag -np specifies the number of processor that are to be utilized
The flag `-np` specifies the number of processor that are to be utilized
in execution of the program.

In your job script, load the same compiler and OpenMPI
choices you used above to compile the program, and run the job with
slurm to execute the application. Your job script should look
Slurm to execute the application. Your job script should look
something like this:

__OpenMPI__
Expand All @@ -183,7 +183,8 @@ __OpenMPI__
#SBATCH -N 1
#SBATCH --ntasks 4
#SBATCH --job-name parallel_hello
#SBATCH --partition shas-testing
#SBATCH --constraint ib
#SBATCH --partition atesting
#SBATCH --time 0:01:00
#SBATCH --output parallel_hello_world.out

Expand Down Expand Up @@ -214,8 +215,8 @@ module load impi
mpirun -np 4 ./hello_world_mpi.exe
```

It is important to note that on Summit, there is a total of 24 cores
per node. For applications that require more than 24 processes, you
It is important to note that on Alpine, there is a total of 64 cores
per node. For applications that require more than 64 processes, you
will need to request multiple nodes in your job. Our
output file should look something like this:

Expand Down
18 changes: 10 additions & 8 deletions docs/programming/MPI-Fortran.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,11 +16,11 @@ __Helpful MPI tutorials:__

### Setup and “Hello World”

Begin by logging into the cluster and using ssh to log in to a Summit
compile node. This can be done with the command:
Begin by logging into the cluster and logging in to a
compile node. This can be done by loading the Alpine module and using the command:

```shell
ssh scompile
acompile
```

Next we must load MPI into our environment. Begin by loading in the
Expand Down Expand Up @@ -150,7 +150,7 @@ order to execute MPI compiled code, a special command must be used:
mpirun -np 4 ./hello_world_mpi.exe
```

The flag -np specifies the number of processor that are to be utilized
The flag `-np` specifies the number of processor that are to be utilized
in execution of the program. In your job script, load the
same compiler and OpenMPI choices you used above to create and compile
the program, and run the job to execute the application. Your
Expand All @@ -163,7 +163,8 @@ __GNU Fortran Compiler__
#SBATCH -N 1
#SBATCH --ntasks 4
#SBATCH --job-name parallel_hello
#SBATCH --partition shas-testing
#SBATCH --partition atesting
#SBATCH --constraint ib
#SBATCH --time 0:01:00
#SBATCH --output parallel_hello_world.out

Expand All @@ -182,7 +183,8 @@ __Intel Fortran Compiler__
#SBATCH -N 1
#SBATCH --ntasks 4
#SBATCH --job-name parallel_hello
#SBATCH --partition shas-testing
#SBATCH --partition atesting
#SBATCH --constraint ib
#SBATCH --time 0:01:00
#SBATCH --output parallel_hello_world.out

Expand All @@ -194,8 +196,8 @@ module load impi
mpirun -np 4 ./hello_world_mpi.exe
```

It is important to note that on Summit, there are 24 cores per
node. For applications that require more than 24 processes, you will
It is important to note that on Alpine, there are 64 cores per
node. For applications that require more than 64 processes, you will
need to request multiple nodes in your job (i.e., "" -N
<number of nodes> "").

Expand Down
46 changes: 11 additions & 35 deletions docs/programming/MPIBestpractices.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
## MPI Best practices
MPI or Message Passing Interface is a powerful library standard that allows for the parallel execution of applications across multiple processors on a system. It differs from other parallel execution libraries like OpenMP by also allowing a user to run their applications across multiple nodes. Unfortunately it can sometimes be a bit tricky to run a compiled MPI application within an HPC resource. The following page outlines best practices in running your MPI applications across Alpine, Summit and Blanca resources.
MPI or Message Passing Interface is a powerful library standard that allows for the parallel execution of applications across multiple processors on a system. It differs from other parallel execution libraries like OpenMP by also allowing a user to run their applications across multiple nodes. Unfortunately it can sometimes be a bit tricky to run a compiled MPI application within an HPC resource. The following page outlines best practices in running your MPI applications across CURC resources.

Please note that this page *does not* go over compiling or optimization of MPI applications.

Expand Down Expand Up @@ -42,63 +42,38 @@ On summit and Blanca, in most situations you will want to try to compile and run
### Commands to Run MPI Applications
Regardless of compiler or MPI distribution, there are 3 “wrapper” commands that will run MPI applications: `mpirun`, `mpiexec`, and `srun`. These “wrapper” commands should be used after loading in your desired compiler and MPI distribution and simply prepend whatever application you wish to run. Each command offers their own pros and cons alongside nuance as to how they function.

`mpirun` is probably the most direct method to run MPI applications with the command being tied to the distribution. This means distribution dependent flags can be passed into the command as well as the command being the most reliable to work with:
`mpirun` is probably the most direct method to run MPI applications with the command being tied to the distribution. This means distribution dependent flags can be passed directly through the command.

```
mpirun -np <core-count> ./<your-application>
```

`mpiexec` is a standardized MPI command execution command that allows for more general MPI flags to be passed. This means that commands you use of one MPI distribution can be used on another MPI distribution.
`mpiexec` is a standardized MPI command execution command that allows for more general MPI flags to be passed. This means that commands are universal accross all distributions.

```
mpiexec -np <core-count> ./<your-application>
```

The final command `srun` is probably the most abstracted away from a specific implementation. This command lets Slurm figure out specific MPI features that are available in your environment and handles running the process as a job. This command is usually a little less efficient and may have some issues in reliability.
The final command `srun` is probably the most abstracted away from a specific implementation. This command lets Slurm figure out specific MPI features that are available in your environment and handles running the process as a job. This command is usually a little less efficient and may have some issues with reliability.

```
srun -n <core-count> ./<your-application>
```

RC usually recommends `mpirun` and `mpiexec` for simplicity and reliability when running MPI applications. `srun` should be used sparing to avoid issues with execution.

### Running MPI on Summit

Because Summit exists as a mostly homogeneous compute cluster, running MPI applications across nodes isn’t usually too troublesome.

Simply select your Compiler/MPI and MPI wrapper command you wish to use, and place them all in a job script. Below is an example of what this can look like. In this example we run a 48 core, 4 hour job with the Intel compiler and Intel distribution of MPI:

```
#!/bin/bash
#SBATCH --nodes=2
#SBATCH --time=04:00:00
#SBATCH --partition=shas
#SBATCH --ntasks=48
#SBATCH --job-name=mpi-job
#SBATCH --output=mpi-job.%j.out
source /curc/sw/opt/spack/linux-rhel7-haswell/gcc-4.8.5/lmod-8.3-pvwkxsyumgym34z7b7cq52uny77cfx4l/lmod/lmod/init/bashexport MODULEPATH=/curc/sw/modules/spack/spring2020/linux-rhel7-x86_64/Core
module purge
module load intel intel-mpi
#Run a 48 core job across 2 nodes:
mpirun -np $SLURM_NTASKS /path/to/mycode.exe
#Note: $SLURM_NTASKS has a value of the amount of cores you requested
```
RC usually recommends `mpirun` and `mpiexec` for simplicity and reliability when running MPI applications. `srun` should be used sparingly to avoid issues with execution.

### Running MPI on Alpine

Alpine is the successor to Summit, and is built in a similar way, so running MPI jobs is relatively straightforward. One caveat on Alpine is that MPI jobs cannot be run across chassis, which limits them to a maximum `--ntask` count of 4096 cores (64 nodes per chassis * 64 cores each).

Simply select your Compiler/MPI and MPI wrapper command you wish to use, and place them all in a job script. Below is an example of what this can look like. In this example we run a 128 core, 4 hour job with a gcc compiler and OpenMPI:
Simply select the Compiler and MPI wrapper you wish to use and place it in a job script. In the following example, we run a 128 core, 4 hour job with a gcc compiler and OpenMPI:

```
#!/bin/bash
#SBATCH --nodes=2
#SBATCH --time=04:00:00
#SBATCH --partition=amilan-ucb
#SBATCH --partition=amilan
#SBATCH --constraint=ib
#SBATCH --ntasks=128
#SBATCH --job-name=mpi-job
#SBATCH --output=mpi-job.%j.out
Expand All @@ -113,10 +88,11 @@ mpirun -np $SLURM_NTASKS /path/to/mycode.exe
#Note: $SLURM_NTASKS has a value of the amount of cores you requested
```
When running MPI jobs on Alpine, you can use the `--constraint=ib` flag to force the job onto an Alpine node that has Infiniband, the networking fabric used by MPI.

### Running MPI on Blanca

Unlike Summit, Blanca is often a bit more complicated because of the diverse variety of nodes it is composed of. In general, there are 3 types of nodes on Blanca that can all run single node multi-core MPI processes that may require additional flags and parameters to achieve cross node parallelism.
Blanca is often a bit more complicated due to the variety of nodes available. In general, there are 3 types of nodes on Blanca that can all run single node multi-core MPI processes that may require additional flags and parameters to achieve cross node parallelism.

#### General Blanca Nodes
General Blanca nodes are not intended to run multi-node processes but this can still be achieved through the manipulation of some network fabric settings. In order to achieve cross node parallelism we must force MPI to utilize ethernet instead of our normal high speed network fabric. We can enforce this with various `mpirun` flags for each respective compiler.
Expand All @@ -134,7 +110,7 @@ Please note that this does not ensure high speed communications in message passi


#### Blanca HPC
Blanca HPC come equipped with Infiniband high speed interconnects that would allow for high speed communication between nodes. These nodes supoort the Intel and Intel MPI compiler/MPI combo, as well as the gcc/openmpi_ucx modules _(note: bve sure to use the *ucx* version of the OpenMPI module)_.
Blanca HPC comes equipped with Infiniband high speed interconnects that would allow for high speed communication between nodes. These nodes supoort the Intel and Intel MPI compiler/MPI combo, as well as the gcc/openmpi_ucx modules _(note: bve sure to use the *ucx* version of the OpenMPI module)_.

Blanca HPC nodes can easily be distinguished from other Blanca nodes with the node's name in the cluster. Nodes will clearly be distinguished with the `bhpc` prefix. They also will have the `edr` feature in their feature list if you query them with `scontrol show node`. If you are using OpenMPI, jobs on Blanca HPC nodes can be run using `mpirun` without any special arguments, although be sure to `export SLURM_EXPORT_ENV=ALL` prior to invoking `mpirun`. If you are using IMPI, select the `ofa` (Open Fabrics Alliance) option to enable Infiniband-based message passing, the fastest interconnect availble on the `bhpc` nodes. You can do this with the following flag:

Expand Down
24 changes: 11 additions & 13 deletions docs/programming/parallel-programming-fundamentals.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,11 @@ We will also assess two parallel programming solutions that utilize the multipro

__Useful Links:__

[https://computing.llnl.gov/tutorials/parallel_comp/#Whatis](https://computing.llnl.gov/tutorials/parallel_comp/#Whatis)
[https://hpc.llnl.gov/documentation/tutorials/introduction-parallel-computing-tutorial##Whatis](https://hpc.llnl.gov/documentation/tutorials/introduction-parallel-computing-tutorial##Whatis)

### Why Parallel?

Say you are attempting to assemble a 10,000-piece jigsaw puzzle* on
Say you are attempting to assemble a 10,000-piece jigsaw puzzle\* on
a rainy weekend. The number of pieces is staggering, and instead of a
weekend it takes you several weeks to finish the puzzle. Now assume
you have a team of friends helping with the puzzle. It progresses much faster,
Expand All @@ -21,7 +21,7 @@ smaller tasks that multiple processors can perform all at once. With
parallel processes a task that would normally take several weeks can
potentially be reduced to several hours.

* Puzzle analogy for describing parallel computing adopted from Henry
\* Puzzle analogy for describing parallel computing adopted from Henry
Neeman's [Supercomputing in Plain
English](http://www.oscer.ou.edu/education.php) tutorial series.

Expand Down Expand Up @@ -64,7 +64,7 @@ network.

![](https://hpc.llnl.gov/sites/default/files/distributed_mem.gif "distributed memory model")

(Image courtesy of LLNL <https://computing.llnl.gov/tutorials/parallel_comp/>)
(Image courtesy of LLNL <https://hpc.llnl.gov/documentation/tutorials/introduction-parallel-computing-tutorial##MemoryArch>)

__Distributed/Shared Model:__

Expand All @@ -74,17 +74,17 @@ processors sharing a set of common memory is called a node.

![](https://hpc.llnl.gov/sites/default/files/hybrid_mem2.gif "hybrid_model")

(Image courtesy of LLNL <https://computing.llnl.gov/tutorials/parallel_comp/> )
(Image courtesy of LLNL <https://hpc.llnl.gov/documentation/tutorials/introduction-parallel-computing-tutorial##MemoryArch> )

Summit utilizes a hybrid distributed/shared model: there are 380
nodes, each having 24 cores.
Alpine utilizes a hybrid distributed/shared model: there are 188 AMD Milan Compute
nodes, 184 having 64 cores. 4 with 48 cores.

### Tools for Parallel Programming

Two common solutions for creating parallel code are OpenMP and
MPI. Both solutions are limited to the C++ or Fortran programming
languages. (Though other languages may be extended with C++ or Fortran
code to utilize OpenMP or MPI.)
languages (though, other languages may be extended with C++ or Fortran
code to utilize OpenMP or MPI).

#### OpenMP

Expand All @@ -95,8 +95,7 @@ your code. OpenMP is often considered more user friendly with thread
safe methods and parallel sections of code that can be set with simple
scoping. OpenMP is, however, limited to the amount of threads
available on a node -- in other words, it follows a shared memory
model. On Summit, this means that no more than 24 processors can be
utilized with programs parallelized using OpenMP.
model. On a node with 64 CPUs, you can use no more than 64 processors.

#### MPI

Expand All @@ -108,7 +107,6 @@ applications (i.e, distributed memory models). MPI is, however, often
considered less accessable and more difficult to learn. Regardless, learning the library
provides a user with the ability to maximize processing ability. MPI
is a library standard, meaning there are several libraries based on
MPI that you can use to develop parallel code. Two solutions available
on Summit are OpenMPI and Intel MPI.
MPI that you can use to develop parallel code. OpenMPI and Intel MPI are solutions available on most CURC systems.

Couldn't find what you need? [Provide feedback on these docs!](https://forms.gle/bSQEeFrdvyeQWPtW9)

0 comments on commit 39afaa1

Please sign in to comment.