programming and parallelization updates

ResearchComputing · Dec 16, 2022 · 39afaa1 · 39afaa1
1 parent 9c2d605
commit 39afaa1
Show file tree

Hide file tree

Showing 4 changed files with 43 additions and 66 deletions.
diff --git a/docs/programming/MPI-C.md b/docs/programming/MPI-C.md
@@ -16,11 +16,11 @@ __Resources:__
 
 ### Setup and “Hello, World”
 
-Begin by logging into the cluster and using ssh to log in to a compile
-node. This can be done with the command:
+Begin by logging into the cluster and logging in to a compile
+node. This can be done by loading the Alpine scheduler and using the command:
 
 ```bash
-ssh scompile
+acompile
 ```
 
 Next we must load MPI into our environment. Begin by loading in your
@@ -84,7 +84,7 @@ directives:
 > This function cleans up the MPI environment and ends MPI communications.
 
 These four directives should be enough to get our parallel 'hello
-world' running. We will begin by creating `two variables`,
+world' running. We will begin by creating two variables,
 `process_Rank`, and `size_Of_Cluster`, to store an identifier for each
 of the parallel processes and the number of processes running in the
 cluster respectively. We will also implement the `MPI_Init` function
@@ -161,19 +161,19 @@ __Intel MPI__
 mpiicc hello_world_mpi.cpp -o hello_world_mpi.exe
 ```
 
-This will produce an executable we can pass to Summit as a job. In
+This will produce an executable we can pass to the cluster as a job. In
 order to execute MPI compiled code, a special command must be used:
 
 ```bash
 mpirun -np 4 ./hello_world_mpi.exe
 ```
 
-The flag -np specifies the number of processor that are to be utilized
+The flag `-np` specifies the number of processor that are to be utilized
 in execution of the program.
 
 In your job script, load the same compiler and OpenMPI
 choices you used above to compile the program, and run the job with
-slurm to execute the application. Your job script should look
+Slurm to execute the application. Your job script should look
 something like this:
 
 __OpenMPI__
@@ -183,7 +183,8 @@ __OpenMPI__
 #SBATCH -N 1
 #SBATCH --ntasks 4
 #SBATCH --job-name parallel_hello
-#SBATCH --partition shas-testing
+#SBATCH --constraint ib
+#SBATCH --partition atesting
 #SBATCH --time 0:01:00
 #SBATCH --output parallel_hello_world.out
 
@@ -214,8 +215,8 @@ module load impi
 mpirun -np 4 ./hello_world_mpi.exe
 ```
 
-It is important to note that on Summit, there is a total of 24 cores
-per node. For applications that require more than 24 processes, you
+It is important to note that on Alpine, there is a total of 64 cores
+per node. For applications that require more than 64 processes, you
 will need to request multiple nodes in your job. Our
 output file should look something like this:
 

diff --git a/docs/programming/MPI-Fortran.md b/docs/programming/MPI-Fortran.md
@@ -16,11 +16,11 @@ __Helpful MPI tutorials:__
 
 ### Setup and “Hello World”
 
-Begin by logging into the cluster and using ssh to log in to a Summit
-compile node. This can be done with the command:
+Begin by logging into the cluster and logging in to a
+compile node. This can be done by loading the Alpine module and using the command:
 
 ```shell
-ssh scompile
+acompile
 ```
 
 Next we must load MPI into our environment. Begin by loading in the
@@ -150,7 +150,7 @@ order to execute MPI compiled code, a special command must be used:
 mpirun -np 4 ./hello_world_mpi.exe
 ```
 
-The flag -np specifies the number of processor that are to be utilized
+The flag `-np` specifies the number of processor that are to be utilized
 in execution of the program.  In your job script, load the
 same compiler and OpenMPI choices you used above to create and compile
 the program, and run the job to execute the application. Your
@@ -163,7 +163,8 @@ __GNU Fortran Compiler__
 #SBATCH -N 1
 #SBATCH --ntasks 4
 #SBATCH --job-name parallel_hello
-#SBATCH --partition shas-testing
+#SBATCH --partition atesting
+#SBATCH --constraint ib
 #SBATCH --time 0:01:00
 #SBATCH --output parallel_hello_world.out
 
@@ -182,7 +183,8 @@ __Intel Fortran Compiler__
 #SBATCH -N 1
 #SBATCH --ntasks 4
 #SBATCH --job-name parallel_hello
-#SBATCH --partition shas-testing
+#SBATCH --partition atesting
+#SBATCH --constraint ib
 #SBATCH --time 0:01:00
 #SBATCH --output parallel_hello_world.out
 
@@ -194,8 +196,8 @@ module load impi
 mpirun -np 4 ./hello_world_mpi.exe
 ```
 
-It is important to note that on Summit, there are 24 cores per
-node. For applications that require more than 24 processes, you will
+It is important to note that on Alpine, there are 64 cores per
+node. For applications that require more than 64 processes, you will
 need to request multiple nodes in your job (i.e., "" -N
 <number of nodes> "").
 

diff --git a/docs/programming/MPIBestpractices.md b/docs/programming/MPIBestpractices.md
@@ -1,5 +1,5 @@
 ## MPI Best practices
-MPI or Message Passing Interface is a powerful library standard that allows for the parallel execution of applications across multiple processors on a system. It differs from other parallel execution libraries like OpenMP by also allowing a user to run their applications across multiple nodes. Unfortunately it can sometimes be a bit tricky to run a compiled MPI application within an HPC resource. The following page outlines best practices in running your MPI applications across Alpine, Summit and Blanca resources.  
+MPI or Message Passing Interface is a powerful library standard that allows for the parallel execution of applications across multiple processors on a system. It differs from other parallel execution libraries like OpenMP by also allowing a user to run their applications across multiple nodes. Unfortunately it can sometimes be a bit tricky to run a compiled MPI application within an HPC resource. The following page outlines best practices in running your MPI applications across CURC resources.  
 
 Please note that this page *does not* go over compiling or optimization of MPI applications.  
 
@@ -42,63 +42,38 @@ On summit and Blanca, in most situations you will want to try to compile and run
 ### Commands to Run MPI Applications
 Regardless of compiler or MPI distribution, there are 3 “wrapper” commands that will run MPI applications: `mpirun`, `mpiexec`, and `srun`. These “wrapper” commands should be used after loading in your desired compiler and MPI distribution and simply prepend whatever application you wish to run. Each command offers their own pros and cons alongside nuance as to how they function.  
 
-`mpirun` is probably the most direct method to run MPI applications with the command being tied to the distribution. This means distribution dependent flags can be passed into the command as well as the command being the most reliable to work with:  
+`mpirun` is probably the most direct method to run MPI applications with the command being tied to the distribution. This means distribution dependent flags can be passed directly through the command.  
 
 ```
 mpirun -np <core-count> ./<your-application>
 ```
 
-`mpiexec` is a standardized MPI command execution command that allows for more general MPI flags to be passed. This means that commands you use of one MPI distribution can be used on another MPI distribution.  
+`mpiexec` is a standardized MPI command execution command that allows for more general MPI flags to be passed. This means that commands are universal accross all distributions.  
 
 ```
 mpiexec -np <core-count> ./<your-application>
 ```
 
-The final command `srun` is probably the most abstracted away from a specific implementation. This command lets Slurm figure out specific MPI features that are available in your environment and handles running the process as a job. This command is usually a little less efficient and may have some issues in reliability.  
+The final command `srun` is probably the most abstracted away from a specific implementation. This command lets Slurm figure out specific MPI features that are available in your environment and handles running the process as a job. This command is usually a little less efficient and may have some issues with reliability.  
 
 ```
 srun -n <core-count> ./<your-application>
 ```
 
-RC usually recommends `mpirun` and `mpiexec` for simplicity and reliability when running MPI applications. `srun` should be used sparing to avoid issues with execution.
-
-### Running MPI on Summit
-
-Because Summit exists as a mostly homogeneous compute cluster, running MPI applications across nodes isn’t usually too troublesome.  
-
-Simply select your Compiler/MPI and MPI wrapper command you wish to use, and place them all in a job script. Below is an example of what this can look like. In this example we run a 48 core, 4 hour job with the Intel compiler and Intel distribution of MPI:  
-
-```
-#!/bin/bash
-#SBATCH --nodes=2
-#SBATCH --time=04:00:00
-#SBATCH --partition=shas
-#SBATCH --ntasks=48
-#SBATCH --job-name=mpi-job
-#SBATCH --output=mpi-job.%j.out
-
-source /curc/sw/opt/spack/linux-rhel7-haswell/gcc-4.8.5/lmod-8.3-pvwkxsyumgym34z7b7cq52uny77cfx4l/lmod/lmod/init/bashexport MODULEPATH=/curc/sw/modules/spack/spring2020/linux-rhel7-x86_64/Core
-
-module purge
-module load intel intel-mpi
-
-#Run a 48 core job across 2 nodes:
-mpirun -np $SLURM_NTASKS /path/to/mycode.exe
-
-#Note: $SLURM_NTASKS has a value of the amount of cores you requested
-```
+RC usually recommends `mpirun` and `mpiexec` for simplicity and reliability when running MPI applications. `srun` should be used sparingly to avoid issues with execution.
 
 ### Running MPI on Alpine
 
 Alpine is the successor to Summit, and is built in a similar way, so running MPI jobs is relatively straightforward. One caveat on Alpine is that MPI jobs cannot be run across chassis, which limits them to a maximum `--ntask` count of 4096 cores (64 nodes per chassis * 64 cores each).
 
-Simply select your Compiler/MPI and MPI wrapper command you wish to use, and place them all in a job script. Below is an example of what this can look like. In this example we run a 128 core, 4 hour job with a gcc compiler and OpenMPI:  
+Simply select the Compiler and MPI wrapper you wish to use and place it in a job script. In the following example, we run a 128 core, 4 hour job with a gcc compiler and OpenMPI:  
 
 ```
 #!/bin/bash
 #SBATCH --nodes=2
 #SBATCH --time=04:00:00
-#SBATCH --partition=amilan-ucb
+#SBATCH --partition=amilan
+#SBATCH --constraint=ib
 #SBATCH --ntasks=128
 #SBATCH --job-name=mpi-job
 #SBATCH --output=mpi-job.%j.out
@@ -113,10 +88,11 @@ mpirun -np $SLURM_NTASKS /path/to/mycode.exe
 
 #Note: $SLURM_NTASKS has a value of the amount of cores you requested
 ```
+When running MPI jobs on Alpine, you can use the `--constraint=ib` flag to force the job onto an Alpine node that has Infiniband, the networking fabric used by MPI.
 
 ### Running MPI on Blanca
 
-Unlike Summit, Blanca is often a bit more complicated because of the diverse variety of nodes it is composed of. In general, there are 3 types of nodes on Blanca that can all run single node multi-core MPI processes that may require additional flags and parameters to achieve cross node parallelism.  
+Blanca is often a bit more complicated due to the variety of nodes available. In general, there are 3 types of nodes on Blanca that can all run single node multi-core MPI processes that may require additional flags and parameters to achieve cross node parallelism.  
 
 #### General Blanca Nodes
 General Blanca nodes are not intended to run multi-node processes but this can still be achieved through the manipulation of some network fabric settings. In order to achieve cross node parallelism we must force MPI to utilize ethernet instead of our normal high speed network fabric. We can enforce this with various `mpirun` flags for each respective compiler.
@@ -134,7 +110,7 @@ Please note that this does not ensure high speed communications in message passi
 
 
 #### Blanca HPC
-Blanca HPC come equipped with Infiniband high speed interconnects that would allow for high speed communication between nodes. These nodes supoort the Intel and Intel MPI compiler/MPI combo, as well as the gcc/openmpi_ucx modules _(note: bve sure to use the *ucx* version of the OpenMPI module)_. 
+Blanca HPC comes equipped with Infiniband high speed interconnects that would allow for high speed communication between nodes. These nodes supoort the Intel and Intel MPI compiler/MPI combo, as well as the gcc/openmpi_ucx modules _(note: bve sure to use the *ucx* version of the OpenMPI module)_. 
 
 Blanca HPC nodes can easily be distinguished from other Blanca nodes with the node's name in the cluster. Nodes will clearly be distinguished with the `bhpc` prefix.  They also will have the `edr` feature in their feature list if you query them with `scontrol show node`. If you are using OpenMPI, jobs on  Blanca HPC nodes can be run using `mpirun` without any special arguments, although be sure to `export SLURM_EXPORT_ENV=ALL` prior to invoking `mpirun`.  If you are using IMPI, select the `ofa` (Open Fabrics Alliance) option to enable Infiniband-based message passing, the fastest interconnect availble on the `bhpc` nodes. You can do this with the following flag: 
 

diff --git a/docs/programming/parallel-programming-fundamentals.md b/docs/programming/parallel-programming-fundamentals.md
@@ -6,11 +6,11 @@ We will also assess two parallel programming solutions that utilize the multipro
 
 __Useful Links:__
 
-[https://computing.llnl.gov/tutorials/parallel_comp/#Whatis](https://computing.llnl.gov/tutorials/parallel_comp/#Whatis)
+[https://hpc.llnl.gov/documentation/tutorials/introduction-parallel-computing-tutorial##Whatis](https://hpc.llnl.gov/documentation/tutorials/introduction-parallel-computing-tutorial##Whatis)
 
 ### Why Parallel?
 
-Say you are attempting to assemble a 10,000-piece jigsaw puzzle* on
+Say you are attempting to assemble a 10,000-piece jigsaw puzzle\* on
 a rainy weekend. The number of pieces is staggering, and instead of a
 weekend it takes you several weeks to finish the puzzle. Now assume
 you have a team of friends helping with the puzzle. It progresses much faster,
@@ -21,7 +21,7 @@ smaller tasks that multiple processors can perform all at once. With
 parallel processes a task that would normally take several weeks can
 potentially be reduced to several hours.
 
-* Puzzle analogy for describing parallel computing adopted from Henry
+\* Puzzle analogy for describing parallel computing adopted from Henry
   Neeman's [Supercomputing in Plain
   English](http://www.oscer.ou.edu/education.php) tutorial series.
 
@@ -64,7 +64,7 @@ network.
 
 ![](https://hpc.llnl.gov/sites/default/files/distributed_mem.gif "distributed memory model")
 
-(Image courtesy of LLNL <https://computing.llnl.gov/tutorials/parallel_comp/>)
+(Image courtesy of LLNL <https://hpc.llnl.gov/documentation/tutorials/introduction-parallel-computing-tutorial##MemoryArch>)
 
 __Distributed/Shared Model:__
 
@@ -74,17 +74,17 @@ processors sharing a set of common memory is called a node.
 
 ![](https://hpc.llnl.gov/sites/default/files/hybrid_mem2.gif "hybrid_model")
 
-(Image courtesy of LLNL <https://computing.llnl.gov/tutorials/parallel_comp/> )
+(Image courtesy of LLNL <https://hpc.llnl.gov/documentation/tutorials/introduction-parallel-computing-tutorial##MemoryArch> )
 
-Summit utilizes a hybrid distributed/shared model: there are 380
-nodes, each having 24 cores.
+Alpine utilizes a hybrid distributed/shared model: there are 188 AMD Milan Compute
+nodes, 184 having 64 cores. 4 with 48 cores.
 
 ### Tools for Parallel Programming
 
 Two common solutions for creating parallel code are OpenMP and
 MPI. Both solutions are limited to the C++ or Fortran programming
-languages. (Though other languages may be extended with C++ or Fortran
-code to utilize OpenMP or MPI.)
+languages (though, other languages may be extended with C++ or Fortran
+code to utilize OpenMP or MPI).
 
 #### OpenMP
 
@@ -95,8 +95,7 @@ your code. OpenMP is often considered more user friendly with thread
 safe methods and parallel sections of code that can be set with simple
 scoping.  OpenMP is, however, limited to the amount of threads
 available on a node -- in other words, it follows a shared memory
-model. On Summit, this means that no more than 24 processors can be
-utilized with programs parallelized using OpenMP.
+model. On a node with 64 CPUs, you can use no more than 64 processors.
 
 #### MPI
 
@@ -108,7 +107,6 @@ applications (i.e, distributed memory models). MPI is, however, often
 considered less accessable and more difficult to learn. Regardless, learning the library
 provides a user with the ability to maximize processing ability. MPI
 is a library standard, meaning there are several libraries based on
-MPI that you can use to develop parallel code. Two solutions available
-on Summit are OpenMPI and Intel MPI.
+MPI that you can use to develop parallel code. OpenMPI and Intel MPI are solutions available on most CURC systems.
 
 Couldn't find what you need? [Provide feedback on these docs!](https://forms.gle/bSQEeFrdvyeQWPtW9)