# MPI Guide
In this guide we will be running some multi node job using the message passing interface (MPI). Since Slurm is a popular scheduler used by many HPC and supercomputer, mainstream MPI implementation has built-in support for slurm. If you launch a MPI software within a slurm job, it is able to recognise the slurm environment and launch the software accordingly (ie. launching right amount of paralle process and use the correct allocated nodes). So you don't need to bother writing a machinefile/hostfile, or manually putting the `-np` option.    
For more information check out [Slurm - MPI User Guide](/doc/mpi_guide.html).

## Example: calculate $\pi$
In this example, we would estimate the value of $\pi$ using the [Monte Carlo Method](https://en.wikipedia.org/wiki/Monte_Carlo_methodhttps://en.wikipedia.org/wiki/Monte_Carlo_method), with [OpenMPI](https://www.open-mpi.org/). 
In this lab environment, some of the software and libraries are managed using [Environment Modules](https://modules.readthedocs.io/en/latest/), which is a very convenient way of manage multiple libraries, software, different or even conflicting version of them. 

In [None]:
# check available modules
module avail

In [None]:
# loading the mpi module
module load mpi

In [None]:
# list loaded modules
module list

Next we can take a look at the code and build it. 

In [None]:
cat mpi-pi/parallel-pi.c

In [None]:
make --directory mpi-pi

Now we are ready to run the code. Slurm provide many way of running a MPI program, one of them is using the `--mpi` option of srun. With this option you can launch the mpi program even from the submission host and see the stdout right there, but the actual execution happens in the compute node. For more detail of the option check out the manpage of [`srun --mpi`](/doc/srun.html#OPT_mpi).  
For starters, use option `--ntasks <N>` to specify how many MPI process you would like to run. If you have more specific requirement of no. of nodes, process, memory, you could use a combination of `--nodes`, `--ntasks-per-node`, `--cpus-per-task`, `--mem` options. 

In [None]:
# 2 parallel process on 1 node
srun --nodes=1 --ntasks-per-node=2 --mpi=pmi2 mpi-pi/parallel-pi

In [None]:
# 8 parallel process, cross node
srun --ntasks=8 --mpi=pmi2 mpi-pi/parallel-pi

In [None]:
# request 4 nodes, 2 process on each node
srun --nodes=4 --ntasks-per-node=2 --mem=0 --mpi=pmi2 mpi-pi/parallel-pi

You might find it weird to see a multi node execution run much slower then single node run, That is because of MPI_Reduce being called unnecessary often. Each time this function is called, a barrier is setup, all process stop and synchronized to exchange data, and this is a very costly operation across node.  
In the next section we will run the HPL benchmark, which doesn't have such an issue, and even offer OpenMP multithreading option to further reduce cross-node synchronization and communication. 

## HPL Benchmark
The [High Perfomance Linpack (HPL)](https://netlib.org/benchmark/hpl/) is a common benchmark in HPC/Supercomputing, It measures how many Floating-point operations per second (FLOPS) a cluster is capable of doing to rate it computational power. HPL is commonly use in ranking the best supercomputer in the world, UAT of new cluster/hardware, or as a stress test after hardware replacement in HPC. In this section we will build and run the HPL benchmark via slurm.

### Install Spack
[Spack](https://spack.io/) is a HPC software package manager, many compilers and HPC software are available and they are build from source locally when you install them. It is one of the 10 initial project in the [High Performance Software Foundation](https://hpsfoundation.github.io/#projects) formed by the [Linux Foundation](https://www.linuxfoundation.org/press/linux-foundation-announces-intent-to-form-high-performance-software-foundation-hpsf). We are going to install spack into our container lab cluster, and then build the HPL benchmark using spack.

In [None]:
git clone -c feature.manyFiles=true https://github.com/spack/spack.git ~/.local/spack
git -C ~/.local/spack checkout v0.22.3

# add this line to setup spack on login
ansible -m lineinfile -a "path=${HOME}/.bashrc line='source ~/.local/spack/share/spack/setup-env.sh'" localhost

# activate spack
source ~/.local/spack/share/spack/setup-env.sh
which spack
spack config add modules:default:enable:[lmod]

# detect available compilers
spack compiler find
spack compilers

### Install HPL with Spack

In [None]:
spack list hpl

List and confirm the configuration spack is going to use for installing hpl

In [None]:
# spack spec hpl+openmp^openmpi+internal-pmix+internal-hwloc schedulers=slurm ^slurm+pmix
spack spec hpl+openmp^openmpi+internal-pmix+internal-hwloc

Building HPL and all the dependency in parallel, span across 2 node. Make sure your spack is installed on a shared flock-supported file system, and you didn't turn off the default locking mechanism of spack. This could go terribly wrong otherwise. This installation takes long time to complete.

In [None]:
# srun --nodes=4 --ntasks-per-node=1 --exclusive spack install hpl+openmp^openmpi+internal-pmix+internal-hwloc schedulers=slurm ^slurm+pmix
srun --nodes=4 --ntasks-per-node=1 --exclusive spack install hpl+openmp^openmpi+internal-pmix+internal-hwloc

# verify hpl has been installed & setup module
spack find hpl

for mod_path in $( find ~/.local/spack/share/spack/lmod -iname "*.lua" | xargs dirname | xargs dirname | uniq ); do
    module use $mod_path
    ansible -m lineinfile -a "path=${HOME}/.bashrc line='module use $mod_path'" localhost
done

module avail

### Run HPL
To run the HPL benchmark we need to prepare a HPL.dat file that describe the problem size and the configuration of running the benchmark, we can also choose between running it in pure multi-process MPI or a hybrid MPI + OpenMP execution. 

In [None]:
# load hpl from module or spack
module load hpl || spack load hpl

Example HPL.dat file. For detail and tuning of these parameters please refer to the [HPL Tuning Guide](https://www.netlib.org/benchmark/hpl/tuning.html).

In [None]:
cat ./HPL.dat

In [None]:
# Just MPI
srun --ntasks=8 --pty mpirun --bind-to none xhpl

In [None]:
# MPI + OpenMP hybrid run
OMP_NUM_THREADS=2 srun --nodes=4 --ntasks-per-node=1 --cpus-per-task=2 --pty mpirun --bind-to none xhpl

Next is an example of sbatch HPL running script, that generates a HPL.dat file using environment variables provided by Slurm. This allows a bigger longer run when more resource is requested for the job.

In [None]:
# Example sbatch job script
cat ./hpl-job.sh

In [None]:
# MPI + OpenMP Hybrid run with sbatch job script
sbatch --nodes=4 --ntasks-per-node=1 --cpus-per-task=2 ./hpl-job.sh