In [1]:
import Pkg;
Pkg.activate(@__DIR__)
Pkg.status()

[32m[1m  Activating[22m[39m project at `/global/u1/b/blaschke/juliacon24-hpcworkshop/parts/mpi/explanation`


[32m[1mStatus[22m[39m `/global/u1/b/blaschke/juliacon24-hpcworkshop/parts/mpi/explanation/Project.toml`
  [90m[1520ce14] [39mAbstractTrees v0.4.5
  [90m[0e44f5e4] [39mHwloc v3.0.1
  [90m[da04e1cc] [39mMPI v0.20.20
  [90m[e7922434] [39mMPIClusterManagers v0.2.4
  [90m[6f74fd91] [39mNetworkInterfaceControllers v0.1.0


# Julia + Jupyter + MPI

([Back to Overview](../index.html#/0/8))

`MPI.jl` provides wrappers for the system MPI libraries. And the `MPIClusterManagers.jl` package lets you control MPI workflows within Julia

## MPI.jl

In [2]:
using MPI

`MPI.versioninfo()` tells you which MPI backend is being used by `MPI.jl`. On HPC systems, which rely on vendor-provided MPI implementations (e.g. on HPE Cray systems like Perlmutter), make sure that `MPI.jl` loads the "right" `libmpi.so`:

In [3]:
MPI.versioninfo()

MPIPreferences:
  binary:  system
  abi:     MPICH
  libmpi:  libmpi_gnu_123.so
  mpiexec: srun

Package versions
  MPI.jl:             0.20.20
  MPIPreferences.jl:  0.1.11

Library information:
  libmpi:  libmpi_gnu_123.so
  libmpi dlpath:  /opt/cray/pe/lib64/libmpi_gnu_123.so
  MPI version:  3.1.0
  Library version:  
    MPI VERSION    : CRAY MPICH version 8.1.28.29 (ANL base 3.4a2)
    MPI BUILD INFO : Wed Nov 15 20:57 2023 (git hash 1cde46f)
    


## MPIClusterManagers.jl

`MPIClusterManagers.jl` provide a way for Jupyter to connect to MPI processes.

On Perlmutter, we have a choice among network interfaces:

In [4]:
using NetworkInterfaceControllers, Sockets
interfaces = NetworkInterfaceControllers.get_interface_data(IPv4)

6-element Vector{NetworkInterfaceControllers.Interface}:
 NetworkInterfaceControllers.Interface("nmn0", :v4, ip"10.100.108.135")
 NetworkInterfaceControllers.Interface("hsn0", :v4, ip"10.249.43.24")
 NetworkInterfaceControllers.Interface("hsn0:chn", :v4, ip"128.55.84.223")
 NetworkInterfaceControllers.Interface("hsn1", :v4, ip"10.249.43.8")
 NetworkInterfaceControllers.Interface("hsn2", :v4, ip"10.249.43.7")
 NetworkInterfaceControllers.Interface("hsn3", :v4, ip"10.249.43.23")

Buf we have to be careful about which network we connect to:

In [6]:
import Base: filter, Fix1
filter(f::Function)::Function = Fix1(filter, f)

filter (generic function with 11 methods)

In [52]:
using Hwloc, AbstractTrees

import AbstractTrees: PreOrderDFS
import Hwloc: hwloc_pci_class_string

sys_devs = children(gettopology())
pci_devs = PreOrderDFS(sys_devs) |> collect |> filter(x->x.type==:PCI_Device)
net_devs = pci_devs |> filter(x->hwloc_pci_class_string(nodevalue(x).attr.class_id) == "Ethernet")

;

In [54]:
# net_devs are populated using Hwloc, please take a look at the source notebook
# for further information

for dev in net_devs
    io = dev.io_children |> only
    name = io.object.name
    kind = io.object.subtype
    kind = kind == "" ? "Unknown" : kind
    println("Device $(name) is a $(kind) device")
end

Device hsn0 is a Slingshot device
Device nmn0 is a Unknown device
Device hsn1 is a Slingshot device
Device hsn2 is a Slingshot device
Device hsn3 is a Slingshot device


Therefore only the `hsn*` defivices are Slingshot devices.

Let's now use this information to find a HSN device with which we manage our MPI cluster. Note: we'll take the one with `:chn` in the name (as it's the only one with a public IP:

In [65]:
hsn0_public = filter(x->(x.name=="hsn0:chn" && x.version==:v4), interfaces) |> only 

NetworkInterfaceControllers.Interface("hsn0:chn", :v4, ip"128.55.84.223")

In [66]:
public_slingshot_name = getnameinfo(hsn0_public.ip)

"nid200448-hsn0"

## MPI Worker Cluster

We use `MPIClusterManagers.jl` to start a cluster of workers. Each worker uses MPI to communicate (`MPIWorkerManager` stars an `srun` session), and is controlled via the device at `public_slingshot_name` (previous section):

In [67]:
# to import MPIManager
using MPIClusterManagers

# need to also import Distributed to use addprocs()
using Distributed

# specify, number of mpi workers, launch cmd, etc.
manager=MPIWorkerManager(4)

# start mpi workers and add them as julia workers too.
addprocs(
    manager,
    exeflags=`--project=$(Base.active_project())`,
    master_tcp_interface=public_slingshot_name
)

4-element Vector{Int64}:
 2
 3
 4
 5

Now we can use `@mpi_do` to issue instructions to all of our MPI workers:

In [69]:
@mpi_do manager begin
    using MPI: MPI, Comm, Win, free
    comm = MPI.COMM_WORLD
    rank = MPI.Comm_rank(comm)
    size = MPI.Comm_size(comm)
    name = gethostname()
    println("Hello world, I am $(rank) of $(size) on $(name)")
end

      From worker 5:	Hello world, I am 3 of 4 on nid200453
      From worker 2:	Hello world, I am 0 of 4 on nid200448
      From worker 4:	Hello world, I am 2 of 4 on nid200452
      From worker 3:	Hello world, I am 1 of 4 on nid200449


We started this in a 4-node job. Therefore each worker is on a different node.