AMPI is included in the source distribution of Charm++. To get the latest sources from PPL, visit: https://charm.cs.illinois.edu/software
and follow the download links. Then build Charm++ and AMPI from source.
The build script for Charm++ is called build
. The syntax for this
script is:
$ ./build <target> <version> <opts>
Users who are interested only in AMPI and not any other component of
Charm++ should specify <target>
to be AMPI-only
. This will build
Charm++ and other libraries needed by AMPI in a mode configured and
tuned exclusively for AMPI. To fully build Charm++ underneath AMPI for
use with either paradigm, or for interoperation between the two, specify
<target>
to be AMPI
.
<opts>
are command line options passed to the charmc
compile
script. Common compile time options such as
-g, -O, -Ipath, -Lpath, -llib
are accepted.
To build a debugging version of AMPI, use the option: -g
. To build a
production version of AMPI, use the option: --with-production
.
<version>
depends on the machine, operating system, and the
underlying communication library one wants to use for running AMPI
programs. See the charm/README file for details on picking the proper
version. Here is an example of how to build a debug version of AMPI in a
linux and ethernet environment:
$ ./build AMPI netlrts-linux-x86_64 -g
And the following is an example of how to build a production version of AMPI on a Cray XC system, with MPI-level error checking in AMPI turned off:
$ ./build AMPI-only gni-crayxc --with-production --disable-ampi-error-checking
AMPI can also be built with support for multithreaded parallelism on any communication layer by adding "smp" as an option after the build target. For example, on an Infiniband Linux cluster:
$ ./build AMPI-only verbs-linux-x86_64 smp --with-production
AMPI ranks are implemented as user-level threads with a stack size default of 1MB. If the default is not correct for your program, you can specify a different default stack size (in bytes) at build time. The following build command illustrates this for an Intel Omni-Path system:
$ ./build AMPI-only ofi-linux-x86_64 --with-production -DTCHARM_STACKSIZE_DEFAULT=16777216
The same can be done for AMPI’s RDMA messaging threshold using
AMPI_RDMA_THRESHOLD_DEFAULT
and, for messages sent within the same
address space (ranks on the same worker thread or ranks on different
worker threads in the same process in SMP builds), using
AMPI_SMP_RDMA_THRESHOLD_DEFAULT
. Contiguous messages with sizes
larger than the threshold are sent via RDMA on communication layers that
support this capability. You can also set the environment variables
AMPI_RDMA_THRESHOLD
and AMPI_SMP_RDMA_THRESHOLD
before running a
job to override the default specified at build time.
AMPI provides compiler wrappers such as ampicc
, ampif90
, and
ampicxx
in the bin
subdirectory of Charm++ installations. You can
use them to build your AMPI program using the same syntax as other
compilers like gcc
. They are intended as drop-in replacements for
mpicc
wrappers provided by most conventional MPI implementations.
These scripts automatically handle the details of building and linking
against AMPI and the Charm++ runtime system. This includes launching the
compiler selected during the Charm++ build process, passing any toolchain
parameters important for proper function on the selected build target,
supplying the include and link paths for the runtime system, and linking
with Charm++ components important for AMPI, including Isomalloc heap
interception and commonly used load balancers.
Command Name | Purpose |
---|---|
ampicc | C |
ampiCC | C++ |
ampicxx | C++ |
ampic++ | C++ |
ampif77 | Fortran 77 |
ampif90 | Fortran 90 |
ampifort | Fortran 90 |
ampirun | Program Launch |
ampiexec | Program Launch |
All command line flags that you would use for other compilers can be used with the AMPI compilers the same way. For example:
$ ampicc -c pgm.c -O3
$ ampif90 -c pgm.f90 -O0 -g
$ ampicc -o pgm pgm.o -lm -O3
For consistency with other MPI implementations, these wrappers are also
provided using their standard names with the suffix .ampi
:
$ mpicc.ampi -c pgm.c -O3
$ mpif90.ampi -c pgm.f90 -O0 -g
$ mpicc.ampi -o pgm pgm.o -lm -O3
Additionally, the bin/ampi
subdirectory of Charm++ installations
contains the wrappers with their exact standard names, allowing them to
be given precedence as shell commands in a module
-like fashion by
adding this directory to the $PATH
environment variable:
$ export PATH=/home/user/charm/netlrts-linux-x86_64/bin/ampi:$PATH $ mpicc -c pgm.c -O3 $ mpif90 -c pgm.f90 -O0 -g $ mpicc -o pgm pgm.o -lm -O3
These wrappers also allow the user to configure AMPI and Charm++-specific
functionality.
For example, to automatically select a Charm++ load balancer at program
launch without passing the +balancer
runtime parameter, specify a
strategy at link time with -balancer <LB>
:
$ ampicc pgm.c -o pgm -O3 -balancer GreedyRefineLB
Internally, the toolchain wrappers call the Charm runtime's general
toolchain script, charmc
. By default, they will specify -memory
isomalloc
and -module CommonLBs
. Advanced users can disable
Isomalloc heap interception by passing -memory default
. For
diagnostic purposes, the -verbose
option will print all parameters
passed to each stage of the toolchain. Refer to the Charm++ manual for
information about the full set of parameters supported by charmc
.
AMPI offers two options to execute an AMPI program, charmrun
and
ampirun
.
The Charm++ distribution contains a script called charmrun
that
makes the job of running AMPI programs portable and easier across all
parallel machines supported by Charm++. charmrun
is copied to a
directory where an AMPI program is built using ampicc
. It takes a
command line parameter specifying number of processors, and the name of
the program followed by AMPI options (such as number of ranks to create,
and the stack size of every user-level thread) and the program
arguments. A typical invocation of an AMPI program pgm
with
charmrun
is:
$ ./charmrun +p16 ./pgm +vp64
Here, the AMPI program pgm
is run on 16 physical processors with 64
total virtual ranks (which will be mapped 4 per processor initially).
To run with load balancing, specify a load balancing strategy.
You can also specify the size of user-level thread’s stack
using the +tcharm_stacksize
option, which can be used to decrease
the size of the stack that must be migrated, as in the following
example:
$ ./charmrun +p16 ./pgm +vp128 +tcharm_stacksize 32K +balancer RefineLB
For compliance with the MPI standard and simpler execution, AMPI ships
with the ampirun
script that is similar to mpirun
provided by
other MPI runtimes. As with charmrun
, ampirun
is copied
automatically to the program directory when compiling an application
with ampicc
. Users with prior MPI experience may find ampirun
the
simplest way to run AMPI programs.
The basic usage of ampirun is as follows:
$ ./ampirun -np 16 --host h1,h2,h3,h4 ./pgm
This command will create 16 (non-virtualized) ranks and distribute them on the hosts h1-h4.
When using the -vr
option, AMPI will create the number of ranks
specified by the -np
parameter as virtual ranks, and will create
only one process per host:
$ ./ampirun -np 16 --host h1,h2,h3,h4 -vr ./pgm
Other options (such as the load balancing strategy), can be specified in the same way as for charmrun:
$ ./ampirun -np 16 ./pgm +balancer RefineLB
Note that for AMPI programs compiled with gfortran, users may need to set the following environment variable to see program output on stdout:
$ export GFORTRAN_UNBUFFERED_ALL=1