Skip to content

Latest commit



99 lines (70 loc) · 2.95 KB


File metadata and controls

99 lines (70 loc) · 2.95 KB

Perlmutter @ NERSC

This page only provides HiPACE++ specific instructions. For more information please visit the Perlmutter documentation.

Log in with ssh <yourid>

Building for GPU

Create a file profile.hipace and source it whenever you log in and want to work with HiPACE++:

# please set your project account
export proj=<your project id>_g  # _g for GPU accounting

# required dependencies
module load cmake/3.22.0
module load cray-hdf5-parallel/

# necessary to use CUDA-Aware MPI and run a job
export CRAY_ACCEL_TARGET=nvidia80

# optimize CUDA compilation for A100
export AMREX_CUDA_ARCH=8.0

# compiler environment hints
export CC=cc
export CXX=CC
export FC=ftn
export CUDACXX=$(which nvcc)

Download HiPACE++ from GitHub (the first time, and whenever you want the latest version):

git clone $HOME/src/hipace # or any other path you prefer

Compile the code using CMake

source profile.hipace # load the correct modules
cd $HOME/src/hipace   # or where HiPACE++ is installed
rm -rf build
cmake -S . -B build -DHiPACE_COMPUTE=CUDA
cmake --build build -j 16

You can get familiar with the HiPACE++ input file format in our :doc:`../../run/get_started` section, to prepare an input file that suits your needs. You can then create your directory in your $PSCRATCH, where you can put your input file and adapt the following submission script:

#!/bin/bash -l

#SBATCH -t 01:00:00
#    note: <proj> must end on _g
#SBATCH -A <proj>_g
#SBATCH -q regular
#SBATCH -C gpu
#SBATCH -c 32
#SBATCH --exclusive
#SBATCH --gpu-bind=none
#SBATCH --gpus-per-node=4
#SBATCH -o hipace.o%j
#SBATCH -e hipace.e%j

# path to executable and input script

# pin to closest NIC to GPU

# for GPU-aware MPI use the first line

# CUDA visible devices are ordered inverse to local task IDs
#   Reference: nvidia-smi topo -m
srun --cpu-bind=cores bash -c "
  > output.txt

and use it to submit a simulation. Note, that this example simulation runs on 8 GPUs, since -N = 2 yields 2 nodes with 4 GPUs each.


Parallel simulations can be largely accelerated by using GPU-aware MPI. To utilize GPU-aware MPI, the input parameter hipace.comms_buffer_on_gpu = 1 must be set (see the job script above).

Note that using GPU-aware MPI may require more GPU memory.