Skip to content
Frank Winter edited this page Oct 30, 2021 · 8 revisions

Building the Chroma software stack with QDP-JIT for NVIDIA or AMD GPUs

1) Getting the sources

Currently recommended branches:

  • JeffersonLab/qmp.git (master)
  • JeffersonLab/qdp-jit.git (devel)
  • JeffersonLab/chroma.git (devel)

Example for software download:

git clone https://github.com/usqcd-software/qmp.git

git clone --recursive --branch devel https://github.com/JeffersonLab/qdp-jit.git

git clone --recursive --branch devel https://github.com/JeffersonLab/chroma.git

2) Building

General remarks:

It is recommended to build all packages (llvm, qmp, qdp-jit, chroma) using the same compiler and compiler version. A C++20 compliant compiler is required unless the propagator optimizations are turned off in qdp-jit. The propagator optimizations let sink smearing, sequential sources and contractions execute much more efficiently. Compiler versions known to work are GCC 10 and 11 or Clang 12 or 13. With propagator optimization turned off a C++14 compliant compiler like GCC 9 or 8 will work. There's no need for vendor-specific compilers like nvcc or hipcc at this point.

It is recommended to create a separate build directory for each of the packages.

CMake version 3.17 or higher is required.

CMake reads the environment variables CXX and CC and it is convenient to set those to the appropriate compiler (or compiler wrapper).

Examples:

Multi GPU MPICH:

export CXX=mpicxx

export CC=mpicc

Single GPU/Notebook build:

export CXX=g++

export CC=gcc

Prerequisites:

  • libxml2 this library is required and pre-installed on most computing facilities. For a personal notebook installation the required package is usually named libxml2-dev.

2a) Building LLVM (not required when targeting AMD GPUs)

For NVIDIA a build of LLVM 12 or 13 is required. For AMD the LLVM 13 build included with ROCm can be used. NB: Building QDP-JIT with LLVM 13 will produce some warning about using some deprecated functions. Using LLVM 12 does not produce these warnings. There's no know benefit of using LLVM 13 compared to LLVM 12.

The LLVM 12 or 13 sources can be found via their website (www.llvm.org) and downloaded from github.

Example:

wget https://github.com/llvm/llvm-project/releases/download/llvmorg-12.0.1/llvm-12.0.1.src.tar.xz

wget https://github.com/llvm/llvm-project/releases/download/llvmorg-13.0.0/llvm-13.0.0.src.tar.xz

cmake
-DBUILD_SHARED_LIBS="ON"
-DLLVM_ENABLE_PROJECTS="llvm"
-DLLVM_ENABLE_TERMINFO="OFF"
-DLLVM_ENABLE_ZLIB="OFF"
-DCMAKE_BUILD_TYPE="Release"
-DCMAKE_INSTALL_PREFIX=$INSTALL_PATH
-DLLVM_TARGETS_TO_BUILD="NVPTX"
$LLVMSRC

make -j $(nproc)

make install

2b) Building QMP

For multi-GPU support an MPI implementation and according compiler wrappers are required.

The CMake option QMP_MPI (ON/OFF) must match the MPI requirements.

cmake
-DCMAKE_INSTALL_PREFIX=$QMP_INSTALL_DIR
-DQMP_MPI="ON"
$PATH_TO_QMP

make install

2b) Building QDP-JIT

QDP-JIT shares many build options with QDP++ like:

  • QDP_ND (default 4) "Number of Spacetime Dimension (default 4)"
  • QDP_NC (default 3) "Number of colors (default 3)"
  • QDP_NS (default 4) "Number of spins (default 4)"
  • QDP_PRECISION (default "double") "Base precision"

It is recommended to set the base precision at most to the capabilities of the target GPU, or not to use "double" for a GPU that has limited double precision throughput. Most GPUs in notebooks have severely limited double precision performance. For typical notebook use the "single" base precision is recommended. Reductions always execute in double precision - even when the base precision is set to "single".

The GPU backend is selected with

  • QDP_ENABLE_BACKEND=CUDA or ROCM (default: CUDA)

In case the CUDA SDK is installed in a location that CMake is not able to find the option CUDA_TOOLKIT_ROOT_DIR can be set accordingly.

Support for LLVM 12 is default, when building with LLVM 13 the option QDP_ENABLE_LLVM13=ON is required.

CMake for QDP-JIT searches for the packages: QMP and llvm. The locations must be set as in following examples:

-DLLVM_DIR=$LLVM_INSTALL_DIR/lib/cmake/llvm

-DQMP_DIR=$QMP_INSTALL_DIR/lib/cmake/QMP

For compilers without C++20 capabilities, the propagator optimizations can be turned off with:

-DQDP_PROP_OPT=OFF

When building for AMD GPUs and the ROCM_PATH environment variable is set appropriately then CMake should be able to locate all ROCm related libraries including LLVM and LLD. However, on some systems the libraries are not readily found and CMake needs some hints (see below).

Example for NVIDIA:

cmake
-DQDP_ENABLE_BACKEND=CUDA
-DCMAKE_INSTALL_PREFIX=$QDPJIT_INSTALL_DIR
-DQMP_DIR=$QMP_INSTALL_DIR/lib/cmake/QMP
-DLLVM_DIR=$LLVM_INSTALL_DIR/lib/cmake/llvm
-DQDP_ENABLE_LLVM13=ON
$PATH_TO_QDPJIT

make -j $(nproc)

make install

Example for AMD:

LLVMDIR=${ROCM_PATH}/llvm

cmake
-DQDP_ENABLE_BACKEND=ROCM
-DCMAKE_INSTALL_PREFIX=$QDPJIT_INSTALL_DIR
-DQMP_DIR=$QMP_INSTALL_DIR/lib/cmake/QMP
-DLLVM_DIR=${LLVMDIR}/lib/cmake/llvm
-DLLD_DIR=${LLVMDIR}/lib/cmake/lld
-DQDP_ENABLE_LLVM13=ON
$PATH_TO_QDPJIT

2c) Building Chroma

The support for the Clover action (clover term and stouting routines) must be enabled with the option Chroma_ENABLE_JIT_CLOVER=ON.

Example:

cmake
-DCMAKE_INSTALL_PREFIX=$CHROMA_INSTALL_DIR
-DQDPXX_DIR=$QDPJIT_INSTALL_DIR/lib/cmake/QDPXX
-DQMP_DIR=$QMP_INSTALL_DIR/lib/cmake/QMP
-DLLVM_DIR=$LLVM_INSTALL_DIR/lib/cmake/llvm
-DChroma_ENABLE_JIT_CLOVER=ON
$PATH_TO_CHROMA

make -j $(nproc)

make install