Skip to content

Building KHARMA

Ben Prather edited this page May 31, 2024 · 24 revisions

Prerequisites

First, be sure to check out all of KHARMA's submodules by running

$ git submodule update --init --recursive

This will grab KHARMA's two main dependencies (as well as some incidental things):

  1. The Parthenon AMR framework from LANL (accompanying documentation). Note KHARMA actually uses a fork of Parthenon, see here.

  2. The Kokkos performance-portability library, originally from SNL. If they are not present here, many common questions and problems can be answered by the Kokkos wiki and tutorials. Parthenon includes a list of the Parthenon-specific wrappers for Kokkos functions in their developer guide.

The dependencies KHARMA needs from the system are the same as Parthenon and Kokkos:

  1. A C++17 compliant compiler with OpenMP (tested with gcc >= 11, Intel icpc/icpx >= 22, nvc++ >= 22.7, clang++ and derivatives >= 12)
  2. An MPI implementation
  3. Parallel HDF5 compiled against this MPI implementation. make.sh can compile this for you.

And optionally

  1. CUDA >= 11.5 and a CUDA-supported C++ compiler

OR

  1. ROCm >= 5.3

OR

  1. The most recent Intel oneAPI release (SYCL/oneAPI support is experimental)

If necessary, KHARMA can also be compiled without MPI; with dedication, you might be able to compile it without HDF5. The results of omitting either are quite useless, though.

Basic Building

KHARMA uses cmake for building, and has a small set of bash scripts to handle loading the correct modules and giving the correct arguments to cmake on specific systems. Configurations for new machines are welcome; existing example scripts are in machines/.

Generally, on systems with a parallel HDF5 module, one can then run the following to compile KHARMA. Note that clean here specifies to do a clean build, not to clean an existing one:

./make.sh clean [hip, cuda]

If your system does not have an HDF5 module, KHARMA can attempt to compile one for you. Just add hdf5 to the arguments of make.sh.

./make.sh clean [hip, cuda] hdf5

When switching compilers, architectures, or devices, you may additionally need to add cleanhdf5. So, at worst:

./make.sh clean [hip, cuda] hdf5 cleanhdf5

When using KHARMA to compile HDF5, cmake will print a scary red error message about the HDF5 folder being a subfolder of the source directory. This can be safely ignored, as all build files are still generated successfully. We'll revisit the HDF5 compile if this becomes a real problem.

After compiling once successfully, you generally will not need to specify clean any longer, thus avoiding full recompilation for small changes. However, after large changes, clean is generally the more repeatable experience.

If you run into issues when compiling, remember to check the "Known Incompatibilities" section of this page, as well as the open issues. If the compile breaks on a supported machine, please open a new issue.

make.sh Options

As mentioned above, there are two additional arguments to make.sh specifying dependencies:

  1. hdf5 will compile a version of HDF5 inline with building KHARMA, using the same compiler and options. This is an easy way to get a compatible and fast HDF5 implementation, at the cost of extra compile time. The HDF5 build may not work on all systems.
  2. nompi will compile without MPI suppport, for running on a single GPU or CPU.

There are several more useful options:

  1. debug will enable the DEBUG flag in the code, and more importantly enable bounds-checking in all Kokkos arrays. Useful for very weird undesired behavior and segfaults. Note, however, that most KHARMA checks, prints, and debugging output are actually enabled at runtime, under the <debug> section of the input deck.
  2. trace will print each part of a step to stderr as it is being run (technically, anywhere with a Flag() call in the code). This is useful for pinning down where segfaults are occurring, without manually bisecting the whole code with print statements.
  3. noimplicit will skip compiling the implicit solver. This is really only useful if an update to Parthenon/Kokkos breaks something in the kokkos-kernels.
  4. nocleanup will skip compiling the B field cleanup/simulation resizing support. This is useful if a Parthenon update breaks the BiCGStab solver.

The most up-to-date option listing can be found at the top of the make.sh source. Machine files may provide additional options (usually at least an option to choose the compiler stack, e.g. gcc, icc etc) -- read the relevant machine file for those.

Writing Machine Files

TODO

Optimization

The build script make.sh tries to guess an architecture when compiling, defaulting to code which will be reasonably fast on modern machines. However, you can manually specify a host and/or device architecture. For example, when compiling for CUDA:

PREFIX_PATH=/absolute/path/to/phdf5 HOST_ARCH=CPUVER DEVICE_ARCH=GPUVER ./make.sh clean cuda

Where CPUVER and GPUVER are the strings used by Kokkos to denote a particular architecture & set of compile flags, e.g. "SKX" for Skylake-X, "HSW" for Haswell, or "AMDAVX" for Ryzen/EPYC processors, and VOLTA70, TURING75, or AMPERE80 for Nvidia GPUs. A list of a few common architecture strings is provided in make.sh, and a full (usually) up-to-date list is kept in the Kokkos documentation. (Note make.sh needs only the portion of the flag after Kokkos_ARCH_).

CUDA-aware MPI

If deploying KHARMA to a machine with GPUs, be careful that the MPI stack you use is CUDA-aware -- this allows direct communication from GPUs to the network without involving CPU and RAM, which is much faster. There are notes for particular systems on the machines page.

Container

In the past, there has been a KHARMA container, which avoids compiling the code at all. It is only updated for stable KHARMA versions, so YMMV working closely with the code when following this method.

To pull & use the container:

$ singularity pull docker://registry.gitlab.com/afd-illinois/kharma:dev
$ wget https://raw.githubusercontent.com/AFD-Illinois/kharma/stable/pars/orszag_tang.par
$ singularity run kharma_dev.sif /app/run.sh -i orszag_tang.par

The container/registry itself is OCI compliant, so it'll work with docker, podman, etc too (just remember to give the container access to the parameter file!). The version of KHARMA in /app inside the container is difficult to modify and recompile -- however, the container can also be used to compile and run a version cloned from the git repository:

$ git clone https://github.com/AFD-Illinois/kharma.git
$ cd kharma
$ git submodule update --init --recursive
$ singularity shell /path/to/kharma_dev.sif
Singularity> PREFIX_PATH=/usr/lib64/mpich EXTRA_FLAGS="-DPARTHENON_DISABLE_HDF5_COMPRESSION=ON" ./make.sh clean
Singularity> ./run.sh -i pars/orszag_tang.par

Troubleshooting

Known Incompatibilities:

Generally, the compiler versions present on modern supercomputers or operating systems will work fine to compile KHARMA, but be careful using older compilers and software. Here's an incomplete list of known bad combinations:

  • When compiling with CUDA 11, there can be an internal nvcc error PHINode should have one entry for each predecessor of its parent basic block!. CUDA 12 does not show this issue.
  • If you attempt to compile KHARMA with a version of CUDA before 11.2, nvcc will crash during compilation with the error: Error: Internal Compiler Error (codegen): "there was an error in verifying the lgenfe output!" (see the relevant Kokkos bug). This is a bug in nvcc's support of constexpr in C++17, fixed in 11.2. This appears to be independent of which host compiler is used, but be aware that on some systems, the compiler choice affects which CUDA version is loaded.
  • CentOS 7 and derivatives ship an extremely old default version of gcc and libstdc++. If possible on such machines, load a newer gcc as a module, which might bring with it a more recent standard library as well (other compilers, such as icc or clang rely on the system version of libstdc++, and thus even new versions of these compilers may have trouble compiling KHARMA on old operating systems).
  • GCC version 7.3.0 exactly has a bug making it incapable of compiling a particular Parthenon function, fixed in 7.3.1 and 8+. It is for unfathomable reasons very widely deployed as the default compiler on various machines, but if any other stack is available it should be preferred. Alternatively, the function contents can be commented out, as it isn't necessary in order to compile KHARMA.
  • NVHPC toolkit versions prior to 23.1 can have one of two issues: 21.3 to 21.7 have trouble compiling Parthenon's C++14 constructs, and 21.9 through 22.11 may try to import a header, pmmintrinsics.h, which they cannot compile. The latter is uncommon, but the newest available NVHPC is always preferred.
  • Generally only the very most recent release of Intel oneAPI is "supported," which is to say, has any chance of compiling KHARMA. SYCL is still a moving target and impossible to really support without access to working hardware.
  • IBM XLC is unsupported in modern versions of KHARMA. YMMV with XLC's C++17 support, check out older KHARMA releases if you need to revert to C++14.
  • The mpark::variant submodule in external/variant/ often requires patches to compile on devices (HIP/CUDA/SYCL). These should be automatically applied, but check external/patches/ for the relevant patch if you encounter errors compiling it.

"Can't fork" errors

KHARMA uses a lot of resources per process, and by default uses a lot of processes to compile (NPROC in make.sh or machine files, which defaults to the total number of threads present on the system). This is generally fine for workstations and single nodes, however on popular login nodes or community machines you might see the following (e.g. on Frontera):

...
icpc: error #10103: can't fork process: Resource temporarily unavailable
make[2]: *** [external/parthenon/src/CMakeFiles/parthenon.dir/driver/multistage.cpp.o] Error 1
...
...

This means that make can't fork new compile processes, which of course ruins the compile. You can find a less popular node (e.g. with a development job), or turn down the NPROC variable at the top of make.sh, or wait until the node is not so in-demand.

Text file busy

When compiling a version of KHARMA that is also being run, the OS will (logically) not replace the binary file kharma.x being used by the running program. The error is usually something like cp: cannot create regular file '../kharma.host': Text file busy. To correct this, invoke make.sh again when the run is finished or stopped (or manually run cp build/kharma/kharma.host .).