Hydrogen is a fork of Elemental used by LBANN. Hydrogen is a redux of the Elemental functionality that has been ported to make use of GPGPU accelerators. The supported functionality is essentially the core infrastructure plus BLAS-1 and BLAS-3.
Hydrogen builds with a CMake (version 3.9.0 or
newer) build system. The build system respects the "normal" CMake
CMAKE_BUILD_TYPE, etc) in addition to the Hydrogen-specific options
The most basic build of Hydrogen requires only:
CMake: Version 3.9.0 or newer.
A C++11-compliant compiler.
MPI 3.0-compliant MPI library.
BLAS: Provides basic linear algebra kernels for the CPU code path.
LAPACK: Provides a few utility functions (norms and 2D copies, e.g.). This could be demoted to "optional" status with little effort.
Optional dependencies of Hydrogen include:
Aluminum: Provides asynchronous blocking and non-blocking communication routines with an MPI-like syntax. The use of Aluminum is highly recommended.
CUDA: Version 9.2 or newer. Hydrogen primarily uses the runtime API and also grabs some features of NVML and NVPROF (if enabled).
CUB: Version 1.8.0 is recommended. This will become required for CUDA-enabled builds in the very near future.
Half: Provides support for IEEE-754 16-bit precision support. (Note: This is work in progress.)
OpenMP: OpenMP 3.0 is probably sufficient for the limited use of the features in Hydrogen.
VTune: Proprietary profiler from Intel. May provide more detailed annotations to profiles of Hydrogen CPU code.
Hydrogen CMake options
Some of the options are inherited from Elemental with
Hydrogen_. Others are unique to Hydrogen. Supported options are:
OFF): There is a very small amount of logic to try to detect CUDA-aware MPI (it should not give a false-positive but is likey to give a false negative). This option causes the library to ignore this and assume the MPI library is not CUDA-aware.
OFF): Enable the Aluminum library for asynchronous device-aware communication. The use of this library is highly recommended for CUDA-enabled builds.
OFF): Enable CUDA support in the library. This enables the device type
El::Device::GPUand allows memory to reside on CUDA-aware GPGPUs.
Hydrogen_ENABLE_CUDA): Only available if CUDA is enabled. This enables device memory management through a memory pool using CUB.
OFF): Enable IEEE-754 "binary16" 16-bit precision floating point support through the Half library.
OFF): This option is a placeholder. This will enable support for "bfloat16" 16-bit precision floating point arithmetic if/when that becomes a thing.
longas the default signed integer type within Hydrogen.
longas the default signed integer type for interacting with BLAS libraries.
ON): Build the test suite.
OFF): Initialize buffers to zero by default. There will obviously be a compute-time overhead.
OFF): Enable library annotations using the
nvtxinterface in CUDA.
OFF): Enable library annotations for use with Intel's VTune performance profiler.
OFF): Synchronize computation at the beginning of profiling regions.
OFF): Enable OpenMP on-node parallelization primatives. OpenMP is used for CPU parallelization only; the device offload features of modern OpenMP are not used.
omp taskloopinstead of
omp parallel for. This is a highly experimental feature. Use with caution.
The following options are legacy options inherited from Elemental. The related functionality is not tested regularly. The likely implication of this statement is that nothing specific to this option has been removed from what remains of Elemental but also that nothing specific to these options has been added to any of the new features of Hydrogen.
OFF): Search for
valgrindand enable related features if found.
OFF): Search for the
quadmathlibrary and enable related features if found. This is for extended-precision computations.
OFF): Search for the
QDlibrary and enable related features if found. This is for extended-precision computations.
OFF): Search for the GNU MPC library (requires MPFR and GMP as well) and enable related features if found. This is for extended precision.
OFF): Avoid MPI_Alltoallv for performance reasons.
OFF): Avoid potentially buggy complex MPI routines.
OFF): Avoid BG/P allgather performance bug.
OFF): Warns when using cache-unfriendly routines.
OFF): Warn when performing unaligned redistributions.
OFF): Warn when vector redistribution chances are missed.
Example CMake invocation
The following builds a CUDA-enabled, CUB-enabled, Aluminum-enabled version of Hydrogen:
cmake -GNinja \ -DCMAKE_BUILD_TYPE=Release \ -DBUILD_SHARED_LIBS=ON \ -DCMAKE_INSTALL_PREFIX=/path/to/my/install \ -DHydrogen_ENABLE_CUDA=ON \ -DHydrogen_ENABLE_CUB=ON \ -DHydrogen_ENABLE_ALUMINUM=ON \ -DCUB_DIR=/path/to/cub \ -DAluminum_DIR=/path/to/aluminum \ /path/to/hydrogen ninja install
Issues should be reported on Github.