The HDF5 DAOS VOL connector is a Virtual Object Layer (VOL) connector for HDF5 that allows for direct interfacing with the Distributed Asynchronous Object Storage (DAOS) system, bypassing both MPI I/O and POSIX for efficient and scalable I/O, removing the limitations of the native HDF5 file format and enabling new features such as independent creation of objects in parallel, key-value store objects, data recovery, asynchronous I/O, etc.
Applications already using HDF5 can, using this VOL connector and a DAOS-enabled system, benefit of some of these features with minimal code changes. The connector is built as a plugin library that is external to HDF5, meaning that it must be dynamically-loaded in the same fashion as HDF5 filter plugins.
Below is set a of instructions that is compiled to provide a minimal installation of the DAOS VOL connector on a DAOS-enabled system.
To build the DAOS VOL connector, the following libraries are required:
-
libhdf5
- The HDF5 library. Minimum version required is 1.14.0, compiled with support for both parallel I/O and map objects (i.e.,-DHDF5_ENABLE_MAP_API=ON
CMake option). -
libdaos
- The DAOS library. Minimum version required is 1.3.106-tb. -
libuuid
- UUID support.
Compiled libraries must either exist in the system's library paths or must be pointed to during the DAOS VOL connector build process.
The HDF5 DAOS VOL connector is built using CMake. CMake version 2.8.12.2 or greater is required for building the connector itself, but version 3.1 or greater is required to build the connector's tests.
If you install the full sources, put the tarball in a directory where you have permissions (e.g., your home directory) and unpack it:
gzip -cd hdf5_vol_daos-X.tar.gz | tar xvf -
or
bzip2 -dc hdf5_vol_daos-X.tar.bz2 | tar xvf -
Replace 'X' with the version number of the package.
After obtaining the connector's source code, you can create a build directory
within the source tree and run the ccmake
or cmake
command from it:
cd hdf5_vol_daos-X
mkdir build
cd build
ccmake ..
If using ccmake
, type 'c'
multiple times and choose suitable options or if
using cmake
, pass these options with -D
. Some of these options may be needed
if, for example, the required components mentioned previously are not located in
default paths.
Setting include directory and library paths may require you to toggle to
the advanced mode by typing 't'
. Once you are done and do not see any
errors, type 'g'
to generate makefiles. Once you exit the CMake
configuration screen and are ready to build the targets, do:
make
Verbose build output can be generated by appending VERBOSE=1
to the
make
command.
Assuming that the CMAKE_INSTALL_PREFIX
has been set
and that you have write permissions to the destination directory, you can
install the connector by simply doing:
make install
CMAKE_INSTALL_PREFIX
- This option controls the install directory that the resulting output files are written to. The default value is/usr/local
.CMAKE_BUILD_TYPE
- This option controls the type of build used for the VOL connector. Valid values areRelease
,Debug
,RelWithDebInfo
,MinSizeRel
,Ubsan
,Asan
; the default build type isRelWithDebInfo
.
BUILD_TESTING
- This option is used to enable/disable building of the DAOS VOL connector's tests. The default value isON
.BUILD_EXAMPLES
- This option is used to enable/disable building of the DAOS VOL connector's HDF5 examples. The default value isOFF
.HDF5_C_COMPILER_EXECUTABLE
- This option controls the HDF5 compiler wrapper script used by the VOL connector build process. It should be set to the full path to the HDF5 compiler wrapper (usuallybin/h5cc
), including the name of the wrapper script. The following two options may also need to be set.HDF5_C_LIBRARY_hdf5
- This option controls the HDF5 library used by the VOL connector build process. It should be set to the full path to the HDF5 library, including the library's name (e.g.,/path/libhdf5.so
). Used in conjunction with theHDF5_C_INCLUDE_DIR
option.HDF5_C_INCLUDE_DIR
- This option controls the HDF5 include directory used by the VOL connector build process. Used in conjunction with theHDF5_C_LIBRARY_hdf5
variable.DAOS_LIBRARY
- This option controls the DAOS library used by the VOL connector build process. It should be set to the full path to the DAOS library, including the library's name (e.g.,/path/libdaos.so
). Used in conjunction with theDAOS_UNS_LIBRARY
andDAOS_INCLUDE_DIR
options.DAOS_UNS_LIBRARY
- This option controls the DAOS unified namespace library used by the VOL connector build process. It should be set to the full path to the DAOSlibduns
library, including the library's name (e.g.,/path/libduns.so
). Used in conjunction with theDAOS_LIBRARY
andDAOS_INCLUDE_DIR
options.DAOS_INCLUDE_DIR
- This option controls the DAOS include directory used by the VOL connector build process. Used in conjunction with theDAOS_LIBRARY
andDAOS_UNS_LIBRARY
options.MPI_C_COMPILER
- This option controls the MPI C compiler used by the VOL connector build process. It should be set to the full path to the MPI C compiler, including the name of the executable.
In the connector, each chunk is stored in a different DAOS dkey, and data in a single dkey is stored in a single DAOS storage target. Therefore, splitting the data into different chunks stripes the data across different dkeys and different storage targets. This improves I/O performance by allowing DAOS to read from or write to multiple storage targets at once.
The bandwidth improvement from using different storage targets is so vital that, if h5pset_chunk() is not used, i.e., contiguous datasets, the connector will automatically set a chunk size. The connector, by default, tries to size these chunks to approximately 1 MiB. The environment variable HDF5_DAOS_CHUNK_TARGET_SIZE (in bytes) sets the chunk target size. Setting this variable to 0 disables automatic chunking, and contiguous datasets will stay contiguous (and will therefore only be stored on a single storage target). Better performance may be obtained by choosing a larger chunk target size, such as 4-8 MiB.
For further information on how to use the DAOS VOL connector with an HDF5 application, as well as how to test that the VOL connector is functioning properly, please refer to the DAOS VOL User's Guide under docs/users_guide.pdf.
Design documentation for the DAOS VOL can be found under docs/design_doc.pdf.
Journal paper:
- J. Soumagne, J. Henderson, M. Chaarawi, N. Fortner, S. Breitenfeld, S. Lu, D. Robinson, E. Pourmal, J. Lombardi, "Accelerating HDF5 I/O for Exascale Using DAOS," in IEEE Transactions on Parallel and Distributed Systems, vol. 33, no. 4, pp. 903-914, April 2022. | paper |
DAOS installation and usage instructions can be found on the DAOS website: https://docs.daos.io/