An accelerated version of the
mppnccombine post-processing tool for MOM
Uses HDF5's raw IO functions to speed up collating large datasets - a 0.1 degree model goes from taking 4 hours to collate a compressed variable with mppnccombine, to 6 minutes with mppnccombine-fast running with 16 processes
mppnccombine-fast requires HDF5 version 1.10.2 or above
On Raijin (this will automatically load the modules):
mpirun -n 2 ./mppnccombine-fast --output out.nc input.nc.0000 input.nc.0001 input.nc.0002
Files will be collated along all axes with a
At least 2 MPI ranks need to be used (rank 0 writes the output file, other ranks read). More can be used - input files will be balanced between the MPI ranks.
The main slowdown in copying compressed variables is that the hdf5 library has
to de-compress them during the read, and re-compress them during the write.
mppnccombine-fast works around this by using HDF5 1.10.2's direct IO
to copy the compressed data from one file to the other directly, rather than
going through the de-compress/re-compress cycle.
Since the NetCDF4 library is much nicer to use, but doesn't provide public access to the underlying HDF5 file, we need to do a bit of musical chairs with the files.
- Open the output file and the first input file in netcdf mode
- Copy NetCDF metadata and un-collated variables using the NetCDF library
- Close the NetCDF files
- Open the output file in HDF5 mode
- For each input file:
- Open the input file in NetCDF mode
- Get the collated variables, sizes and offsets
- Re-open the input file in HDF5 mode
- Do a raw copy of the variables from the input to output files
- Close the input file
- Close the output file
To get a even larger speedup MPI is used to have separate read and write processes, since HDF5 IO is a blocking function.
The communication between the read and write processes is handled by the file
async.c - the writer process runs a busy loop waiting for messages from the
reader processes, then handles messages as they come in. Individual reader
processes can be sending different variables at the same time.