Parallel Compression improvements #1302

jhendersonHDF · 2021-12-16T04:38:52Z

No description provided.

jhendersonHDF · 2021-12-16T04:40:49Z

src/H5Dio.c

+                    char global_no_coll_cause_string[512];
+
+                    if (H5D__mpio_get_no_coll_cause_strings(local_no_coll_cause_string, 512,
+                                                            global_no_coll_cause_string, 512) < 0)


Moved most of this code into a new function to get strings for the reasons why collective I/O was broken for code reuse.

jhendersonHDF · 2021-12-16T04:42:10Z

src/H5Dselect.c

+ *-------------------------------------------------------------------------
+ */
+herr_t
+H5D_select_io_mem(void *dst_buf, const H5S_t *dst_space, const void *src_buf, const H5S_t *src_space,


A new routine that is very similar to H5D__select_io(), but rather than copying between application memory and the file, copies between two memory buffers according to the selection in the dst and src dataspaces.

jhendersonHDF · 2021-12-16T04:44:07Z

src/H5FDmpio.c

+ *---------------------------------------------------------------------------
+ */
+static const char *
+H5FD__mem_t_to_str(H5FD_mem_t mem_type)


The changes in this file allow one to see what type of I/O the MPI I/O file driver is doing. Previously one would only see the offset and length of the I/O. This now also shows you whether it's a superblock area, raw data, object header, etc.

jhendersonHDF · 2021-12-16T04:46:31Z

src/H5mpi.c

+ *-------------------------------------------------------------------------
+ */
+herr_t
+H5_mpio_gatherv_alloc(void *send_buf, int send_count, MPI_Datatype send_type, const int recv_counts[],


The two new functions here are simply wrappers around MPI_(All)gatherv that hide a bit of boilerplate code. Both allocate the receive buffer for the caller. The only difference between the two is that the "simple" function calculates the recv_counts and displacements arrays for the caller before making the MPI_(All)gatherv call.

jhendersonHDF · 2021-12-16T04:54:42Z

src/H5Dmpio.c

@@ -273,6 +373,185 @@ static int H5D__cmp_filtered_collective_io_info_entry_owner(const void *filtered
 /* Local Variables */
 /*******************/

+#ifdef H5Dmpio_DEBUG


The below code adds debugging to H5Dmpio similar to that in the MPI I/O file driver.

jhendersonHDF · 2021-12-16T04:55:24Z

src/H5Dmpio.c

- *-------------------------------------------------------------------------
- */
-static herr_t
-H5D__mpio_array_gatherv(void *local_array, size_t local_array_num_entries, size_t array_entry_size,


This whole routine was rewritten and move to H5mpi.c

jhendersonHDF · 2021-12-16T04:56:20Z

src/H5Dmpio.c

+    if ((mpi_rank = H5F_mpi_get_rank(io_info->dset->oloc.file)) < 0)
+        HGOTO_ERROR(H5E_IO, H5E_MPI, FAIL, "unable to obtain MPI rank")
+    if ((mpi_size = H5F_mpi_get_size(io_info->dset->oloc.file)) < 0)
+        HGOTO_ERROR(H5E_IO, H5E_MPI, FAIL, "unable to obtain MPI size")


Rather than retrieving the MPI rank and size multiple times in this file, do it once in H5D__chunk_collective_io which tends to be the main entrypoint in this file. Then, just hand those down to functions as needed.

src/H5Dmpio.c

jhendersonHDF · 2021-12-16T05:00:07Z

src/H5Dmpio.c

         */
-        if (H5D__mpio_array_gatherv(chunk_list, chunk_list_num_entries,


Rather than gathering everybody's list of chunks into a collective array, the feature has been revised in most places to construct different MPI derived types to only send as much data as needed, greatly reducing the feature's memory usage.

jhendersonHDF · 2021-12-16T05:06:18Z

src/H5Dmpio.c

 *-------------------------------------------------------------------------
 */
 static herr_t
-H5D__filtered_collective_chunk_entry_io(H5D_filtered_collective_io_info_t *chunk_entry,


This routine used to work on either reading an individual chunk (for dataset reads) or reading and writing an individual chunk (for dataset writes). However, any chunk reads here used to be independent which is a scalability problem for the feature. The new H5D__mpio_collective_filtered_chunk_read, H5D__mpio_collective_filtered_chunk_update and H5D__mpio_collective_filtered_chunk_common_io routines now perform the duties of this routine, but in a manner that allows chunk reads to be done collectively. This should generally scale much better and still allows the user the option of specifying independent chunk reads when desired.

jhendersonHDF · 2021-12-16T05:09:51Z

src/H5Dmpio.c

+} /* end H5D__mpio_collective_filtered_chunk_reinsert() */
+
+/*-------------------------------------------------------------------------
+ * Function:    H5D__mpio_get_chunk_redistribute_info_types


The 3 functions below here create different MPI derived datatypes to extract certain portions of information from the overall per-chunk H5D_filtered_collective_io_info_t structure. Usually, a particular operation (shared chunk redistribution, chunk reallocation, chunk reinsertion) only needs a few fields out of that structure and this information is gathered to all ranks, so sending just the few fields necessary can drastically save on memory usage at the expense of a bit of MPI overhead.

jhendersonHDF · 2021-12-16T05:10:50Z

src/H5Dmpio.c

+ *-------------------------------------------------------------------------
+ */
+static herr_t
+H5D__mpio_collective_filtered_io_type(H5D_filtered_collective_io_info_t *chunk_list, size_t num_entries,


This routine was just revised a little bit to create slightly more efficient MPI derived types for performing I/O on filtered chunks.

jhendersonHDF · 2021-12-16T05:14:09Z

src/H5Dmpio.c

            /* Participate in the collective re-insertion of all chunks modified
             * in this iteration into the chunk index
             */
-            for (j = 0; j < collective_chunk_list_num_entries; j++) {


Chunk index reinsertion logic here moved into H5D__mpio_collective_filtered_chunk_reinsert, which more efficiently handles memory usage as well as chunk reinsertion itself.

jhendersonHDF · 2021-12-16T05:14:45Z

src/H5Dmpio.c

         */
-        for (i = 0; i < collective_chunk_list_num_entries; i++) {


Chunk index reinsertion logic here moved into H5D__mpio_collective_filtered_chunk_reinsert, which more efficiently handles memory usage as well as chunk reinsertion itself.

jhendersonHDF · 2021-12-16T05:15:38Z

src/H5Dmpio.c

             */
-            for (j = 0; j < collective_chunk_list_num_entries; j++) {


Chunk file space reallocation logic moved into H5D__mpio_collective_filtered_chunk_reallocate, which more efficiently handles memory usage.

jhendersonHDF · 2021-12-16T05:15:47Z

src/H5Dmpio.c

-            HGOTO_ERROR(H5E_DATASET, H5E_CANTGATHER, FAIL, "couldn't gather new chunk sizes")
-
-        /* Collectively re-allocate the modified chunks (from each process) in the file */
-        for (i = 0; i < collective_chunk_list_num_entries; i++) {


Chunk file space reallocation logic moved into H5D__mpio_collective_filtered_chunk_reallocate, which more efficiently handles memory usage.

jhendersonHDF · 2021-12-16T05:16:32Z

src/H5Dmpio.c


-            if (have_chunk_to_process)
-                if (H5D__filtered_collective_chunk_entry_io(&chunk_list[i], io_info, type_info, fm) < 0)


Duties now performed by H5D__mpio_collective_filtered_chunk_update instead.

jhendersonHDF · 2021-12-16T05:16:39Z

src/H5Dmpio.c

-         */
-        for (i = 0; i < chunk_list_num_entries; i++)
-            if (mpi_rank == chunk_list[i].owners.new_owner)
-                if (H5D__filtered_collective_chunk_entry_io(&chunk_list[i], io_info, type_info, fm) < 0)


Duties now performed by H5D__mpio_collective_filtered_chunk_update instead.

…llel compression

Add support for chunk fill values to parallel compression feature Add partial support for incremental file space allocation to parallel compression feature

…allel compression

Refactor chunk reallocation and reinsertion code to use less MPI communication during linked-chunk I/O

… algorithm

H5D__get_num_chunks can be used to correctly determine space allocation status for filtered and unfiltered chunked datasets

Avoid doing I/O when a rank has no selection and the MPI communicator size is 1 or the I/O has been requested as independent at the low level Avoid 0-byte collective read of incrementally allocated filtered dataset when dataset han't been written to yet

* Fix the function cast error in H5Dchunk.c and activate (#1170) `-Werror=cast-function-type`. Again. * Parallel Compression improvements (#1302) * Fix for parallel compression examples on Windows (#1459) * Parallel compression adjustments for HDF5 1.12 * Committing clang-format changes Co-authored-by: David Young <dyoung@hdfgroup.org> Co-authored-by: github-actions <41898282+github-actions[bot]@users.noreply.github.com>

* Fix the function cast error in H5Dchunk.c and activate (HDFGroup#1170) `-Werror=cast-function-type`. Again. * Parallel Compression improvements (HDFGroup#1302) * Fix for parallel compression examples on Windows (HDFGroup#1459) Co-authored-by: David Young <dyoung@hdfgroup.org>

jhendersonHDF requested review from derobins, soumagne, brtnfld, fortnern and byrnHDF December 16, 2021 04:38

jhendersonHDF requested review from bmribler, jrmainzer, lrknox, qkoziol, rawarren and vchoi-hdfgroup as code owners December 16, 2021 04:38

jhendersonHDF commented Dec 16, 2021

View reviewed changes

jhendersonHDF force-pushed the parallel_filters branch from 51af53b to 5124042 Compare December 16, 2021 04:51

jhendersonHDF commented Dec 16, 2021

View reviewed changes

src/H5Dmpio.c Outdated Show resolved Hide resolved

jhendersonHDF commented Dec 16, 2021

View reviewed changes

jhendersonHDF and others added 24 commits February 23, 2022 20:36

Don't force collective metadata reads for parallel compression feature

f12414f

Fix parallel compression for extensible arrays

2a7c33d

Minor parallel compression updates

c16e557

Add tests for parallel compression with extensible arrays and btrees

7c26c4e

Simplify MPI type definitions for haddr_t and hsize_t

2f4027a

Fix parallel examples to share common TEST_REFERENCE

a99080e

Implement support for 'don't filter partial edge chunks' flag in para…

b645bf1

…llel compression

Remove old #ifdef'ed out parallel compression test code

72dc9fd

Fix H5Dget_space_status for filtered datasets

628d785

Parallel compression feature support

1f09229

Add support for chunk fill values to parallel compression feature Add partial support for incremental file space allocation to parallel compression feature

Implement full support for incremental file space allocation with par…

8d80665

…allel compression

Update chunk redistribution to be more efficient

eebbcfb

Parallel compression MPI overhead reduction

983b42a

Refactor chunk reallocation and reinsertion code to use less MPI communication during linked-chunk I/O

Update chunk modification data sharing to use a nonblocking consensus…

2526193

… algorithm

Use paged persistent free space management in parallel compression tests

4cce469

Cleanup from initial parallel compression review

fabd3f6

Committing clang-format changes

83b6441

Simplify H5D__get_space_status for chunked datasets

c82e114

H5D__get_num_chunks can be used to correctly determine space allocation status for filtered and unfiltered chunked datasets

Return error for too many shared chunks during parallel filtered write

e6aa073

Move one-time check in H5D_select_io(_mem) outside of inner loop

c47d09b

Add notes for later parallel compression optimizations noted from review

1893c39

Delay init of fill buffer until first unallocated chunk is found

e54ce5f

Add RELEASE.txt note for parallel compression improvements

c00813c

jhendersonHDF force-pushed the parallel_filters branch from a85daed to c00813c Compare February 24, 2022 03:32

lrknox merged commit 758e97c into HDFGroup:develop Feb 24, 2022

jhendersonHDF added a commit to jhendersonHDF/hdf5 that referenced this pull request Mar 25, 2022

Parallel Compression improvements (HDFGroup#1302)

871266b

jhendersonHDF deleted the parallel_filters branch April 30, 2022 05:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallel Compression improvements #1302

Parallel Compression improvements #1302

jhendersonHDF commented Dec 16, 2021

jhendersonHDF Dec 16, 2021

jhendersonHDF Dec 16, 2021

jhendersonHDF Dec 16, 2021

jhendersonHDF Dec 16, 2021

jhendersonHDF Dec 16, 2021

jhendersonHDF Dec 16, 2021

jhendersonHDF Dec 16, 2021

jhendersonHDF Dec 16, 2021

jhendersonHDF Dec 16, 2021

jhendersonHDF Dec 16, 2021

jhendersonHDF Dec 16, 2021

jhendersonHDF Dec 16, 2021

jhendersonHDF Dec 16, 2021

jhendersonHDF Dec 16, 2021

jhendersonHDF Dec 16, 2021

jhendersonHDF Dec 16, 2021

jhendersonHDF Dec 16, 2021

		*/
		if (H5D__mpio_array_gatherv(chunk_list, chunk_list_num_entries,


		if (have_chunk_to_process)
		if (H5D__filtered_collective_chunk_entry_io(&chunk_list[i], io_info, type_info, fm) < 0)

Parallel Compression improvements #1302

Parallel Compression improvements #1302

Conversation

jhendersonHDF commented Dec 16, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment