Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

H5Pset_fapl_mpio fails with darshan #960

Closed
wangvsa opened this issue Oct 18, 2023 · 1 comment · Fixed by #961
Closed

H5Pset_fapl_mpio fails with darshan #960

wangvsa opened this issue Oct 18, 2023 · 1 comment · Fixed by #961
Labels
bug Something isn't working

Comments

@wangvsa
Copy link

wangvsa commented Oct 18, 2023

Hi,

I was trying to trace some HDF5 applications and stumbled upon this issue.

HDF5: 1.8.20 (parallel hdf5 enabled)
Darshan version: darshan-3.4.4

Darshan was configured using:
./configure --prefix=/xxx/darshan-3.4.4/install --with-log-path=/xxx/darshan-logs --with-jobid-env=SLURM_JOB_ID --with-hdf5=$HDF5_1_8_HOME --enable-hdf5-mod CC=mpicc

The issue can be reproduced using the following code:

int main(int argc, char* argv[]) {
    MPI_Init(&argc, &argv);

    int res;
    hid_t plist_id = H5Pcreate(H5P_FILE_ACCESS);
    if (plist_id == H5I_INVALID_HID)
        printf("H5Pcreate failed\n");

    res = H5Pset_fapl_mpio(plist_id, MPI_COMM_WORLD, MPI_INFO_NULL);
    if (res < 0)
        printf("H5Pset_fapl_mpio failed\n");

    H5Pclose(plist_id);
    MPI_Finalize();
}

Compile & run:

mpicc test_phdf5.c -o test_phdf5 -I$HDF5_1_8_HOME/include -L$HDF5_1_8_HOME/lib -lhdf5
srun -n1 --overlap --export=ALL,LD_PRELOAD=$libdarshan ./test_phdf5

Without darshan, the code finishes without any error. With darshan, H5Pset_fapl_mpio call fails.

@wangvsa wangvsa changed the title H5Pset_fapl_mpio failed with darshan H5Pset_fapl_mpio fails with darshan Oct 18, 2023
@tylerjereddy tylerjereddy added the bug Something isn't working label Oct 18, 2023
@shanedsnyder
Copy link
Contributor

Thanks for the report!

I did confirm the same issue. From some quick testing, I think this may be related to this "workaround" commit we merged in our last release: #833

Basically, some change in HDF5 headers in version 1.13+ was causing Darshan to print out a symbol error when it was LD_PRELOADed. We found what we thought was a workaround, but it seems like it might be leading to this particular error now. I need to dig more to confirm that and see if there's an actual way to resolve both issues.

In the meantime, I think you could roll back to Darshan 3.4.3 and avoid the issue. You aren't missing anything with 3.4.4 -- that was the only HDF5 related change and it was only intended to avoid an issue with much newer HDF5 versions than 1.8.

shanedsnyder added a commit that referenced this issue Oct 23, 2023
* use H5Pget_fapl_mpio instead of H5Pget_driver
* this code temporarily disables HDF5 error reporting to avoid
  noisy error messages in the case the associated VFD is not MPIO

Fixes #960
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants