Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

qmcpack fails to open dataset #13

Closed
brtnfld opened this issue Dec 28, 2021 · 11 comments
Closed

qmcpack fails to open dataset #13

brtnfld opened this issue Dec 28, 2021 · 11 comments

Comments

@brtnfld
Copy link
Collaborator

brtnfld commented Dec 28, 2021

Looks like it is calling the native VOL instead of the LOG VOLs when trying to open the dataset.

All the binaries and scripts are on summit login4 in qmcpack/build_summit_cpu/bin to reproduce.

HDF5-DIAG: Error detected in HDF5 (1.13.0) MPI-process 0:
#000: ../../src/H5VLcallback.c line 4406 in H5VLgroup_open(): unable to open group
major: Virtual Object Layer
minor: Unable to initialize object
#1: ../../src/H5VLcallback.c line 4335 in H5VL__group_open(): group open failed
major: Virtual Object Layer
minor: Can't open object
#2: ../../src/H5VLnative_group.c line 154 in H5VL__native_group_open(): unable to open group
major: Symbol table
minor: Can't open object
#3: ../../src/H5Gint.c line 397 in H5G__open_name(): group not found
major: Symbol table
minor: Object not found
#4: ../../src/H5Gloc.c line 439 in H5G_loc_find(): can't find object
major: Symbol table
minor: Object not found
#5: ../../src/H5Gtraverse.c line 838 in H5G_traverse(): internal path traversal failed
major: Symbol table
minor: Object not found
#6: ../../src/H5Gtraverse.c line 614 in H5G__traverse_real(): traversal operator failed
major: Symbol table
minor: Callback failed
#7: ../../src/H5Gloc.c line 396 in H5G__loc_find_cb(): object 'state_0' doesn't exist
major: Symbol table
minor: Object not found
HDF5-DIAG: Error detected in HDF5 (1.13.0) MPI-process 0:
#000: ../../src/H5VLcallback.c line 2016 in H5VLdataset_open(): unable to open dataset
major: Virtual Object Layer
minor: Can't open object
#1: ../../src/H5VLcallback.c line 1945 in H5VL__dataset_open(): dataset open failed
major: Virtual Object Layer
minor: Can't open object
#2: ../../src/H5VLnative_dataset.c line 252 in H5VL__native_dataset_open(): unable to open dataset
major: Dataset
minor: Can't open object
#3: ../../src/H5Dint.c line 1437 in H5D__open_name(): not found
major: Dataset
minor: Object not found
#4: ../../src/H5Gloc.c line 439 in H5G_loc_find(): can't find object
major: Symbol table
minor: Object not found
#5: ../../src/H5Gtraverse.c line 838 in H5G_traverse(): internal path traversal failed
major: Symbol table
minor: Object not found
#6: ../../src/H5Gtraverse.c line 614 in H5G__traverse_real(): traversal operator failed
major: Symbol table
minor: Callback failed
#7: ../../src/H5Gloc.c line 396 in H5G__loc_find_cb(): object 'nprocs_nthreads_statesize' doesn't exist
major: Symbol table
minor: Object not found

@brtnfld
Copy link
Collaborator Author

brtnfld commented Dec 31, 2021

@khou2020, were you able to copy the files and reproduce the issue?

@khou2020
Copy link
Collaborator

@khou2020, were you able to copy the files and reproduce the issue?

I got the same error message; however, it reports correct result.
Is the benchmark opening non-existing groups/datasets to determine whether they exist?

@brtnfld
Copy link
Collaborator Author

brtnfld commented Dec 31, 2021

Is the benchmark opening non-existing groups/datasets to determine whether they exist?

Yes.

@khou2020
Copy link
Collaborator

Is the benchmark opening non-existing groups/datasets to determine whether they exist?

Yes.

Logovl will dump HDF5 error stack if any under VOL operation fails. It does not affect the result.
Is that the only issue you see?

@brtnfld
Copy link
Collaborator Author

brtnfld commented Jan 4, 2022

I see, are there plans to support error stack H5E APIs?

@khou2020
Copy link
Collaborator

khou2020 commented Jan 4, 2022

I see, are there plans to support error stack H5E APIs?

There is no short-term plan on H5E.
Is it the only issue you saw? If so, we can close the case.

@brtnfld
Copy link
Collaborator Author

brtnfld commented Jan 4, 2022

Yes, I think that was the only issue.

@khou2020 khou2020 closed this as completed Jan 4, 2022
@wkliao
Copy link
Contributor

wkliao commented Jan 4, 2022

Hi, @brtnfld

If qmcpack's calling H5Gopen() is to check whether a group exists,
then it should call H5Lexists() instead, which avoids the error messages.
Maybe you can create a PR in qmcpack to fix it?

@brtnfld
Copy link
Collaborator Author

brtnfld commented Jan 4, 2022

Yes, I was going to do that. But before I suggested that, I wanted to make sure it would not hurt performance, so I did a little check on Summit.

336 Procs, 100000 groups

USING H5Gopen for a non-existent group: time is avg, min, max: 0.827013 0.791975 0.878860 s.

USING H5Lexists for a non-existent group: time is avg, min, max: 0.795233 0.750368 0.924088 s.

There is not much of a difference;

@brtnfld
Copy link
Collaborator Author

brtnfld commented Jan 4, 2022

Even if the group does exist, meaning you need to perform an additional H5Gopen along with the H5Lexists, the timing is similar,

336 Procs, 100000 groups

USING H5Gopen for an existent group: time is avg, min, max: 44.389573 14.626276 54.572166 s.

USING H5Lexists (and H5Gopen) for an existent group: time is avg, min, max: 44.161419 14.559785 53.558933 s.

@wkliao
Copy link
Contributor

wkliao commented Jan 5, 2022

Those results look good. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants