-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use H5Oget_info_by_name instead of H5Gopen to determine if a group exists #3705
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CI failed.
"Some HDF5 VOLs may not support H5E APIs" what are such VOLs?
src/io/hdf/hdf_archive.cpp
Outdated
@@ -161,22 +161,25 @@ bool hdf_archive::is_group(const std::string& aname) | |||
return false; | |||
hid_t p = group_id.empty() ? file_id : group_id.top(); | |||
p = (aname[0] == '/') ? file_id : p; | |||
hid_t g = H5Gopen2(p, aname.c_str(), H5P_DEFAULT); | |||
if (g < 0) | |||
if (!(H5Lexists(p, aname.c_str(), H5P_DEFAULT) > 0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
H5Lexists only query the existence of an entry. It doesn't verify if the entry is a group or not.
The function is_group needs to know if an entry is a group or not. This explains the failure in CI.
If an entry is a dataset, will H5Gopen2 still print the error stack?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Log VOL https://github.com/DataLib-ECP/vol-log-based does not support H5E.
If H5Gopen does not succeed, which would be the case for a dataset, it will dump the error stack.
I did not realize that the object can also be a dataset. In that case, it would probably be better to use H5Oget_info_by_name instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I switch to using H5Oget_info_by_name in my test program:
(1) 336 Procs, 100000 groups
USING H5Gopen for a non-existent group: time is avg, min, max: 0.837418 0.796461 0.972710 s.
USING H5Oget_ for a non-existent group: time is avg, min, max: 0.800274 0.759203 0.854569 s.
(2) 336 Procs, 100000 groups
USING H5Gopen for an existent group: time is avg, min, max: 45.800670 15.395971 55.213545s
USING H5Oget_ (and H5Gopen) for an existent group: time is avg, min, max: 45.440913 14.367588 54.715002
Using a realistic number of 20 groups:
(1) 336 Procs, 20 groups
USING H5Gopen for a non-existent group: time is avg, min, max: 0.010792 0.002854 0.017069 s.
USING H5Oget_ for a non-existent group: time is avg, min, max: 0.009001 0.002897 0.015627 s.
(2) 336 Procs, 20 groups
USING H5Gopen for an existent group: time is avg, min, max: 0.025425 0.008726 0.042872 s.
USING H5Oget_ (and H5Gopen) for an existent group: time is avg, min, max: 0.026317 0.006041 0.041442 s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Please remove Draft if it is ready for merge.
Test this please |
Please review the developer documentation
on the wiki of this project that contains help and requirements.
Proposed changes
Change: Using H5Lexists instead of H5Gopen to determine if a group exists.
Why: Some HDF5 VOLs may not support H5E APIs, and thus the HDF5 error stack will be printed every time a non-existent group is tried to be opened.
I did a little investigation into comparing the performance of each method, and this is an extreme case. For Summit:
(1) 336 Procs, 100000 groups
USING H5Gopen for a non-existent group: time is avg, min, max: 0.827013 0.791975 0.878860 s.
USING H5Lexists for a non-existent group: time is avg, min, max: 0.795233 0.750368 0.924088 s.
(2) Even if the group does exist, meaning you need to perform an additional H5Gopen along with the H5Lexists, the timing is similar,
336 Procs, 100000 groups
USING H5Gopen for an existent group: time is avg, min, max: 44.389573 14.626276 54.572166 s.
USING H5Lexists (and H5Gopen) for an existent group: time is avg, min, max: 44.161419 14.559785 53.558933 s.
What type(s) of changes does this code introduce?
Delete the items that do not apply
x Refactoring (no functional changes, no API changes)
Does this introduce a breaking change?
What systems has this change been tested on?
Summit, have not yet tested due to issue #3704
Checklist
Update the following with a yes where the items apply. If you're unsure about any of them, don't hesitate to ask. This is
simply a reminder of what we are going to look for before merging your code.