Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use H5Oget_info_by_name instead of H5Gopen to determine if a group exists #3705

Merged
merged 5 commits into from
Jan 8, 2022

Conversation

brtnfld
Copy link
Contributor

@brtnfld brtnfld commented Jan 4, 2022

Please review the developer documentation
on the wiki of this project that contains help and requirements.

Proposed changes

Change: Using H5Lexists instead of H5Gopen to determine if a group exists.

Why: Some HDF5 VOLs may not support H5E APIs, and thus the HDF5 error stack will be printed every time a non-existent group is tried to be opened.

I did a little investigation into comparing the performance of each method, and this is an extreme case. For Summit:

(1) 336 Procs, 100000 groups

USING H5Gopen for a non-existent group: time is avg, min, max: 0.827013 0.791975 0.878860 s.
USING H5Lexists for a non-existent group: time is avg, min, max: 0.795233 0.750368 0.924088 s.

(2) Even if the group does exist, meaning you need to perform an additional H5Gopen along with the H5Lexists, the timing is similar,

336 Procs, 100000 groups

USING H5Gopen for an existent group: time is avg, min, max: 44.389573 14.626276 54.572166 s.

USING H5Lexists (and H5Gopen) for an existent group: time is avg, min, max: 44.161419 14.559785 53.558933 s.

What type(s) of changes does this code introduce?

Delete the items that do not apply

  • Bugfix
  • New feature
  • Code style update (formatting, renaming)
    x Refactoring (no functional changes, no API changes)
  • Build related changes
  • Testing changes (e.g. new unit/integration/performance tests)
  • Documentation changes
  • Other (please describe):

Does this introduce a breaking change?

  • No

What systems has this change been tested on?

Summit, have not yet tested due to issue #3704

Checklist

Update the following with a yes where the items apply. If you're unsure about any of them, don't hesitate to ask. This is
simply a reminder of what we are going to look for before merging your code.

  • Yes. This PR is up to date with current the current state of 'develop'
  • No. Code added or changed in the PR has been clang-formatted
  • No. This PR adds tests to cover any new code, or to catch a bug that is being fixed
  • No. Documentation has been added (if appropriate)

Copy link
Contributor

@ye-luo ye-luo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI failed.
"Some HDF5 VOLs may not support H5E APIs" what are such VOLs?

@@ -161,22 +161,25 @@ bool hdf_archive::is_group(const std::string& aname)
return false;
hid_t p = group_id.empty() ? file_id : group_id.top();
p = (aname[0] == '/') ? file_id : p;
hid_t g = H5Gopen2(p, aname.c_str(), H5P_DEFAULT);
if (g < 0)
if (!(H5Lexists(p, aname.c_str(), H5P_DEFAULT) > 0))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

H5Lexists only query the existence of an entry. It doesn't verify if the entry is a group or not.
The function is_group needs to know if an entry is a group or not. This explains the failure in CI.
If an entry is a dataset, will H5Gopen2 still print the error stack?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Log VOL https://github.com/DataLib-ECP/vol-log-based does not support H5E.

If H5Gopen does not succeed, which would be the case for a dataset, it will dump the error stack.

I did not realize that the object can also be a dataset. In that case, it would probably be better to use H5Oget_info_by_name instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I switch to using H5Oget_info_by_name in my test program:

(1) 336 Procs, 100000 groups

USING H5Gopen for a non-existent group: time is avg, min, max: 0.837418 0.796461 0.972710 s.
USING H5Oget_ for a non-existent group: time is avg, min, max: 0.800274 0.759203 0.854569 s.

(2) 336 Procs, 100000 groups

USING H5Gopen for an existent group: time is avg, min, max: 45.800670 15.395971 55.213545s
USING H5Oget_ (and H5Gopen) for an existent group: time is avg, min, max: 45.440913 14.367588 54.715002

Using a realistic number of 20 groups:

(1) 336 Procs, 20 groups

USING H5Gopen for a non-existent group: time is avg, min, max: 0.010792 0.002854 0.017069 s.
USING H5Oget_ for a non-existent group: time is avg, min, max: 0.009001 0.002897 0.015627 s.

(2) 336 Procs, 20 groups

USING H5Gopen for an existent group: time is avg, min, max: 0.025425 0.008726 0.042872 s.
USING H5Oget_ (and H5Gopen) for an existent group: time is avg, min, max: 0.026317 0.006041 0.041442 s.

@brtnfld brtnfld changed the title Using H5Lexists instead of H5Gopen to determine if a group exists Use H5Oget_info_by_name instead of H5Gopen to determine if a group exists Jan 7, 2022
Copy link
Contributor

@ye-luo ye-luo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Please remove Draft if it is ready for merge.

@ye-luo
Copy link
Contributor

ye-luo commented Jan 7, 2022

Test this please

@brtnfld brtnfld marked this pull request as ready for review January 7, 2022 21:59
@ye-luo ye-luo merged commit 08412d1 into QMCPACK:develop Jan 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants