-
Notifications
You must be signed in to change notification settings - Fork 261
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Final refactoring of libsrc4. #903
Conversation
The file docs/indexing.dox tries to provide design information for the refactoring. The primary change is to replace all walking of linked lists with the use of the NCindex data structure. Ncindex is a combination of a hash table (for name-based lookup) and a vector (for walking the elements in the index). Additionally, global vectors are added to NC_HDF5_FILE_INFO_T to support direct mapping of an e.g. dimid to the NC_DIM_INFO_T object. These global vectors exist for dimensions, types, and groups because they have globally unique id numbers. WARNING: 1. since libsrc4 and libsrchdf4 share code, there are also changes in libsrchdf4. 2. Any outstanding pull requests that change libsrc4 or libhdf4 are likely to cause conflicts with this code. 3. The original reason for doing this was for performance improvements, but as noted elsewhere, this may not be significant because the meta-data read performance apparently is being dominated by the hdf5 library because we do bulk meta-data reading rather than lazy reading.
Great! If this could be merged soon, then I can merge it into the prototype for the libsrc4/HDF5 split. |
This is failing on the wget ftp for hdf4 issue. We need to patch it. |
and then open a file with a lot of metadata. The test is configurable to determine the parameters for the created metadata.
@WardF will this PR be merged soon? Thanks. |
OK, turns out this should not be merged without some more work. I created a branch with the current master + this PR, and ran it through my CI system. There are a bunch of issues. All parallel builds fail to compile with this error:
This can be fixed by changing NC4_var_par_access to look up variables in the new way. For sequential builds, the address sanitizer fails on nctest:
It also fails on most of the tests in nc_test4:
I ran a few of these to get some idea of the memory issues:
The att_list_add one pops up a lot. The name of the atts are not being freed with the new list code. I can't figure out where this is intended to happen, so not sure how to fix. So looks like this PR should not be merged without more work. |
Thanks for the reminder. I had fixed --enable-hdf4, but forgot |
Ok, added a fix for the --enable-parallel case. |
I just ran this through my CI system and everything comes up green. Address sanitizer runs pass as long as --disable-utilities and --disable-dap are used. |
@DennisHeimbigner what is the status of nc4xinternal.h? Is it a temporary file that is no longer needed? It seems to be a duplicate of nc4internal.h. Also it does not seem to be used anywhere in the code. |
Guys, is this PR going to be merged soon? I have a fair amount of work stacked up, waiting to submit until after these changes are merged... |
Thanks! |
This completes (for now) the refactoring of libsrc4.
The file docs/indexing.dox tries to provide design
information for the refactoring.
The primary change is to replace all walking of linked
lists with the use of the NCindex data structure.
Ncindex is a combination of a hash table (for name-based
lookup) and a vector (for walking the elements in the index).
Additionally, global vectors are added to NC_HDF5_FILE_INFO_T
to support direct mapping of an e.g. dimid to the NC_DIM_INFO_T
object. These global vectors exist for dimensions, types, and groups
because they have globally unique id numbers.
Another major change is to move common info for each
NC_XXX_INFO_T structure into a common type NC_OBJ
to provide a peudo inheritance mechanism. This NC_OBJ
header contains the name, id, hashkey, and SORT.
WARNINGS:
changes in libsrchdf4.
are likely to cause conflicts with this code.
but as noted elsewhere, this may not be significant because
the meta-data read performance apparently is being dominated
by the hdf5 library because we do bulk meta-data reading rather
than lazy reading.