Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Final refactoring of libsrc4. #903

Merged
merged 17 commits into from
Apr 4, 2018
Merged

Final refactoring of libsrc4. #903

merged 17 commits into from
Apr 4, 2018

Conversation

DennisHeimbigner
Copy link
Collaborator

This completes (for now) the refactoring of libsrc4.
The file docs/indexing.dox tries to provide design
information for the refactoring.

The primary change is to replace all walking of linked
lists with the use of the NCindex data structure.
Ncindex is a combination of a hash table (for name-based
lookup) and a vector (for walking the elements in the index).
Additionally, global vectors are added to NC_HDF5_FILE_INFO_T
to support direct mapping of an e.g. dimid to the NC_DIM_INFO_T
object. These global vectors exist for dimensions, types, and groups
because they have globally unique id numbers.
Another major change is to move common info for each
NC_XXX_INFO_T structure into a common type NC_OBJ
to provide a peudo inheritance mechanism. This NC_OBJ
header contains the name, id, hashkey, and SORT.

WARNINGS:

  1. since libsrc4 and libsrchdf4 share code, there are also
    changes in libsrchdf4.
  2. Any outstanding pull requests that change libsrc4 or libhdf4
    are likely to cause conflicts with this code.
  3. The original reason for doing this was for performance improvements,
    but as noted elsewhere, this may not be significant because
    the meta-data read performance apparently is being dominated
    by the hdf5 library because we do bulk meta-data reading rather
    than lazy reading.

The file docs/indexing.dox tries to provide design
information for the refactoring.

The primary change is to replace all walking of linked
lists with the use of the NCindex data structure.
Ncindex is a combination of a hash table (for name-based
lookup) and a vector (for walking the elements in the index).
Additionally, global vectors are added to NC_HDF5_FILE_INFO_T
to support direct mapping of an e.g. dimid to the NC_DIM_INFO_T
object. These global vectors exist for dimensions, types, and groups
because they have globally unique id numbers.

WARNING:
1. since libsrc4 and libsrchdf4 share code, there are also
   changes in libsrchdf4.
2. Any outstanding pull requests that change libsrc4 or libhdf4
   are likely to cause conflicts with this code.
3. The original reason for doing this was for performance improvements,
   but as noted elsewhere, this may not be significant because
   the meta-data read performance apparently is being dominated
   by the hdf5 library because we do bulk meta-data reading rather
   than lazy reading.
@edhartnett
Copy link
Contributor

Great! If this could be merged soon, then I can merge it into the prototype for the libsrc4/HDF5 split.

@DennisHeimbigner
Copy link
Collaborator Author

This is failing on the wget ftp for hdf4 issue. We need to patch it.

and then open a file with a lot of metadata.
The test is configurable to determine the parameters
for the created metadata.
@edhartnett
Copy link
Contributor

@WardF will this PR be merged soon? Thanks.

@edhartnett
Copy link
Contributor

OK, turns out this should not be merged without some more work.

I created a branch with the current master + this PR, and ran it through my CI system. There are a bunch of issues.

All parallel builds fail to compile with this error:

nc4var.c: In function ‘NC4_var_par_access’:
nc4var.c:1638:39: error: ‘grp->vars’ is a pointer; did you mean to use ‘->’?
    if (varid < 0 || varid >= grp->vars.nelems)
                                       ^
                                       ->
nc4var.c:1640:19: error: ‘grp->vars’ is a pointer; did you mean to use ‘->’?
    var = grp->vars.value[varid];
                   ^
                   ->
Makefile:542: recipe for target 'libnetcdf4_la-nc4var.lo' failed
make[1]: *** [libnetcdf4_la-nc4var.lo] Error 1

This can be fixed by changing NC4_var_par_access to look up variables in the new way.

For sequential builds, the address sanitizer fails on nctest:

make[3]: Entering directory '/var/lib/jenkins/workspace/WDC_ejh_netcdf-c_hdf5_1.10.1_no_dap_no_utilities_asan/nctest'
PASS: tst_rename
FAIL: nctest
PASS: compare_test_files.sh

It also fails on most of the tests in nc_test4:

FAIL: tst_dims
FAIL: tst_dims2
FAIL: tst_dims3
FAIL: tst_files
FAIL: tst_files4
FAIL: tst_vars
FAIL: tst_varms
FAIL: tst_unlim_vars
FAIL: tst_converts
FAIL: tst_converts2
FAIL: tst_grps
FAIL: tst_grps2
FAIL: tst_compounds
FAIL: tst_compounds2
FAIL: tst_compounds3
FAIL: tst_opaques
FAIL: tst_strings
FAIL: tst_strings2
FAIL: tst_interops
FAIL: tst_interops4
PASS: tst_interops5
PASS: tst_interops6
FAIL: tst_enums
FAIL: tst_coords
FAIL: tst_coords2
FAIL: tst_coords3
FAIL: tst_vars3
FAIL: tst_vars4
FAIL: tst_chunks
FAIL: tst_chunks2
FAIL: tst_utf8
FAIL: tst_fills
FAIL: tst_fills2
FAIL: tst_fillbug
PASS: tst_xplatform
FAIL: tst_xplatform2
FAIL: tst_endian_fill
FAIL: tst_atts
PASS: t_type
PASS: cdm_sea_soundings
FAIL: tst_camrun
FAIL: tst_vl
FAIL: tst_atts1
FAIL: tst_atts2
FAIL: tst_vars2
PASS: tst_files5
FAIL: tst_files6
FAIL: tst_sync
FAIL: tst_h_scalar
FAIL: tst_rename
FAIL: tst_rename2
FAIL: tst_h5_endians
FAIL: tst_atts_string_rewrite
PASS: tst_hdf5_file_compat
FAIL: tst_fill_attr_vanish
PASS: tst_rehash
PASS: tst_filterparser
FAIL: tst_bug324
FAIL: tst_types
PASS: tst_h_strbug
PASS: tst_h_refs
FAIL: run_empty_vlen_test.sh
PASS: tst_v2


I ran a few of these to get some idea of the memory issues:

ed@mikado:~/tmp/netcdf-c/nc_test4$ ./tst_dims

*** Testing netcdf-4 dimensions.
*** Testing that netcdf-4 dimids inq works on netcdf-3 file...ok.
*** Testing that netcdf-4 dimids inq works on more complex netcdf-3 file...ok.
*** Testing file with just one dimension...ok.
*** Testing with NULL id pointer...ok.
*** Testing classic model file with just one unlimited dimension...ok.
*** Testing renaming of one dimension...ok.
=================================================================
==4659==ERROR: AddressSanitizer: global-buffer-overflow on address 0x7f999ddbd264 at pc 0x7f999e082733 bp 0x7ffdeffe6bf0 sp 0x7ffdeffe6398
READ of size 256 at 0x7f999ddbd264 thread T0
    #0 0x7f999e082732  (/usr/lib/x86_64-linux-gnu/libasan.so.4+0x79732)
    #1 0x7f999dd502b1 in NC4_def_var /home/ed/tmp/netcdf-c/libsrc4/nc4var.c:511
    #2 0x7f999dc4c411 in nc_def_var /home/ed/tmp/netcdf-c/libdispatch/dvar.c:218
    #3 0x5558749a498e in main /home/ed/tmp/netcdf-c/nc_test4/tst_dims.c:324
    #4 0x7f999d8431c0 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x211c0)
    #5 0x555874999859 in _start (/home/ed/tmp/netcdf-c/nc_test4/.libs/tst_dims+0x2859)

ed@mikado:~/tmp/netcdf-c/nc_test4$ ./tst_opaques 

*** Testing netcdf-4 opaque type.
*** testing scalar opaque variable...ok.
*** testing opaque variable...ok.
*** testing *really* simple opaque attribute...ok.
*** testing opaque attribute...ok.
*** testing 3 opaque types...ok.
*** Tests successful!

=================================================================
==4676==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 9 byte(s) in 1 object(s) allocated from:
    #0 0x7fe8a01cd538 in strdup (/usr/lib/x86_64-linux-gnu/libasan.so.4+0x77538)
    #1 0x7fe89fe91df7 in nc4_att_list_add /home/ed/tmp/netcdf-c/libsrc4/nc4internal.c:968
    #2 0x7fe89fe5ec5e in NC4_put_att /home/ed/tmp/netcdf-c/libsrc4/nc4attr.c:739
    #3 0x7fe89fd95d38 in nc_put_att /home/ed/tmp/netcdf-c/libdispatch/dattput.c:235
    #4 0x55b8519053c3 in main /home/ed/tmp/netcdf-c/nc_test4/tst_opaques.c:147
    #5 0x7fe89f9901c0 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x211c0)

Direct leak of 9 byte(s) in 1 object(s) allocated from:
    #0 0x7fe8a01cd538 in strdup (/usr/lib/x86_64-linux-gnu/libasan.so.4+0x77538)
    #1 0x7fe89fe91df7 in nc4_att_list_add /home/ed/tmp/netcdf-c/libsrc4/nc4internal.c:968
    #2 0x7fe89fe5ec5e in NC4_put_att /home/ed/tmp/netcdf-c/libsrc4/nc4attr.c:739
    #3 0x7fe89fd95d38 in nc_put_att /home/ed/tmp/netcdf-c/libdispatch/dattput.c:235
    #4 0x55b851904694 in main /home/ed/tmp/netcdf-c/nc_test4/tst_opaques.c:116
    #5 0x7fe89f9901c0 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x211c0)

SUMMARY: AddressSanitizer: 18 byte(s) leaked in 2 allocation(s).

The att_list_add one pops up a lot. The name of the atts are not being freed with the new list code. I can't figure out where this is intended to happen, so not sure how to fix.

So looks like this PR should not be merged without more work.

@DennisHeimbigner
Copy link
Collaborator Author

Thanks for the reminder. I had fixed --enable-hdf4, but forgot
to fix enabling parallel.

@DennisHeimbigner
Copy link
Collaborator Author

Ok, added a fix for the --enable-parallel case.
My big worry is memory leaks. So I will address
these next.

#917
#915

Fix following memory errors:
1. global_buffer_overflow
2. nc4_att_list_add
@edhartnett
Copy link
Contributor

I just ran this through my CI system and everything comes up green. Address sanitizer runs pass as long as --disable-utilities and --disable-dap are used.

@edhartnett
Copy link
Contributor

@DennisHeimbigner what is the status of nc4xinternal.h? Is it a temporary file that is no longer needed? It seems to be a duplicate of nc4internal.h. Also it does not seem to be used anywhere in the code.

@WardF WardF self-assigned this Apr 2, 2018
@WardF WardF added this to the 4.7.0 milestone Apr 2, 2018
@edhartnett
Copy link
Contributor

Guys, is this PR going to be merged soon? I have a fair amount of work stacked up, waiting to submit until after these changes are merged...

@WardF WardF merged commit 9ba3912 into master Apr 4, 2018
@WardF WardF deleted the index.dmh branch April 4, 2018 16:29
@edhartnett
Copy link
Contributor

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants