Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tst_nccopy4 failing #1857

Closed
opoplawski opened this issue Oct 9, 2020 · 16 comments
Closed

tst_nccopy4 failing #1857

opoplawski opened this issue Oct 9, 2020 · 16 comments
Assignees
Milestone

Comments

@opoplawski
Copy link
Contributor

I'm looking to update both hdf5 to 1.10.7 and netcdf to 4.7.4 in Fedora rawhide. I'm getting the following test failure with that combo:

$ cat tst_nccopy4.log 

*** Testing compound types some more.
*** creating another compound test file tst_comp2.nc...ok.
*** Tests successful!

*** Testing netCDF-4 features of nccopy on ncdump/*.nc files
*** Test nccopy tst_comp.nc copy_of_tst_comp.nc ...
*** Test nccopy tst_comp2.nc copy_of_tst_comp2.nc ...
*** Test nccopy tst_enum_data.nc copy_of_tst_enum_data.nc ...
*** Test nccopy tst_fillbug.nc copy_of_tst_fillbug.nc ...
*** Test nccopy tst_group_data.nc copy_of_tst_group_data.nc ...
*** Test nccopy tst_nans.nc copy_of_tst_nans.nc ...
*** Test nccopy tst_opaque_data.nc copy_of_tst_opaque_data.nc ...
*** Test nccopy tst_solar_1.nc copy_of_tst_solar_1.nc ...
*** Test nccopy tst_solar_2.nc copy_of_tst_solar_2.nc ...
*** Test nccopy tst_solar_cmp.nc copy_of_tst_solar_cmp.nc ...
*** Test nccopy tst_special_atts.nc copy_of_tst_special_atts.nc ...
*** Test nccopy tst_string_data.nc copy_of_tst_string_data.nc ...
*** Test nccopy tst_unicode.nc copy_of_tst_unicode.nc ...
*** Creating compressible test files tst_inflated.nc, tst_inflated4.nc...ok.
*** Tests successful!
*** Test nccopy -d1 can compress a classic format file ...
*** Test nccopy -d1 can compress a netCDF-4 format file ...
*** Test nccopy -d1 -s can compress a classic model netCDF-4 file even more ...
*** Test nccopy -d1 -s can compress a netCDF-4 file even more ...
*** Test nccopy -d0 turns off compression, shuffling of compressed, shuffled file ...
*** Testing nccopy -d1 -s on ncdump/*.nc files
*** Test nccopy -d1 -s tst_comp.nc copy_of_tst_comp.nc ...
*** Test nccopy -d1 -s tst_comp2.nc copy_of_tst_comp2.nc ...
*** Test nccopy -d1 -s tst_enum_data.nc copy_of_tst_enum_data.nc ...
*** Test nccopy -d1 -s tst_fillbug.nc copy_of_tst_fillbug.nc ...
*** Test nccopy -d1 -s tst_group_data.nc copy_of_tst_group_data.nc ...
*** Test nccopy -d1 -s tst_nans.nc copy_of_tst_nans.nc ...
*** Test nccopy -d1 -s tst_opaque_data.nc copy_of_tst_opaque_data.nc ...
*** Test nccopy -d1 -s tst_solar_1.nc copy_of_tst_solar_1.nc ...
*** Test nccopy -d1 -s tst_solar_2.nc copy_of_tst_solar_2.nc ...
*** Test nccopy -d1 -s tst_solar_cmp.nc copy_of_tst_solar_cmp.nc ...
*** Test nccopy -d1 -s tst_special_atts.nc copy_of_tst_special_atts.nc ...
*** Test nccopy -d1 -s tst_string_data.nc copy_of_tst_string_data.nc ...
NetCDF: HDF error
Location: file ../../ncdump/ncdump.c; line 1732
FAIL tst_nccopy4.sh (exit status: 1)

Not sure where to go from here.

@opoplawski
Copy link
Contributor Author

I get the same error trying to rebuild the current netcdf 4.7.3 package with hdf5 1.10.7, so it appears to be a change in hdf5 that is triggering it.

@edwardhartnett
Copy link
Contributor

Do all of the tests in nc_test4 pass?

@opoplawski
Copy link
Contributor Author

Full build log is here: https://download.copr.fedorainfracloud.org/results/orion/hdf5-1.10.7/fedora-rawhide-x86_64/01701010-netcdf/build.log.gz

Looks like the tests fail before it gets to those tests?

@edwardhartnett
Copy link
Contributor

Right! If you build with --disable-utilities, or just make check in the nc_test4 directory, that will run those tests.

The nc_test4 tests test the netcdf-c library for netCDF-4 functionality. If one of those tests fails, it is far easier to find and debug the problem, than it is working with a utilities test failure.

Furthermore, a failure in the utilities would clearly be the problem of @DennisHeimbigner but a problem in nc_test4 is much more likely in my code. ;-)

@DennisHeimbigner
Copy link
Contributor

Ok, so lets try to get some more information.
I assume that you are running under *nix of some sort.
So after you run the build and it fails.

  1. enter the directory ncdump
  2. execute the command 'sh -x ./tst_nccopy4.sh'
    and post the output.
    It should show exactly what command is failing and with what arguments.

@WardF
Copy link
Member

WardF commented Oct 9, 2020

Interesting; in a separate issue yesterday I was working with the hdf5 1.10.7 installed via conda, and observed no issues. I'll try a custom build in my linux VM and see what I observe.

@WardF WardF self-assigned this Oct 9, 2020
@WardF WardF added this to the 4.8.0 milestone Oct 9, 2020
@opoplawski
Copy link
Contributor Author

I think this is what you're looking for:

+ echo '*** Test nccopy -d1 -s tst_string_data.nc copy_of_tst_string_data.nc ...'
*** Test nccopy -d1 -s tst_string_data.nc copy_of_tst_string_data.nc ...
+ /home/orion/fedora/netcdf/netcdf-c-4.7.4/build/ncdump/nccopy -d1 -s tst_string_data.nc copy_of_tst_string_data.nc
+ /home/orion/fedora/netcdf/netcdf-c-4.7.4/build/ncdump/ncdump -n copy_of_tst_string_data tst_string_data.nc
+ /home/orion/fedora/netcdf/netcdf-c-4.7.4/build/ncdump/ncdump copy_of_tst_string_data.nc
NetCDF: HDF error
Location: file ../../ncdump/ncdump.c; line 1732

@DennisHeimbigner
Copy link
Contributor

Ok, now try this to get more more info about the HDF error.

export NETCDF_LOG_LEVEL=0
then run that test again.
This may not work if you did not enable logging when building netcdf-c.

@opoplawski
Copy link
Contributor Author

So, this is fun. With --enable-logging I got a build failure which I fixed by applying commit b0e0d81. Compiled and ran tests with export NETCDF_LOG_LEVEL=0 and voila - no more test failure.

So I disabled enable-logging again but kept the patch and the error returned.

I also tried building the latest master but got a build failure that I reported elsewhere.

@DennisHeimbigner
Copy link
Contributor

What the XXX is going on? Setting NETCDF_LOG_LEVEL=0 enables HDF5 detailed
error reporting. That code has not AFAIK changed in a long time.
The only thing I can think of at this point is to run with NETCDF_LOG_LEVEL=5
and see if the error still disappears.
Level 5 provides voluminous tracing of the netcdf and it would help narrow down
where the error occurs. But I am not hopeful because level 5 => level 0 also.
Otherwise, I got nothing.

@opoplawski
Copy link
Contributor Author

Perhaps some of this helps:

ncdump output:

netcdf copy_of_tst_string_data {
dimensions:
        line = 5 ;
variables:
NetCDF: HDF error

gdb:

(gdb) bt
#0  nc_strerror (ncerr1=<optimized out>) at ../libdispatch/../../libdispatch/derror.c:202
#1  nc_strerror (ncerr1=<optimized out>) at ../libdispatch/../../libdispatch/derror.c:86
#2  0x0000555555559591 in check (err=<optimized out>, file=0x5555555632b8 "../../ncdump/ncdump.c", line=1732) at ../../ncdump/utils.c:84
#3  0x0000555555560cb2 in do_ncdump_rec (ncid=65536, path=<optimized out>) at ../../ncdump/ncdump.c:1732
#4  0x00005555555584c4 in do_ncdump (path=0x555555569b40 "copy_of_tst_string_data.nc", ncid=65536) at ../../ncdump/indent.c:15
#5  main (argc=<optimized out>, argv=<optimized out>) at ../../ncdump/ncdump.c:2416
(gdb) up
#3  0x0000555555560cb2 in do_ncdump_rec (ncid=65536, path=<optimized out>) at ../../ncdump/ncdump.c:1732
1732          NC_CHECK( nc_inq_varndims(ncid, varid, &var.ndims) );
(gdb) list
1727       memset((void*)&var,0,sizeof(var));
1728
1729       /* For each var, get and print out info. */
1730
1731       for (varid = 0; varid < nvars; varid++) {
1732          NC_CHECK( nc_inq_varndims(ncid, varid, &var.ndims) );
1733          if(var.dims != NULL) free(var.dims);
1734          var.dims = (int *) emalloc((var.ndims + 1) * sizeof(int));
1735          NC_CHECK( nc_inq_var(ncid, varid, var.name, &var.type, 0,
1736                               var.dims, &var.natts) );
(gdb) print var
$1 = {name = '\000' <repeats 255 times>, type = 0, tinfo = 0x0, ndims = 0, dims = 0x0, natts = 0, has_fillval = 0, fillvalp = 0x0, has_timeval = 0, timeinfo = 0x0, is_bnds_var = 0, fmt = 0x0, locid = 0, val_tostring = 0x0}
(gdb) print varid
$2 = 0
(gdb) print nvars
$3 = 1

h5dump output:

$ h5dump copy_of_tst_string_data.nc                                                               
HDF5 "copy_of_tst_string_data.nc" {
GROUP "/" {
   ATTRIBUTE "_NCProperties" {
      DATATYPE  H5T_STRING {
         STRSIZE 35;
         STRPAD H5T_STR_NULLTERM;
         CSET H5T_CSET_ASCII;
         CTYPE H5T_C_S1;
      }
      DATASPACE  SCALAR
      DATA {
      (0): "version=2,netcdf=4.7.4,hdf5=1.10.7,"
      }
   }
   DATASET "description" {
      DATATYPE  H5T_STRING {
         STRSIZE H5T_VARIABLE;
         STRPAD H5T_STR_NULLTERM;
         CSET H5T_CSET_UTF8;
         CTYPE H5T_C_S1;
      }
      DATASPACE  SIMPLE { ( 5 ) / ( 5 ) }
      DATA {
      (0): "first string", "second string", "third string", "",
      (4): "last 
           "string""
      }
      ATTRIBUTE "DIMENSION_LIST" {
         DATATYPE  H5T_VLEN { H5T_REFERENCE { H5T_STD_REF_OBJECT }}
         DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
         DATA {
         (0): (DATASET 331 "/line")
         }
      }
      ATTRIBUTE "_FillValue" {
         DATATYPE  H5T_STRING {
            STRSIZE H5T_VARIABLE;
            STRPAD H5T_STR_NULLTERM;
            CSET H5T_CSET_UTF8;
            CTYPE H5T_C_S1;
         }
         DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
         DATA {
         (0): ""
         }
      }
   }
   DATASET "line" {
      DATATYPE  H5T_IEEE_F32BE
      DATASPACE  SIMPLE { ( 5 ) / ( 5 ) }
      DATA {
      (0): 0, 0, 0, 0, 0
      }
      ATTRIBUTE "CLASS" {
         DATATYPE  H5T_STRING {
            STRSIZE 16;
            STRPAD H5T_STR_NULLTERM;
            CSET H5T_CSET_ASCII;
            CTYPE H5T_C_S1;
         }
         DATASPACE  SCALAR
         DATA {
         (0): "DIMENSION_SCALE"
         }
      }
      ATTRIBUTE "NAME" {
         DATATYPE  H5T_STRING {
            STRSIZE 64;
            STRPAD H5T_STR_NULLTERM;
            CSET H5T_CSET_ASCII;
            CTYPE H5T_C_S1;
         }
         DATASPACE  SCALAR
         DATA {
         (0): "This is a netCDF dimension but not a netCDF variable.         5"
         }
      }
      ATTRIBUTE "REFERENCE_LIST" {
         DATATYPE  H5T_COMPOUND {
            H5T_REFERENCE { H5T_STD_REF_OBJECT } "dataset";
            H5T_STD_I32LE "dimension";
         }
         DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
         DATA {
         (0): {
               DATASET 685 "/description",
               0
            }
         }
      }
      ATTRIBUTE "_Netcdf4Dimid" {
         DATATYPE  H5T_STD_I32LE
         DATASPACE  SCALAR
         DATA {
         (0): 0
         }
      }
   }
}
}

h5dump output of tst_string_data.nc is the same, though the files are not identical in size.

valgrind output:

==17532== Command: /home/orion/fedora/netcdf/netcdf-c-4.7.4/build/ncdump/.libs/ncdump copy_of_tst_string_data.nc
==17532== 
netcdf copy_of_tst_string_data {
dimensions:
        line = 5 ;
variables:
==17532== Conditional jump or move depends on uninitialised value(s)
==17532==    at 0x48BC5E7: UnknownInlinedFun (hdf5open.c:962)
==17532==    by 0x48BC5E7: nc4_get_var_meta (hdf5open.c:1289)
==17532==    by 0x48BD70F: nc4_hdf5_find_grp_var_att (hdf5internal.c:911)
==17532==    by 0x48BDC61: NC4_HDF5_inq_var_all (hdf5var.c:2210)
==17532==    by 0x4876D42: nc_inq_var (dvarinq.c:131)
==17532==    by 0x4876E64: nc_inq_varndims (dvarinq.c:204)
==17532==    by 0x1144D8: do_ncdump_rec.constprop.0 (ncdump.c:1732)
==17532==    by 0x10C4C3: UnknownInlinedFun (ncdump.c:1978)
==17532==    by 0x10C4C3: main (ncdump.c:2416)
==17532== 
        string description(line) ;
                string description:_FillValue = "" ;
data:

 description = "first string", "second string", "third string", _, 
    "last \n\"string\"" ;
}

The UnknownInlinedFun seems to be:
get_filter_info (var=0x5555555e2b00, propid=720575940379279384) at ../libhdf5/../../libhdf5/hdf5open.c:1289

1289        if ((retval = get_filter_info(propid, var)))
(gdb) 
943         if ((num_filters = H5Pget_nfilters(propid)) < 0)
(gdb) n
946         for (f = 0; f < num_filters; f++)
(gdb) 
948             if ((filter = H5Pget_filter2(propid, f, NULL, &cd_nelems, cd_values_zip,
(gdb) 
951             switch (filter)
(gdb) 
954                 var->shuffle = NC_TRUE;
(gdb) 
955                 break;
(gdb) 
946         for (f = 0; f < num_filters; f++)
(gdb) 
948             if ((filter = H5Pget_filter2(propid, f, NULL, &cd_nelems, cd_values_zip,
(gdb) print num_filters
$4 = 2
(gdb) n
951             switch (filter)
(gdb) 
962                 if (cd_nelems != CD_NELEMS_ZLIB ||
(gdb) list
957             case H5Z_FILTER_FLETCHER32:
958                 var->fletcher32 = NC_TRUE;
959                 break;
960
961             case H5Z_FILTER_DEFLATE:
962                 if (cd_nelems != CD_NELEMS_ZLIB ||
963                     cd_values_zip[0] > NC_MAX_DEFLATE_LEVEL)
964                     return NC_EHDFERR;
965                 if((stat = NC4_hdf5_addfilter(var,FILTERACTIVE,filter,cd_nelems,cd_values_zip)))
966                     return stat;
(gdb) print cd_nelems
$5 = 1
(gdb) print cd_values_zip
$6 = {21845}

I'm guessing it's this cd_values_zip that isn't being initialized.

@DennisHeimbigner
Copy link
Contributor

You might try rebuilding netcdf-c with optimization turned off: -g -O0.

@opoplawski
Copy link
Contributor Author

Looks like this is fixed in current netcdf master, or at least the test passes and I don't see any valgrind output.

@rjdave
Copy link

rjdave commented Jan 27, 2021

Another note to add to this is that, at least in my case, downgrading HDF5 to 1.10.6 also solves this issue when using the NetCDF 4.7.4 release version.

@nh2
Copy link

nh2 commented Aug 17, 2021

This issue reappeared for me on NixOS, with netCDF 4.7.4 and hdf5 1.12.1, on Linux aarch64 only:

NixOS/nixpkgs#115788 (comment)

NetCDF: HDF error
Location: file ncdump.c; line 1732
FAIL tst_nccopy4.sh (exit status: 1)

(full build log).

@nh2
Copy link

nh2 commented Aug 17, 2021

For others facing the same problem, trying to upgrade to netCDF 4.8.0, I get this test failure instead: #1971 (comment) (edit: the PR linked in there fixes that).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants