Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NetCDF: HDF error when creating a lot of variables and attributes #2251

Closed
wkliao opened this issue Mar 17, 2022 · 3 comments
Closed

NetCDF: HDF error when creating a lot of variables and attributes #2251

wkliao opened this issue Mar 17, 2022 · 3 comments

Comments

@wkliao
Copy link
Contributor

wkliao commented Mar 17, 2022

NetCDF 4.8.1
HDF5 1.13.0
MPICH 3.4.3
gcc 8.5.0

I encountered "NetCDF: HDF error" when running a parallel program that creates
an HDF5-based NetCDF4 file. The test program that can reproduce the error is
available in icase_def.c

The test program follows the E3SM I/O pattern by creating the a large number of
variables and attributes:
27 global attributes
21 dimensions
560 variables
and each variable has a few attributes.
The test program does not call any nc_put_var* APIs.

When using one MPI process, the test program ran fine.
But when running 4 MPI processes, it printed the following errors and hung
at nc_enddef.

mpiexec -n 4 ./icase_def
Error at icase_def.c:35 : NetCDF: HDF error
Error at icase_def.c:35 : NetCDF: HDF error
Error at icase_def.c:35 : NetCDF: HDF error
@wkliao
Copy link
Contributor Author

wkliao commented Mar 25, 2022

Further investigation reveals that the location returns the error is at

netcdf-c/libhdf5/nc4hdf.c

Lines 1412 to 1413 in cd0f169

if (H5DSattach_scale(hdf5_var->hdf_datasetid, dsid, d) < 0)
return NC_EHDFERR;

In issue #1822, @brtnfld mentioned that "HDF5 does not test any of the HL APIs in a parallel setting". Given that, I wonder if netcdf plans to resolve the bug by taking @brtnfld's suggestion.

@DennisHeimbigner
Copy link
Collaborator

I wonder if this PR #2161
would fix the problem?

@wkliao
Copy link
Contributor Author

wkliao commented Mar 25, 2022

Yes. Thanks.
I assume this PR will go to 4.9.0.

wkliao added a commit to Parallel-NetCDF/E3SM-IO that referenced this issue Mar 25, 2022
* NetCDF-4 uses the dimension scale feature which is part of HDF5
  high-level APIs, but HDF5 high-level APIs are not well tested for
  parallel I/O. See Unidata/netcdf-c#2251 and
  Unidata/netcdf-c#1822
* NetCDF PR #2161 adds a new flag NC_NODIMSCALE_ATTACH to allow users to
  disable dimension scale, which resolves the problem for e3sm-io. See
  Unidata/netcdf-c#2161
* NetCDF team indicates PR #2161 will appear in version 4.9.0.
wkliao added a commit to Parallel-NetCDF/E3SM-IO that referenced this issue Mar 25, 2022
* NetCDF-4 uses the dimension scale feature which is part of HDF5
  high-level APIs, but HDF5 high-level APIs are not well tested for
  parallel I/O. See Unidata/netcdf-c#2251 and
  Unidata/netcdf-c#1822
* NetCDF PR #2161 adds a new flag NC_NODIMSCALE_ATTACH to allow users to
  disable dimension scale, which resolves the problem for e3sm-io. See
  Unidata/netcdf-c#2161
* NetCDF team indicates PR #2161 will appear in version 4.9.0.
@wkliao wkliao closed this as completed Apr 22, 2022
wkliao added a commit to Parallel-NetCDF/E3SM-IO that referenced this issue Apr 22, 2022
* NetCDF-4 uses the dimension scale feature which is part of HDF5
  high-level APIs, but HDF5 high-level APIs are not well tested for
  parallel I/O. See Unidata/netcdf-c#2251 and
  Unidata/netcdf-c#1822
* NetCDF PR #2161 adds a new flag NC_NODIMSCALE_ATTACH to allow users to
  disable dimension scale, which resolves the problem for e3sm-io. See
  Unidata/netcdf-c#2161
* NetCDF team indicates PR #2161 will appear in version 4.9.0.
wkliao added a commit to Parallel-NetCDF/E3SM-IO that referenced this issue Apr 28, 2022
* NetCDF-4 uses the dimension scale feature which is part of HDF5
  high-level APIs, but HDF5 high-level APIs are not well tested for
  parallel I/O. See Unidata/netcdf-c#2251 and
  Unidata/netcdf-c#1822
* NetCDF PR #2161 adds a new flag NC_NODIMSCALE_ATTACH to allow users to
  disable dimension scale, which resolves the problem for e3sm-io. See
  Unidata/netcdf-c#2161
* NetCDF team indicates PR #2161 will appear in version 4.9.0.
wkliao added a commit to Parallel-NetCDF/E3SM-IO that referenced this issue Apr 28, 2022
* NetCDF-4 uses the dimension scale feature which is part of HDF5
  high-level APIs, but HDF5 high-level APIs are not well tested for
  parallel I/O. See Unidata/netcdf-c#2251 and
  Unidata/netcdf-c#1822
* NetCDF PR #2161 adds a new flag NC_NODIMSCALE_ATTACH to allow users to
  disable dimension scale, which resolves the problem for e3sm-io. See
  Unidata/netcdf-c#2161
* NetCDF team indicates PR #2161 will appear in version 4.9.0.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants