Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NetCDF's use of HDF5 HL DS APIs and parallel I/O performance. #1822

Closed
brtnfld opened this issue Aug 28, 2020 · 4 comments
Closed

NetCDF's use of HDF5 HL DS APIs and parallel I/O performance. #1822

brtnfld opened this issue Aug 28, 2020 · 4 comments

Comments

@brtnfld
Copy link
Contributor

brtnfld commented Aug 28, 2020

I noticed that NetCDF calls the High-level DS APIs by all the processes doing I/O. For example, it has sync_netcdf5_file --> nc4_rec_write_metadata --> write_var --> (H5DSdetach_scale, H5DSattach_scale, etc...).

Has there been an investigation into the possibility of closing the file and reopening it with only one process and then doing all the metadata updating (.i.e. the DS calls)?

All the HL DS APIs are "serial," so all the processes are re-writing the same data, and this data also seems to be very small. Usually, doing many little writes results in poor parallel performance on a parallel file system, especially when you have tens of thousands of processes writing what amounts to the same data.

The HL APIs are not meant to be called by all the processes since this, for the most part, just duplicates the I/O. Also, HDF5 does not test any of the HL APIs in a parallel setting.

I could be missing something subtle that doesn't allow NetCDF to do this, so any insight/feedback would be much appreciated.

@WardF
Copy link
Member

WardF commented Aug 28, 2020

Scott, I'll tag @edhartnett for his feedback since he implemented the HL calls in netCDF-4; I can offer supposition, but that's a poor substitute for first hand info :).

@edwardhartnett
Copy link
Contributor

I don't think you are missing anything subtle. This just never occurred to me. Good idea!

wkliao added a commit to Parallel-NetCDF/E3SM-IO that referenced this issue Mar 25, 2022
* NetCDF-4 uses the dimension scale feature which is part of HDF5
  high-level APIs, but HDF5 high-level APIs are not well tested for
  parallel I/O. See Unidata/netcdf-c#2251 and
  Unidata/netcdf-c#1822
* NetCDF PR #2161 adds a new flag NC_NODIMSCALE_ATTACH to allow users to
  disable dimension scale, which resolves the problem for e3sm-io. See
  Unidata/netcdf-c#2161
* NetCDF team indicates PR #2161 will appear in version 4.9.0.
wkliao added a commit to Parallel-NetCDF/E3SM-IO that referenced this issue Mar 25, 2022
* NetCDF-4 uses the dimension scale feature which is part of HDF5
  high-level APIs, but HDF5 high-level APIs are not well tested for
  parallel I/O. See Unidata/netcdf-c#2251 and
  Unidata/netcdf-c#1822
* NetCDF PR #2161 adds a new flag NC_NODIMSCALE_ATTACH to allow users to
  disable dimension scale, which resolves the problem for e3sm-io. See
  Unidata/netcdf-c#2161
* NetCDF team indicates PR #2161 will appear in version 4.9.0.
wkliao added a commit to Parallel-NetCDF/E3SM-IO that referenced this issue Apr 22, 2022
* NetCDF-4 uses the dimension scale feature which is part of HDF5
  high-level APIs, but HDF5 high-level APIs are not well tested for
  parallel I/O. See Unidata/netcdf-c#2251 and
  Unidata/netcdf-c#1822
* NetCDF PR #2161 adds a new flag NC_NODIMSCALE_ATTACH to allow users to
  disable dimension scale, which resolves the problem for e3sm-io. See
  Unidata/netcdf-c#2161
* NetCDF team indicates PR #2161 will appear in version 4.9.0.
@edwardhartnett
Copy link
Contributor

I believe this issue should be closed.

I note that it is now possible to create files that don't do the dimscales.

@brtnfld
Copy link
Contributor Author

brtnfld commented Apr 27, 2022

The issue can be revisited in the future if needed. Closing.

@brtnfld brtnfld closed this as completed Apr 27, 2022
wkliao added a commit to Parallel-NetCDF/E3SM-IO that referenced this issue Apr 28, 2022
* NetCDF-4 uses the dimension scale feature which is part of HDF5
  high-level APIs, but HDF5 high-level APIs are not well tested for
  parallel I/O. See Unidata/netcdf-c#2251 and
  Unidata/netcdf-c#1822
* NetCDF PR #2161 adds a new flag NC_NODIMSCALE_ATTACH to allow users to
  disable dimension scale, which resolves the problem for e3sm-io. See
  Unidata/netcdf-c#2161
* NetCDF team indicates PR #2161 will appear in version 4.9.0.
wkliao added a commit to Parallel-NetCDF/E3SM-IO that referenced this issue Apr 28, 2022
* NetCDF-4 uses the dimension scale feature which is part of HDF5
  high-level APIs, but HDF5 high-level APIs are not well tested for
  parallel I/O. See Unidata/netcdf-c#2251 and
  Unidata/netcdf-c#1822
* NetCDF PR #2161 adds a new flag NC_NODIMSCALE_ATTACH to allow users to
  disable dimension scale, which resolves the problem for e3sm-io. See
  Unidata/netcdf-c#2161
* NetCDF team indicates PR #2161 will appear in version 4.9.0.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants