-
Notifications
You must be signed in to change notification settings - Fork 262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NetCDF's use of HDF5 HL DS APIs and parallel I/O performance. #1822
Comments
Scott, I'll tag @edhartnett for his feedback since he implemented the HL calls in netCDF-4; I can offer supposition, but that's a poor substitute for first hand info :). |
I don't think you are missing anything subtle. This just never occurred to me. Good idea! |
* NetCDF-4 uses the dimension scale feature which is part of HDF5 high-level APIs, but HDF5 high-level APIs are not well tested for parallel I/O. See Unidata/netcdf-c#2251 and Unidata/netcdf-c#1822 * NetCDF PR #2161 adds a new flag NC_NODIMSCALE_ATTACH to allow users to disable dimension scale, which resolves the problem for e3sm-io. See Unidata/netcdf-c#2161 * NetCDF team indicates PR #2161 will appear in version 4.9.0.
* NetCDF-4 uses the dimension scale feature which is part of HDF5 high-level APIs, but HDF5 high-level APIs are not well tested for parallel I/O. See Unidata/netcdf-c#2251 and Unidata/netcdf-c#1822 * NetCDF PR #2161 adds a new flag NC_NODIMSCALE_ATTACH to allow users to disable dimension scale, which resolves the problem for e3sm-io. See Unidata/netcdf-c#2161 * NetCDF team indicates PR #2161 will appear in version 4.9.0.
* NetCDF-4 uses the dimension scale feature which is part of HDF5 high-level APIs, but HDF5 high-level APIs are not well tested for parallel I/O. See Unidata/netcdf-c#2251 and Unidata/netcdf-c#1822 * NetCDF PR #2161 adds a new flag NC_NODIMSCALE_ATTACH to allow users to disable dimension scale, which resolves the problem for e3sm-io. See Unidata/netcdf-c#2161 * NetCDF team indicates PR #2161 will appear in version 4.9.0.
I believe this issue should be closed. I note that it is now possible to create files that don't do the dimscales. |
The issue can be revisited in the future if needed. Closing. |
* NetCDF-4 uses the dimension scale feature which is part of HDF5 high-level APIs, but HDF5 high-level APIs are not well tested for parallel I/O. See Unidata/netcdf-c#2251 and Unidata/netcdf-c#1822 * NetCDF PR #2161 adds a new flag NC_NODIMSCALE_ATTACH to allow users to disable dimension scale, which resolves the problem for e3sm-io. See Unidata/netcdf-c#2161 * NetCDF team indicates PR #2161 will appear in version 4.9.0.
* NetCDF-4 uses the dimension scale feature which is part of HDF5 high-level APIs, but HDF5 high-level APIs are not well tested for parallel I/O. See Unidata/netcdf-c#2251 and Unidata/netcdf-c#1822 * NetCDF PR #2161 adds a new flag NC_NODIMSCALE_ATTACH to allow users to disable dimension scale, which resolves the problem for e3sm-io. See Unidata/netcdf-c#2161 * NetCDF team indicates PR #2161 will appear in version 4.9.0.
I noticed that NetCDF calls the High-level DS APIs by all the processes doing I/O. For example, it has sync_netcdf5_file --> nc4_rec_write_metadata --> write_var --> (H5DSdetach_scale, H5DSattach_scale, etc...).
Has there been an investigation into the possibility of closing the file and reopening it with only one process and then doing all the metadata updating (.i.e. the DS calls)?
All the HL DS APIs are "serial," so all the processes are re-writing the same data, and this data also seems to be very small. Usually, doing many little writes results in poor parallel performance on a parallel file system, especially when you have tens of thousands of processes writing what amounts to the same data.
The HL APIs are not meant to be called by all the processes since this, for the most part, just duplicates the I/O. Also, HDF5 does not test any of the HL APIs in a parallel setting.
I could be missing something subtle that doesn't allow NetCDF to do this, so any insight/feedback would be much appreciated.
The text was updated successfully, but these errors were encountered: