Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nc_put_vars_double fails in parallel if using netcdf-4 collective #447

Closed
gsjaardema opened this issue Aug 1, 2017 · 4 comments
Closed

Comments

@gsjaardema
Copy link
Contributor

Version 4.5.1-devel.

If I call nc_put_vars_double with stride=1 in parallel with some processors having no data to write, then the H5Dwrite call will fail.

The problem is due to the

   if(nels == 0)
      return NC_NOERR; /* cannot write anything */

at line 244 of libdispatch/dvarput.c. If I remove that early return and if stride == 1, then the code will complete correctly. If that line is left as is, then some processors return early and the code hangs down below H5Dwrite due to hdf5 calling PMPI_Allreduce if using collective io.

There is another issue if stride != 1, but I will report that in a separate issue.

@gsjaardema
Copy link
Contributor Author

The related issue when stride != 1 is #448

gsjaardema added a commit to gsjaardema/netcdf-c that referenced this issue Aug 1, 2017
@edhartnett
Copy link
Contributor

@gsjaardema Is this still and issue or should it be closed?

@gsjaardema
Copy link
Contributor Author

This can be closed.

@edhartnett
Copy link
Contributor

You have to close it, I can't. ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants