Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HPX locks up when using at least 256 localities #2516

Closed
sithhell opened this issue Feb 22, 2017 · 2 comments · Fixed by #2518
Closed

HPX locks up when using at least 256 localities #2516

sithhell opened this issue Feb 22, 2017 · 2 comments · Fixed by #2518

Comments

@sithhell
Copy link
Member

The barrier implementation using an hierarchical tree currently is leading to a deadlock. This has been observed by trying to run an HPX application on 256 nodes. When setting hpx.lcos.collectives.cut_off to a very high value, the hang disappears.

@sithhell sithhell added this to the 1.0.0 milestone Feb 22, 2017
sithhell added a commit that referenced this issue Feb 23, 2017
Due to the recent changes in executing background_work in an HPX thread,
parcel decoding could create a deadlock when a direct actions suspends
and there are more parcels to decode and schedule. This patch remedies
this situation.

This fixes #2516.
sithhell added a commit that referenced this issue Feb 26, 2017
Due to the recent changes in executing background_work in an HPX thread,
parcel decoding could create a deadlock when a direct actions suspends
and there are more parcels to decode and schedule. This patch remedies
this situation.

This fixes #2516.

Flyby: Only notify the big boot barrier CV once
sithhell added a commit that referenced this issue Feb 26, 2017
Due to the recent changes in executing background_work in an HPX thread,
parcel decoding could create a deadlock when a direct actions suspends
and there are more parcels to decode and schedule. This patch remedies
this situation.

This fixes #2516.

Flyby: Only notify the big boot barrier CV once
@biddisco
Copy link
Contributor

Now I know why my infiniband tests stopped at 128 nodes

@diehlpk
Copy link
Member

diehlpk commented Jun 2, 2021

@sithhell What is a really high value?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants