Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing parcel scheduling #2518

Merged
merged 1 commit into from Feb 26, 2017
Merged

Fixing parcel scheduling #2518

merged 1 commit into from Feb 26, 2017

Conversation

sithhell
Copy link
Member

Due to the recent changes in executing background_work in an HPX thread,
parcel decoding could create a deadlock when a direct actions suspends
and there are more parcels to decode and schedule. This patch remedies
this situation.

This fixes #2516.

@hkaiser
Copy link
Member

hkaiser commented Feb 23, 2017

@sithhell Could you please explain a bit what this patch does?

@sithhell
Copy link
Member Author

In short: whenever we have more than 1 parcel to decode, and at least two of those parcels contain a direct action, every direct action except the last is scheduled on a new HPX thread. This is to avoid deadlock situations if this direct action is being put into suspended mode.

@hkaiser
Copy link
Member

hkaiser commented Feb 23, 2017

Do I have to like that what seemed to be a good idea (i.e. execute direct actions directly) starts influencing more places in the code than we would have even remotely expected? I still think that all of this has a non-healthy ring to it. We have to introduce more and more tricky workarounds into various, unrelated places. I don't have a better suggestion however at this point.

@sithhell
Copy link
Member Author

While I agree that the change in general had a big impact on what needed to be changed, I still think it is overall good, I wouldn't call the places which needed change unrelated though, they were all dealing with scheduling threads/actions. If at all, I think the concept of direct actions is incomplete, while of course offering a great optimization possibility. I wouldn't want to loose that, I put some effort into commenting the pieces of code such that they hopefully don't appear to be that tricky after all.

@hkaiser
Copy link
Member

hkaiser commented Feb 24, 2017

Thomas, I wouldn't call the changes to the barrier code being related to scheduling threads/actions etc. But as said, I don't have a better suggestion at this point. So please go ahead and merge this.

@hkaiser
Copy link
Member

hkaiser commented Feb 24, 2017

Due to the recent changes in executing background_work in an HPX thread,
parcel decoding could create a deadlock when a direct actions suspends
and there are more parcels to decode and schedule. This patch remedies
this situation.

This fixes #2516.

Flyby: Only notify the big boot barrier CV once
@hkaiser hkaiser merged commit b97ae1c into master Feb 26, 2017
@hkaiser hkaiser deleted the fixing_2516 branch February 26, 2017 18:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

HPX locks up when using at least 256 localities
2 participants