Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suspend runtime #3129

Merged
merged 33 commits into from Feb 4, 2018
Merged

Suspend runtime #3129

merged 33 commits into from Feb 4, 2018

Conversation

msimberg
Copy link
Contributor

@msimberg msimberg commented Jan 25, 2018

Part of #3003.

Proposed Changes

  • Add hpx::suspend and hpx::resume functions

  • hpx::suspend blocks waiting for all pools to finish their work

  • hpx::resume blocks waiting for pools to resume

  • Add a wait function to io_service_pool which waits for the io_services to finish their work. As io_services stay idle when they have no work, hpx::suspend only waits for io_service_pools to be empty.

  • This uses Suspend thread pool #3110 and Add hpx::start nullptr overloads #3127. They should be merged first.

Any background context you want to provide?

The API is now a lot simpler that initially suggested in #3003, to not complicate the use of hpx::start/init/stop further. This also makes the implementation simpler. If a "smarter" API is needed in the future it can be built on top of this, but this is enough as a start.

The resume/suspend in this PR is faster than start/stop, but not significantly. I will still need to optimize that part to make this useful for small blocks of work, but will do that in a separate PR.

…dlock during exceptional shutdown

Previously stop_locked would block on calling resume_internal, even if blocking was set to false. This lead to deadlocks when multiple threads call stop_locked and assume that only one of them has blocking set to true.
Wrong function name passed to HPX_THROWS_IF
- Make sure there is an idle thread which can do the stealing
- Add comments to warn about corner cases
Change state_suspended to state_sleeping.
I.e. don't use pending_boost to let other work run first.
Add two new functions to the hpx namespace:

- suspend
- resume

and corresponding functions to threadmanager.

These two currently more or less wrap the threadmanager suspend/resume
calls and check the runtime status.
io_service_pool::wait removes the work guard from the io_services so
that they finish when done, waits for all io_service.run()s to
return, and then restarts them so that they accept work again.
Don't allow suspension when number of localities > 1.
- Wait for runtime to start before suspending
- Remove hpx_main
- Clean up includes
@hkaiser hkaiser added this to the 1.1.0 milestone Jan 25, 2018
@msimberg msimberg changed the title WIP: Suspend runtime Suspend runtime Jan 29, 2018
@msimberg msimberg changed the title Suspend runtime WIP: Suspend runtime Jan 31, 2018
@msimberg msimberg changed the title WIP: Suspend runtime Suspend runtime Jan 31, 2018
This was referenced Jan 31, 2018
Copy link
Member

@hkaiser hkaiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@hkaiser
Copy link
Member

hkaiser commented Feb 2, 2018

This PR is a high risk change as it touches on the scheduling loop. We have to be very careful with merging it.

@hkaiser hkaiser merged commit 792504f into STEllAR-GROUP:master Feb 4, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants