Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shared priority scheduler does not always create a new thread #5189

Open
msimberg opened this issue Feb 19, 2021 · 7 comments
Open

Shared priority scheduler does not always create a new thread #5189

msimberg opened this issue Feb 19, 2021 · 7 comments

Comments

@msimberg
Copy link
Contributor

While working on #5172 the shutdown_suspended_pus test started failing more regularly. The failure was this assertion firing:

. The scheduling loop expects run_now = true to mean that a thread object will definitely be created. The shared priority queue scheduler has a fallback where if select_active_pu does not return the same thread as the given hint it will not create a thread, but just a thread description. This can happen basically only when suspension is involved.

I think either the shared priority queue scheduler should respect the run_now argument, or there needs to be a separate flag to force a thread to be created. This may also explain some other failures with the shared_priority_queue_scheduler on the suspension tests (many of which are disabled because of failures).

@biddisco

@biddisco
Copy link
Contributor

It depends on what you think the scheduling loop should be doing/requesting. I would like ro remove most of the functionality from the scheduling loop and instead allow schedulers to decide how they manage things. Currently the scheduling loop calls functions that steal, delete, create tasks and this forces a certain api onto the schedulers that I think is wrong. Evidence at hand is that for the fork join scheduler, all of this is bypassed and instead another scheduling loop is run.
HPX_ASSERT(background_thread); is really another thing. The background task used to be run on the OS thread, but Thomas added support to instead run it on an hpx worker thread - this was partly because of problems inside the parcelport when adding verbs/libfabric support, but in fact it isn't necessary any more since we don't suspend or block inside the network processing. There is not really any need to do this, though it is an interesting feature (which might prove useful when considering continuations on mpi or gpu futures that are currently polled on the OS thread). The obvious thing to do is simply put the background processing thread into the same category as those other polling helpers.

Why does the shared priority scheduler cause problems with the suspension tests? I was not aware of this issue, perhaps I can help solve that...

@msimberg
Copy link
Contributor Author

Why does the shared priority scheduler cause problems with the suspension tests? I was not aware of this issue, perhaps I can help solve that...

You tell me ;) This is one of the reasons, apparently. Let me know if my description of the problem wasn't clear enough. I thought there were more of them, but it looks like there's only one other place where the shared priority scheduler is disabled, in suspend_thread_timed:

// until timed thread problems are fix, disable this
//hpx::resource::scheduling_policy::shared_priority,
. I don't know if it's in any way related though.

It depends on what you think the scheduling loop should be doing/requesting. I would like ro remove most of the functionality from the scheduling loop and instead allow schedulers to decide how they manage things. Currently the scheduling loop calls functions that steal, delete, create tasks and this forces a certain api onto the schedulers that I think is wrong. Evidence at hand is that for the fork join scheduler, all of this is bypassed and instead another scheduling loop is run.
HPX_ASSERT(background_thread); is really another thing. The background task used to be run on the OS thread, but Thomas added support to instead run it on an hpx worker thread - this was partly because of problems inside the parcelport when adding verbs/libfabric support, but in fact it isn't necessary any more since we don't suspend or block inside the network processing. There is not really any need to do this, though it is an interesting feature (which might prove useful when considering continuations on mpi or gpu futures that are currently polled on the OS thread). The obvious thing to do is simply put the background processing thread into the same category as those other polling helpers.

As for all this, I have no particular objections to all this but they are obviously bigger changes. I don't even personally consider the suspension tests particularly important, so please feel free to ignore this as well. I just wanted to make you aware of this problem at least.

@msimberg
Copy link
Contributor Author

I meant to add: the fact that the shared priority scheduler sometimes ignores to return a thread object is related to suspension, but that doesn't only affect the background thread. Things like hpx::thread also rely on actually getting a thread id when spawning work.

@biddisco
Copy link
Contributor

I'll take a look at this once I'm done with my current work.

@stale
Copy link

stale bot commented Aug 24, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@msimberg
Copy link
Contributor Author

@biddisco I'm leaving this open in case you get a chance to look at it. Feel free to close if you're not planning on fixing this.

@stale
Copy link

stale bot commented Apr 16, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the tag: wontfix label Apr 16, 2022
@hkaiser hkaiser added the tag: pinned Never close as stale label Apr 16, 2022
@stale stale bot removed the tag: wontfix label Apr 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants