Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pinned task problem #48

Closed
Vuhdo opened this issue Dec 20, 2019 · 12 comments
Closed

Pinned task problem #48

Vuhdo opened this issue Dec 20, 2019 · 12 comments
Assignees

Comments

@Vuhdo
Copy link

Vuhdo commented Dec 20, 2019

I think there is a problem with TaskScheduler::WakeThreadsForNewTasks() and pinned tasks.

Consider a possible case: what if the number of suspended threads - those waiting the m_pNewTaskSemaphore to be signalled - increases just before the SemaphoreSignal() called, i.e. the value of waiting was not accurate as some threads fell asleep between the check and the signal. In this case some task threads would idle.
Most of the time it's not a big deal, those threads would awake when next task arrives.
(Not sure if it possible, but even if we are so unlucky and all the tasks threads fall asleep just after the check - the calling thread would handle the task itself.)

Now, when using AddPinnedTask() there's a subtle chance that the thread we pinned the task to was suspended as described above:

  • [User thread] calls AddPinnedTask().
  • [Task thread] falls asleep just after the m_NumThreadsWaitingForNewTasks check.
  • The semaphore is either not being released at all or it awakens some threads but the desired one.
  • [User thread] calls WaitforTask() and hangs as the thread the task is pinned to can't handle the request.

This is the problem I ran into while trying to port my code to enkiTS.
Though to be honest I'm not quite sure if it indeed the case and if my assumption is accurate. Parallel programming is hard.

@dougbinks
Copy link
Owner

I'll have a check through the logic again for this over the weekend, but if you're able to reproduce the issue in a small sample then that would be great, but it's not a requirement for me to check this.

@dougbinks dougbinks self-assigned this Dec 20, 2019
@dougbinks dougbinks added the bug label Dec 20, 2019
@Vuhdo
Copy link
Author

Vuhdo commented Dec 20, 2019

Thanks, I'll try to reproduce it in a sample later.

In the meanwhile, here's a picture of missed signals:
AddPinnedTask_Fail

@dougbinks
Copy link
Owner

I've created a branch dev_waketasks with a potential fix for this - it looks like I broke the logic when moving to semaphores. I'm still doing some tests and running through the logic to check, but would appreciate knowing if this resolves your issue.

@dougbinks
Copy link
Owner

I've been able to find a reproduction for this, which will be added to my tests. The above fix doesn't resolve it but now I have a reproduction I should be able to fix this fairly rapidly - however I'm not sure how much time I'll have over the next few days.

@Vuhdo
Copy link
Author

Vuhdo commented Dec 23, 2019

Yeah, unfortunately the change hasn't helped, hopefully it will be fixed soon enough.
And I would like to say thank you, Doug, and wish you a Merry Christmas and a Happy New Year! 🎄

@dougbinks
Copy link
Owner

I have a good idea what's going on now, and a potential fix in progress so this shouldn't take too long.

Merry Christmas and a Happy New Year to you too!

dougbinks added a commit that referenced this issue Dec 23, 2019
@dougbinks
Copy link
Owner

I've pushed a fix for the issue I was able to reproduce in the revised TestWaitforTask.cpp test, though there's always a chance you have another related issue.

The current solution is slightly heavy handed, but the alternative solutions will take longer to program.

@dougbinks
Copy link
Owner

Update I'm still tracking a bug with this which happens extremely rarely, so the current dev_waketasks branch changes improve but do not resolve the issue.

@dougbinks
Copy link
Owner

The latest commit on dev_waketasks resolves the above issue, so may be worth testing on your code when you get time.

Meanwhile I'll continue checking logic and testing.

@dougbinks
Copy link
Owner

I've now merged this into master branch.

I'm going to pre-emptively close this assuming the fix works for your situation as well as my test, but if not please re-open.

@dougbinks dougbinks reopened this Jan 5, 2020
@dougbinks
Copy link
Owner

I'm reopening this issue as I can still replicate the problem on OSX now I've fixed #49

@dougbinks
Copy link
Owner

I found the logic error responsible for the recent issue - somehow I'd missed that HaveTasks() was not being called by WaitForTaskCompletion() after wait variables were set. I found a few other minor perf issues at the same time.

I'm going to close this again, hopefully it will stay closed for a while :) Please do update this thread if your able to test with whether your issue is resolved or not and if not reopen.

Many thanks for finding this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants