Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix empty(QueueCpuAsync) returning true even though the last task is still executing #627

Merged

Conversation

BenjaminW3
Copy link
Member

@BenjaminW3 BenjaminW3 commented Sep 3, 2018

Fixes #621

By the way the test found that this did not work for QueueCpuSync and QueueCudaRtSync as well (when the last task was a callback).

@@ -186,8 +189,7 @@ namespace alpaka
queue::QueueCpuSync const & queue)
-> bool
{
alpaka::ignore_unused(queue);
return true;
return !queue.m_spQueueImpl->m_bCurrentlyExecutingTask;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering for which case this is useful? If I call enqueue, the task will not return until, so nobody except the task itself has a change to check to queue for emptyness. Of course a concurrent task could test the queue, but in that case m_bCurrentlyExecutingTask should be an atomic?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The task being executed can be a host callback which can check the state (this is exactly what the unit test is doing).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, makes sense. I will just check the changes with my code. 😉


task();
t.join();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code here is not new but only copied from the QueueCudaRtAsync. This had been forgotten when a bug had been fixed there.

theZiz
theZiz previously approved these changes Sep 4, 2018
Copy link
Contributor

@theZiz theZiz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks!

@BenjaminW3
Copy link
Member Author

Hmmm... CI had one single build that was not happy with the QueueCpuAsync unit test.
I think I understand what the problem is but it is really really hard to solve.
The test is doing wait(queue) and after this expects that empty(queue) is true. To understand the failure you have to know how the CPU queue implements waiting. wait(queue) simply enqueues an event and waits for this event. The event itself is only a condition_variable which is notified once the event is "executed". Now there is a very very short time in between the event notification and the line where it is counted as finished. In the end it boils down to the event being a task that does essentially m_conditionVariable.notify_all(); and the task queue doing --m_numActiveTasks; If the thread is interrupted in between, the event has been notified, the wait for the event/queue finishes and empty(queue) still returns false.

@BenjaminW3
Copy link
Member Author

So in the end I will have to fix the waiting for QueueCpuAsync either by not using events but using something different, or by changing the behaviour of the events.
This may take some time. Until then the current state of this branch already fixes your problem but will not be merged.

@theZiz
Copy link
Contributor

theZiz commented Sep 6, 2018

I understand the problem, but except for the test, I think the solution is "less wrong" than the old behaviour. Of course of you check for emptiness, you will get an unexpected result, but at least you can assume emptiness of the queue, as no user-given task is executed anymore.
Maybe this would be even a solution? Flagging tasks, whether they are "real" user tasks or internal tasks like for the event waiting. Not incrementing and decrementing the counter for the internal event task for wait(queue) may be a simple solution?

@BenjaminW3
Copy link
Member Author

I have now made changes to the events so that they are not signaled ready before they are removed from the queue. I had to try many ways to find the current solution but now it is cleaner than the original version.
I might have to fix a gcc-4.9 workaround I have removed at the moment but CI will tell me.

@BenjaminW3 BenjaminW3 force-pushed the topic-fix-QueueCpuAsync branch 6 times, most recently from e507323 to 5f0ea8e Compare September 15, 2018 14:14
Signaling the event ready from within the enqueued event function is
wrong because waits for the event may be resolved even though the work
queue still has the event task itself in progress.
Using the future of the work queue is better.
while(enqueueCount > m_LastReadyEnqueueCount)
{
auto future = m_future;
lk.unlock();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shoudln't that the other way around? locking, working, unlocking?

Copy link
Member Author

@BenjaminW3 BenjaminW3 Sep 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I think this is correct. At a specific point A in time we want to wait. In this case we lock the event mutex, check if we need to wait at all and copy the current future. Then we unlock the event and wait for the future to finish. This waiting can not be done while locked because it might deadlock for multiple reasons but most prominently because the execution of the event itself requires the lock to increase the m_LastReadyEnqueueCount.
We waited on a copy of the future because in the meantime the event could have been re-enqueued to a later point in time. This is no problem because after waiting we simply lock the event again and the while check is false and we correctly waited for the event.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, sorry, I mixed up lock and mutex 🤦‍♂️

@theZiz
Copy link
Contributor

theZiz commented Sep 17, 2018

Now for something completely different!
I found the "bug". I already had the topic branch with your first fix attempt and just merged the new changes to it. Will, figures out, this failed. So with removing the branch and repulling your latest changes, everything works fine, thx! 😆 At least we got a new unit test... 🙄

@BenjaminW3
Copy link
Member Author

But the new unit test does not compile due to many warnings and it is not really new because both things are already tested. I have pushed some minor changes and revert the test.

@BenjaminW3
Copy link
Member Author

@theZiz Please approve if this works for you.

@theZiz
Copy link
Contributor

theZiz commented Sep 18, 2018

@theZiz Please approve if this works for you.

Works! 👍

@BenjaminW3
Copy link
Member Author

@theZiz You should be allowed to give a real review approval so that I can merge it.

@theZiz
Copy link
Contributor

theZiz commented Sep 18, 2018

@theZiz You should be allowed to give a real review approval so that I can merge it.

I doubt so. @ax3l or @psychocoderHPC need to approve the PR.

Edit: And I was wrong...

theZiz
theZiz previously approved these changes Sep 18, 2018
@psychocoderHPC
Copy link
Member

I will check the PR in ~2h

@BenjaminW3
Copy link
Member Author

@psychocoderHPC Are you done?

@psychocoderHPC
Copy link
Member

no sry I start now. Was busy with other tasks

psychocoderHPC
psychocoderHPC previously approved these changes Sep 18, 2018
Copy link
Member

@psychocoderHPC psychocoderHPC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The style issue is not important but the other question about using callbacks is very important.

#endif
{
auto boundTask([=](){return task(args...);});
auto decrementNumActiveTasks = [this](){--m_numActiveTasks;};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not important but should be: auto decrementNumActiveTasks([this](){--m_numActiveTasks;});

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

lock,
[pCallbackSynchronizationData](){
return pCallbackSynchronizationData->notified;
// We start a new std::thread which stores the task to be executed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we always enqueue CUDA callbacks with alpaka?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK after reviewing the corresponding parts again I found that this is the code to create a callback introduced in #373.
The code itself looks like it is the kernel enqueue stuff but kernel start in the name space alpaka::exec.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not changed anything about the cudaStreamAddCallback here. The bug I found was that the callback was directly executed within the CUDA callback thread. In the CUDA callback thread you are not allowed to do CUDA calls, which I did in my test. This had already been fixed in the Async version. I simply copied the fix from over there.

@psychocoderHPC
Copy link
Member

psychocoderHPC commented Sep 18, 2018

form my side it can be merged if expect @BenjaminW3 will fix the style issue

@BenjaminW3
Copy link
Member Author

Please re-approve

@psychocoderHPC psychocoderHPC merged commit 3dbf482 into alpaka-group:develop Sep 19, 2018
@BenjaminW3 BenjaminW3 deleted the topic-fix-QueueCpuAsync branch September 19, 2018 16:02
psychocoderHPC added a commit to psychocoderHPC/alpaka that referenced this pull request Sep 25, 2018
… still executing

backport of alpaka-group#627

Fixes alpaka-group#621

By the way the test found that this did not work for QueueCpuSync and QueueCudaRtSync as well (when the last task was a callback).
psychocoderHPC added a commit to psychocoderHPC/alpaka that referenced this pull request Sep 25, 2018
… still executing

backport of alpaka-group#627

Fixes alpaka-group#621

By the way the test found that this did not work for QueueCpuSync and QueueCudaRtSync as well (when the last task was a callback).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants