-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix empty(QueueCpuAsync) returning true even though the last task is still executing #627
Fix empty(QueueCpuAsync) returning true even though the last task is still executing #627
Conversation
@@ -186,8 +189,7 @@ namespace alpaka | |||
queue::QueueCpuSync const & queue) | |||
-> bool | |||
{ | |||
alpaka::ignore_unused(queue); | |||
return true; | |||
return !queue.m_spQueueImpl->m_bCurrentlyExecutingTask; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering for which case this is useful? If I call enqueue, the task will not return until, so nobody except the task itself has a change to check to queue for emptyness. Of course a concurrent task could test the queue, but in that case m_bCurrentlyExecutingTask
should be an atomic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The task being executed can be a host callback which can check the state (this is exactly what the unit test is doing).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, makes sense. I will just check the changes with my code. 😉
|
||
task(); | ||
t.join(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code here is not new but only copied from the QueueCudaRtAsync
. This had been forgotten when a bug had been fixed there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, thanks!
Hmmm... CI had one single build that was not happy with the |
So in the end I will have to fix the waiting for |
I understand the problem, but except for the test, I think the solution is "less wrong" than the old behaviour. Of course of you check for emptiness, you will get an unexpected result, but at least you can assume emptiness of the queue, as no user-given task is executed anymore. |
ed288ea
to
47d48ff
Compare
I have now made changes to the events so that they are not signaled ready before they are removed from the queue. I had to try many ways to find the current solution but now it is cleaner than the original version. |
e507323
to
5f0ea8e
Compare
Signaling the event ready from within the enqueued event function is wrong because waits for the event may be resolved even though the work queue still has the event task itself in progress. Using the future of the work queue is better.
while(enqueueCount > m_LastReadyEnqueueCount) | ||
{ | ||
auto future = m_future; | ||
lk.unlock(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shoudln't that the other way around? locking, working, unlocking?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I think this is correct. At a specific point A in time we want to wait. In this case we lock the event mutex, check if we need to wait at all and copy the current future. Then we unlock the event and wait for the future to finish. This waiting can not be done while locked because it might deadlock for multiple reasons but most prominently because the execution of the event itself requires the lock to increase the m_LastReadyEnqueueCount
.
We waited on a copy of the future because in the meantime the event could have been re-enqueued to a later point in time. This is no problem because after waiting we simply lock the event again and the while check is false and we correctly waited for the event.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, sorry, I mixed up lock
and mutex
🤦♂️
Now for something completely different! |
0825fe4
to
167769e
Compare
But the new unit test does not compile due to many warnings and it is not really new because both things are already tested. I have pushed some minor changes and revert the test. |
167769e
to
a6f960e
Compare
@theZiz Please approve if this works for you. |
Works! 👍 |
@theZiz You should be allowed to give a real review approval so that I can merge it. |
I doubt so. @ax3l or @psychocoderHPC need to approve the PR. Edit: And I was wrong... |
I will check the PR in ~2h |
@psychocoderHPC Are you done? |
no sry I start now. Was busy with other tasks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The style issue is not important but the other question about using callbacks is very important.
#endif | ||
{ | ||
auto boundTask([=](){return task(args...);}); | ||
auto decrementNumActiveTasks = [this](){--m_numActiveTasks;}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not important but should be: auto decrementNumActiveTasks([this](){--m_numActiveTasks;});
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
lock, | ||
[pCallbackSynchronizationData](){ | ||
return pCallbackSynchronizationData->notified; | ||
// We start a new std::thread which stores the task to be executed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we always enqueue CUDA callbacks with alpaka?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK after reviewing the corresponding parts again I found that this is the code to create a callback introduced in #373.
The code itself looks like it is the kernel enqueue stuff but kernel start in the name space alpaka::exec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have not changed anything about the cudaStreamAddCallback
here. The bug I found was that the callback was directly executed within the CUDA callback thread. In the CUDA callback thread you are not allowed to do CUDA calls, which I did in my test. This had already been fixed in the Async version. I simply copied the fix from over there.
form my side it can be merged if expect @BenjaminW3 will fix the style issue |
0353f45
Please re-approve |
… still executing backport of alpaka-group#627 Fixes alpaka-group#621 By the way the test found that this did not work for QueueCpuSync and QueueCudaRtSync as well (when the last task was a callback).
… still executing backport of alpaka-group#627 Fixes alpaka-group#621 By the way the test found that this did not work for QueueCpuSync and QueueCudaRtSync as well (when the last task was a callback).
Fixes #621
By the way the test found that this did not work for
QueueCpuSync
andQueueCudaRtSync
as well (when the last task was a callback).