-
Notifications
You must be signed in to change notification settings - Fork 3k
Remove windup behavior from break_dispatch #6238
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Looks fine to me - outstanding thought from #6204 - should the "break" flag also be cleared if it exits due to timeout, which is tested before the break flag? |
events/equeue/equeue.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's now "break", rather than "breaks", but that's a keyword. How about break_requested
? Always good to name booleans as something that makes sense to be true
or false
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah! Good catch, that would explain why the CI was failing
events/equeue/equeue.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could avoid semaphore signalling if break already set. (Avoiding pointless ISR queue use, as per my suggestion ARM-software/CMSIS_5#283 )
I think this looks great! |
Ha! Of course, all the other issues now emerge from the woodwork. So, so far we have:
Would it make sense for me to do those as well? I did take a look at the failing CI test, but it's not too clear to me why the failure occurred, is there anything else I need to do to make the tests happy? |
Didn't think it would be that easy, did you? I'd like @geky's thought on the timeout clear. It seems correct to me, but my brain sometimes malfunctions. If we're going to change break behaviour, probably best to make 1 combined change. Variable should be renamed because I think avoiding the semaphore signal is worthwhile, just because we have recurring bug reports of that ISR overflow problem. Although the real place that occurs here would be in |
If you have a linux machine handy, you can run the tests locally with I think I found the issue, will comment on the line.
I think you're right we need timeout to also clear. That's the only way to guarantee that a break from inside the queue is cleared before dispatch exits. Here's the other return statement, should just need a Sidenote: There is no similar guarantee for a break from a different thread, since it's a tossup if the break could land after dispatch exits.
IMO this sounds like an RTOS issue if binary semaphores have overflow problems. If we need a workaround I'd prefer that goes in the porting layer (with a flag?) so this issue is fixed for all calls to Also break_requested sounds good to me too. |
events/equeue/equeue.c
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, unfortunately this logic is subtlely complex...
if (q->breaks) { // <- This condition is outside of mutex so it can't be
// trusted. Even if this if is true, we're only _probably_
// sure that a break was signalled (concurrent dispatch calls).
// The reason for the extra if statement is so that we can avoid
// getting a mutex in the common case (no break events).
equeue_mutex_lock(&q->queuelock);
if (q->breaks > 0) { // <- Once we're in the mutex, we confirm that break
// was signalled
q->breaks--;
equeue_mutex_unlock(&q->queuelock);
return; // <- This return actually exits equeue_dispatch. Otherwise
// the queue happily continues to run.
}
equeue_mutex_unlock(&q->queuelock);
}
So two problems:
- You still need the second if statement inside the mutex
- You still need the return statement to actually break out of the dispatch loop
c00061a
to
5d98d22
Compare
ffbf969
to
bb0d540
Compare
Allright, I chose to amend my commit because I didn't want any false noise around the return cleanup. I also added the clearing on timeout, but I also chose to put in within the mutex lock. I went ahead and added two tests - one for the break_dispatch windup behavior and the other for clearing break_request on timeout. Good thing that I did because I originally put the timeout clearing code within the background update mutex which didn't fire, of course. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks great 👍 Thanks for adding tests. I noticed some style difference if you are able to fix those, but then it should be good to go in.
void simple_breaker(void *p) { | ||
CountAndQueue* caq = (CountAndQueue*)p; | ||
equeue_break(caq->q); | ||
usleep(10000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: It looks like there's a mixture of tabs + spaces here. Can you change this to just 4 spaces for indention.
events/equeue/tests/tests.c
Outdated
equeue_t* q; | ||
}; | ||
|
||
typedef struct sCaQ CountAndQueue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: This is inconsistent with other structs in this file. Could you change this to struct count_and_queue
without a typedef.
grr, that's what I get for editor-hopping, I fall in the space-over-tabs camp, so thanks for letting me know :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem, looks good to me 👍
Next step is @kjbracey-arm's review, CI, and then it should be merged shortly after.
q->background.active = true; | ||
equeue_mutex_unlock(&q->queuelock); | ||
} | ||
q->break_requested = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel a bit nervous about actually writing to this variable outside the mutex - one step worse than reading it. In both cases it has no synchronisation protection, which could in principle cause problems with multiple CPUs. Some C11/C++11 atomic would sort it out, but we don't have that available.
However, I can't actually construct a plausible failure mode looking at it - all the mutexes and semaphores that do exist around it add quite a lot of ordering constraints. And I guess we assume only 1 thread ever dispatches the event queue - we're synchronising between multiple queuers and 1 dispatcher, not multiple dispatchers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did originally move it to be within the mutex block in 413-419, but of course that led to failing tests because that mutex is only entered if there's an update object attached.
How about if I expand the mutex lock to outside the update conditional? It hurts the normal case, but doesn't impact the worst case.
As in...
// check if we should stop dispatching soon
if (ms >= 0) {
deadline = equeue_tickdiff(timeout, tick);
if (deadline <= 0) {
equeue_mutex_lock(&q->queuelock);
// update background timer if necessary
if (q->background.update) {
if (q->background.update && q->queue) {
q->background.update(q->background.timer,
equeue_clampdiff(q->queue->target, tick));
}
q->background.active = true;
}
q->break_requested = false;
equeue_mutex_unlock(&q->queuelock);
return;
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original implementation seemed fine to me. The case we're concerned about is a break from inside the event queue, which by definition is synchronous.
With multiple dispatchers all we're gauranteed is that at least one thread will break, which is already protected by a mutex. This race condition can only happen after a thread has already decided to exit, but even with a critical section it's still a race since you don't know if the break will be consumed before or after the timeout.
I'd be fine with extending the mutex to be safe, except it does look like it has a negative performance impact (using make prof
):
beginning profiling...
baseline_prof: 23 cycles (+0%)
equeue_tick_prof: 43 cycles (+0%)
equeue_alloc_prof: 61 cycles (+0%)
equeue_post_prof: 227 cycles (+0%)
equeue_post_future_prof: 227 cycles (+0%)
equeue_dispatch_prof: 312 cycles (-34%) <-- ouch, was 232 cycles
equeue_cancel_prof: 121 cycles (+0%)
equeue_alloc_many_prof: 64 cycles (+0%)
equeue_post_many_prof: 221 cycles (+2%)
equeue_post_future_many_prof: 220 cycles (+3%)
equeue_dispatch_many_prof: 6952 cycles (+0%)
equeue_cancel_many_prof: 122 cycles (+0%)
equeue_alloc_size_prof: 56 bytes (+0%)
equeue_alloc_many_size_prof: 64000 bytes (+0%)
equeue_alloc_fragmented_size_prof: 64000 bytes (+0%)
done!
Although note this is on Linux, where mutex acquisition isn't cheap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me - leave it as is.
/morph build |
Build : SUCCESSBuild number : 1357 Triggering tests/morph test |
Exporter Build : SUCCESSBuild number : 1010 |
Test : SUCCESSBuild number : 1138 |
/morph mbed2-build |
@pauluap Please provide a proper description of this fix and the impact. Also only one checkbox should be ticked. Thanks |
I updated the description, but I wasn't the one who added the flags, dunno which one to pick. Behavior did change, but it's arguable that the previous behavior wasn't the intended behavior in the first place. I'm okay either way, if it's designated as a breaking change, then clearly the fix flag has to go. |
I edited it , it's now fixed. Please use PR type for future.
Changes like this should be fine. |
Change break_dispatch from a counting semaphore which needs the same number of calls to dispatch_forever() as number of calls to break_dispatch() to a squashable flag which will only break once for any number of calls to break_dispatch() between calls to dispatch_forever()
Resolves #6204
Pull request type