Remove windup behavior from break_dispatch #6238

pauluap · 2018-02-28T18:03:36Z

Change break_dispatch from a counting semaphore which needs the same number of calls to dispatch_forever() as number of calls to break_dispatch() to a squashable flag which will only break once for any number of calls to break_dispatch() between calls to dispatch_forever()

Resolves #6204

Pull request type

kjbracey · 2018-03-01T11:32:16Z

Looks fine to me - outstanding thought from #6204 - should the "break" flag also be cleared if it exits due to timeout, which is tested before the break flag?

kjbracey · 2018-03-01T11:33:27Z

events/equeue/equeue.c

It's now "break", rather than "breaks", but that's a keyword. How about break_requested? Always good to name booleans as something that makes sense to be true or false.

Ah! Good catch, ~~that would explain why the CI was failing~~

kjbracey · 2018-03-01T11:34:51Z

events/equeue/equeue.c

Could avoid semaphore signalling if break already set. (Avoiding pointless ISR queue use, as per my suggestion ARM-software/CMSIS_5#283 )

geky · 2018-03-01T14:40:29Z

I think this looks great!

pauluap · 2018-03-01T17:32:12Z

Ha! Of course, all the other issues now emerge from the woodwork.

So, so far we have:

variable rename - break_requested sounds good to me
squash with timeout
avoid semaphore signaling

Would it make sense for me to do those as well?

I did take a look at the failing CI test, but it's not too clear to me why the failure occurred, is there anything else I need to do to make the tests happy?

kjbracey · 2018-03-02T09:20:22Z

Didn't think it would be that easy, did you?

I'd like @geky's thought on the timeout clear. It seems correct to me, but my brain sometimes malfunctions. If we're going to change break behaviour, probably best to make 1 combined change.

Variable should be renamed because breaks no longer fits.

I think avoiding the semaphore signal is worthwhile, just because we have recurring bug reports of that ISR overflow problem. Although the real place that occurs here would be in equeue_post (assuming your equeue is bigger than RTX's ISR queue, so you hit RTX's limit first). So maybe leave it until a future combined patch that does it for both post and break. (Do we have any open issues for this?)

geky · 2018-03-02T18:24:35Z

If you have a linux machine handy, you can run the tests locally with make test inside the equeue directory. In this case the break test (here) is segfaulting, so unfortunately it needs to be ran inside a debugger to see what's going wrong. make DEBUG=1 test ; gdb tests/tests will get you to where the failure is happening.

I think I found the issue, will comment on the line.

I'd like @geky's thought on the timeout clear.

I think you're right we need timeout to also clear. That's the only way to guarantee that a break from inside the queue is cleared before dispatch exits.

Here's the other return statement, should just need a breaks = false before it:
https://github.com/pauluap/mbed-os/blob/c00061a2f8f5a2e5d0f7ec452c43ceafe3de761e/events/equeue/equeue.c#L421

Sidenote: There is no similar guarantee for a break from a different thread, since it's a tossup if the break could land after dispatch exits.

I think avoiding the semaphore signal is worthwhile [..] Although the real place that occurs here would be in equeue_post

IMO this sounds like an RTOS issue if binary semaphores have overflow problems. If we need a workaround I'd prefer that goes in the porting layer (with a flag?) so this issue is fixed for all calls to equeue_sema_signal:
https://github.com/ARMmbed/mbed-os/blob/master/events/equeue/equeue_mbed.cpp#L111

Also break_requested sounds good to me too.

geky · 2018-03-02T18:36:01Z

events/equeue/equeue.c

So, unfortunately this logic is subtlely complex...

if (q->breaks) { // <- This condition is outside of mutex so it can't be // trusted. Even if this if is true, we're only _probably_ // sure that a break was signalled (concurrent dispatch calls). // The reason for the extra if statement is so that we can avoid // getting a mutex in the common case (no break events). equeue_mutex_lock(&q->queuelock); if (q->breaks > 0) { // <- Once we're in the mutex, we confirm that break // was signalled q->breaks--; equeue_mutex_unlock(&q->queuelock); return; // <- This return actually exits equeue_dispatch. Otherwise // the queue happily continues to run. } equeue_mutex_unlock(&q->queuelock); }

So two problems:

You still need the second if statement inside the mutex

You still need the return statement to actually break out of the dispatch loop

…ue to a timeout condition

pauluap · 2018-03-02T20:46:38Z

Allright, I chose to amend my commit because I didn't want any false noise around the return cleanup.

I also added the clearing on timeout, but I also chose to put in within the mutex lock.

I went ahead and added two tests - one for the break_dispatch windup behavior and the other for clearing break_request on timeout. Good thing that I did because I originally put the timeout clearing code within the background update mutex which didn't fire, of course.

geky

It looks great 👍 Thanks for adding tests. I noticed some style difference if you are able to fix those, but then it should be good to go in.

geky · 2018-03-02T21:00:12Z

events/equeue/tests/tests.c

+void simple_breaker(void *p) {
+	CountAndQueue* caq = (CountAndQueue*)p;
+	equeue_break(caq->q);
+    usleep(10000);


nit: It looks like there's a mixture of tabs + spaces here. Can you change this to just 4 spaces for indention.

geky · 2018-03-02T21:01:20Z

events/equeue/tests/tests.c

+    equeue_t* q;
+};
+
+typedef struct sCaQ CountAndQueue;


nit: This is inconsistent with other structs in this file. Could you change this to struct count_and_queue without a typedef.

pauluap · 2018-03-02T21:12:39Z

grr, that's what I get for editor-hopping, I fall in the space-over-tabs camp, so thanks for letting me know :)

geky

No problem, looks good to me 👍

Next step is @kjbracey-arm's review, CI, and then it should be merged shortly after.

kjbracey · 2018-03-05T08:22:40Z

events/equeue/equeue.c

                    q->background.active = true;
                    equeue_mutex_unlock(&q->queuelock);
                }
+                q->break_requested = false;


I feel a bit nervous about actually writing to this variable outside the mutex - one step worse than reading it. In both cases it has no synchronisation protection, which could in principle cause problems with multiple CPUs. Some C11/C++11 atomic would sort it out, but we don't have that available.

However, I can't actually construct a plausible failure mode looking at it - all the mutexes and semaphores that do exist around it add quite a lot of ordering constraints. And I guess we assume only 1 thread ever dispatches the event queue - we're synchronising between multiple queuers and 1 dispatcher, not multiple dispatchers?

I did originally move it to be within the mutex block in 413-419, but of course that led to failing tests because that mutex is only entered if there's an update object attached.

How about if I expand the mutex lock to outside the update conditional? It hurts the normal case, but doesn't impact the worst case.

As in...

// check if we should stop dispatching soon if (ms >= 0) { deadline = equeue_tickdiff(timeout, tick); if (deadline <= 0) { equeue_mutex_lock(&q->queuelock); // update background timer if necessary if (q->background.update) { if (q->background.update && q->queue) { q->background.update(q->background.timer, equeue_clampdiff(q->queue->target, tick)); } q->background.active = true; } q->break_requested = false; equeue_mutex_unlock(&q->queuelock); return; } }

The original implementation seemed fine to me. The case we're concerned about is a break from inside the event queue, which by definition is synchronous.

With multiple dispatchers all we're gauranteed is that at least one thread will break, which is already protected by a mutex. This race condition can only happen after a thread has already decided to exit, but even with a critical section it's still a race since you don't know if the break will be consumed before or after the timeout.

I'd be fine with extending the mutex to be safe, except it does look like it has a negative performance impact (using make prof):

beginning profiling... baseline_prof: 23 cycles (+0%) equeue_tick_prof: 43 cycles (+0%) equeue_alloc_prof: 61 cycles (+0%) equeue_post_prof: 227 cycles (+0%) equeue_post_future_prof: 227 cycles (+0%) equeue_dispatch_prof: 312 cycles (-34%) <-- ouch, was 232 cycles equeue_cancel_prof: 121 cycles (+0%) equeue_alloc_many_prof: 64 cycles (+0%) equeue_post_many_prof: 221 cycles (+2%) equeue_post_future_many_prof: 220 cycles (+3%) equeue_dispatch_many_prof: 6952 cycles (+0%) equeue_cancel_many_prof: 122 cycles (+0%) equeue_alloc_size_prof: 56 bytes (+0%) equeue_alloc_many_size_prof: 64000 bytes (+0%) equeue_alloc_fragmented_size_prof: 64000 bytes (+0%) done!

Although note this is on Linux, where mutex acquisition isn't cheap.

Makes sense to me - leave it as is.

cmonr · 2018-03-06T04:42:47Z

/morph build

mbed-ci · 2018-03-06T05:41:46Z

Build : SUCCESS

Build number : 1357
Build artifacts/logs : http://mbed-os.s3-website-eu-west-1.amazonaws.com/?prefix=builds/6238/

Triggering tests

/morph test
/morph uvisor-test
/morph export-build
/morph mbed2-build

mbed-ci · 2018-03-06T10:07:37Z

Exporter Build : SUCCESS

Build number : 1010
Build artifacts/logs : http://mbed-os.s3-website-eu-west-1.amazonaws.com/?prefix=builds/exporter/6238/

mbed-ci · 2018-03-06T15:42:38Z

Test : SUCCESS

Build number : 1138
Test logs :http://mbed-os-logs.s3-website-us-west-1.amazonaws.com/?prefix=logs/6238/1138

studavekar · 2018-03-06T23:28:04Z

/morph mbed2-build

cmonr · 2018-03-07T02:38:07Z

@adbridge @0xc0170 Marking this as targeting 5.9.0-rc1 since according to the issue this will fix, behavior will change, but I'm not sure about merging a PR that won't be put in use for a long time.

adbridge · 2018-03-07T12:18:51Z

@pauluap Please provide a proper description of this fix and the impact. Also only one checkbox should be ticked. Thanks

adbridge · 2018-03-07T12:23:54Z

@cmonr Whether we should merge for 5.9 so far in advance really depends on the likelihood of the files in this PR being touched again for other bug fixes... @0xc0170 what is your opinion?

pauluap · 2018-03-07T12:55:43Z

I updated the description, but I wasn't the one who added the flags, dunno which one to pick.

Behavior did change, but it's arguable that the previous behavior wasn't the intended behavior in the first place. I'm okay either way, if it's designated as a breaking change, then clearly the fix flag has to go.

0xc0170 · 2018-03-07T13:09:40Z

I updated the description, but I wasn't the one who added the flags, dunno which one to pick.

I edited it , it's now fixed. Please use PR type for future.

@cmonr Whether we should merge for 5.9 so far in advance really depends on the likelihood of the files in this PR being touched again for other bug fixes... @0xc0170 what is your opinion?

Changes like this should be fine.

geky added needs: review BREAKING-CHANGE labels Feb 28, 2018

geky requested review from geky, kjbracey and pan- February 28, 2018 19:52

kjbracey reviewed Mar 1, 2018

View reviewed changes

geky reviewed Mar 2, 2018

View reviewed changes

Remove windup behavior from break_dispatch

5d98d22

pauluap force-pushed the break_dispatch_flag branch from c00061a to 5d98d22 Compare March 2, 2018 19:01

Paul Thompson added 2 commits March 2, 2018 12:37

Clear the break requested flag if the dispatch loop is being broken d…

31f581c

…ue to a timeout condition

Add test to cover break_dispatch windup

bb0d540

pauluap force-pushed the break_dispatch_flag branch from ffbf969 to bb0d540 Compare March 2, 2018 20:44

geky reviewed Mar 2, 2018

View reviewed changes

style fixups

dc430ef

geky approved these changes Mar 2, 2018

View reviewed changes

kjbracey approved these changes Mar 5, 2018

View reviewed changes

cmonr added needs: CI and removed needs: review labels Mar 6, 2018

cmonr added ready for merge and removed needs: CI labels Mar 7, 2018

cmonr added the release-version: 5.9.0-rc1 label Mar 7, 2018

cmonr merged commit 1580774 into ARMmbed:master Mar 15, 2018

cmonr removed the ready for merge label Mar 15, 2018

pauluap deleted the break_dispatch_flag branch April 16, 2018 17:45

Remove windup behavior from break_dispatch #6238

Remove windup behavior from break_dispatch #6238

Uh oh!

Conversation

pauluap commented Feb 28, 2018 • edited by 0xc0170 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull request type

Uh oh!

kjbracey commented Mar 1, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

geky Mar 1, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

geky commented Mar 1, 2018

Uh oh!

pauluap commented Mar 1, 2018

Uh oh!

kjbracey commented Mar 2, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

geky commented Mar 2, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

geky Mar 2, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pauluap commented Mar 2, 2018

Uh oh!

geky left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pauluap commented Mar 2, 2018

Uh oh!

geky left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

geky Mar 5, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cmonr commented Mar 6, 2018

Uh oh!

mbed-ci commented Mar 6, 2018

Triggering tests

Uh oh!

mbed-ci commented Mar 6, 2018

Uh oh!

mbed-ci commented Mar 6, 2018

Uh oh!

studavekar commented Mar 6, 2018

Uh oh!

cmonr commented Mar 7, 2018

Uh oh!

adbridge commented Mar 7, 2018

Uh oh!

adbridge commented Mar 7, 2018

Uh oh!

pauluap commented Mar 7, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

0xc0170 commented Mar 7, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pauluap commented Feb 28, 2018 •

edited by 0xc0170

Loading

geky Mar 1, 2018 •

edited

Loading

kjbracey commented Mar 2, 2018 •

edited

Loading

geky commented Mar 2, 2018 •

edited

Loading

geky Mar 2, 2018 •

edited

Loading

geky left a comment •

edited

Loading

geky Mar 5, 2018 •

edited

Loading

pauluap commented Mar 7, 2018 •

edited

Loading

0xc0170 commented Mar 7, 2018 •

edited

Loading