mbed_events.h: Add ability to request a shared event queue #4406

kjbracey · 2017-05-30T15:25:24Z

First pass at a couple of shared event queues.

To allow components with a simple need to schedule a few events to not
have to create their own threads, with all the associated memory
overhead, add 2 central calls to get shared normal and an
interrupt-deferral event queues, each dispatched on their own shared
threads.

One initial in-tree user in this PR - the Nanomesh/mbed-client timer.

First out-of-tree users would be 6LoWPAN radio drivers like atmel-rf-driver.

kjbracey · 2017-05-30T15:29:45Z

First pass for comment.

I'm not particularly attached to any of the fine details here, but this is the basic functionality I'm looking for.

The "run normal event queue from main" option is one of those "why not" things - we may have a main() thread with nothing else to do after boot, so it's an obvious memory saving option.

@geky, @sg-, @MarceloSalazar, @SeppoTakalo

kjbracey · 2017-05-30T15:31:20Z

FYI @karsev

adbridge · 2017-05-30T16:32:19Z

@kjbracey-arm Could you please update the PR title so that it is suitably informed for inclusion in a release note. Thanks

0xc0170 · 2017-06-05T10:29:03Z

Please rebase

tommikas · 2017-06-14T08:42:25Z

jenkins/pr-head is failing in all Nanostack builds due to:

06:21:38.620 | D1 <-- DutThread: mbed assertation failed: status == osOK, file: ./mbed-os/events/mbed_events.cpp, line 42

With 6lowpan and Thread it happens immediately after reset and with nanostack ethernet on network connect.

SeppoTakalo · 2017-06-14T09:17:06Z

@tommikas Please ignore the build failures for now, this PR yes now open for a discussion and the API might change so it cannot be considered as a ready-to-be-merged.

kjbracey · 2017-06-14T09:42:52Z

Actually, I am interested in investigating the build failures - I would expect this to work.

But I confess I have not run it myself - just checked compilation.

tommikas · 2017-06-14T09:53:43Z

@SeppoTakalo Sure, the comment was just meant as FYI. Happened to notice that as I was looking through some failed builds.

sg-

I really like where this is going. Few questions about the possibilities:

Can the normal queue be bound to the main thread and on task change trampoline into queue.dispatch before returning execution into main? This would save a thread and do what I think the most common use case would be, signal the super loop
Naming for the getters might need work.
shared-normal config, not sure what this does but IMO simple use cases first and then we can expand functionality / options

kjbracey · 2017-06-15T09:07:20Z

Main context hook sounds a bit clever. Not sure the OS offers such a thing, but I can check. The "dispatch from main" option was intended to be the dumb equivalent of that, for the not-uncommon case where the application isn't doing anything else from main(), so can promise to call dispatch (either forever at the end, or periodically in some way).
I wasn't happy with the getter naming either - suggestions welcome.
At a minimum we absolutely need the ability to set the basic memory sizes from the application so they can tune to squeeze into a small platform. (We still lack this for the main stack, I believe?) Ideally, libraries would be able to specify minimum requirements via JSON, to make this more automated, but this will do for now. Does mean that if, say, a library was using mbed TLS from the normal event loop, the application would have to know to set the normal event loop stack size to 6-7K so it fitted.

kjbracey · 2017-06-15T09:17:25Z

One background note - I currently have no users waiting for the normal event loop, but do have a few lined up for the high-priority one.

Thread deferral from interrupt is a very common driver requirement, and the high-priority one can also be a replacement for the deprecated RtosTimer.

theotherjimmy · 2017-06-19T16:50:36Z

@kjbracey-arm What is that temporary debugging commit all about?

adbridge · 2017-06-23T09:48:57Z

@kjbracey-arm bump

kjbracey · 2017-06-26T08:24:06Z

I was just adding trace to get more explanation of the CI failure that @tommikas highlighted.

tommikas · 2017-06-27T09:55:03Z

@kjbracey-arm It's still failing the same way. The error print isn't visible for some reason.

c1728p9

I like it!

c1728p9 · 2017-06-30T20:19:21Z

events/mbed_events.cpp

+#ifdef MBED_CONF_RTOS_PRESENT
+EventQueue *get_shared_interrupt_deferral_event_queue()
+{
+    return do_shared_event_queue_with_thread<osPriorityRealtime, MBED_CONF_EVENTS_SHARED_IRQDEFERRAL_EVENTSIZE, MBED_CONF_EVENTS_SHARED_IRQDEFERRAL_STACKSIZE>();


For the case where the RTOS is not present, can this just call into get_shared_normal_event_queue()? You wouldn't get the latency benefits, but code for 5 which uses interrupt deferral but not the rtos would be able to run on 2.

Interesting thought. I think that could make some 5 code run on 2, but code that actually needs the low latency (eg the 802.15.4 radio drivers) would find it a non-starter and still have to use raw interrupt.

So I'm currently leaning to no, for this reason: if any code using this would still work on the normal event queue, it should be always be using the normal event queue. The "Realtime" RTOS priority should be reserved for people who actually need it.

There's a difference between "wanting" a latency benefit and needing one. If people who don't need the Realtime priority use it, then the people who do need it are in danger of failing.

My last comment actually suggests maybe this should be renamed to emphasise the realtime nature.

There are cases where deferring an interrupt to a normal priority thread+event would be fine, and I wouldn't want users to automatically assume they should use this "interrupt deferral" event queue.

After responding below, I started thinking, rather than re-naming it (or in addition to renaming it), why not assert the latency instead? That would enforce the realtime nature of this queue. In addition, for mbed 2 builds you could share the event queue, but with caveat that adding long running events to the normal event queue while using a driver which requires the interrupt queue will trigger timing asserts. That way you could use components like nanostack on 2 as long as the application and any libraries it pulls in don't need to post any long-running tasks to the queue.

That's then effectively saying that it would have no normal event queue though, isn't it?

And it would further mean that there could be no real "normal" processing or even slightly time-taking blocking calls anywhere in the application, as the processor would be required to be giving it's full attention to achieving realtime-class latency to its event loop in the main thread. Feels a bit unrealistic.

Not necessarily. Assuming you have 3 types of drivers:

Drivers which require real-time latency and have short running events

Drivers which don't require real-time and have short running events

Drivers which don't require real-time and have long running events

The current implementation allows an mbed 2 application to use type 2 and 3 drivers. Trying to use a type 1 driver will cause a compiler error.

If you have both normal and interrupt queue present then applications will be able to use type 2 and 3 drivers or type 1 and 2 drivers, but not type 1 and 3 in the same application. The biggest downside to this approach is you get a runtime assert error for mixing type 1 and 3 drivers, rather than getting an upfront compiler error.

I'm happy with keeping the interrupt queue out of mbed 2, as this is more conservative and doesn't risk setting any false expectations. I just wanted to make sure all the options were explored.

But it's not just about "long-running" events on the event queue and drivers - it's the entire system, including any application. For the event loop to have that 100us latency, nothing in the entire system could take more than 100us. You're requiring everything the system ever does to be written as if it was in an interrupt handler.

Trying to use the normal event loop on mbed OS 2 is already a stretch - it's effectively turning it into back into mbed OS 3, and forcing the complete system to be event-driven with "reasonable" event run times. (10s of ms, basically what Nanostack/mbed-client copes with). So no blocking network code, no excessive block crypto operations, etc.

But trying to get your type-1 driver to run in mbed OS 2 via event loops totally rules out almost all application code. You couldn't run Nanostack, you couldn't run mbed-Client, you couldn't use the SD card. All of them occasionally do things that process for milliseconds. My basic argument is that the "type 1 + type 2" system does not currently exist, and I don't see how it ever could, unless it was something as simple as "blinky".

I'll happily believe we can create useful non-RTOS systems using a normal-class 10-100ms event loop, mbed OS 3/Nanomesh style (with realtime stuff running under interrupt directly). But I don't believe we can realistically break the entire work of the system into 100us chunks to allow a non-RTOS device to defer its interrupt work into the foreground.

On the other hand, with some more serious work, we could create the equivalent of the Linux "softirq" or RISC OS "transient callback" - queue up things to be run on exit from the supervisor or interrupt context before returning to the application. That would be a very powerful tool - it did all the heavy background lifting in RISC OS, a non-threaded system, giving you this extra third "context" between interrupts and the application.

If we were serious about supporting non-RTOS devices, that would be the most useful thing we could do - it's been on my mind for ages, but it needs integration into the OS, so is decidedly non-trivial.

On the other hand is there some syntactical sugar we could use here? Some way to make it easy to write code that defers to the realtime queue if available, else puts it directly in the interrupt handler? Actually, maybe it's not that hard - my PR over in atmel-rf-driver is quite "ifdeff"y on the RTOS flag, but if I used bound callback parameters I could cut down a bit and get rid of the little thunk functions?

https://github.com/ARMmbed/atmel-rf-driver/pull/55/files

In mbed 2, could we just run the realtime queue in interrupt context? Event queue could be dispatched from a ticker using the background function.

ARM supports interrupts with different priorities, so there is the possibility of even having this queue in low-priority interrupt context, but that might require more work.

Maybe, but you want the event to be run immediately after queuing, so don't see how a regular ticker provides the desired latency.

Could be done using a low priority software/generated interrupt (or 0-delay timer). Do we have such a thing on all platforms? Guess there's something built in to the Cortex-M, but might not be there on Cortex-A?

Either of those might be sufficient in this case, and I think give you the Linux softirq equivalent. The remaining catch is that you are still in interrupt context, so can't call any libraries or facilities that aren't interrupt re-entrant.

Portable protection against this "background" context, whether it be thread or interrupt, is one of the things in the Nanostack HAL - platform_enter_critical(). That is a "disable IRQs" if the background context is interrupts, or a mutex claim if the background context is a realtime thread. Maybe that could be exposed via this API - giving application code in the normal event queue the ability to be safe against changes running in the realtime event queue.

Most tickers have handling for already expired events (example). Although we would have to check that this corner case is handled correctly across targets.

I'd be fine adopting something similar to platform_enter_critical in mbed OS, although it may need a name change since we already have the bombastic name core_util_critical_section_enter. Maybe something core_util_realtime_section_enter...

Although it does sound like this work should be grouped into a second pr. Simply ifdefing away the realtime queue seems fine to me for now. We can always expose it to the mbed 2 world later.

c1728p9 · 2017-06-30T20:33:38Z

events/mbed_events.h

+ *
+ * @return pointer to normal event queue
+ */
+events::EventQueue *get_shared_normal_event_queue();


What are your thoughts on having get_shared_normal_event_queue and get_shared_interrupt_deferral_event_queue take event size and stack size as optional parameters to provide hints/warnings? Drivers could pass in the size they need and there could be a warning or error if the defines are not set correctly.

Another more radical idea is to have this grow the sizes at runtime, so config isn't needed. Maybe (probably) a crazy idea, but I figured I would mention it, as this API would allow it to be lazily added in the future.

Yes, makes sense, particularly for stack size. In the absence of any other current negotiation mechanism such as JSON, it at least avoids hard-to-debug overflows due to misconfiguration, and potentially opens up the door to some sort of compile-time (or even run-time) magic.

What form - MBED_ASSERT? mbed_error?

For event size, it's a bit less useful than stack size, or at least trickier, as the requirement is (sort of) a sum of the users, rather than the maximum. Maybe it could add up the total from calls, if individual components didn't repeatedly call it with event size? On the other hand, event overflow is easier to debug than stack - you can have an immediate nice assert with backtrace.

MBED_ASSERT might be a good way to do it. That way in release builds the assert can be compiled out.

I'm not sure how best the event size option should be handled. To safely size the queue you need to know the size of each event, the rate of posting events, and the latency of the queue. Maybe this optional parameter could be specified later.

Another optional parameter that might be good to have, at least for the interrupt event queue is required latency. That would allow timing violations to be caught deterministically, rather than allowing them to manifest as driver errors that are hard to reproduce.

The ability to diagnose this sort of problem is certainly good, but an ordinary assert that would trigger in any debug build is a bit strong - we're not generally a hard real-time system, so occasional spikes aren't the end of the world. I'd be inclined to have an option you can turn on when required to hunt for culprits if you see problems.

If you are using a debugger you would definitely want the asserts off. On the flip side, it would be great if CI could fail changes cause timing violations. Since this could vary from target to target, it might not be feasible.

Yes, CI alerts are useful, so a couple of tests with the alert flagging on would be good.

What could be useful if we could get some sort of reliable raw latency number out would be having the CI flag if it ever sees a significant shift since the last test on any given platform. That's something that's proven invaluable for raw speed benchmarks (where either a speed increase or decrease could indicate a bug), but it might be harder to get a reliable number on latency.

To not slow down this pr, could we move the latency asserts (and stack asserts unless someone has a patch ready) to another pr? We can always add additional parameters to this function in a backwards compatible manner.

This changeset is already very valuable as is.

theotherjimmy · 2017-07-10T15:33:08Z

@kjbracey-arm Have you been able to track down what's going wrong here?

geky · 2017-07-14T15:53:52Z

events/mbed_events.cpp

+ * limitations under the License.
+ */
+
+#include "events/mbed_events.h"


Could this feature be moved into its own file? I realize it's small, but mbed_events.h was just the entry point to including the other mbed-events headers (EventQueue.h and Event.h).

Perhaps mbed_shared_queues.h and mbed_shared_queues.cpp?

They should still be included from mbed_events.h, so this wouldn't change the user experience, but it would help keep the logical chunks of this module organized.

Yes, no problem with that. I was aware it was somewhat over-shadowing the file's original core contents.

geky

Sorry about being late,

This looks great! Using static function variables is a solid approach that removes a lot of concerns with the gc, and this looks like a very clean implementation.

geky · 2017-07-14T18:35:55Z

events/mbed_events.h

+ * @return pointer to interrupt deferral event queue
+ */
+
+events::EventQueue *get_shared_interrupt_deferral_event_queue();


Since the question of names came up earlier

Thoughts on get_shared_queue and get_shared_irq_queue?

Quite happy to drop "normal", but isn't it specifically an "event queue"?

IRQ queue doesn't feel quite right, but not sure I've got anything better.

This is very much an analogue to Nanostack's way of working, and it has no formal name for this context, but I've tended to call it the "background" context (as opposed to the main foreground application running in Nanostack event loop). Don't really like that either.

Maybe go for "soft_irq_event_queue", akin to Linux?

BTW, did you have a view on my previous thought that it could be named from the latency point of view - "high-priority / low-latency / realtime" or something?

The name "realtime" seems to capture what we're going for the best to me (although it just seems like there's not a perfect solution 😆).

How about get_shared_event_queue and get_shared_realtime_event_queue?

Realtime has meaning which we should stay away from. mbed_highprio_event_queue and mbed_lowprio_event_queue?

I would not like to specify the normal priority queue to be a lowprio_event_queue. It would steer some users towards highprio queues. How about just event_queue and highprio_event_queue instead because the "other" queue is not in lower priority than rest of the threads.

Agree about not calling it "low" - it's "normal", and I'm fine with that being implied without being in the name.

Out of curiosity why not realtime? It sounds to me like it captures what we're going for with this queue (latency guarantees). It's even running on a osPriorityRealtime thread. If it's a scary name that's a good thing.

Actually if we go with mbed_highprio_event_queue, we should go with osPriorityHigh to match. That that would still accomplish the same requirement right?

Sounds like the current lead are mbed_highprio_event_queue and mbed_event_queue.

Out of curiosity why not realtime?

From wikipedia, Real-time programs must guarantee response within specified time constraints, often referred to as "deadlines" and I dont think we can or want to commit to this. There is a significant amount to make this commitment something we're not ready for.

All embedded threading kernels are referred as a RTOS (Realtime Operating System) even when they don't make any guarantees about delays.

I don't think using word "realtime" here to refer the thread priority osPriorityRealtime would be any more misleading that industry adopted way of calling all small kernels as a realtime kernels.
As long as we don't refer our selfs as a hard realtime system, this sound OK to me.

Also, deadlines can be added to this queue later.

theotherjimmy · 2017-08-07T15:07:27Z

@kjbracey-arm Any news?

theotherjimmy · 2017-08-21T15:09:40Z

@kjbracey-arm Any news?

kjbracey · 2017-08-22T09:36:40Z

Hi, just back from 4 weeks' holiday - will start looking at updating this shortly.

kjbracey · 2017-08-31T08:25:26Z

Following comments, a couple more commits:

Implementation moved to separate files.
Rename to mbed_event_queue and mbed_highprio_event_queue, including changing priority to high rather than realtime.

Will squash together before finalising.

tommikas · 2017-08-31T11:40:59Z

6lowpan and Thread seem to work now. Nanostack ethernet driver still fails.

geky

This looks great 👍

Lets see if we can't get this in for 5.6

I believe the CI is down this week, so we'll run it when we can @studavekar

kjbracey · 2017-09-01T08:30:45Z

It wasn't working reliably because I wasn't aligning the stack block.

Corrected, and took the opportunity to adjust Thread and EventQueue prototypes so I didn't have to put in pointless casts to unsigned char * from my uint64_t arrays.

If you don't fancy that change, I can drop them and put in the casts.

To allow components with a simple need to schedule a few events to not have to create their own threads, with all the associated memory overhead, add 2 central calls to get shared normal and an interrupt-deferral event queues, each dispatched on their own shared threads. For non-RTOS systems, just the normal event queue is provided, and the application would have to dispatch this itself. This application-dispatch is also available via a config option, to potentially save memory by reusing the main thread. Possible future improvement: the ability for separate components to request a minimum stack size, and have the JSON combine these requests. (Analogous tooling has already been mooted for mbed TLS config options like key size).

Don't create our own timer thread - use the shared event queue.

kjbracey · 2017-09-01T10:28:07Z

Okay, binary compatibility grief made me change my mind on those void * changes to EventQueue and Thread. Taken out, and put in the casts.

studavekar · 2017-09-02T07:01:19Z

/morph test-nightly

mbed-bot · 2017-09-03T07:23:52Z

Result: SUCCESS

Your command has finished executing! Here's what you wrote!

/morph test-nightly

Output

mbed Build Number: 1182

All builds and test passed!

adbridge added the needs: work label May 30, 2017

kjbracey force-pushed the shared_equeues branch from 3c694ea to 47247a7 Compare May 31, 2017 08:34

kjbracey mentioned this pull request May 31, 2017

Use shared event queue PelionIoT/atmel-rf-driver#55

Open

kjbracey changed the title ~~Shared event queues~~ mbed_events.h: Add ability to request a shared event queue May 31, 2017

kjbracey force-pushed the shared_equeues branch 2 times, most recently from 4ba6a53 to 961af4b Compare June 2, 2017 08:29

kjbracey force-pushed the shared_equeues branch from 961af4b to e5fde1c Compare June 13, 2017 08:13

SeppoTakalo requested review from sg-, geky, c1728p9 and 0xc0170 June 14, 2017 09:15

sg- reviewed Jun 14, 2017

View reviewed changes

c1728p9 approved these changes Jun 30, 2017

View reviewed changes

geky reviewed Jul 14, 2017

View reviewed changes

geky approved these changes Jul 14, 2017

View reviewed changes

SeppoTakalo mentioned this pull request Jul 17, 2017

Network problems with 7 or more clients connected to the mbed Device Connector ARMmbed/mbed-os-example-client#266

Closed

kjbracey force-pushed the shared_equeues branch from b04bfac to 45f3e4a Compare August 31, 2017 12:42

geky approved these changes Aug 31, 2017

View reviewed changes

geky added needs: CI and removed needs: work labels Aug 31, 2017

kjbracey force-pushed the shared_equeues branch 2 times, most recently from dfd691c to d490162 Compare September 1, 2017 08:30

kjbracey force-pushed the shared_equeues branch from d490162 to 8279ed4 Compare September 1, 2017 08:37

kjbracey added 2 commits September 1, 2017 13:26

Use shared event queue for timer

b55af5c

Don't create our own timer thread - use the shared event queue.

kjbracey force-pushed the shared_equeues branch from 8279ed4 to b55af5c Compare September 1, 2017 10:26

0xc0170 approved these changes Sep 4, 2017

View reviewed changes

0xc0170 added ready for merge release-version: 5.6.0 and removed needs: CI labels Sep 4, 2017

0xc0170 merged commit 5bddd88 into ARMmbed:master Sep 4, 2017

0xc0170 removed the ready for merge label Sep 4, 2017

anttiylitokola mentioned this pull request Sep 7, 2017

Runtime error with WiFi & mbed-os-5.5 ARMmbed/mbed-os-example-client#303

Closed

MarceloSalazar mentioned this pull request Sep 7, 2017

Review of wifi driver for mbed OS 5 support ARMmbed/wifi-x-nucleo-idw01m1#1

Closed

mbed_events.h: Add ability to request a shared event queue #4406

mbed_events.h: Add ability to request a shared event queue #4406

Conversation

kjbracey commented May 30, 2017

kjbracey commented May 30, 2017

kjbracey commented May 30, 2017

adbridge commented May 30, 2017

0xc0170 commented Jun 5, 2017

tommikas commented Jun 14, 2017

SeppoTakalo commented Jun 14, 2017

kjbracey commented Jun 14, 2017

tommikas commented Jun 14, 2017

sg- left a comment

Choose a reason for hiding this comment

kjbracey commented Jun 15, 2017

kjbracey commented Jun 15, 2017

theotherjimmy commented Jun 19, 2017

adbridge commented Jun 23, 2017

kjbracey commented Jun 26, 2017

tommikas commented Jun 27, 2017

c1728p9 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kjbracey Jul 3, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

c1728p9 Jul 13, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kjbracey Jul 17, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

theotherjimmy commented Jul 10, 2017

geky Jul 14, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

geky left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

geky Jul 20, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

theotherjimmy commented Aug 7, 2017

theotherjimmy commented Aug 21, 2017

kjbracey commented Aug 22, 2017

kjbracey commented Aug 31, 2017

tommikas commented Aug 31, 2017

geky left a comment • edited Loading

Choose a reason for hiding this comment

kjbracey commented Sep 1, 2017

kjbracey commented Sep 1, 2017

studavekar commented Sep 2, 2017

mbed-bot commented Sep 3, 2017

Output

kjbracey Jul 3, 2017 •

edited

Loading

c1728p9 Jul 13, 2017 •

edited

Loading

kjbracey Jul 17, 2017 •

edited

Loading

geky Jul 14, 2017 •

edited

Loading

geky left a comment •

edited

Loading

geky Jul 20, 2017 •

edited

Loading

geky left a comment •

edited

Loading