Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Breaking changes in mbed-rtos rev118+ #2631

Closed
neilt6 opened this issue Sep 6, 2016 · 15 comments
Closed

Breaking changes in mbed-rtos rev118+ #2631

neilt6 opened this issue Sep 6, 2016 · 15 comments

Comments

@neilt6
Copy link
Contributor

neilt6 commented Sep 6, 2016

I take issue with some of the breaking changes that have been made in mbed-rtos rev118 and newer, and I'm sure I'm not the only one.

For starters, any targets using the ARM micro toolchain by default (AKA most of them) are now forced into a "single-thread" mode, in which you can't create any additional threads besides the main thread or even RtosTimers. What sense does that make? Why would anyone include the RTOS library just to bloat their firmware with no additional functionality? And on that note, why should we be forced to switch to the more bloated ARM standard toolchain in order to use the RTOS, when the ARM micro toolchain has worked for years?

Secondly, rev120 changed the memory model so that the main thread is now given an explicit ~512B stack instead of a more traditional stack at the top of the RAM. This is going to break a lot of existing code that expects the main thread to have a larger stack than it now does, and since there's no way to customize the size of that stack (short of modifying the library), people are going to be forced to waste even more RAM creating worker threads to do tasks that the main thread would normally be doing. I don't know about everyone else, but in my own projects I run all of the "application" code on the main thread, and only use additional threads for concurrent/background stuff (e.g. USB, battery, etc). IMO that is a logical and efficient progression from a non-RTOS design, and it will no longer be possible because of this change.

Thirdly, combining "single-thread" mode with the changes to the memory model creates an asinine combination in which you only have ~512B of stack to work with period.

In conclusion, I believe a lot of the smaller targets are significantly less useful as a result of these changes, and I would urge the mbed developers to seriously reconsider them.

@c1728p9
Copy link
Contributor

c1728p9 commented Sep 6, 2016

Hi @neilt6, sorry for any problems this caused. I'll try to explain the reasoning behind some of these changes. Also, the main stack size is 4096, not 512B which should be enough for most applications. Are you running into stack overflows? A stack size of 4K should be enough for most applications unless large buffers are being declared on the stack. Responses to your points below:

  • Single thread on small builds - With the update to mbed-os 5, the RTOS is included always since some components rely on this to operate. The reason smaller builds such as uARM and newlib nano only allow a single thread to be used is because these libraries are not thread safe. Using multiple threads give the impression of working, could break at any time. For example malloc is not synchronized, so if two threads call new or malloc at the same time it will corrupt the heap and lead to a crash. Rather than giving the impression that multithreading is safe to use in the smaller configurations, the decision was to limit the code to 1 thread to prevent problems. If you don't want the added overhead of the RTOS you can still used the older mbed library which is still being maintained, but is no longer getting new features.
  • Explicit main thread stack size - When there is only a single thread it is fairly straight forward to have the heap and stack share memory. The heap grows until the next allocation address overlaps with the stack. On a multithread system this becomes much more complicated since there are many stacks, some of which can be inside the heap (allocated with dynamic memory). If the current task isn't the task sharing memory with the heap, then it's stack pointer can't be used. In this case the stack pointer of the main thread would have to be grabbed from the RTX suspension record and used at the limit. Since GCC is the only library with an overrideable stack limit checking function (_sbrk), heap limits could only be enforced for GCC. For ARM stack check against the current SP is turned off because of the multiple stacks here which means malloc could allocate memory that is currently in use by the main thread stack rather than returning null. IAR was already setup to have a fixed size heap and stack before 5.0 so not much changed there. To unify the memory model and to allow both stack overflow and heap limit checking the decision was made to have a fixed size for the main stack.
  • This is less than ideal, but the alternative is to not include these targets in 5.0 at all. These targets are still available as they were in the mbed library.

Most of these changes were made out of necessity, not as a matter of preference. If newlib-nano or uARM is made thread safe, then the single thread restriction can be removed (also if you want to live dangerously you can chop out this limitation locally). As for the main stack size, maybe a configuration option could be added. If you have any ideas on how to address these issues feel free to let me know, or even better put together a PR 😄. Thanks for the feedback.

@neilt6
Copy link
Contributor Author

neilt6 commented Sep 6, 2016

Hi @c1728p9, thanks for the reply.

I understand the reasoning for single-thread mode when using mbed-os (where the RTOS is always included), it just doesn't make sense for mbed classic (where the RTOS is explicitly included). The lack of intrinsic thread safety on microlib is well known, and myself and other developers have simply used a global stdlib mutex to solve this. In the interest of backwards-compatibility, I believe that single-thread mode should be disabled on mbed classic builds somehow. Ideally, microlib should also be updated for thread safety, since in my experience the increased memory consumption from using the other toolchains is a deal breaker on the smaller targets.

Thanks for the insight into the technical motivations behind the memory model change. Admittedly, I misunderstood the preprocessor macros around the main stack definition, and didn't realize that it was set to DEFAULT_STACK_SIZE * 2. However, that makes my main thread stack 1024B on the ARM micro toolchain not 4096B, and I am in fact experiencing stack overflows. Perhaps what we need is a new JSON configuration option that allows the main stack size to be overridden. I will look into this and work on a PR.

@neilt6
Copy link
Contributor Author

neilt6 commented Sep 6, 2016

@c1728p9, how about something like this:

mbed-rtos/mbed_lib.json

{
    "name": "rtos",
    "config": {
        "present": 1,
        "main_stack_size": "DEFAULT_STACK_SIZE * 2"
    }
}

...which can then be overridden by the developer using:

mbed_app.json

{
    "target_overrides": {
        "*": {
            "rtos.main_stack_size": 8192
        }
    }
}

@sg-
Copy link
Contributor

sg- commented Sep 6, 2016

Hi @neilt6 These are considered regressions for mbed 2 RTOS delivery and will be fixed to match the previous memory map and assumptions about stack locations and sizes. No need to make a PR for configurable stack sizes.

@ciarmcom
Copy link
Member

ciarmcom commented Sep 6, 2016

ARM Internal Ref: IOTMORF-451

@neilt6
Copy link
Contributor Author

neilt6 commented Sep 6, 2016

@sg- Thanks!

@c1728p9
Copy link
Contributor

c1728p9 commented Oct 10, 2016

Hi @neilt6, I created #2976 to address these problems. These changes will make their way into the next rtos release. Please give it a try and let me know if it fixes this issue.

@neilt6
Copy link
Contributor Author

neilt6 commented Oct 11, 2016

@c1728p9 Thanks! I'll let you know if I encounter any problems once the changes sync back to the online compiler.

@c1728p9
Copy link
Contributor

c1728p9 commented Nov 14, 2016

Hi @neilt6, I made changes to support the older 2.0 behavior of mbed-rtos in this fork: https://developer.mbed.org/users/c1728p9/code/mbed-rtos-legacy/

Please give this a try and let me know if addresses all your concerns. Also, let me know if you would like any additional modifications to be made. Once the changes have been tested and are ready I'll bring them main line into mbed-rtos.

@neilt6
Copy link
Contributor Author

neilt6 commented Nov 15, 2016

Thanks @c1728p9! I just finished testing your fork with several of my projects, and so far everything appears to be working. Memory impact is a little higher, but I suppose that is to be expected.

The only thing I noticed is that I had to re-introduce mbed_lib.json to make everything work. Why was it removed in rev 122? Without it, all of the built-in thread safety in the mbed library gets disabled.

@c1728p9
Copy link
Contributor

c1728p9 commented Nov 15, 2016

Good to hear that it is working for you.

Are you building mbed from source or are you using pre-built (mbed.bld) version? The rtos with mbed_lib.json does not work with the pre-built version, since having mbed_lib.json present changes structure sizes. This leads to intermittent crashes. Issue #3115 was opened for this and was fixed by removing mbed_lib.json.

Even without mbed_lib.json, core functions such as malloc are still thread safe. High level objects like Serial or SPI are not thread safe and need to be protected by the application. This is a limitation of having a pre-built mbed library.

@neilt6
Copy link
Contributor Author

neilt6 commented Nov 15, 2016

@c1728p9 I'm using the mbed-dev library and building from source. In my opinion the new thread safety improvements were a huge leap forward for the library, and I've designed my entire project to leverage them. I could see this change causing a lot of confusion since thread safety is apparently present and documented, but no longer works.

@c1728p9
Copy link
Contributor

c1728p9 commented Nov 15, 2016

The primary use case of mbed 2.0 was without an RTOS, so none of the core drivers were thread safe. Thread safety and the related documentation was only added for mbed-os 5.0. An erroneous check in of mbed_lib.json to the RTOS turned this on from 2.0 but this was not the intention.

The technical reason this doesn't work for 2.0 is the application needs to be built with the same configuration as mbed.bld. Adding mbed_lib.json changes the size of PlatformMutex which leads to memory corruption. If you are building with mbed-dev you should be able to add mbed_lib.json without negative side effects. This is not an officially supported configuration of mbed 2.0 + rtos though.

@neilt6
Copy link
Contributor Author

neilt6 commented Nov 16, 2016

@c1728p9 Thanks for the clarification, I have no further objections.

@c1728p9
Copy link
Contributor

c1728p9 commented Nov 17, 2016

Changes pushed to mbed-rtos. Please either reopen this or create a new issue if you run into problems with it.

@c1728p9 c1728p9 closed this as completed Nov 17, 2016
artokin pushed a commit to artokin/mbed-os that referenced this issue Jun 22, 2021
…..4a3c5c5

4a3c5c5 Merge remote-tracking branch 'origin/release_internal' into release_external
2b8d2e1 Do not reset radio when MAC data request timeouts (ARMmbed#2647)
95c506a Frame counters for nw keys are now stored to NVM only after send key is set (ARMmbed#2641)
3b3010a Adjusted stagger random to [min,min+max] and for small nw set the stagger value to 10 seconds
02bc33a Adjusted security protocol (EAP-TLS,4WH,2WH) retry timers
eb26726 High Priority timestamp compare overflow support fix.
928723a FHSS WS: Initialize broadcast channel count when enabling FHSS (ARMmbed#2642)
6040d70 Updated change log
667b191 Changed initial EAPOL-key retries from trickle to exponential backup
d925145 Add RTT calculation for DHCP Time calculation
0b82953 Traceroute bug fix.
04de6e2 Merge pull request ARMmbed#2638 from PelionIoT/mbed_os_fix_ufsi_calculation
2012347 Fixed FHSS UFSI calculation unit tests
436f16e Handle timer rollover in calculate_ufsi
411cf5c coding style
d6f4421 Correct ufsi timing calculation
560619d Add network time vendor data element to DHCPv6 reply message
6d290dc System time read/write callbacks (ARMmbed#2637)
7905df6 Restart or remove transmission when MAC data request timeouts (ARMmbed#2636)
c97695c Bug fix: EAPOL parent compare fix
e283e62 Fixed channel mask usage with OFDM configurations (ARMmbed#2633)
24168f8 Do not send too old packets (ARMmbed#2632)
dbd83be Fix copyrights (ARMmbed#2631)
7f0cffd Merge pull request ARMmbed#2630 from PelionIoT/use_pelion_copyright
511bd5a Corrected coding style
57ec028 Corrected comparison
7d853de When EAPOL waiting queue is full oldest entry is removed
acf580f Update copyright in changed MDNS files
933c0bb Update copyright
3aeb2af Statistics for data request latencies (ARMmbed#2629)
3f7eae6 EAPOL FHSS temp entry discover
5200b66 DHCP time elapsed time write fix.
0536874 Removed empty EAPOL-key message send after 4WH completion to nodes on relay
8a2a683 Fixed DHCP wrong time elapsed value write.
283f2ee DHCPv6 update:
99be778 EAPOL temp neighbour update
4f9e3d1 Adaptation layer to remove oldest packet first
20f1f64 Added ignoring of retry messages from RADIUS server when waiting EAP-TLS
8a8b407 Add RSL check for ETX Calculation for RPL parent selection
c05e1da Fix DHCP server Uninitialized memory read
77229ee Fix CPP error from domain prefix check
7e47889 support filtering of EAPOL parents based device-min-sens configuration
618a191 Wi-SUN Expedite forward state update
4371462 Fix NULL read from RPL header addition
7802c7e Update CHANGELOG.md
b2c8104 CHANGELOG for Nanostack v13.0.0 (ARMmbed#2615)

git-subtree-dir: features/nanostack/sal-stack-nanostack
git-subtree-split: 4a3c5c5
artokin pushed a commit to artokin/mbed-os that referenced this issue Jun 23, 2021
…903b81..4a3c5c5

4a3c5c5 Merge remote-tracking branch 'origin/release_internal' into release_external
2b8d2e1 Do not reset radio when MAC data request timeouts (ARMmbed#2647)
95c506a Frame counters for nw keys are now stored to NVM only after send key is set (ARMmbed#2641)
3b3010a Adjusted stagger random to [min,min+max] and for small nw set the stagger value to 10 seconds
02bc33a Adjusted security protocol (EAP-TLS,4WH,2WH) retry timers
eb26726 High Priority timestamp compare overflow support fix.
928723a FHSS WS: Initialize broadcast channel count when enabling FHSS (ARMmbed#2642)
6040d70 Updated change log
667b191 Changed initial EAPOL-key retries from trickle to exponential backup
d925145 Add RTT calculation for DHCP Time calculation
0b82953 Traceroute bug fix.
04de6e2 Merge pull request ARMmbed#2638 from PelionIoT/mbed_os_fix_ufsi_calculation
2012347 Fixed FHSS UFSI calculation unit tests
436f16e Handle timer rollover in calculate_ufsi
411cf5c coding style
d6f4421 Correct ufsi timing calculation
560619d Add network time vendor data element to DHCPv6 reply message
6d290dc System time read/write callbacks (ARMmbed#2637)
7905df6 Restart or remove transmission when MAC data request timeouts (ARMmbed#2636)
c97695c Bug fix: EAPOL parent compare fix
e283e62 Fixed channel mask usage with OFDM configurations (ARMmbed#2633)
24168f8 Do not send too old packets (ARMmbed#2632)
dbd83be Fix copyrights (ARMmbed#2631)
7f0cffd Merge pull request ARMmbed#2630 from PelionIoT/use_pelion_copyright
511bd5a Corrected coding style
57ec028 Corrected comparison
7d853de When EAPOL waiting queue is full oldest entry is removed
acf580f Update copyright in changed MDNS files
933c0bb Update copyright
3aeb2af Statistics for data request latencies (ARMmbed#2629)
3f7eae6 EAPOL FHSS temp entry discover
5200b66 DHCP time elapsed time write fix.
0536874 Removed empty EAPOL-key message send after 4WH completion to nodes on relay
8a2a683 Fixed DHCP wrong time elapsed value write.
283f2ee DHCPv6 update:
99be778 EAPOL temp neighbour update
4f9e3d1 Adaptation layer to remove oldest packet first
20f1f64 Added ignoring of retry messages from RADIUS server when waiting EAP-TLS
8a8b407 Add RSL check for ETX Calculation for RPL parent selection
c05e1da Fix DHCP server Uninitialized memory read
77229ee Fix CPP error from domain prefix check
7e47889 support filtering of EAPOL parents based device-min-sens configuration
618a191 Wi-SUN Expedite forward state update
4371462 Fix NULL read from RPL header addition
7802c7e Update CHANGELOG.md
b2c8104 CHANGELOG for Nanostack v13.0.0 (ARMmbed#2615)

git-subtree-dir: connectivity/nanostack/sal-stack-nanostack
git-subtree-split: 4a3c5c5
artokin pushed a commit to artokin/mbed-os that referenced this issue Jun 28, 2021
…903b81..4a3c5c5

4a3c5c5 Merge remote-tracking branch 'origin/release_internal' into release_external
2b8d2e1 Do not reset radio when MAC data request timeouts (ARMmbed#2647)
95c506a Frame counters for nw keys are now stored to NVM only after send key is set (ARMmbed#2641)
3b3010a Adjusted stagger random to [min,min+max] and for small nw set the stagger value to 10 seconds
02bc33a Adjusted security protocol (EAP-TLS,4WH,2WH) retry timers
eb26726 High Priority timestamp compare overflow support fix.
928723a FHSS WS: Initialize broadcast channel count when enabling FHSS (ARMmbed#2642)
6040d70 Updated change log
667b191 Changed initial EAPOL-key retries from trickle to exponential backup
d925145 Add RTT calculation for DHCP Time calculation
0b82953 Traceroute bug fix.
04de6e2 Merge pull request ARMmbed#2638 from PelionIoT/mbed_os_fix_ufsi_calculation
2012347 Fixed FHSS UFSI calculation unit tests
436f16e Handle timer rollover in calculate_ufsi
411cf5c coding style
d6f4421 Correct ufsi timing calculation
560619d Add network time vendor data element to DHCPv6 reply message
6d290dc System time read/write callbacks (ARMmbed#2637)
7905df6 Restart or remove transmission when MAC data request timeouts (ARMmbed#2636)
c97695c Bug fix: EAPOL parent compare fix
e283e62 Fixed channel mask usage with OFDM configurations (ARMmbed#2633)
24168f8 Do not send too old packets (ARMmbed#2632)
dbd83be Fix copyrights (ARMmbed#2631)
7f0cffd Merge pull request ARMmbed#2630 from PelionIoT/use_pelion_copyright
511bd5a Corrected coding style
57ec028 Corrected comparison
7d853de When EAPOL waiting queue is full oldest entry is removed
acf580f Update copyright in changed MDNS files
933c0bb Update copyright
3aeb2af Statistics for data request latencies (ARMmbed#2629)
3f7eae6 EAPOL FHSS temp entry discover
5200b66 DHCP time elapsed time write fix.
0536874 Removed empty EAPOL-key message send after 4WH completion to nodes on relay
8a2a683 Fixed DHCP wrong time elapsed value write.
283f2ee DHCPv6 update:
99be778 EAPOL temp neighbour update
4f9e3d1 Adaptation layer to remove oldest packet first
20f1f64 Added ignoring of retry messages from RADIUS server when waiting EAP-TLS
8a8b407 Add RSL check for ETX Calculation for RPL parent selection
c05e1da Fix DHCP server Uninitialized memory read
77229ee Fix CPP error from domain prefix check
7e47889 support filtering of EAPOL parents based device-min-sens configuration
618a191 Wi-SUN Expedite forward state update
4371462 Fix NULL read from RPL header addition
7802c7e Update CHANGELOG.md
b2c8104 CHANGELOG for Nanostack v13.0.0 (ARMmbed#2615)

git-subtree-dir: connectivity/nanostack/sal-stack-nanostack
git-subtree-split: 4a3c5c5
artokin pushed a commit to artokin/mbed-os that referenced this issue Jun 28, 2021
…..4a3c5c5

4a3c5c5 Merge remote-tracking branch 'origin/release_internal' into release_external
2b8d2e1 Do not reset radio when MAC data request timeouts (ARMmbed#2647)
95c506a Frame counters for nw keys are now stored to NVM only after send key is set (ARMmbed#2641)
3b3010a Adjusted stagger random to [min,min+max] and for small nw set the stagger value to 10 seconds
02bc33a Adjusted security protocol (EAP-TLS,4WH,2WH) retry timers
eb26726 High Priority timestamp compare overflow support fix.
928723a FHSS WS: Initialize broadcast channel count when enabling FHSS (ARMmbed#2642)
6040d70 Updated change log
667b191 Changed initial EAPOL-key retries from trickle to exponential backup
d925145 Add RTT calculation for DHCP Time calculation
0b82953 Traceroute bug fix.
04de6e2 Merge pull request ARMmbed#2638 from PelionIoT/mbed_os_fix_ufsi_calculation
2012347 Fixed FHSS UFSI calculation unit tests
436f16e Handle timer rollover in calculate_ufsi
411cf5c coding style
d6f4421 Correct ufsi timing calculation
560619d Add network time vendor data element to DHCPv6 reply message
6d290dc System time read/write callbacks (ARMmbed#2637)
7905df6 Restart or remove transmission when MAC data request timeouts (ARMmbed#2636)
c97695c Bug fix: EAPOL parent compare fix
e283e62 Fixed channel mask usage with OFDM configurations (ARMmbed#2633)
24168f8 Do not send too old packets (ARMmbed#2632)
dbd83be Fix copyrights (ARMmbed#2631)
7f0cffd Merge pull request ARMmbed#2630 from PelionIoT/use_pelion_copyright
511bd5a Corrected coding style
57ec028 Corrected comparison
7d853de When EAPOL waiting queue is full oldest entry is removed
acf580f Update copyright in changed MDNS files
933c0bb Update copyright
3aeb2af Statistics for data request latencies (ARMmbed#2629)
3f7eae6 EAPOL FHSS temp entry discover
5200b66 DHCP time elapsed time write fix.
0536874 Removed empty EAPOL-key message send after 4WH completion to nodes on relay
8a2a683 Fixed DHCP wrong time elapsed value write.
283f2ee DHCPv6 update:
99be778 EAPOL temp neighbour update
4f9e3d1 Adaptation layer to remove oldest packet first
20f1f64 Added ignoring of retry messages from RADIUS server when waiting EAP-TLS
8a8b407 Add RSL check for ETX Calculation for RPL parent selection
c05e1da Fix DHCP server Uninitialized memory read
77229ee Fix CPP error from domain prefix check
7e47889 support filtering of EAPOL parents based device-min-sens configuration
618a191 Wi-SUN Expedite forward state update
4371462 Fix NULL read from RPL header addition
7802c7e Update CHANGELOG.md
b2c8104 CHANGELOG for Nanostack v13.0.0 (ARMmbed#2615)

git-subtree-dir: features/nanostack/sal-stack-nanostack
git-subtree-split: 4a3c5c5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants