New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
core/thread: Allow for inline thread_yield_higher #15788
core/thread: Allow for inline thread_yield_higher #15788
Conversation
d162a25
to
c88871f
Compare
Similar to irq.h, this allows for inline the often trivial thread_yield_higher function
The thread_yield_higher function is complex enough to not inline it for the avr8 cpu
c88871f
to
9c59580
Compare
Funny, I had just this also on my wish list. Thanks for tackling this :-) (I wonder if we will undo this and the other |
Probably, but as you say, that should be relative low effort |
Would be nice to see benchmarks for at least one board per arch, so that this is motivated. |
First for the cortex-m0+ and the cortex-m4 case using bench_thread_yield_idle, using the ticks result:
|
I wonder why this increases ROM noticeably. I smell another missed optimization opportunity in GCC. |
On the nrf52dk, the sequence is LDR, MOV.W, STR and ISB. The LDR and STR are 16 bit, the other two 32 bit. A 32 bit word is used at the end of the function to store the |
9c59580
to
ede0f17
Compare
We're only missing test / benchmark for the fe310 and MIPS. And the ESP platform has no pseudo header yet. @aabadie: Would you mind giving @francois-berder: Would you mind to do the same on one of your MIPS boards? |
ede0f17
to
84dfc88
Compare
Here it is |
And the hifive1b (fe310) case using bench_thread_yield_idle, using the ticks result:
Trading 40 bytes for 8 clock ticks :) |
Add for now only and empty pseudo-header for MIPS to get this in swiftly? |
This only moves an existing function into a header, do we really expect any change in behavior because of that? |
The function is inlined, influencing the performance, so yeah, I would prefer to have hard numbers before changing this. In terms of flash I see a 20B increase on the pick32-wifire and a 24B increase on the 6lowpan-clicker. |
|
Of course 😑, Added |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me.
I'd say we don't need to wait for MIPS results. Every architecture without exception got a speed bump im a hot code path by this, as was the expectation. Odds are good that for MIPS theory and practice also match. |
Thanks! |
Contribution description
This PR modifies
core/include/thread.h
to allow for inlining thethread_yield_higher
function similar to the irq api. Patches for the relevant CPUs have been included to either add thethread_arch.h
header with the inlined function or as dummy header.Initial benchmarks show that this shaves of 13 cycles from the
bench_thread_yield_pingpong
test on the nrf52dk board.TODO: add headers for arm7, esp8266 and msp430.Testing procedure
Benchmarks: TODOI can run the tests for the cortex-m and the RISC-V platform myself, but I could use some help with the mips32r2 (pic32-wifire or similar) platform.
Benchmarks
Comparing flash size and "ticks" parameter of
tests/bench_thread_yield_pingpong
:Issues/PRs references
None