Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core/thread: Allow for inline thread_yield_higher #15788

Merged
merged 9 commits into from Jan 22, 2021

Conversation

bergzand
Copy link
Member

@bergzand bergzand commented Jan 18, 2021

Contribution description

This PR modifies core/include/thread.h to allow for inlining the thread_yield_higher function similar to the irq api. Patches for the relevant CPUs have been included to either add the thread_arch.h header with the inlined function or as dummy header.

Initial benchmarks show that this shaves of 13 cycles from the bench_thread_yield_pingpong test on the nrf52dk board.

TODO: add headers for arm7, esp8266 and msp430.

Testing procedure

  • Benchmarks: TODO
  • The threading tests must still work on all affected platforms.

I can run the tests for the cortex-m and the RISC-V platform myself, but I could use some help with the mips32r2 (pic32-wifire or similar) platform.

Benchmarks

Comparing flash size and "ticks" parameter of tests/bench_thread_yield_pingpong:

board flash pre flash post bench pre bench post
samr21-xpro 11760 11836 575 546
nrf52dk 10812 10892 414 399
msba2 19924 19948 715 697
hifive1b 11108 11148 468 460

Issues/PRs references

None

@bergzand bergzand added Type: enhancement The issue suggests enhanceable parts / The PR enhances parts of the codebase / documentation State: WIP State: The PR is still work-in-progress and its code is not in its final presentable form yet Area: core Area: RIOT kernel. Handle PRs marked with this with care! Area: cpu Area: CPU/MCU ports labels Jan 18, 2021
@bergzand bergzand force-pushed the pr/core/inline_thread_yield_higher branch from d162a25 to c88871f Compare January 19, 2021 09:41
@bergzand bergzand removed the State: WIP State: The PR is still work-in-progress and its code is not in its final presentable form yet label Jan 19, 2021
@bergzand bergzand force-pushed the pr/core/inline_thread_yield_higher branch from c88871f to 9c59580 Compare January 19, 2021 10:03
@bergzand bergzand added CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR and removed CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR labels Jan 19, 2021
@maribu
Copy link
Member

maribu commented Jan 19, 2021

Funny, I had just this also on my wish list. Thanks for tackling this :-)

(I wonder if we will undo this and the other inline headers again once LTO=1 becomes default 🤣 - but that should be easier.)

@bergzand
Copy link
Member Author

bergzand commented Jan 19, 2021

(I wonder if we will undo this and the other inline headers again once LTO=1 becomes default 🤣 - but that should be easier.)

Probably, but as you say, that should be relative low effort

@maribu maribu added Reviewed: 1-fundamentals The fundamentals of the PR were reviewed according to the maintainer guidelines Reviewed: 2-code-design The code design of the PR was reviewed according to the maintainer guidelines Reviewed: 4-code-style The adherence to coding conventions by the PR were reviewed according to the maintainer guidelines Reviewed: 5-documentation The documentation details of the PR were reviewed according to the maintainer guidelines labels Jan 19, 2021
@maribu
Copy link
Member

maribu commented Jan 19, 2021

Would be nice to see benchmarks for at least one board per arch, so that this is motivated.

@bergzand
Copy link
Member Author

First for the cortex-m0+ and the cortex-m4 case using bench_thread_yield_idle, using the ticks result:

board flash pre flash post bench pre bench post
samr21-xpro 11760 11836 575 546
nrf52dk 10812 10892 414 399

@maribu
Copy link
Member

maribu commented Jan 19, 2021

I wonder why this increases ROM noticeably. I smell another missed optimization opportunity in GCC.

@bergzand
Copy link
Member Author

I wonder why this increases ROM noticeably. I smell another missed optimization opportunity in GCC.

On the nrf52dk, the sequence is LDR, MOV.W, STR and ISB. The LDR and STR are 16 bit, the other two 32 bit. A 32 bit word is used at the end of the function to store the SCB->ICSR register memory address in. 16 bytes in total for the whole sequence, every time the thread_yield_higher is used.

@bergzand bergzand force-pushed the pr/core/inline_thread_yield_higher branch from 9c59580 to ede0f17 Compare January 19, 2021 20:39
@maribu
Copy link
Member

maribu commented Jan 20, 2021

We're only missing test / benchmark for the fe310 and MIPS. And the ESP platform has no pseudo header yet.

@aabadie: Would you mind giving tests/bench_thread_yield_pingpong a spin on your Hifive 1?

@francois-berder: Would you mind to do the same on one of your MIPS boards?

@bergzand bergzand force-pushed the pr/core/inline_thread_yield_higher branch from ede0f17 to 84dfc88 Compare January 20, 2021 09:38
@bergzand
Copy link
Member Author

And the ESP platform has no pseudo header yet.

Here it is

@maribu maribu added the CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR label Jan 20, 2021
@bergzand
Copy link
Member Author

And the hifive1b (fe310) case using bench_thread_yield_idle, using the ticks result:

board flash pre flash post bench pre bench post
hifive1b 11108 11148 468 460

Trading 40 bytes for 8 clock ticks :)

@maribu
Copy link
Member

maribu commented Jan 20, 2021

Add for now only and empty pseudo-header for MIPS to get this in swiftly?

@benpicco
Copy link
Contributor

This only moves an existing function into a header, do we really expect any change in behavior because of that?

@bergzand
Copy link
Member Author

This only moves an existing function into a header, do we really expect any change in behavior because of that?

The function is inlined, influencing the performance, so yeah, I would prefer to have hard numbers before changing this.

In terms of flash I see a 20B increase on the pick32-wifire and a 24B increase on the 6lowpan-clicker.

@benpicco
Copy link
Contributor

native needs an update too.

@bergzand
Copy link
Member Author

native needs an update too.

Of course 😑, Added

Copy link
Contributor

@benpicco benpicco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me.

@maribu
Copy link
Member

maribu commented Jan 22, 2021

I'd say we don't need to wait for MIPS results. Every architecture without exception got a speed bump im a hot code path by this, as was the expectation. Odds are good that for MIPS theory and practice also match.

@benpicco benpicco merged commit 4c403d6 into RIOT-OS:master Jan 22, 2021
@bergzand
Copy link
Member Author

Thanks!

@bergzand bergzand deleted the pr/core/inline_thread_yield_higher branch January 22, 2021 19:26
@kaspar030 kaspar030 added this to the Release 2021.04 milestone Apr 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: core Area: RIOT kernel. Handle PRs marked with this with care! Area: cpu Area: CPU/MCU ports CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR Reviewed: 1-fundamentals The fundamentals of the PR were reviewed according to the maintainer guidelines Reviewed: 2-code-design The code design of the PR was reviewed according to the maintainer guidelines Reviewed: 4-code-style The adherence to coding conventions by the PR were reviewed according to the maintainer guidelines Reviewed: 5-documentation The documentation details of the PR were reviewed according to the maintainer guidelines Type: enhancement The issue suggests enhanceable parts / The PR enhances parts of the codebase / documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants