-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kinetis: Optimize GPIO interrupt handler #8558
Conversation
Maybe try bitharithm_lsb()? It uses __builtin_ffs() if available and black magic otherwise. |
I don't have time to examine further at the moment. Might take a look later. |
@gebart is it worth looking at the current state of this PR plus testing or do you want to improve it? |
c78c5c3
to
efdc3ab
Compare
Rewrote this using the GCC builtin ctz function. Added some real measurements for the frdm-kw41z board. |
cpu/kinetis/periph/gpio.c
Outdated
unsigned pin = 0; | ||
while (status) { | ||
unsigned tz = __builtin_ctz(status); | ||
pin += tz; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's in status? A bitfield with the pending ISRs I suppose?
If yes, I think we could simplify this, as "ctz" should directly return the number of the LSB (and thus pin) starting from zero:
while(status) {
/* get position of first bit set in status */
unsigned pin = __builtin_ctz(status);
/* clear it */
status &= ~(1<<pin);
if (...
I wouldn't bother, as the compiler usually figures this itself, but it took me a bit to understand the current version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw, did you try bitarithm_lsb()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The status variable is a bit field where 1 means the pin interrupt has occurred, 0 means no interrupt has occurred.
I didn't test bitarithm_lsb yet. I will make a quick test now while I have the measurement set up still connected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All measurements were performed with the logic analyzer set to 100 MHz sample rate.
Tested with bitarithm_lsb on CM0+ (frdm-kw41z). The timing is on average 5.7µs with bitarithm_lsb on CM0+ (frdm-kw41z), BITARITHM_LSB_LOOKUP
is defined, BITARITHM_LSB_BUILTIN
is not defined.
There is quite a bit of other things adding to the measured delay though, so if you want to perform a direct comparison of the MultiplyDeBruijnBitPosition implementation against the GCC builtin armv6 asm implementation I think we will need a different test program.
One more measurement was taken as well, I moved the LED0_ON macro to the first line of the GPIO interrupt callback routine (the one passed as an argument to gpio_init_int). These measurements seem to confirm the improvement from builtin_ctz to bitarithm_lsb
Average time from GPIO pin flank to first line of cb
:
With bitarithm_lsb
: 4.4 µs
With __builtin_ctz
: 4.8 µs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion on the bit clearing, it saved another 0.3-0.4 µs on average.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not all pins are usable since they are already used for other functions, but we might find some pin numbered 14-19 which may be usable for a GPIO IRQ. I'll try to test something later tonight or in the morning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a test on frdm-k22f, using the PTB19 pin. That CPU is already quite fast, so the practical gains are small, even though the relative improvement is huge.
master: 7.1 µs
this PR: 3.1 µs
I realized that the KW41Z tests were already using a "late" number pin. Because the port IRQ for PORTB and PORTC are combined on that CPU, the interrupt handler first checks the PORTB interrupts, then the PORTC interrupts, so the numbers above for PTC4 means that it was actually running the naive pin loop for 32+5=37 iterations until the correct pin was checked. I verified the above theory by switching to PTA19 on master, and the timing was better than when using PTC4. With this PR the difference between using PTA19 and PTC4 is too small for my measurements, even though there is one more interrupt flag check for checking PORTB first before checking PORTC in the PTC4 case. I guess it would show if we count the actual cycles, but it is not noticeable in practice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the relative improvement is huge.
master: 7.1 µs
this PR: 3.1 µs
Nice!
your suggestion makes the compiler generate a single
bics
instruction (bit clear), while the other version needs some adds and lsrs instructions
This is not commited, yet, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be, maybe I forgot to push. Will update
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pushed
Added one more bugfix: cortexm_isr_end() was being called twice on KW41Z and other CPUs with combined PORTB and PORTC IRQs. Solved by moving cortexm_isr_end() out of irq_handler and into isr_portx |
/* get position of first bit set in status */ | ||
unsigned pin = bitarithm_lsb(status); | ||
/* clear it */ | ||
status &= ~(1 << pin); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for consistency you could add a u here, or remove it below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I forgot to address this while squashing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not tell anyone! ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
untested ACK. pls squash!
Measurements show that the time from pin edge until return from interrupt is reduced from 22 us to 6.1 us for KW41Z running at 41.94 MHz. The measurements used a no-op GPIO callback for testing and were measured using an external logic analyzer.
Devices with combined IRQs would call the cortexm_isr_end handler twice when it is part of the irq_handler routine.
fc3d463
to
8bb73f2
Compare
squashed |
&go. |
@kaspar030 thanks for the feedback and review |
Contribution description
The IRQ handler in the GPIO driver on Kinetis currently iterates one pin at a time to find out which pin has triggered the interrupt. With this change we use the builtin function ctz to count trailing zeros to immediately jump to the interrupt flag that has been set.
Most of the time there will only be a single bit set in the 32 bit GPIO PORT interrupt flags register, which means that this method should be quicker.
Edit: Updated with new implementation using GCC builtin function
__builtin_ctz
for efficiencyMeasured improvement
Using LED0 and SW3 on the frdm-kw41z board (Cortex-M0+), the time was measured between the falling edge of SW3 until LED0 was toggled from main(). The GPIO interrupt callback function was set to a simple no-op
return
. The software was written so that the first thing that happens in main after returning from the interrupt isLED0_ON
. The time was measured using a logic analyzer.Current master: 22 µs
This PR: 6.2 µs
The improvement should be even better on the Cortex-M4 Kinetis CPUs since they will use the CLZ hardware instruction which is quicker than the assembly version of ctz.
Testing
Run tests/periph_gpio on a Kinetis board (e.g. frdm-kw41z)
use init_int to test pin interrupt on some accessible pin.
Example for FRDM-KW41Z, enabling pin interrupt on falling flank of PTC4 (SW3 button on the dev board):
init_int 2 4 0 0
Issues/PRs references
none