Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kinetis: Optimize GPIO interrupt handler #8558

Merged
merged 2 commits into from
Nov 8, 2018

Conversation

jnohlgard
Copy link
Member

@jnohlgard jnohlgard commented Feb 14, 2018

Contribution description

The IRQ handler in the GPIO driver on Kinetis currently iterates one pin at a time to find out which pin has triggered the interrupt. With this change we use the builtin function ctz to count trailing zeros to immediately jump to the interrupt flag that has been set.
Most of the time there will only be a single bit set in the 32 bit GPIO PORT interrupt flags register, which means that this method should be quicker.

Edit: Updated with new implementation using GCC builtin function __builtin_ctz for efficiency

Measured improvement

Using LED0 and SW3 on the frdm-kw41z board (Cortex-M0+), the time was measured between the falling edge of SW3 until LED0 was toggled from main(). The GPIO interrupt callback function was set to a simple no-op return. The software was written so that the first thing that happens in main after returning from the interrupt is LED0_ON. The time was measured using a logic analyzer.
Current master: 22 µs
This PR: 6.2 µs

The improvement should be even better on the Cortex-M4 Kinetis CPUs since they will use the CLZ hardware instruction which is quicker than the assembly version of ctz.

Testing

Run tests/periph_gpio on a Kinetis board (e.g. frdm-kw41z)
use init_int to test pin interrupt on some accessible pin.
Example for FRDM-KW41Z, enabling pin interrupt on falling flank of PTC4 (SW3 button on the dev board):
init_int 2 4 0 0

Issues/PRs references

none

@jnohlgard jnohlgard added Type: enhancement The issue suggests enhanceable parts / The PR enhances parts of the codebase / documentation Platform: ARM Platform: This PR/issue effects ARM-based platforms Impact: minor The PR is small in size and might only require a quick look of a knowledgeable reviewer CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR labels Feb 14, 2018
@jnohlgard jnohlgard added this to the Release 2018.04 milestone Feb 14, 2018
@kaspar030
Copy link
Contributor

An alternative would be to use the __builtin_ffs function

Maybe try bitharithm_lsb()? It uses __builtin_ffs() if available and black magic otherwise.

@jnohlgard
Copy link
Member Author

I don't have time to examine further at the moment. Might take a look later.

@PeterKietzmann
Copy link
Member

@gebart is it worth looking at the current state of this PR plus testing or do you want to improve it?

@jnohlgard
Copy link
Member Author

Rewrote this using the GCC builtin ctz function. Added some real measurements for the frdm-kw41z board.
The current improvement is from 22 µs to 6.1 µs.

unsigned pin = 0;
while (status) {
unsigned tz = __builtin_ctz(status);
pin += tz;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's in status? A bitfield with the pending ISRs I suppose?

If yes, I think we could simplify this, as "ctz" should directly return the number of the LSB (and thus pin) starting from zero:

while(status) {
  /* get position of first bit set in status */
  unsigned pin = __builtin_ctz(status);
 /* clear it */
  status &= ~(1<<pin);
  if (...

I wouldn't bother, as the compiler usually figures this itself, but it took me a bit to understand the current version.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, did you try bitarithm_lsb()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The status variable is a bit field where 1 means the pin interrupt has occurred, 0 means no interrupt has occurred.
I didn't test bitarithm_lsb yet. I will make a quick test now while I have the measurement set up still connected.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All measurements were performed with the logic analyzer set to 100 MHz sample rate.

Tested with bitarithm_lsb on CM0+ (frdm-kw41z). The timing is on average 5.7µs with bitarithm_lsb on CM0+ (frdm-kw41z), BITARITHM_LSB_LOOKUP is defined, BITARITHM_LSB_BUILTIN is not defined.
There is quite a bit of other things adding to the measured delay though, so if you want to perform a direct comparison of the MultiplyDeBruijnBitPosition implementation against the GCC builtin armv6 asm implementation I think we will need a different test program.

One more measurement was taken as well, I moved the LED0_ON macro to the first line of the GPIO interrupt callback routine (the one passed as an argument to gpio_init_int). These measurements seem to confirm the improvement from builtin_ctz to bitarithm_lsb

Average time from GPIO pin flank to first line of cb:
With bitarithm_lsb: 4.4 µs
With __builtin_ctz: 4.8 µs

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion on the bit clearing, it saved another 0.3-0.4 µs on average.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not all pins are usable since they are already used for other functions, but we might find some pin numbered 14-19 which may be usable for a GPIO IRQ. I'll try to test something later tonight or in the morning.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a test on frdm-k22f, using the PTB19 pin. That CPU is already quite fast, so the practical gains are small, even though the relative improvement is huge.

master: 7.1 µs
this PR: 3.1 µs

I realized that the KW41Z tests were already using a "late" number pin. Because the port IRQ for PORTB and PORTC are combined on that CPU, the interrupt handler first checks the PORTB interrupts, then the PORTC interrupts, so the numbers above for PTC4 means that it was actually running the naive pin loop for 32+5=37 iterations until the correct pin was checked. I verified the above theory by switching to PTA19 on master, and the timing was better than when using PTC4. With this PR the difference between using PTA19 and PTC4 is too small for my measurements, even though there is one more interrupt flag check for checking PORTB first before checking PORTC in the PTC4 case. I guess it would show if we count the actual cycles, but it is not noticeable in practice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the relative improvement is huge.

master: 7.1 µs
this PR: 3.1 µs

Nice!

your suggestion makes the compiler generate a single bics instruction (bit clear), while the other version needs some adds and lsrs instructions

This is not commited, yet, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be, maybe I forgot to push. Will update

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pushed

@jnohlgard
Copy link
Member Author

Added one more bugfix: cortexm_isr_end() was being called twice on KW41Z and other CPUs with combined PORTB and PORTC IRQs. Solved by moving cortexm_isr_end() out of irq_handler and into isr_portx

@kaspar030 kaspar030 added Reviewed: 1-fundamentals The fundamentals of the PR were reviewed according to the maintainer guidelines Reviewed: 2-code-design The code design of the PR was reviewed according to the maintainer guidelines Reviewed: 4-code-style The adherence to coding conventions by the PR were reviewed according to the maintainer guidelines Reviewed: 5-documentation The documentation details of the PR were reviewed according to the maintainer guidelines labels Nov 7, 2018
/* get position of first bit set in status */
unsigned pin = bitarithm_lsb(status);
/* clear it */
status &= ~(1 << pin);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for consistency you could add a u here, or remove it below.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I forgot to address this while squashing

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not tell anyone! ;-)

Copy link
Contributor

@kaspar030 kaspar030 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

untested ACK. pls squash!

Joakim Nohlgård added 2 commits November 8, 2018 14:52
Measurements show that the time from pin edge until return from
interrupt is reduced from 22 us to 6.1 us for KW41Z running at
41.94 MHz. The measurements used a no-op GPIO callback for testing and
were measured using an external logic analyzer.
Devices with combined IRQs would call the cortexm_isr_end handler twice
when it is part of the irq_handler routine.
@jnohlgard
Copy link
Member Author

squashed

@kaspar030
Copy link
Contributor

&go.

@kaspar030 kaspar030 merged commit f3fa8b5 into RIOT-OS:master Nov 8, 2018
@jnohlgard jnohlgard deleted the pr/kinetis-gpio-interrupt branch November 8, 2018 21:46
@jnohlgard
Copy link
Member Author

@kaspar030 thanks for the feedback and review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI: ready for build If set, CI server will compile all applications for all available boards for the labeled PR Impact: minor The PR is small in size and might only require a quick look of a knowledgeable reviewer Platform: ARM Platform: This PR/issue effects ARM-based platforms Reviewed: 1-fundamentals The fundamentals of the PR were reviewed according to the maintainer guidelines Reviewed: 2-code-design The code design of the PR was reviewed according to the maintainer guidelines Reviewed: 4-code-style The adherence to coding conventions by the PR were reviewed according to the maintainer guidelines Reviewed: 5-documentation The documentation details of the PR were reviewed according to the maintainer guidelines Type: enhancement The issue suggests enhanceable parts / The PR enhances parts of the codebase / documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants