Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

naked interrupt will break when s registers get used #90

Closed
jnk0le opened this issue May 10, 2023 · 17 comments
Closed

naked interrupt will break when s registers get used #90

jnk0le opened this issue May 10, 2023 · 17 comments

Comments

@jnk0le
Copy link
Contributor

jnk0le commented May 10, 2023

As introduced in exti example. Be it function call, register pressure or pure size optimization.

see:
https://www.eevblog.com/forum/microcontrollers/bizarre-problem-on-ch32v003-with-systick-isr-corrupting-uart-tx-data
https://www.reddit.com/r/RISCV/comments/126262j/notes_on_wch_fast_interrupts/

@cnlohr
Copy link
Owner

cnlohr commented May 10, 2023

Those are really long discussions, can you explain it with more detail here? Is there a work around?

Does this fix it? (If so, I don't understand quite why.)

__attribute__((noinline))
void my_handler_inner() {
    ... all your stuff here
}

__attribute__((naked))
void my_handler() {
    my_handler_inner();
    asm volatile ("mret");
    __builtin_unreachable(); // suppress the usual ret
}

Or is the argument that we just would be better off without that feature? Because I would be A OK with that conclusion!!

@cnlohr
Copy link
Owner

cnlohr commented May 10, 2023

Ok, let me see if I have this right:

  1. __attribute__((interrupt) works.
  2. The issue is only with s0, s1. So, doing the weird call thing forced GCC to backup s0, s1... But, it incurs a hefty perf hit because of the call
  3. We should profile interrupt vs the proposed naked solution.
  4. Can we just add our own prologue/epilogue to save/restore s0, s1?

@cnlohr
Copy link
Owner

cnlohr commented May 10, 2023

wait
5. Without doing this song and dance with call, neither GCC nor CLANG want to emit code with stack allocation??

@cnlohr
Copy link
Owner

cnlohr commented May 10, 2023

Well this is flaming nightmare.

@cnlohr
Copy link
Owner

cnlohr commented May 10, 2023

Wait, you can get around it with GCC. https://godbolt.org/z/aEffcsvfv

@jnk0le
Copy link
Contributor Author

jnk0le commented May 10, 2023

__attribute__((interrupt) will always work but will do redundant stacking if HPE enabled.
In some scenarions (like 2-4 registers pushed) it may be a bit fatser than HPE.

  1. Naked attribute completely removes stack adjustment, if there are stack allocated variables, then those will underflow the stack. Clang also won't accept anything but inline asm in naked functions.

Manually pushing s0, s1 will add 6 instructions, that in typical use case will be useless.

Apart from the __attribute__((interrupt) and BH ritual, We are left with __attribute__((interrupt("WCH-Interrupt-fast"))).
There was a gcc patch circulating somewhere and newest xpack gcc 12.2.x builds should contain this patch.

The best scenario is to get standardized something like "prestacked" annotations for compilers, otherwise we end up with more and more proprietary attributes and related mess.
riscv/riscv-fast-interrupt#329 (comment)

@jnk0le
Copy link
Contributor Author

jnk0le commented May 10, 2023

In BH solution it's the job of the inner function to handle the stack and saved registers. hence it must benoinline

ED:

Wait, you can get around it with GCC. https://godbolt.org/z/aEffcsvfv

This one does actually underflow the stack

@jnk0le
Copy link
Contributor Author

jnk0le commented May 10, 2023

and newest xpack gcc 12.2.x builds should contain this patch.

That's wrong I must have gotten the custom xpack-gcc build from somewhere else

@brucehoult
Copy link

Wait, you can get around it with GCC. https://godbolt.org/z/aEffcsvfv

Nope.

If bar() writes to its argument -- elements of xxx, which no space is allocated for -- then you are scribbling all over the saved s0 and s1 and also the next 32 bytes of stack (HPE-saved registers in the case of the '003).

@brucehoult
Copy link

Does this fix it? (If so, I don't understand quite why.)

Allow me to quote from https://www.eevblog.com/forum/microcontrollers/bizarre-problem-on-ch32v003-with-systick-isr-corrupting-uart-tx-data/msg4785773/#msg4785773

The caller MUST be naked, otherwise it will allocate a stack frame and save ra but never deallocate the stack space.

The called function must NOT be inlined, otherwise any stack it uses (e.g. to save s0 or s1 or to allocate an array) will also never be deallocated.

@cnlohr
Copy link
Owner

cnlohr commented May 11, 2023

I think I understand a bit more now. This really does feel sticky. I think the best answer is to, for now, not encourage people to use the fast interrupts. Maybe someday.

I was just hoping to find something where you could have some #define boilerplate surrounding the body of your naked interrupt. This still feels unintuitively sticky to me.

@prplz
Copy link

prplz commented May 11, 2023

Here's how it's handled in tinyusb:

void USBHS_IRQHandler (void) __attribute__((naked));
void USBHS_IRQHandler (void)
{
  __asm volatile ("call USBHS_IRQHandler_impl; mret");
}

__attribute__ ((used)) void USBHS_IRQHandler_impl (void)
{
  tud_int_handler(0);
}

https://github.com/hathach/tinyusb/blob/master/hw/bsp/ch32v307/family.c#L38-L47

Not sure if it has the same issues but it is worth looking at

@cnlohr
Copy link
Owner

cnlohr commented May 11, 2023

@cnlohr
Copy link
Owner

cnlohr commented May 12, 2023

I will close this ticket now. At some point I may show the example of the asm called interrupt.

@jnk0le
Copy link
Contributor Author

jnk0le commented Oct 18, 2023

It turns out that even the workaround is broken when FPU is present (e.g. ch32v307) hydrausb3/riscv-none-elf-gcc-xpack#5

I've already sent prestacked annotation RFC to llvm list. That could finally solve this mess...
https://discourse.llvm.org/t/rfc-prestacked-annotation-to-solve-risc-v-interrupt-stacking-mess/74120

@brucehoult
Copy link

brucehoult commented Nov 7, 2023

Only when FP is actually used in the handler, Shirley?

Which is pretty questionable in an interrupt handler small enough that you care about register stacking overhead.

The Linux kernel is a pretty big interrupt handler and it doesn't use FP! Remember the tech media fuss when René Rebe patched Linux to enable RDNA2-based Radeon RX 6700XT cards to be used on the Unmatched? Do you know what the patch was? Saving and restoring FP state so the kernel (and kernel modules) could use FP.

The "prestacked" attribute is a good idea, but people can easily make errors while using it. There should probably be some predefined constants e.g. WCH_HPE. Which will need to be defined differently for RV32E and RV32I (or maybe it's ok to just expand to the RV32I values always, as the compiler won't use the other registers anyway)

@jnk0le
Copy link
Contributor Author

jnk0le commented Nov 8, 2023

Only when FP is actually used in the handler, Shirley?

that's right

Which is pretty questionable in an interrupt handler small enough that you care about register stacking overhead.

Things like 3p3z compensators etc. are using a few FP registers, and those do care about latency.

The "prestacked" attribute is a good idea, but people can easily make errors while using it. There should probably be some predefined constants e.g. WCH_HPE. Which will need to be defined differently for RV32E and RV32I (or maybe it's ok to just expand to the RV32I values always, as the compiler won't use the other registers anyway)

Vendors could provide those in the device headers.

e.g.

#define WCH_HPE_PRESTACKED "x1,x5-x7,x10-x15"

// then use in code like
__attribute__((interrupt, prestacked(WCH_HPE_PRESTACKED)))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants