New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
riscv/fpu: Implement correct lazy-FPU functionality (attempt #2) #9577
Conversation
@masayuki2009 The last commit fixes the ksmp64 crash but I cannot verify the FPU test because I do not own an FPU capable board. I noticed that rv-virt has FPU disabled, is this something I can fix ? Or is FPU unsupported by qemu ? |
The problem with smp is
And later
Only the integer context changes but no MMU or FPU restore is done. Maybe someone who knows how SMP works can comment on what up_cpu_paused is and what it is supposed to do? Is it supposed to be capable of handling context switches or ? |
@masayuki2009 I enabled FPU for qemu-rv in arch/risc-v/Kconfig and ostest passes on smp64 now. So it seems the FPU issue with qemu-rv is also fixed by this PR.
|
I think the smp issue is now fixed, but I don't own an smp capable board to test it on actual hardware. On rv-virt FPU works both in SMP and non-SMP. |
@pussuw |
@pussuw
|
@masayuki2009 Thanks for testing. I think I need to make icicle smp capable so I can test it myself. Is it possible to provide me with the .elf file so I can see where EPC points to (the illegal instruction) ? |
Thank you!
At least the software is not running wild, looks like something might be wrong with the CPU status (mstatus or frcsr / similar). |
I was able to reproduce the ostest crash on qemu-rv and it should now be fixed. Also, rv-virt:ksmp64 is not crashing any longer either. Changes since last PR roundMoving the MMU change and FPU restore to riscv_internal.h / riscv_restorecontext() works for single core but not for multicore, so they are both moved back into riscv_doirq.c. (and riscv_perform_syscall.c). SMP seems to expect this order of execution, I don't know why though. Why does the ostest crash happen ?The crash is due to mishandling of the floating point control/status register (fcsr). The illegal istruction happes due to incorrect / stale rouding mode, from the risc-v spec: Floating-point operations use either a static rounding mode encoded in the instruction, or a dynamic So the CPU (FPU status register) status was not updated correctly in the SMP case. |
Getting the style error from #9824 here now. Should nxstyle ignore old errors or no ? |
Nevermind, needs more testing |
d9cae11
to
f2f4499
Compare
Now qemu-rv smp64 is rock solid. The trick was up_cpu_paused, I did not realize the full CPU context (integer + FPU regs) need to be saved there. Could you @masayuki2009 verify with actual hardware that it works ? I think we should not merge before this is verified with HW. |
Why? The tcb can contain info that is needed by the context switch routine. One example is lazy-FPU handling; the integer registers can be stored into the stack, because they are always stored & restored. Lazy-FPU however needs a non-volatile location to store the FPU registers as the save feature will skip saving a clean FPU, but the restore must always restore the FPU registers if the thread uses FPU.
146b6c2
to
271beb3
Compare
@pussuw |
@pussuw please fix the style warning. |
@xiaoxiang781216 I don't think I'm responsible for the style issue: It is the same as in here #9824 I thought the style check would not complain about issues that are already in upstream ? |
No, the style checks the full source code. This design prefers the contributor fix the style issue in the old code base. |
But I'm still confused, the only change I made in assert.c is
The violation comes from a previous very recent commit (#9824), which was merged with the violation. The code added there is fine so I think the violation is a false alarm, but I have NO IDEA how to fix that. Hmm, now looking at the code the fix for the violation should be pretty simple, I'll do it on Monday. |
yes, it's a false alarm, that's why #9824 is merged with this warning.
ok, you can try it. |
- Save the FPU registers into the tcb so they don't get lost if the stack frame for xcp.regs moves (as it does) - Handle interger and FPU register save/load separately - Integer registers are saved/loaded always, like before - FPU registers are only saved during a context switch: - Save ONLY if FPU is dirty - Restore always if FPU has been used (not in FSTATE_OFF, FSTATE_INIT) - Remove all lazy-FPU related logic from the macros, it is not needed
Instead of clearing the fields individually, just wipe the whole register. This can be done because flags and rm are just parts of the fcsr. 31 8 5 0 +--------------+--------+-----------+ | | | | | RESERVED | FRM | FSTATUS | | | | | +--------------+--------+-----------+ FCSR
This way the registers can be read easily
Adds option to use the old implementation where FPU is stored into the process stack.
The FPU restore issue does not show itself any longer, so FPU support can be re-enabled.
@pussuw So in this PR, the FPU save and restore are migrated to riscv_swint()? I am curious whether the FPU will be restored correctly if a context switch is triggered in other exceptions? |
@anchao A context switch always ends up in riscv_swint(), so yes. Are you experiencing some odd / unexpected behavior with FPU ? I have a project that heavily relies on the FPU and I have not seen any issues with it. Note, that I do not use SMP, there was a problem with SMP in the first iteration of my lazy FPU patches, but that was related to address environment handling. |
@pussuw No, I just noticed this commit when reviewing the code, current implementation may have certain risks if enable FPU in kernel mode and compiler generates code that uses floating-point registers in the interrupt handler. |
Second attempt at this:
#9486
There are two issues with smp mode:
Both very likely for the same reason