riscv/fpu: Implement correct lazy-FPU functionality (attempt #2) #9577

pussuw · 2023-06-20T09:37:23Z

Second attempt at this:
#9486

There are two issues with smp mode:

Context restore routine in up_cpu_paused did not restore the correct address environment
FPU test in ostest crashes

Both very likely for the same reason

pussuw · 2023-06-20T10:10:13Z

@masayuki2009 The last commit fixes the ksmp64 crash but I cannot verify the FPU test because I do not own an FPU capable board.

I noticed that rv-virt has FPU disabled, is this something I can fix ? Or is FPU unsupported by qemu ?

pussuw · 2023-06-20T10:14:01Z

The problem with smp is up_cpu_paused:

  /* Save the current context at CURRENT_REGS into the TCB at the head
   * of the assigned task list for this CPU.
   */

  riscv_savestate(tcb->xcp.regs);

And later

  /* Then switch contexts.  Any necessary address environment changes
   * will be made when the interrupt returns.
   */

  riscv_restorestate(tcb->xcp.regs);

Only the integer context changes but no MMU or FPU restore is done. Maybe someone who knows how SMP works can comment on what up_cpu_paused is and what it is supposed to do? Is it supposed to be capable of handling context switches or ?

pussuw · 2023-06-20T10:32:40Z

@masayuki2009 I enabled FPU for qemu-rv in arch/risc-v/Kconfig and ostest passes on smp64 now. So it seems the FPU issue with qemu-rv is also fixed by this PR.

ville@ville-ThinkPad-T14s-Gen-1:~/wpc_toolchain/workspace/nuttx-bare/nuttx$ ./tools/configure.sh rv-virt:smp64 && make -j16
Create version.h
LN: platform/board to /home/ville/wpc_toolchain/workspace/nuttx-bare/apps/platform/dummy
Register: ostest
Register: sh
Register: nsh
Register: smp
CPP:  /home/ville/wpc_toolchain/workspace/nuttx-bare/nuttx/boards/risc-v/qemu-rv/rv-virt/scripts/ld.script-> /home/ville/wpc_toolchain/workspace/nuttx-bare/nuttx/boards/risc-v/qemu-rv/rLD: nuttx
CP: nuttx.hex

ville@ville-ThinkPad-T14s-Gen-1:~/wpc_toolchain/workspace/nuttx-bare/nuttx$ qemu-system-riscv64 -semihosting -M virt,aclint=on -cpu rv64 -smp 8 -bios none -kernel nuttx -nographic

NuttShell (NSH) NuttX-10.4.0
nsh> ostest
stdio_test: write fd=1
stdio_test: Standard I/O Check: printf
stdio_test: write fd=2
stdio_test: Standard I/O Check: fprintf to stderr

...

user_main: FPU test
Starting task FPU#1
fpu_test: Started task FPU#1 at PID=11
Starting task FPU#2
fpu_test: Started task FPU#2 at PID=12
FPU#1: pass 1
FPU#2: pass 1
FPU#1: pass 2
FPU#2: pass 2
FPU#1: pass 3
FPU#2: pass 3
FPU#1: pass 4
FPU#2: pass 4
FPU#2: pass 5
FPU#1: pass 5
FPU#2: pass 6
FPU#1: pass 6
FPU#2: pass 7
FPU#1: pass 7
FPU#1: pass 8
FPU#2: pass 8
FPU#1: pass 9
FPU#2: pass 9
FPU#1: pass 10
FPU#2: pass 10
FPU#2: pass 11
FPU#1: pass 11
FPU#2: pass 12
FPU#1: pass 12
FPU#2: pass 13
FPU#1: pass 13
FPU#2: pass 14
FPU#1: pass 14
FPU#2: pass 15
FPU#1: pass 15
FPU#2: pass 16
FPU#1: pass 16
FPU#1: Succeeded
FPU#2: Succeeded
fpu_test: Returning

...

user_main: Exiting
ostest_main: Exiting with status 0
nsh>

pussuw · 2023-06-20T10:52:25Z

I think the smp issue is now fixed, but I don't own an smp capable board to test it on actual hardware. On rv-virt FPU works both in SMP and non-SMP.

masayuki2009 · 2023-06-20T11:36:30Z

I think the smp issue is now fixed, but I don't own an smp capable board to test it on actual hardware. On rv-virt FPU works both in SMP and non-SMP.

@pussuw
Thanks for the fix. Let me check this PR tomorrow.

masayuki2009 · 2023-06-20T23:32:12Z

@pussuw
Hmm, ostest with maix-bit:smp (board) still crashes.

End of test memory usage:
VARIABLE  BEFORE   AFTER
======== ======== ========
arena      1fb990   1fb990
ordblks         4        4
mxordblk   1f4020   1f4020
uordblks     7100     7100
fordblks   1f4890   1f4890

user_main: /dev/null test
dev_null: Read 0 bytes from /dev/null
dev_null: Wrote 1024 bytes to /dev/null

End of test [CPU1] riscv_exception: EXCEPTION: Illegal instruction. MCAUSE: 0000000000000002, EPC: 0000000080016bf8, MTVAL: 00000037ec3f44d4
[CPU1] riscv_exception: PANIC!!! Exception = 0000000000000002
[CPU1] _assert: Current Version: NuttX  10.4.0 67e17d2acf Jun 21 2023 08:04:13 risc-v
[CPU1] _assert: Assertion failed panic: at file: :0 task(CPU1): FPU#1 0x80016b9c
[CPU1] up_dump_register: EPC: 0000000080016bf8
[CPU1] up_dump_register: A0: 0000000000000000 A1: 0000000000000001 A2: 0000000000000000 A3: 0000000080403d58
[CPU1] up_dump_register: A4: 0000000000000002 A5: 0000000000000440 A6: 0000000000000000 A7: 0000000000000000
[CPU1] up_dump_register: T0: 0000000000000000 T1: 0000000000000000 T2: 0000000000000000 T3: 0000000000000000
[CPU1] up_dump_register: T4: 0000000000000000 T5: 0000000000000000 T6: 0000000000000000
[CPU1] up_dump_register: S0: 0000000000000000 S1: 0000000000000001 S2: 0000000000000000 S3: 0000000000000000
[CPU1] up_dump_register: S4: 0000000000000000 S5: 0000000080403248 S6: 0000000000000000 S7: 0000000000000000
[CPU1] up_dump_register: S8: 0000000000000000 S9: 0000000000000000 S10: 0000000000000000 S11: 0000000000000000
[CPU1] up_dump_register: SP: 000000008040c720 FP: 0000000000000000 TP: 0000000000000000 RA: 0000000080016bea
[CPU1] dump_stack: User Stack:
[CPU1] dump_stack:   base: 0x8040c030
[CPU1] dump_stack:   size: 00001968
[CPU1] dump_stack:     sp: 0x8040c720
[CPU1] stack_dump: 0x8040c720: deadbeef deadbeef deadbeef deadbeef deadbeef deadbeef deadbeef deadbeef
[CPU1] stack_dump: 0x8040c740: deadbeef deadbeef deadbeef deadbeef deadbeef deadbeef deadbeef deadbeef
[CPU1] stack_dump: 0x8040c760: deadbeef deadbeef deadbeef deadbeef 00000000 00000000 00000000 00000000
[CPU1] stack_dump: 0x8040c780: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[CPU1] stack_dump: 0x8040c7a0: 80016b9c 00000000 80006c9c 00000000 8040c010 00000000 00000001 00000000
[CPU1] stack_dump: 0x8040c7c0: 00000000 00000000 8000365a 00000000 00000000 00000000 00000000 00000000
[CPU1] show_tasks:    PID GROUP   CPU PRI POLICY   TYPE    NPX STATE   EVENT      SIGMASK          STACKBASE  STACKSIZE      USED   FILLED    COMMAND
[CPU1] show_tasks:   ----   ---  ---- --- -------- ------- --- ------- ---------- -------- 0x80400240      2048      1384    67.5%    irq
[CPU1] dump_task:       0     0     0   0 FIFO     Kthread N-- Running            0000000000000000 0x80403ea0      2000      1136    56.8%    CPU0 IDLE
[CPU1] dump_task:       1     1     1   0 FIFO     Kthread N-- Assigned           0000000000000000 0x80401270      2000      1104    55.2%    CPU1 IDLE
[CPU1] dump_task:       2     2     0 100 RR       Task    --- Waiting Semaphore  0000000000000000 0x80406670      2992      2176    72.7%    nsh_main
[CPU1] dump_task:      39    39     0 100 RR       Task    --- Waiting Semaphore  0000000000000000 0x8040aa20      1968      1456    73.9%    ostest
[CPU1] dump_task:      40    40     0 100 RR       Task    --- Waiting Semaphore  0000000000000000 0x80407990      8064      1520    18.8%    ostest Arg1 Arg2 Arg3 Arg4
[CPU1] dump_task:      41    41     1 100 RR       Task    --- Running            0000000000000000 0x8040c030      1968       720    36.5%    FPU#1

pussuw · 2023-06-21T06:34:05Z

@masayuki2009 Thanks for testing. I think I need to make icicle smp capable so I can test it myself. Is it possible to provide me with the .elf file so I can see where EPC points to (the illegal instruction) ?

masayuki2009 · 2023-06-22T02:54:31Z

@pussuw
nuttx-maix-bit-crash-pr9577.gz

pussuw · 2023-06-22T07:52:45Z

@pussuw nuttx-maix-bit-crash-pr9577.gz

Thank you!


0000000080016b9c <fpu_task>:
    80016b9c:   7175                    addi    sp,sp,-144
    80016b9e:   e506                    sd      ra,136(sp)
    80016ba0:   e122                    sd      s0,128(sp)
    80016ba2:   fca6                    sd      s1,120(sp)
    80016ba4:   f8ca                    sd      s2,112(sp)
    80016ba6:   f4ce                    sd      s3,104(sp)
    80016ba8:   f0d2                    sd      s4,96(sp)
    80016baa:   ecd6                    sd      s5,88(sp)
    80016bac:   a4a2                    fsd     fs0,72(sp)
    80016bae:   a0a6                    fsd     fs1,64(sp)
    80016bb0:   bc4a                    fsd     fs2,56(sp)
    80016bb2:   b84e                    fsd     fs3,48(sp)
    80016bb4:   b452                    fsd     fs4,40(sp)
    80016bb6:   b056                    fsd     fs5,32(sp)
    80016bb8:   ac5a                    fsd     fs6,24(sp)
    80016bba:   a85e                    fsd     fs7,16(sp)
    80016bbc:   a462                    fsd     fs8,8(sp)
    80016bbe:   a066                    fsd     fs9,0(sp)
    80016bc0:   fe2eb0ef                jal     ra,800023a2 <sched_lock>
    80016bc4:   003ed797                auipc   a5,0x3ed
    80016bc8:   26178793                addi    a5,a5,609 # 80403e25 <g_fpuno>
    80016bcc:   0007c403                lbu     s0,0(a5)
    80016bd0:   003eca97                auipc   s5,0x3ec
    80016bd4:   678a8a93                addi    s5,s5,1656 # 80403248 <g_fputhread>
    80016bd8:   4981                    li      s3,0
    80016bda:   0014049b                addiw   s1,s0,1
    80016bde:   0ff4f493                zext.b  s1,s1
    80016be2:   00978023                sb      s1,0(a5)
    80016be6:   84feb0ef                jal     ra,80002434 <sched_unlock>
    80016bea:   44000793                li      a5,1088
    80016bee:   00040a1b                sext.w  s4,s0
    80016bf2:   02f40433                mul     s0,s0,a5
    80016bf6:   2481                    sext.w  s1,s1
    **80016bf8:   d004f453                fcvt.s.w        fs0,s1**
    80016bfc:   d20484d3                fcvt.d.w        fs1,s1

At least the software is not running wild, looks like something might be wrong with the CPU status (mstatus or frcsr / similar).

pussuw · 2023-07-26T13:25:26Z

I was able to reproduce the ostest crash on qemu-rv and it should now be fixed. Also, rv-virt:ksmp64 is not crashing any longer either.

Changes since last PR round

Moving the MMU change and FPU restore to riscv_internal.h / riscv_restorecontext() works for single core but not for multicore, so they are both moved back into riscv_doirq.c. (and riscv_perform_syscall.c).

SMP seems to expect this order of execution, I don't know why though.

Why does the ostest crash happen ?

The crash is due to mishandling of the floating point control/status register (fcsr). The illegal istruction happes due to incorrect / stale rouding mode, from the risc-v spec:

Floating-point operations use either a static rounding mode encoded in the instruction, or a dynamic
rounding mode held in frm. Rounding modes are encoded as shown in Table 8.1. A value of 111 in
the instruction’s rm field selects the dynamic rounding mode held in frm. If frm is set to an invalid
value (101–111), any subsequent attempt to execute a floating-point operation with a dynamic
rounding mode will cause an illegal instruction trap. Some instructions that have the rm field are
nevertheless unaffected by the rounding mode; they should have their rm field set to RNE (000).

So the CPU (FPU status register) status was not updated correctly in the SMP case.

pussuw · 2023-07-26T13:45:22Z

Getting the style error from #9824 here now. Should nxstyle ignore old errors or no ?

pussuw · 2023-07-26T13:52:36Z

@masayuki2009 sorry to keep bothering you, but I still do not have an SMP capable board (I'm writing support for icicle), can you try this on maix-bit:smp once again ? I was able to reproduce the crash on qemu and also fix it, so I have some confidence it should be fixed for actual HW as well.

Nevermind, needs more testing

pussuw · 2023-07-27T09:55:36Z

Now qemu-rv smp64 is rock solid. The trick was up_cpu_paused, I did not realize the full CPU context (integer + FPU regs) need to be saved there.

Could you @masayuki2009 verify with actual hardware that it works ? I think we should not merge before this is verified with HW.

Why? The tcb can contain info that is needed by the context switch routine. One example is lazy-FPU handling; the integer registers can be stored into the stack, because they are always stored & restored. Lazy-FPU however needs a non-volatile location to store the FPU registers as the save feature will skip saving a clean FPU, but the restore must always restore the FPU registers if the thread uses FPU.

masayuki2009 · 2023-07-28T05:36:10Z

Could you @masayuki2009 verify with actual hardware that it works ? I think we should not merge before this is verified with HW.

@pussuw
I confirmed that the latest PR works with the Sipeed Maix Bit (K210).

xiaoxiang781216 · 2023-07-28T05:39:04Z

@pussuw please fix the style warning.

pussuw · 2023-07-28T06:14:18Z

@pussuw please fix the style warning.

@xiaoxiang781216 I don't think I'm responsible for the style issue:
Error: /home/runner/work/nuttx/nuttx/nuttx/sched/misc/assert.c:243:18: error: Unmatched right parentheses
Error: /home/runner/work/nuttx/nuttx/nuttx/sched/misc/assert.c:276:18: error: Unmatched right parentheses
Error: Process completed with exit code 1.

It is the same as in here #9824

I thought the style check would not complain about issues that are already in upstream ?

arch/risc-v/Kconfig

xiaoxiang781216 · 2023-07-28T15:28:21Z

@pussuw please fix the style warning.

@xiaoxiang781216 I don't think I'm responsible for the style issue: Error: /home/runner/work/nuttx/nuttx/nuttx/sched/misc/assert.c:243:18: error: Unmatched right parentheses Error: /home/runner/work/nuttx/nuttx/nuttx/sched/misc/assert.c:276:18: error: Unmatched right parentheses Error: Process completed with exit code 1.

It is the same as in here #9824

I thought the style check would not complain about issues that are already in upstream ?

No, the style checks the full source code. This design prefers the contributor fix the style issue in the old code base.

pussuw · 2023-07-28T16:51:20Z

@pussuw please fix the style warning.

@xiaoxiang781216 I don't think I'm responsible for the style issue: Error: /home/runner/work/nuttx/nuttx/nuttx/sched/misc/assert.c:243:18: error: Unmatched right parentheses Error: /home/runner/work/nuttx/nuttx/nuttx/sched/misc/assert.c:276:18: error: Unmatched right parentheses Error: Process completed with exit code 1.
It is the same as in here #9824
I thought the style check would not complain about issues that are already in upstream ?

No, the style checks the full source code. This design prefers the contributor fix the style issue in the old code base.

But I'm still confused, the only change I made in assert.c is

static uintptr_t g_last_regs[XCPTCONTEXT_REGS] aligned_data(16);

The violation comes from a previous very recent commit (#9824), which was merged with the violation. The code added there is fine so I think the violation is a false alarm, but I have NO IDEA how to fix that.

Hmm, now looking at the code the fix for the violation should be pretty simple, I'll do it on Monday.

arch/risc-v/include/irq.h

xiaoxiang781216 · 2023-07-28T17:50:00Z

@pussuw please fix the style warning.

@xiaoxiang781216 I don't think I'm responsible for the style issue: Error: /home/runner/work/nuttx/nuttx/nuttx/sched/misc/assert.c:243:18: error: Unmatched right parentheses Error: /home/runner/work/nuttx/nuttx/nuttx/sched/misc/assert.c:276:18: error: Unmatched right parentheses Error: Process completed with exit code 1.
It is the same as in here #9824
I thought the style check would not complain about issues that are already in upstream ?

No, the style checks the full source code. This design prefers the contributor fix the style issue in the old code base.

But I'm still confused, the only change I made in assert.c is

static uintptr_t g_last_regs[XCPTCONTEXT_REGS] aligned_data(16);

The violation comes from a previous very recent commit (#9824), which was merged with the violation. The code added there is fine so I think the violation is a false alarm, but I have NO IDEA how to fix that.

yes, it's a false alarm, that's why #9824 is merged with this warning.

Hmm, now looking at the code the fix for the violation should be pretty simple, I'll do it on Monday.

ok, you can try it.

- Save the FPU registers into the tcb so they don't get lost if the stack frame for xcp.regs moves (as it does) - Handle interger and FPU register save/load separately - Integer registers are saved/loaded always, like before - FPU registers are only saved during a context switch: - Save ONLY if FPU is dirty - Restore always if FPU has been used (not in FSTATE_OFF, FSTATE_INIT) - Remove all lazy-FPU related logic from the macros, it is not needed

Instead of clearing the fields individually, just wipe the whole register. This can be done because flags and rm are just parts of the fcsr. 31 8 5 0 +--------------+--------+-----------+ | | | | | RESERVED | FRM | FSTATUS | | | | | +--------------+--------+-----------+ FCSR

This way the registers can be read easily

Adds option to use the old implementation where FPU is stored into the process stack.

The FPU restore issue does not show itself any longer, so FPU support can be re-enabled.

anchao · 2024-04-19T02:01:16Z

@pussuw So in this PR, the FPU save and restore are migrated to riscv_swint()? I am curious whether the FPU will be restored correctly if a context switch is triggered in other exceptions?

pussuw · 2024-04-19T10:01:29Z

@anchao A context switch always ends up in riscv_swint(), so yes. Are you experiencing some odd / unexpected behavior with FPU ? I have a project that heavily relies on the FPU and I have not seen any issues with it.

Note, that I do not use SMP, there was a problem with SMP in the first iteration of my lazy FPU patches, but that was related to address environment handling.

anchao · 2024-04-22T01:07:22Z

@pussuw No, I just noticed this commit when reviewing the code, current implementation may have certain risks if enable FPU in kernel mode and compiler generates code that uses floating-point registers in the interrupt handler.

pussuw marked this pull request as draft June 20, 2023 09:37

pussuw marked this pull request as ready for review June 20, 2023 10:51

pussuw force-pushed the riscv_lazyfpu branch from e9baa90 to d0a2964 Compare June 20, 2023 10:57

pussuw force-pushed the riscv_lazyfpu branch from d0a2964 to 67e17d2 Compare June 20, 2023 13:01

pussuw marked this pull request as draft June 22, 2023 08:57

pussuw force-pushed the riscv_lazyfpu branch from 67e17d2 to eca0311 Compare July 26, 2023 13:14

pussuw marked this pull request as ready for review July 26, 2023 13:25

pussuw force-pushed the riscv_lazyfpu branch from eca0311 to 7fde63b Compare July 26, 2023 13:46

pussuw force-pushed the riscv_lazyfpu branch 3 times, most recently from d9cae11 to f2f4499 Compare July 27, 2023 09:51

pussuw force-pushed the riscv_lazyfpu branch 3 times, most recently from 146b6c2 to 271beb3 Compare July 27, 2023 10:26

xiaoxiang781216 reviewed Jul 28, 2023

View reviewed changes

arch/risc-v/Kconfig Outdated Show resolved Hide resolved

arch/risc-v/Kconfig Outdated Show resolved Hide resolved

pussuw commented Jul 28, 2023

View reviewed changes

arch/risc-v/include/irq.h Outdated Show resolved Hide resolved

pussuw added 7 commits July 31, 2023 12:23

riscv/saveusercontext: Fix FPU state save

e422ace

sched/assert: Store table for all registers instead of buffer

6ee64c7

This way the registers can be read easily

riscv/lazyfpu: Add option to disable lazy FPU

96d159b

Adds option to use the old implementation where FPU is stored into the process stack.

riscv/qemu-rv: Add FPU support back to qemu-rv

6329f7a

The FPU restore issue does not show itself any longer, so FPU support can be re-enabled.

sched/assert.c: Fix style check error

6e1ad11

pussuw force-pushed the riscv_lazyfpu branch from 271beb3 to 6e1ad11 Compare July 31, 2023 09:24

xiaoxiang781216 approved these changes Jul 31, 2023

View reviewed changes

xiaoxiang781216 merged commit c126ee8 into apache:master Jul 31, 2023
26 checks passed

pussuw deleted the riscv_lazyfpu branch July 31, 2023 16:32

jerpelea added this to To-Add in Release Notes - 12.3.0 Sep 26, 2023

jerpelea moved this from To-Add to done in Release Notes - 12.3.0 Oct 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

riscv/fpu: Implement correct lazy-FPU functionality (attempt #2) #9577

riscv/fpu: Implement correct lazy-FPU functionality (attempt #2) #9577

pussuw commented Jun 20, 2023

pussuw commented Jun 20, 2023

pussuw commented Jun 20, 2023 •

edited

pussuw commented Jun 20, 2023 •

edited

pussuw commented Jun 20, 2023

masayuki2009 commented Jun 20, 2023

masayuki2009 commented Jun 20, 2023

pussuw commented Jun 21, 2023

masayuki2009 commented Jun 22, 2023

pussuw commented Jun 22, 2023 •

edited

pussuw commented Jul 26, 2023

pussuw commented Jul 26, 2023

pussuw commented Jul 26, 2023 •

edited

pussuw commented Jul 27, 2023

masayuki2009 commented Jul 28, 2023

xiaoxiang781216 commented Jul 28, 2023

pussuw commented Jul 28, 2023

xiaoxiang781216 commented Jul 28, 2023

pussuw commented Jul 28, 2023 •

edited

xiaoxiang781216 commented Jul 28, 2023

anchao commented Apr 19, 2024

pussuw commented Apr 19, 2024

anchao commented Apr 22, 2024

riscv/fpu: Implement correct lazy-FPU functionality (attempt #2) #9577

riscv/fpu: Implement correct lazy-FPU functionality (attempt #2) #9577

Conversation

pussuw commented Jun 20, 2023

pussuw commented Jun 20, 2023

pussuw commented Jun 20, 2023 • edited

pussuw commented Jun 20, 2023 • edited

pussuw commented Jun 20, 2023

masayuki2009 commented Jun 20, 2023

masayuki2009 commented Jun 20, 2023

pussuw commented Jun 21, 2023

masayuki2009 commented Jun 22, 2023

pussuw commented Jun 22, 2023 • edited

pussuw commented Jul 26, 2023

Changes since last PR round

Why does the ostest crash happen ?

pussuw commented Jul 26, 2023

pussuw commented Jul 26, 2023 • edited

pussuw commented Jul 27, 2023

masayuki2009 commented Jul 28, 2023

xiaoxiang781216 commented Jul 28, 2023

pussuw commented Jul 28, 2023

xiaoxiang781216 commented Jul 28, 2023

pussuw commented Jul 28, 2023 • edited

xiaoxiang781216 commented Jul 28, 2023

anchao commented Apr 19, 2024

pussuw commented Apr 19, 2024

anchao commented Apr 22, 2024

pussuw commented Jun 20, 2023 •

edited

pussuw commented Jun 20, 2023 •

edited

pussuw commented Jun 22, 2023 •

edited

pussuw commented Jul 26, 2023 •

edited

pussuw commented Jul 28, 2023 •

edited