Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

riscv/fpu: Implement correct lazy-FPU functionality (attempt #2) #9577

Merged
merged 8 commits into from Jul 31, 2023

Conversation

pussuw
Copy link
Contributor

@pussuw pussuw commented Jun 20, 2023

Second attempt at this:
#9486

There are two issues with smp mode:

  • Context restore routine in up_cpu_paused did not restore the correct address environment
  • FPU test in ostest crashes

Both very likely for the same reason

@pussuw pussuw marked this pull request as draft June 20, 2023 09:37
@pussuw
Copy link
Contributor Author

pussuw commented Jun 20, 2023

@masayuki2009 The last commit fixes the ksmp64 crash but I cannot verify the FPU test because I do not own an FPU capable board.

I noticed that rv-virt has FPU disabled, is this something I can fix ? Or is FPU unsupported by qemu ?

@pussuw
Copy link
Contributor Author

pussuw commented Jun 20, 2023

The problem with smp is up_cpu_paused:

  /* Save the current context at CURRENT_REGS into the TCB at the head
   * of the assigned task list for this CPU.
   */

  riscv_savestate(tcb->xcp.regs);

And later

  /* Then switch contexts.  Any necessary address environment changes
   * will be made when the interrupt returns.
   */

  riscv_restorestate(tcb->xcp.regs);

Only the integer context changes but no MMU or FPU restore is done. Maybe someone who knows how SMP works can comment on what up_cpu_paused is and what it is supposed to do? Is it supposed to be capable of handling context switches or ?

@pussuw
Copy link
Contributor Author

pussuw commented Jun 20, 2023

@masayuki2009 I enabled FPU for qemu-rv in arch/risc-v/Kconfig and ostest passes on smp64 now. So it seems the FPU issue with qemu-rv is also fixed by this PR.

ville@ville-ThinkPad-T14s-Gen-1:~/wpc_toolchain/workspace/nuttx-bare/nuttx$ ./tools/configure.sh rv-virt:smp64 && make -j16
Create version.h
LN: platform/board to /home/ville/wpc_toolchain/workspace/nuttx-bare/apps/platform/dummy
Register: ostest
Register: sh
Register: nsh
Register: smp
CPP:  /home/ville/wpc_toolchain/workspace/nuttx-bare/nuttx/boards/risc-v/qemu-rv/rv-virt/scripts/ld.script-> /home/ville/wpc_toolchain/workspace/nuttx-bare/nuttx/boards/risc-v/qemu-rv/rLD: nuttx
CP: nuttx.hex

ville@ville-ThinkPad-T14s-Gen-1:~/wpc_toolchain/workspace/nuttx-bare/nuttx$ qemu-system-riscv64 -semihosting -M virt,aclint=on -cpu rv64 -smp 8 -bios none -kernel nuttx -nographic

NuttShell (NSH) NuttX-10.4.0
nsh> ostest
stdio_test: write fd=1
stdio_test: Standard I/O Check: printf
stdio_test: write fd=2
stdio_test: Standard I/O Check: fprintf to stderr

...

user_main: FPU test
Starting task FPU#1
fpu_test: Started task FPU#1 at PID=11
Starting task FPU#2
fpu_test: Started task FPU#2 at PID=12
FPU#1: pass 1
FPU#2: pass 1
FPU#1: pass 2
FPU#2: pass 2
FPU#1: pass 3
FPU#2: pass 3
FPU#1: pass 4
FPU#2: pass 4
FPU#2: pass 5
FPU#1: pass 5
FPU#2: pass 6
FPU#1: pass 6
FPU#2: pass 7
FPU#1: pass 7
FPU#1: pass 8
FPU#2: pass 8
FPU#1: pass 9
FPU#2: pass 9
FPU#1: pass 10
FPU#2: pass 10
FPU#2: pass 11
FPU#1: pass 11
FPU#2: pass 12
FPU#1: pass 12
FPU#2: pass 13
FPU#1: pass 13
FPU#2: pass 14
FPU#1: pass 14
FPU#2: pass 15
FPU#1: pass 15
FPU#2: pass 16
FPU#1: pass 16
FPU#1: Succeeded
FPU#2: Succeeded
fpu_test: Returning

...

user_main: Exiting
ostest_main: Exiting with status 0
nsh> 

@pussuw pussuw marked this pull request as ready for review June 20, 2023 10:51
@pussuw
Copy link
Contributor Author

pussuw commented Jun 20, 2023

I think the smp issue is now fixed, but I don't own an smp capable board to test it on actual hardware. On rv-virt FPU works both in SMP and non-SMP.

@masayuki2009
Copy link
Contributor

I think the smp issue is now fixed, but I don't own an smp capable board to test it on actual hardware. On rv-virt FPU works both in SMP and non-SMP.

@pussuw
Thanks for the fix. Let me check this PR tomorrow.

@masayuki2009
Copy link
Contributor

@pussuw
Hmm, ostest with maix-bit:smp (board) still crashes.

End of test memory usage:
VARIABLE  BEFORE   AFTER
======== ======== ========
arena      1fb990   1fb990
ordblks         4        4
mxordblk   1f4020   1f4020
uordblks     7100     7100
fordblks   1f4890   1f4890

user_main: /dev/null test
dev_null: Read 0 bytes from /dev/null
dev_null: Wrote 1024 bytes to /dev/null

End of test [CPU1] riscv_exception: EXCEPTION: Illegal instruction. MCAUSE: 0000000000000002, EPC: 0000000080016bf8, MTVAL: 00000037ec3f44d4
[CPU1] riscv_exception: PANIC!!! Exception = 0000000000000002
[CPU1] _assert: Current Version: NuttX  10.4.0 67e17d2acf Jun 21 2023 08:04:13 risc-v
[CPU1] _assert: Assertion failed panic: at file: :0 task(CPU1): FPU#1 0x80016b9c
[CPU1] up_dump_register: EPC: 0000000080016bf8
[CPU1] up_dump_register: A0: 0000000000000000 A1: 0000000000000001 A2: 0000000000000000 A3: 0000000080403d58
[CPU1] up_dump_register: A4: 0000000000000002 A5: 0000000000000440 A6: 0000000000000000 A7: 0000000000000000
[CPU1] up_dump_register: T0: 0000000000000000 T1: 0000000000000000 T2: 0000000000000000 T3: 0000000000000000
[CPU1] up_dump_register: T4: 0000000000000000 T5: 0000000000000000 T6: 0000000000000000
[CPU1] up_dump_register: S0: 0000000000000000 S1: 0000000000000001 S2: 0000000000000000 S3: 0000000000000000
[CPU1] up_dump_register: S4: 0000000000000000 S5: 0000000080403248 S6: 0000000000000000 S7: 0000000000000000
[CPU1] up_dump_register: S8: 0000000000000000 S9: 0000000000000000 S10: 0000000000000000 S11: 0000000000000000
[CPU1] up_dump_register: SP: 000000008040c720 FP: 0000000000000000 TP: 0000000000000000 RA: 0000000080016bea
[CPU1] dump_stack: User Stack:
[CPU1] dump_stack:   base: 0x8040c030
[CPU1] dump_stack:   size: 00001968
[CPU1] dump_stack:     sp: 0x8040c720
[CPU1] stack_dump: 0x8040c720: deadbeef deadbeef deadbeef deadbeef deadbeef deadbeef deadbeef deadbeef
[CPU1] stack_dump: 0x8040c740: deadbeef deadbeef deadbeef deadbeef deadbeef deadbeef deadbeef deadbeef
[CPU1] stack_dump: 0x8040c760: deadbeef deadbeef deadbeef deadbeef 00000000 00000000 00000000 00000000
[CPU1] stack_dump: 0x8040c780: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[CPU1] stack_dump: 0x8040c7a0: 80016b9c 00000000 80006c9c 00000000 8040c010 00000000 00000001 00000000
[CPU1] stack_dump: 0x8040c7c0: 00000000 00000000 8000365a 00000000 00000000 00000000 00000000 00000000
[CPU1] show_tasks:    PID GROUP   CPU PRI POLICY   TYPE    NPX STATE   EVENT      SIGMASK          STACKBASE  STACKSIZE      USED   FILLED    COMMAND
[CPU1] show_tasks:   ----   ---  ---- --- -------- ------- --- ------- ---------- -------- 0x80400240      2048      1384    67.5%    irq
[CPU1] dump_task:       0     0     0   0 FIFO     Kthread N-- Running            0000000000000000 0x80403ea0      2000      1136    56.8%    CPU0 IDLE
[CPU1] dump_task:       1     1     1   0 FIFO     Kthread N-- Assigned           0000000000000000 0x80401270      2000      1104    55.2%    CPU1 IDLE
[CPU1] dump_task:       2     2     0 100 RR       Task    --- Waiting Semaphore  0000000000000000 0x80406670      2992      2176    72.7%    nsh_main
[CPU1] dump_task:      39    39     0 100 RR       Task    --- Waiting Semaphore  0000000000000000 0x8040aa20      1968      1456    73.9%    ostest
[CPU1] dump_task:      40    40     0 100 RR       Task    --- Waiting Semaphore  0000000000000000 0x80407990      8064      1520    18.8%    ostest Arg1 Arg2 Arg3 Arg4
[CPU1] dump_task:      41    41     1 100 RR       Task    --- Running            0000000000000000 0x8040c030      1968       720    36.5%    FPU#1

@pussuw
Copy link
Contributor Author

pussuw commented Jun 21, 2023

@masayuki2009 Thanks for testing. I think I need to make icicle smp capable so I can test it myself. Is it possible to provide me with the .elf file so I can see where EPC points to (the illegal instruction) ?

@masayuki2009
Copy link
Contributor

@pussuw
Copy link
Contributor Author

pussuw commented Jun 22, 2023

@pussuw nuttx-maix-bit-crash-pr9577.gz

Thank you!


0000000080016b9c <fpu_task>:
    80016b9c:   7175                    addi    sp,sp,-144
    80016b9e:   e506                    sd      ra,136(sp)
    80016ba0:   e122                    sd      s0,128(sp)
    80016ba2:   fca6                    sd      s1,120(sp)
    80016ba4:   f8ca                    sd      s2,112(sp)
    80016ba6:   f4ce                    sd      s3,104(sp)
    80016ba8:   f0d2                    sd      s4,96(sp)
    80016baa:   ecd6                    sd      s5,88(sp)
    80016bac:   a4a2                    fsd     fs0,72(sp)
    80016bae:   a0a6                    fsd     fs1,64(sp)
    80016bb0:   bc4a                    fsd     fs2,56(sp)
    80016bb2:   b84e                    fsd     fs3,48(sp)
    80016bb4:   b452                    fsd     fs4,40(sp)
    80016bb6:   b056                    fsd     fs5,32(sp)
    80016bb8:   ac5a                    fsd     fs6,24(sp)
    80016bba:   a85e                    fsd     fs7,16(sp)
    80016bbc:   a462                    fsd     fs8,8(sp)
    80016bbe:   a066                    fsd     fs9,0(sp)
    80016bc0:   fe2eb0ef                jal     ra,800023a2 <sched_lock>
    80016bc4:   003ed797                auipc   a5,0x3ed
    80016bc8:   26178793                addi    a5,a5,609 # 80403e25 <g_fpuno>
    80016bcc:   0007c403                lbu     s0,0(a5)
    80016bd0:   003eca97                auipc   s5,0x3ec
    80016bd4:   678a8a93                addi    s5,s5,1656 # 80403248 <g_fputhread>
    80016bd8:   4981                    li      s3,0
    80016bda:   0014049b                addiw   s1,s0,1
    80016bde:   0ff4f493                zext.b  s1,s1
    80016be2:   00978023                sb      s1,0(a5)
    80016be6:   84feb0ef                jal     ra,80002434 <sched_unlock>
    80016bea:   44000793                li      a5,1088
    80016bee:   00040a1b                sext.w  s4,s0
    80016bf2:   02f40433                mul     s0,s0,a5
    80016bf6:   2481                    sext.w  s1,s1
    **80016bf8:   d004f453                fcvt.s.w        fs0,s1**
    80016bfc:   d20484d3                fcvt.d.w        fs1,s1

At least the software is not running wild, looks like something might be wrong with the CPU status (mstatus or frcsr / similar).

@pussuw pussuw marked this pull request as draft June 22, 2023 08:57
@pussuw
Copy link
Contributor Author

pussuw commented Jul 26, 2023

I was able to reproduce the ostest crash on qemu-rv and it should now be fixed. Also, rv-virt:ksmp64 is not crashing any longer either.

Changes since last PR round

Moving the MMU change and FPU restore to riscv_internal.h / riscv_restorecontext() works for single core but not for multicore, so they are both moved back into riscv_doirq.c. (and riscv_perform_syscall.c).

SMP seems to expect this order of execution, I don't know why though.

Why does the ostest crash happen ?

The crash is due to mishandling of the floating point control/status register (fcsr). The illegal istruction happes due to incorrect / stale rouding mode, from the risc-v spec:

Floating-point operations use either a static rounding mode encoded in the instruction, or a dynamic
rounding mode held in frm. Rounding modes are encoded as shown in Table 8.1. A value of 111 in
the instruction’s rm field selects the dynamic rounding mode held in frm. If frm is set to an invalid
value (101–111), any subsequent attempt to execute a floating-point operation with a dynamic
rounding mode will cause an illegal instruction trap. Some instructions that have the rm field are
nevertheless unaffected by the rounding mode; they should have their rm field set to RNE (000).

So the CPU (FPU status register) status was not updated correctly in the SMP case.

@pussuw pussuw marked this pull request as ready for review July 26, 2023 13:25
@pussuw
Copy link
Contributor Author

pussuw commented Jul 26, 2023

Getting the style error from #9824 here now. Should nxstyle ignore old errors or no ?

@pussuw
Copy link
Contributor Author

pussuw commented Jul 26, 2023

@masayuki2009 sorry to keep bothering you, but I still do not have an SMP capable board (I'm writing support for icicle), can you try this on maix-bit:smp once again ? I was able to reproduce the crash on qemu and also fix it, so I have some confidence it should be fixed for actual HW as well.

Nevermind, needs more testing

@pussuw pussuw force-pushed the riscv_lazyfpu branch 3 times, most recently from d9cae11 to f2f4499 Compare July 27, 2023 09:51
@pussuw
Copy link
Contributor Author

pussuw commented Jul 27, 2023

Now qemu-rv smp64 is rock solid. The trick was up_cpu_paused, I did not realize the full CPU context (integer + FPU regs) need to be saved there.

Could you @masayuki2009 verify with actual hardware that it works ? I think we should not merge before this is verified with HW.

Why? The tcb can contain info that is needed by the context switch
routine. One example is lazy-FPU handling; the integer registers can
be stored into the stack, because they are always stored & restored.

Lazy-FPU however needs a non-volatile location to store the FPU registers
as the save feature will skip saving a clean FPU, but the restore must
always restore the FPU registers if the thread uses FPU.
@pussuw pussuw force-pushed the riscv_lazyfpu branch 3 times, most recently from 146b6c2 to 271beb3 Compare July 27, 2023 10:26
@masayuki2009
Copy link
Contributor

Could you @masayuki2009 verify with actual hardware that it works ? I think we should not merge before this is verified with HW.

@pussuw
I confirmed that the latest PR works with the Sipeed Maix Bit (K210).

@xiaoxiang781216
Copy link
Contributor

@pussuw please fix the style warning.

@pussuw
Copy link
Contributor Author

pussuw commented Jul 28, 2023

@pussuw please fix the style warning.

@xiaoxiang781216 I don't think I'm responsible for the style issue:
Error: /home/runner/work/nuttx/nuttx/nuttx/sched/misc/assert.c:243:18: error: Unmatched right parentheses
Error: /home/runner/work/nuttx/nuttx/nuttx/sched/misc/assert.c:276:18: error: Unmatched right parentheses
Error: Process completed with exit code 1.

It is the same as in here #9824

I thought the style check would not complain about issues that are already in upstream ?

arch/risc-v/Kconfig Outdated Show resolved Hide resolved
arch/risc-v/Kconfig Outdated Show resolved Hide resolved
@xiaoxiang781216
Copy link
Contributor

@pussuw please fix the style warning.

@xiaoxiang781216 I don't think I'm responsible for the style issue: Error: /home/runner/work/nuttx/nuttx/nuttx/sched/misc/assert.c:243:18: error: Unmatched right parentheses Error: /home/runner/work/nuttx/nuttx/nuttx/sched/misc/assert.c:276:18: error: Unmatched right parentheses Error: Process completed with exit code 1.

It is the same as in here #9824

I thought the style check would not complain about issues that are already in upstream ?

No, the style checks the full source code. This design prefers the contributor fix the style issue in the old code base.

@pussuw
Copy link
Contributor Author

pussuw commented Jul 28, 2023

@pussuw please fix the style warning.

@xiaoxiang781216 I don't think I'm responsible for the style issue: Error: /home/runner/work/nuttx/nuttx/nuttx/sched/misc/assert.c:243:18: error: Unmatched right parentheses Error: /home/runner/work/nuttx/nuttx/nuttx/sched/misc/assert.c:276:18: error: Unmatched right parentheses Error: Process completed with exit code 1.
It is the same as in here #9824
I thought the style check would not complain about issues that are already in upstream ?

No, the style checks the full source code. This design prefers the contributor fix the style issue in the old code base.

But I'm still confused, the only change I made in assert.c is

static uintptr_t g_last_regs[XCPTCONTEXT_REGS] aligned_data(16);

The violation comes from a previous very recent commit (#9824), which was merged with the violation. The code added there is fine so I think the violation is a false alarm, but I have NO IDEA how to fix that.

Hmm, now looking at the code the fix for the violation should be pretty simple, I'll do it on Monday.

arch/risc-v/include/irq.h Outdated Show resolved Hide resolved
@xiaoxiang781216
Copy link
Contributor

@pussuw please fix the style warning.

@xiaoxiang781216 I don't think I'm responsible for the style issue: Error: /home/runner/work/nuttx/nuttx/nuttx/sched/misc/assert.c:243:18: error: Unmatched right parentheses Error: /home/runner/work/nuttx/nuttx/nuttx/sched/misc/assert.c:276:18: error: Unmatched right parentheses Error: Process completed with exit code 1.
It is the same as in here #9824
I thought the style check would not complain about issues that are already in upstream ?

No, the style checks the full source code. This design prefers the contributor fix the style issue in the old code base.

But I'm still confused, the only change I made in assert.c is

static uintptr_t g_last_regs[XCPTCONTEXT_REGS] aligned_data(16);

The violation comes from a previous very recent commit (#9824), which was merged with the violation. The code added there is fine so I think the violation is a false alarm, but I have NO IDEA how to fix that.

yes, it's a false alarm, that's why #9824 is merged with this warning.

Hmm, now looking at the code the fix for the violation should be pretty simple, I'll do it on Monday.

ok, you can try it.

- Save the FPU registers into the tcb so they don't get lost if the stack
  frame for xcp.regs moves (as it does)
- Handle interger and FPU register save/load separately
- Integer registers are saved/loaded always, like before
- FPU registers are only saved during a context switch:
  - Save ONLY if FPU is dirty
  - Restore always if FPU has been used (not in FSTATE_OFF, FSTATE_INIT)
- Remove all lazy-FPU related logic from the macros, it is not needed
Instead of clearing the fields individually, just wipe the whole register.
This can be done because flags and rm are just parts of the fcsr.

31             8        5           0
+--------------+--------+-----------+
|              |        |           |
|   RESERVED   |  FRM   |  FSTATUS  |
|              |        |           |
+--------------+--------+-----------+
                FCSR
This way the registers can be read easily
Adds option to use the old implementation where FPU is stored into
the process stack.
The FPU restore issue does not show itself any longer, so FPU support
can be re-enabled.
@xiaoxiang781216 xiaoxiang781216 merged commit c126ee8 into apache:master Jul 31, 2023
26 checks passed
@pussuw pussuw deleted the riscv_lazyfpu branch July 31, 2023 16:32
@jerpelea jerpelea added this to To-Add in Release Notes - 12.3.0 Sep 26, 2023
@jerpelea jerpelea moved this from To-Add to done in Release Notes - 12.3.0 Oct 5, 2023
@anchao
Copy link
Contributor

anchao commented Apr 19, 2024

@pussuw So in this PR, the FPU save and restore are migrated to riscv_swint()? I am curious whether the FPU will be restored correctly if a context switch is triggered in other exceptions?

@pussuw
Copy link
Contributor Author

pussuw commented Apr 19, 2024

@anchao A context switch always ends up in riscv_swint(), so yes. Are you experiencing some odd / unexpected behavior with FPU ? I have a project that heavily relies on the FPU and I have not seen any issues with it.

Note, that I do not use SMP, there was a problem with SMP in the first iteration of my lazy FPU patches, but that was related to address environment handling.

@anchao
Copy link
Contributor

anchao commented Apr 22, 2024

@pussuw No, I just noticed this commit when reviewing the code, current implementation may have certain risks if enable FPU in kernel mode and compiler generates code that uses floating-point registers in the interrupt handler.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

4 participants