Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RISC-V: Implement option to run NuttX in supervisor mode (S-mode) #5782

Merged
merged 2 commits into from
Apr 1, 2022

Conversation

pussuw
Copy link
Contributor

@pussuw pussuw commented Mar 18, 2022

  • Add config "ARCH_USE_S_MODE" which controls whether the kernel
    runs in M-mode or S-mode
  • Add more MSTATUS and most of the SSTATUS register definitions
  • Add more MIP flags for interrupt delegation
  • Add handling of interrupts from S-mode
  • Add handling of FPU from S-mode
  • Add new context handling functions that are not dependent on the trap
    handlers / ecall

NOTE: S-mode requires a companion SW (SBI) which is not yet implemented,
thus S-mode is not usable as is, yet.

Summary

Add option to run NuttX with RISC-V in supervisor mode.

Impact

Enabler for CONFIG_BUILD_KERNEL=y, MMU mappings require S-mode.

Testing

Tested with icicle:knsh (not yet published)

@pussuw pussuw force-pushed the riscv_smode_for_nuttx branch 2 times, most recently from 4d41bbf to 1cf8761 Compare March 18, 2022 13:28
@xiaoxiang781216
Copy link
Contributor

Only the machine timer interrupt is handled in M-mode. Machine mode timer is used as follows:

  • The timer reload happens in M-mode ISR
  • The event is dispatched to S-mode ISR, which will notify the kernel
    to advance time

Why not directly handle timer interrupt in S-mode?

  • For some reason clearing the interrupt does not work from S-mode, so
    created a call gate to clear the interrupt in M-mode

Could we find the root cause?

@pussuw
Copy link
Contributor Author

pussuw commented Mar 18, 2022

Why not directly handle timer interrupt in S-mode?

The memory mapped registers are available in M-mode only.

  • For some reason clearing the interrupt does not work from S-mode, so
    created a call gate to clear the interrupt in M-mode

Could we find the root cause?

I tried to simply clear the interrupt from S-mode but nothing happens. Writing e.g. CLEAR_CSR(sip, SIP_STIP); leaves the interrupt asserted unfortunately. Clearing from M-mode works. I can only assume this is by design, even though it is a bit silly IMO.

arch/risc-v/include/csr.h Show resolved Hide resolved
arch/risc-v/include/csr.h Show resolved Hide resolved
arch/risc-v/include/csr.h Show resolved Hide resolved
arch/risc-v/include/mode.h Outdated Show resolved Hide resolved
arch/risc-v/include/mode.h Outdated Show resolved Hide resolved
arch/risc-v/src/common/riscv_signal_dispatch.c Outdated Show resolved Hide resolved
arch/risc-v/src/common/riscv_releasepending.c Outdated Show resolved Hide resolved
arch/risc-v/src/common/riscv_releasepending.c Outdated Show resolved Hide resolved
arch/risc-v/src/common/riscv_task_start.c Outdated Show resolved Hide resolved
arch/risc-v/src/common/riscv_task_start.c Outdated Show resolved Hide resolved
arch/risc-v/Kconfig Outdated Show resolved Hide resolved
arch/risc-v/Kconfig Outdated Show resolved Hide resolved
@jlaitine
Copy link
Contributor

I think @pussuw has addressed all the comments so far. This looks good to me, can you @xiaoxiang781216 re-check?

@xiaoxiang781216
Copy link
Contributor

I think @pussuw has addressed all the comments so far. This looks good to me, can you @xiaoxiang781216 re-check?

Yes, I will take a look again. But since the change is huge, I need find the enough free time to finish the review. So, I expect the review will finish in the end of week.

The major concern is how to implement the syscall.
From "3.3.1 Environment Call and Breakpoint“ at https://github.com/riscv/riscv-isa-manual/releases/tag/draft-20220318-c9a172f:

The ECALL instruction is used to make a request to the supporting execution environment.
When executed in U-mode, S-mode, or M-mode, it generates an environment-call-from-U-mode
exception, environment-call-from-S-mode exception, or environment-call-from-M-mode exception,
respectively, and performs no other operation.
ECALL generates a different exception for each originating privilege mode so that environment
call exceptions can be selectively delegated. A typical use case for Unix-like operating systems is
to delegate to S-mode the environment-call-from-U-mode exception but not the others.

S-mode can hook ECALL to self, @pussuw could you explain more why we need handle differently in S-mode and M-mode?

@pussuw
Copy link
Contributor Author

pussuw commented Mar 23, 2022

I think @pussuw has addressed all the comments so far. This looks good to me, can you @xiaoxiang781216 re-check?

Yes, I will take a look again. But since the change is huge, I need find the enough free time to finish the review. So, I expect the review will finish in the end of week.

Yes, apologies for the size of the patch, I perfectly understand that reviewing large patches is not preferred (I prefer smaller patches also). In this case I could not figure out a good way to split it into smaller components, the changes are pretty strongly dependent on each other.

The major concern is how to implement the syscall. From "3.3.1 Environment Call and Breakpoint“ at https://github.com/riscv/riscv-isa-manual/releases/tag/draft-20220318-c9a172f:

The ECALL instruction is used to make a request to the supporting execution environment.
When executed in U-mode, S-mode, or M-mode, it generates an environment-call-from-U-mode
exception, environment-call-from-S-mode exception, or environment-call-from-M-mode exception,
respectively, and performs no other operation.
ECALL generates a different exception for each originating privilege mode so that environment
call exceptions can be selectively delegated. A typical use case for Unix-like operating systems is
to delegate to S-mode the environment-call-from-U-mode exception but not the others.

S-mode can hook ECALL to self, @pussuw could you explain more why we need handle differently in S-mode and M-mode?

S-mode can trap ecall from user mode (U-mode) but not from S-mode. So ecall from S-mode to S-mode is not possible. Also, we want to use ecall in S-mode to raise privileges to M-mode, to access some machine mode services (mhartid, mtimer to name a few, see riscv_mcall.c).

The reason the current syscall implementation works with M-mode, is that M-mode is the highest privilege level, so ecall from M-mode will enter M-mode again. This mechanism unfortunately cannot be used when the kernel is in S-mode -> we need what is implemented in this patch.

@xiaoxiang781216
Copy link
Contributor

xiaoxiang781216 commented Mar 23, 2022

S-mode can hook ECALL to self, @pussuw could you explain more why we need handle differently in S-mode and M-mode?

S-mode can trap ecall from user mode (U-mode) but not from S-mode. So ecall from S-mode to S-mode is not possible.

Ok, I get the key point. So is syscall from S-mode to S-mode only used for kernel thread context switch?

Also, we want to use ecall in S-mode to raise privileges to M-mode, to access some machine mode services (mhartid,

Yes, but it's still the same ecall instruction. Here is my understanding:

  1. Any syscall(include context switch) issued by U-mode could be implemented by ECALL and raise to S-mode
  2. Kernel thread context switch could be implemented by ECALL too, but it will be handled in M-mode instead S-mode
  3. Some special function(e.g. mhartid) which need the highest privilege has to issue ECALL from S-mode to M-mode

So, I think we can still use the same sys_call0-syscall6 as before, and:

  1. Handle U-mode syscall in S-mode as before
  2. Handle S-mode context switch in M-mode( the code should be same as S-mode)
  3. Extend the reserved syscall id to cover the special need(e.g. mhardid)

mtimer to name a few, see riscv_mcall.c).

Why we can't use S-mode timer instead?

@pussuw
Copy link
Contributor Author

pussuw commented Mar 23, 2022

S-mode can hook ECALL to self, @pussuw could you explain more why we need handle differently in S-mode and M-mode?

S-mode can trap ecall from user mode (U-mode) but not from S-mode. So ecall from S-mode to S-mode is not possible.

Ok, I get the key point. So is syscall from S-mode to S-mode only used for kernel thread context switch?

A context switch can occur in two ways:

  1. A process running thread in privileged mode (either kernel process or user process who's privileges are temporary raised e.g. when executing a system call from usersace), in this case a context switch occurs via riscv_switchcontext. In flat mode (M-mode) this is handled via:
#define riscv_switchcontext(saveregs, restoreregs) \
  sys_call2(SYS_switch_context, (uintptr_t)saveregs, (uintptr_t)restoreregs)

Which uses ecall to enter the trap handler which does all the work. This works, because ecall from M-mode to M-mode enters M-mode.

This cannot be used to switch context in S-mode, because it would raise privileges to M-mode (ecall from S-mode) which is not what we want. Instead, the assembly routine below is used to do the same work as was done by the trap handler previously!

riscv_switchcontext:

  /* Save old context to arg[0] */

  save_ctx   a0                        /* save context */

  REGSTORE   x1, REG_EPC(a0)           /* save ra to epc */
  REGSTORE   sp, REG_SP(a0)            /* original SP */

  /* Set previous privilege, we are in privileged mode now */

  csrr       s0, CSR_STATUS            /* read status register */
  li         s1, STATUS_PPP            /* set previous privilege */
  or         s0, s0, s1
  li         s1, ~STATUS_PIE           /* clear previous interrupt enable */
  and        s0, s0, s1
  REGSTORE   s0, REG_INT_CTX(a0)       /* store status to context */

#ifdef CONFIG_ARCH_FPU
  jal        x1, riscv_savefpu         /* FP registers */
#endif

  /* Load new context from arg[1] */

  mv         a0, a1                    /* load from a1 */
  j          riscv_fullcontextrestore  /* restore context */
  1. The second way a context switch can occur is in the trap handler itself, for example if a task is waiting for uart semaphore and the semaphore is posted by the uart interrupt handler, this can dispatch a wake up signal to the task, and the task that was running prior to taking the trap gets preempted and the task that was waiting for the semaphore gets activated instead.

Also, we want to use ecall in S-mode to raise privileges to M-mode, to access some machine mode services (mhartid,

Yes, but it's still the same ecall instruction. Here is my understanding:

1. Any syscall(include context switch) issued by U-mode could be implemented by ECALL and raise to S-mode

A user task will never call syscall(SYS_context_switch), or any of the reserved SYS_ call numbers, from user mode. If a user task calls for a context switch, it means its privileges are raised to kernel temporarily. The reserved syscalls are reserved for the kernel only and thus will only be executed by privileged mode code, i.e. only the kernel does this.

A user task can however call e.g. sys_call(SYS__exit, ...), which is executed via a proxy function like this:

void _exit(int parm1)
{
  (void)sys_call1((unsigned int)SYS__exit, (uintptr_t)parm1);
  while(1);
}

This sys_call will come from U-mode and it is intended to raise the privilege level to privileged mode.

Vica versa, a task running in kernel mode (be it a user task or a kernel task, what I mean the current privilege level = kernel) will never execute any of the old sys_call0-6 functions. They are not needed! The only reason the prototypes / definitions exist, is because the proxy functions mentioned above need their definitions, due to how the build system creates PROXY_ and STUB_ function into the /syscall/proxies and /syscall/stubs folders in non-flat builds. We could just as well move the sys_call0-6 prototypes and definitions to the libc folder, under machine/risc-v IMO.

In flat builds this will obviously fail horribly, and this is one of the reasons why kernel in S-mode is not really feasible for the flat build (among other things, like PMP needs to be configured to give access to memory and so forth).

2. Kernel thread context switch could be implemented by ECALL too, but it will be handled in M-mode instead S-mode

I don't know how to do this, M-mode does not support address translations, so when using address environments (the kernel memory is MMU mapped) it will become quite difficult.

3. Some special function(e.g. mhartid) which need the highest privilege has to issue ECALL from S-mode to M-mode

Yes, exactly

So, I think we can still use the same sys_call0-syscall6 as before, and:

1. Handle U-mode syscall in S-mode as before

Yes, this is why we need sys_call0-6, for the userspace

2. Handle S-mode context switch in M-mode( the code should be same as S-mode)

I don't know how to do this

3. Extend the reserved syscall id to cover the special need(e.g. mhardid)

Can be done, but I'd like to keep this separate. What riscv_mcall.c basically implements is a tiny SBI (comparable to openSBI for example but much smaller).

mtimer to name a few, see riscv_mcall.c).

Why we can't use S-mode timer instead?

There is no S-mode timer as far as I know. There is a scounter but this is for performance monitoring / profiling as far as I know. What we need is a ticker with a compare match event, i.e. mtime and mtimecmp. Such memory mapped registers are not available for S-mode.

@pussuw
Copy link
Contributor Author

pussuw commented Mar 23, 2022

Maybe to add some implicit information that is missing. The S-mode implementation is a requirement for CONFIG_BUILD_KERNEL=y, which requires MMU mappings for the user- and kernelspaces. MMU is not active in M-mode, this is why the kernel must run in S-mode instead.

Using S-mode in CONFIG_BUILD_FLAT is not really feasible, this is why the dependency depends on BUILD_KERNEL && ARCH_USE_MMU was added to the kconfig parameter. In CONFIG_BUILD_PROTECTED it could be done but is not required as the user- and kernelspaces still must exist in a flat address environment, so M-mode can be used. Only CONFIG_BUILD_KERNEL requires S-mode.

I tend to forget that most use nuttx in flat mode only, while all of the changes I have made have been to the protected or kernel build targets.

@xiaoxiang781216
Copy link
Contributor

S-mode can hook ECALL to self, @pussuw could you explain more why we need handle differently in S-mode and M-mode?

S-mode can trap ecall from user mode (U-mode) but not from S-mode. So ecall from S-mode to S-mode is not possible.

Ok, I get the key point. So is syscall from S-mode to S-mode only used for kernel thread context switch?

A context switch can occur in two ways:

  1. A process running thread in privileged mode (either kernel process or user process who's privileges are temporary raised e.g. when executing a system call from usersace), in this case a context switch occurs via riscv_switchcontext. In flat mode (M-mode) this is handled via:
#define riscv_switchcontext(saveregs, restoreregs) \
  sys_call2(SYS_switch_context, (uintptr_t)saveregs, (uintptr_t)restoreregs)

Which uses ecall to enter the trap handler which does all the work. This works, because ecall from M-mode to M-mode enters M-mode.

This cannot be used to switch context in S-mode, because it would raise privileges to M-mode (ecall from S-mode) which is not what we want.

How about we trigger software interrupt(SSIP) instead ECALL?

Instead, the assembly routine below is used to do the same work as was done by the trap handler previously!

riscv_switchcontext:

  /* Save old context to arg[0] */

  save_ctx   a0                        /* save context */

  REGSTORE   x1, REG_EPC(a0)           /* save ra to epc */
  REGSTORE   sp, REG_SP(a0)            /* original SP */

  /* Set previous privilege, we are in privileged mode now */

  csrr       s0, CSR_STATUS            /* read status register */
  li         s1, STATUS_PPP            /* set previous privilege */
  or         s0, s0, s1
  li         s1, ~STATUS_PIE           /* clear previous interrupt enable */
  and        s0, s0, s1
  REGSTORE   s0, REG_INT_CTX(a0)       /* store status to context */

#ifdef CONFIG_ARCH_FPU
  jal        x1, riscv_savefpu         /* FP registers */
#endif

  /* Load new context from arg[1] */

  mv         a0, a1                    /* load from a1 */
  j          riscv_fullcontextrestore  /* restore context */

Can we fake CPU by modifying CSR_STATUS to match the ECALL condition and ghen jump to exception_common directly?

  1. The second way a context switch can occur is in the trap handler itself, for example if a task is waiting for uart semaphore and the semaphore is posted by the uart interrupt handler, this can dispatch a wake up signal to the task, and the task that was running prior to taking the trap gets preempted and the task that was waiting for the semaphore gets activated instead.

It's different from the active context switch, the passive context switch(e.g. the block thread wake up by an interrupt handler) since the interrupt handle already run in S-mode:

  1. The exception entry pointer already save the current thread context for us
  2. So we can restore the unblock thread context directly without trap to M-mode

Also, we want to use ecall in S-mode to raise privileges to M-mode, to access some machine mode services (mhartid,

Yes, but it's still the same ecall instruction. Here is my understanding:

1. Any syscall(include context switch) issued by U-mode could be implemented by ECALL and raise to S-mode

A user task will never call syscall(SYS_context_switch), or any of the reserved SYS_ call numbers, from user mode. If a user task calls for a context switch, it means its privileges are raised to kernel temporarily.

My understanding is different from yours: user task need issue syscall here otherwise how the thread can raise the privilege to S-mode?

The reserved syscalls are reserved for the kernel only and thus will only be executed by privileged mode code, i.e. only the kernel does this.

No, all syscalls(either reserved or normal) are for user space(U-mode) to initiate the designed functionality to kernel space.

A user task can however call e.g. sys_call(SYS__exit, ...), which is executed via a proxy function like this:

void _exit(int parm1)
{
  (void)sys_call1((unsigned int)SYS__exit, (uintptr_t)parm1);
  while(1);
}

This sys_call will come from U-mode and it is intended to raise the privilege level to privileged mode.

Vica versa, a task running in kernel mode (be it a user task or a kernel task, what I mean the current privilege level = kernel) will never execute any of the old sys_call0-6 functions. They are not needed! The only reason the prototypes / definitions exist, is because the proxy functions mentioned above need their definitions, due to how the build system creates PROXY_ and STUB_ function into the /syscall/proxies and /syscall/stubs folders in non-flat builds.

Yes, I agree the behavior your describe above.

We could just as well move the sys_call0-6 prototypes and definitions to the libc folder, under machine/risc-v IMO.

The suggestion look reasonable, sys_call0-6 is directly called by user space code, libc/machine is a better place. But, we can do the change for all arch in a new patch.

In flat builds this will obviously fail horribly, and this is one of the reasons why kernel in S-mode is not really feasible for the flat build (among other things, like PMP needs to be configured to give access to memory and so forth).

Yes, I agree too. It's better to run flat mode in M-mode.

2. Kernel thread context switch could be implemented by ECALL too, but it will be handled in M-mode instead S-mode

I don't know how to do this, M-mode does not support address translations, so when using address environments (the kernel memory is MMU mapped) it will become quite difficult.

Sorry, I forget this. Yes, it's very hard to access the memory pointed by S-mode. It's clear that it isn't a good idea to handle S-mode context switch in M-mode.

3. Some special function(e.g. mhartid) which need the highest privilege has to issue ECALL from S-mode to M-mode

Yes, exactly

So, I think we can still use the same sys_call0-syscall6 as before, and:

1. Handle U-mode syscall in S-mode as before

Yes, this is why we need sys_call0-6, for the userspace

2. Handle S-mode context switch in M-mode( the code should be same as S-mode)

I don't know how to do this

3. Extend the reserved syscall id to cover the special need(e.g. mhardid)

Can be done, but I'd like to keep this separate. What riscv_mcall.c basically implements is a tiny SBI (comparable to openSBI for example but much smaller).

mtimer to name a few, see riscv_mcall.c).

Why we can't use S-mode timer instead?

There is no S-mode timer as far as I know. There is a scounter but this is for performance monitoring / profiling as far as I know. What we need is a ticker with a compare match event, i.e. mtime and mtimecmp. Such memory mapped registers are not available for S-mode.

Ok, it just has the S-mode timer interrupt pending bit(STIP), so we have:

  1. Let M-mode simulate the software timer and access the functionality through ECALL and trigger the interrupt by STIP
  2. Use the external hardware time instead

To support cpuid and timer, S-mode has to talk with M-mode. How do you reduce the coupling since we can't assume M-mode is also running NuttX? Do you plan reuse OpenSBI interface?

@pussuw
Copy link
Contributor Author

pussuw commented Mar 23, 2022

Thank you for your comments. Here are a few considerations to your questions.

How about we trigger software interrupt(SSIP) instead ECALL?

This mechanism could maybe be used, but is not necessary, as the asm functions provided here do the same. Also, the SW interrupt is typically used for inter-processor communication (IPI) and I would like to keep that option open for a true SMP.

Can we fake CPU by modifying CSR_STATUS to match the ECALL condition and ghen jump to exception_common directly?

Not really, in my opinion this gets extremely complicated.

  1. Sharing the IRQ stack will break this
  2. I don't know how the mcause/scause registers will behave in this case. Are these even writable (need to check tomorrow, I don't think so).
  3. Finally, the CURRENT_REGS behavior. CURRENT_REGS != NULL is tested in many places to guess whether the SW is in IRQ or not

I would keep the interrupt handling separate, it is quite mature code and I would rather not modify it lightly.

It's different from the active context switch, the passive context switch(e.g. the block thread wake up by an interrupt handler) since the interrupt handle already run in S-mode:
The exception entry pointer already save the current thread context for us so we can restore the unblock thread context directly without trap to M-mode

Absolutely true, and this is what is done in this implementation also

My understanding is different from yours: user task need issue syscall here otherwise how the thread can raise the privilege to S-mode?

This is also true. The user thread must use syscall to raise privilege to S-mode / kernel mode. But a context switch syscall is never sent by the application directly from user mode. This happens only when the user task is already executing kernel code in privileged mode. How the task got there can happen e.g. via sem_wait() or some other synchronization mechanism / timer interrupt / other interrupt etc. In any case, at this point the user task is already in privileged mode, and thus any sys_call() (including SYS_switch_context) will be executed from privileged mode. If that uses ecall, then the privilege will raise from S->M which is not what we want.

This is the only place where SYS_switch_context exists. The macro takes in tcb->xcp.regs from task that is to be preempted as arg1, and the task to be activated as arg2. Only the kernel can do this, as only the kernel knows the tcb:s. The same goes for the other context handling SYS_xxx numbers. These have been replaced by the asm functions, because e.g. in the below case sys_call2 would execute ecall.

#define riscv_switchcontext(saveregs, restoreregs) \
  sys_call2(SYS_switch_context, (uintptr_t)saveregs, (uintptr_t)restoreregs)

No, all syscalls(either reserved or normal) are for user space(U-mode) to initiate the designed functionality to kernel space.

Also true. But with the old implementation the user task is already in kernel space when e.g. sys_call(SYS_context_switch) is executed. If sys_call(SYS_context_switch) executes ecall instruction, again we raise privilege from S->M (or M->M in the case of the flat build). This works when the kernel is in M-mode, as the privilege level stays the same! This is the case for many (not all, like I said, sorry about that) reserved syscalls.

The suggestion look reasonable, sys_call0-6 is directly called by user space code, libc/machine is a better place. But, we can do the change for all arch in a new patch.

I agree, a new patch is better for this. This patch is already very big. And I still have the CONFIG_BUILD_KERNEL code to give for review (it's also done) :)

On a last note, I think I chose the naming of "ksys_call()" quite poorly. Actually, the function is just a trampoline to execute the asm function riscv_syscall_dispatch. A trampoline is required to store the original return address, so the "ksys_call()" return procedure will know where it came from originally (like with ecall you know from EPC, but emulated with SW!)

Ok, it just has the S-mode timer interrupt pending bit(STIP), so we have:

Let M-mode simulate the software timer and access the functionality through ECALL and trigger the interrupt by STIP
Use the external hardware time instead

To support cpuid and timer, S-mode has to talk with M-mode. How do you reduce the coupling since we can't assume M-mode is also running NuttX? Do you plan reuse OpenSBI interface?

I don't plan to use OpenSBI. The coupling should/could be handled somehow differently. However, if NuttX is running on a hart, no other OS can run on the same hart, so using my "miniSBI" is perfectly safe, for that hart. Obviously tampering with other harts this will fail horribly...

NOTE: The arch specific startup routine e.g. mpfs_head.S sets mtvec = __trap_handler, so ecall to M-mode will enter the nuttx trap handler on that particular hart.

@xiaoxiang781216
Copy link
Contributor

Thank you for your comments. Here are a few considerations to your questions.

How about we trigger software interrupt(SSIP) instead ECALL?

This mechanism could maybe be used, but is not necessary, as the asm functions provided here do the same. Also, the SW interrupt is typically used for inter-processor communication (IPI) and I would like to keep that option open for a true SMP.

Ok.

Can we fake CPU by modifying CSR_STATUS to match the ECALL condition and ghen jump to exception_common directly?

Not really, in my opinion this gets extremely complicated.

  1. Sharing the IRQ stack will break this
  2. I don't know how the mcause/scause registers will behave in this case. Are these even writable (need to check tomorrow, I don't think so).
  3. Finally, the CURRENT_REGS behavior. CURRENT_REGS != NULL is tested in many places to guess whether the SW is in IRQ or not

I would keep the interrupt handling separate, it is quite mature code and I would rather not modify it lightly.

OK.

It's different from the active context switch, the passive context switch(e.g. the block thread wake up by an interrupt handler) since the interrupt handle already run in S-mode:
The exception entry pointer already save the current thread context for us so we can restore the unblock thread context directly without trap to M-mode

Absolutely true, and this is what is done in this implementation also

My understanding is different from yours: user task need issue syscall here otherwise how the thread can raise the privilege to S-mode?

This is also true. The user thread must use syscall to raise privilege to S-mode / kernel mode. But a context switch syscall is never sent by the application directly from user mode. This happens only when the user task is already executing kernel code in privileged mode. How the task got there can happen e.g. via sem_wait() or some other synchronization mechanism / timer interrupt / other interrupt etc. In any case, at this point the user task is already in privileged mode, and thus any sys_call() (including SYS_switch_context) will be executed from privileged mode.

Ok, I get your point. U-mode thread never directly initiative the context switch through riscv_switch_context, instead through other normal syscall or the hardware interrupt.

If that uses ecall, then the privilege will raise from S->M which is not what we want.

This is the only place where SYS_switch_context exists. The macro takes in tcb->xcp.regs from task that is to be preempted as arg1, and the task to be activated as arg2. Only the kernel can do this, as only the kernel knows the tcb:s. The same goes for the other context handling SYS_xxx numbers. These have been replaced by the asm functions, because e.g. in the below case sys_call2 would execute ecall.

#define riscv_switchcontext(saveregs, restoreregs) \
  sys_call2(SYS_switch_context, (uintptr_t)saveregs, (uintptr_t)restoreregs)

No, all syscalls(either reserved or normal) are for user space(U-mode) to initiate the designed functionality to kernel space.

Also true. But with the old implementation the user task is already in kernel space when e.g. sys_call(SYS_context_switch) is executed. If sys_call(SYS_context_switch) executes ecall instruction, again we raise privilege from S->M (or M->M in the case of the flat build). This works when the kernel is in M-mode, as the privilege level stays the same! This is the case for many (not all, like I said, sorry about that) reserved syscalls.

Once kernel mode is enabled, only threads in S-mode will call riscv_switch_context directly to switch the context:

  1. In the flat mode, all threads call riscv_switch_context to trigger the context switch. In this case, riscv_switch_context could reuse the interrupt process logic by issuing ECALL since RISCV support ECALL from M-mode to M-mode.
    2.In the kernel mode, only S-mode thread call riscv_switch_context to trigger the context switch. In this case, riscv_switch_context has do the save/restore manually, since RISCV doesn't support ECALL from S-mode to S-mode.

Basically, riscv_switch_context need two different implementation.

The suggestion look reasonable, sys_call0-6 is directly called by user space code, libc/machine is a better place. But, we can do the change for all arch in a new patch.

I agree, a new patch is better for this. This patch is already very big. And I still have the CONFIG_BUILD_KERNEL code to give for review (it's also done) :)

On a last note, I think I chose the naming of "ksys_call()" quite poorly. Actually, the function is just a trampoline to execute the asm function riscv_syscall_dispatch. A trampoline is required to store the original return address, so the "ksys_call()" return procedure will know where it came from originally (like with ecall you know from EPC, but emulated with SW!)

Yes, it confuse me too.

Ok, it just has the S-mode timer interrupt pending bit(STIP), so we have:

Let M-mode simulate the software timer and access the functionality through ECALL and trigger the interrupt by STIP
Use the external hardware time instead

To support cpuid and timer, S-mode has to talk with M-mode. How do you reduce the coupling since we can't assume M-mode is also running NuttX? Do you plan reuse OpenSBI interface?

I don't plan to use OpenSBI. The coupling should/could be handled somehow differently. However, if NuttX is running on a hart, no other OS can run on the same hart, so using my "miniSBI" is perfectly safe, for that hart. Obviously tampering with other harts this will fail horribly...

But M-mode mayn't run NuttX at all, how that piece software support your private "miniSBI"? My suggestion doesn't mean to bring the whole OpenSBI package into M mode NuttX, but reuse the OpenSBI interface: NuttX in M-mode implement the minimal OpenSBI API required by NuttX in S-mode. The approach make S-mode NuttX could work with any M-mode software which compliant with OpenSBI.

NOTE: The arch specific startup routine e.g. mpfs_head.S sets mtvec = __trap_handler, so ecall to M-mode will enter the nuttx trap handler on that particular hart.

@pussuw
Copy link
Contributor Author

pussuw commented Mar 24, 2022

Ok, I get your point. U-mode thread never directly initiative the context switch through riscv_switch_context, instead through other normal syscall or the hardware interrupt.

Exactly!

Once kernel mode is enabled, only threads in S-mode will call riscv_switch_context directly to switch the context:

1.In the flat mode, all threads call riscv_switch_context to trigger the context switch. In this case, riscv_switch_context could reuse the interrupt process logic by issuing ECALL since RISCV support ECALL from M-mode to M-mode.
2.In the kernel mode, only S-mode thread call riscv_switch_context to trigger the context switch. In this case, riscv_switch_context has do the save/restore manually, since RISCV doesn't support ECALL from S-mode to S-mode.

Basically, riscv_switch_context need two different implementation.

Exactly! 1. is what we have and it works, this is why I did not modify that. 2. is what this patch is about. I made using them an option, because 1. works and I wanted to make this change as non-disruptive as possible.

More over all of the context handling routines (most of the reserved sys calls in fact) need to be handled differently. The asm functions provided here are replacements for the interrupt process logic.

But M-mode mayn't run NuttX at all, how that piece software support your private "miniSBI"? My suggestion doesn't mean to bring the whole OpenSBI package into M mode NuttX, but reuse the OpenSBI interface: NuttX in M-mode implement the minimal OpenSBI API required by NuttX in S-mode. The approach make S-mode NuttX could work with any M-mode software which compliant with OpenSBI.

This is absolutely true. This patch now forces usage of my "miniSBI" if the user wants to run NuttX in S-mode. Using OpenSBI is not possible. Maybe we need a new Kconfig flag for this ? "CONFIG_USE_MINISBI" or something (what to name it, IDK?)

@xiaoxiang781216
Copy link
Contributor

xiaoxiang781216 commented Mar 24, 2022

Why not implement OpenSBI API in M-mode required by S-mode NuttX directly? I think is more simple and general than a private MiniSBI. We can write from scratch in M-mode NuttX without link OpenSBI package and implement function and argument which actually used by S-mode NuttX. The benefit of this approach is:

  1. One copy code in S-mode can work with M-mode NuttX or other OpenSBI compliant implementation
  2. M-mode NuttX can enable your minimal OpenSBI implementation, or switch to the full feature OpenSBI package

Both side get the flexibility and avoid to lock to some special implementation.

Actually, you can split the patch into at least two part:

  1. M-mode add the minimal OpenSBI implementation
  2. S-mode implementation

@pussuw
Copy link
Contributor Author

pussuw commented Mar 24, 2022

Why not implement OpenSBI API in M-mode required by S-mode NuttX directly? I think is more simple and general than a private MiniSBI. We can write from scratch in M-mode NuttX without link OpenSBI package and implement function and argument which actually used by S-mode NuttX. The benefit of this approach is:

1. One copy code in S-mode can work with M-mode NuttX or other OpenSBI compliant implementation

2. M-mode NuttX can enable your minimal OpenSBI implementation, or switch to the full feature OpenSBI package

Both side get the flexibility and avoid to lock to some special implementation.

It can be done. The skeleton for this pretty much exists in this patch already.

The abstraction point in this case would be the riscv_mcall.c module:

void riscv_mcall_ack_timer(void)
{
   mcall0(MCALL_ACK_TIMER);
}

Instead of directly calling mcall0 it would call some "openSBI_wrap_timerxx" function, which would ecall into OpenSBI which would do the work.

This requires obviously that OpenSBI is built & linked with NuttX (or separately) and placed into memory somehow. NuttX would then be provided as a payload binary to OpenSBI. Adding this infrastructure is a lot of work.

Actually, you can split the patch into at least two part:

1. M-mode add the minimal OpenSBI implementation

2. S-mode implementation

Yes, of course. I can do the split.

At this point I would like to clarify, that the task given to me is "enable CONFIG_BUILD_KERNEL=y" for RISC-V on MPFS platform, and S-mode is just one of the requirements to get that done. What this patch includes does not break any existing code or what we already have. It was made just as an enabler to get CONFIG_BUILD_KERNEL working with the minimal amount of extra effort.

I can do the abstraction / enabler to use OpenSBI instead of my "MiniSBI" but at this point I will not implement full OpenSBI support to NuttX as it is a lot of extra work and does not bring me any closer to the target / task I have been given.

@xiaoxiang781216
Copy link
Contributor

Yes, I understand, that is why I emphasize that M-mode only need implement the minimal API(subset of function and argument) which is actually used by S-mode NuttX. But the implemented subset is better to compliant with OpenSBI spec as much as possible.

@xiaoxiang781216
Copy link
Contributor

Maybe we can change CONFIG_OPENSBI at https://github.com/apache/incubator-nuttx/tree/master/arch/risc-v/src/opensbi from bool to choice and add a new minimal implementation aside the full feature implementation like what we have done for C++ here:
https://github.com/apache/incubator-nuttx/blob/master/libs/libxx/Kconfig#L25-L50

@jlaitine
Copy link
Contributor

I think that we are responsible for the SBI confusion, so let me explain a bit where we are.

The OpenSBI library, which we added (by @eenurkka) is not really used by NuttX itself. It would be possible to use that to provide the SBI for nuttx in theory, but would require quite a lot of extra framework to bind it to nuttx.

So why did we add the support for building the full OpenSBI library to the NuttX build scripts then?

We have a NuttX based bootloader application for MPFS, which can either boot another NuttX or also uBoot&Linux (actually on in parallel on different harts on the same soc even). Now, Linux requires a lot more from the SBI than the NuttX, such as a bunch of extra trap handlers (unaligned access etc.), detection of hart capabilities etc. etc. So we are only passing the OpenSBI libarary from our bootloader app to uBoot and Linux after NuttX has initialized the HW. The underlying idea behind integrating the build scripts into NuttX was that maybe the OpenSBI could be usable for others as well...

What @pussuw has implemented here is really clever; NuttX running in S-mode only needs the timer interrupt handling and hart id reading in M-mode. So it makes a world of sense to just make this very minimal "sbi" which is tailored for NuttX. But let's not mix these two SBIs at this point. It is most likely not feasible to take in the whole OpenSBI into NuttX use (since most of it is not really needed - it is way too big and generic).

So I would suggest to let these live side-by-side for now, even though they partially implement the same thing. Enabling S-mode in NuttX is really only needed when you want to enable address environments (in CONFIG_BUILD_KERNEL), and what @pussuw has done here enables just that and nothing extra.

@xiaoxiang781216
Copy link
Contributor

xiaoxiang781216 commented Mar 24, 2022

I think that we are responsible for the SBI confusion, so let me explain a bit where we are.

The OpenSBI library, which we added (by @eenurkka) is not really used by NuttX itself. It would be possible to use that to provide the SBI for nuttx in theory, but would require quite a lot of extra framework to bind it to nuttx.

So why did we add the support for building the full OpenSBI library to the NuttX build scripts then?

We have a NuttX based bootloader application for MPFS, which can either boot another NuttX or also uBoot&Linux (actually on in parallel on different harts on the same soc even). Now, Linux requires a lot more from the SBI than the NuttX, such as a bunch of extra trap handlers (unaligned access etc.), detection of hart capabilities etc. etc. So we are only passing the OpenSBI libarary from our bootloader app to uBoot and Linux after NuttX has initialized the HW. The underlying idea behind integrating the build scripts into NuttX was that maybe the OpenSBI could be usable for others as well...

What @pussuw has implemented here is really clever; NuttX running in S-mode only needs the timer interrupt handling and hart id reading in M-mode. So it makes a world of sense to just make this very minimal "sbi" which is tailored for NuttX. But let's not mix these two SBIs at this point. It is most likely not feasible to take in the whole OpenSBI into NuttX use (since most of it is not really needed - it is way too big and generic).

Since OpenSBI is the standard way to expose M-mode functionality to S-mode, what I suggest it's that we use OpenSBI API definition, but write our own minimal implementation. This is just like how NuttX implementation POSIX API, which is much smaller than other(FreeBSD/Linux) implementation. It's always good to avoid the private(ad hoc) interface definition, especially a well defined interface(OpenSBI here) already exist.
If you don' want to implement the minimal OpenSBI this time. It's better to use other hardware timer for OS scheduler. Since most SoC has a dozen of hardware timer, it isn't too difficult to find a free one. Actually, many chips supported NuttX use the chip specific hardware timer, instead of ARM recommend SYSTICK or general timer.

So I would suggest to let these live side-by-side for now, even though they partially implement the same thing. Enabling S-mode in NuttX is really only needed when you want to enable address environments (in CONFIG_BUILD_KERNEL), and what @pussuw has done here enables just that and nothing extra.

It's fine to put the implementation to different location, the key point is that the interface follow OpenSBI definition. It's also good to make the timer selectable since it may more suitable to use other hardware timer on different chip.

@pussuw pussuw force-pushed the riscv_smode_for_nuttx branch 5 times, most recently from e5962b3 to 6f84db7 Compare March 31, 2022 13:31
- Add config "ARCH_USE_S_MODE" which controls whether the kernel
  runs in M-mode or S-mode
- Add more MSTATUS and most of the SSTATUS register definitions
- Add more MIP flags for interrupt delegation
- Add handling of interrupts from S-mode
- Add handling of FPU from S-mode
- Add new context handling functions that are not dependent on the trap
  handlers / ecall

NOTE: S-mode requires a companion SW (SBI) which is not yet implemented,
      thus S-mode is not usable as is, yet.
It might be useful to store things in memory per CPU. The tricky part
is that all CPUs run the same code and see the same memory, so some
kind of centralized access is required.

For now, the structure contains the hart id.

Access to the structure elements is provided via sscratch, which is
unique for every hart!
@pussuw
Copy link
Contributor Author

pussuw commented Apr 1, 2022

I think I don't have comments unaddressed and all of the merge issues should now be solved (CI will tell the final truth), so could you @xiaoxiang781216 and @pkarashchenko please re-review so we can get this opus finally merged.

Copy link
Contributor

@pkarashchenko pkarashchenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not have any comments

arch/risc-v/include/irq.h Show resolved Hide resolved
# define ASM_SYS_CALL \
" addi sp, sp, -16\n" /* Make room */ \
REGSTORE " ra, 0(sp)\n" /* Save ra */ \
" jal ra, riscv_syscall_dispatch\n" /* Dispatch (modifies ra) */ \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about we directly jump to riscv_syscall_dispatch by j? so we can save 4 instructions and remove line 34.

@@ -24,6 +24,12 @@ ifeq ($(CONFIG_OPENSBI),y)
include opensbi/Make.defs
endif

# Kernel runs in supervisor mode or machine mode ?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change the comment "Include supervisor specific source files"

@@ -119,24 +87,29 @@ exception_common:

/* Setup arg0(exception cause), arg1(context) */

csrr a0, mcause /* exception cause */
csrr a0, CSR_CAUSE /* exception cause */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move line 88-91 after line 110 and remove line 111

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was like this so I'll keep it like this

@@ -69,48 +69,16 @@
exception_common:

addi sp, sp, -XCPTCONTEXT_SIZE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move the stack reserve to save_ctx?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was like this so I'll keep it like this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is my thought: It isn't a problem before since we just have one exception entry pointer, but now we have two(exception_common and riscv_dispatch_syscall), it's always good to make both share the code base as much as possible.

*
****************************************************************************/

.macro save_ctx in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

save_ctx->riscv_savectx

*
****************************************************************************/

.macro load_ctx out
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

riscv_loadctx

arch/risc-v/src/common/supervisor/Make.defs Show resolved Hide resolved
arch/risc-v/Kconfig Show resolved Hide resolved
arch/risc-v/Kconfig Show resolved Hide resolved
arch/risc-v/src/common/riscv_percpu.h Show resolved Hide resolved
arch/risc-v/src/common/riscv_percpu.h Show resolved Hide resolved
arch/risc-v/src/common/riscv_percpu.c Show resolved Hide resolved
arch/risc-v/src/common/riscv_percpu.c Show resolved Hide resolved
arch/risc-v/src/common/riscv_percpu.c Show resolved Hide resolved
arch/risc-v/src/common/riscv_percpu.c Show resolved Hide resolved
@acassis acassis merged commit 71ced1f into apache:master Apr 1, 2022
@xiaoxiang781216
Copy link
Contributor

xiaoxiang781216 commented Apr 1, 2022

@acassis why directly merge this patch? there still has some place need to improve.

@pussuw
Copy link
Contributor Author

pussuw commented Apr 2, 2022

Thanks to all the reviewers, especially @pkarashchenko and @xiaoxiang781216 thank you for your hard work

@jlaitine
Copy link
Contributor

jlaitine commented Oct 11, 2022 via email

@luojia65
Copy link

luojia65 commented Nov 8, 2022

There is one big issue with the spec. IMO, as it does not provide a way to read the timer (mtime) value or the value of mtimecmp. To create an accurate and safe to use ticker both are needed, as there is no way to calculate time in absolute format otherwise. If it was an offset then it would make sense, but it's not. This ABI does not really make any sense to me.

@pussuw at #5782 (comment) : The SBI environment is typically responsible for RISC-V software register time. RISC-V registers include two parts, one implemented by RISC-V hardware and the other part defined by software. RISC-V machine environments typically use trap-and-return procedure to emulate software registers. The register time is in absolute, meaning that RISC-V supervisor software may read time directly, add difference of time and fill it using SBI set_time. Here's some assemble code:

csrr t0, time ; read time register from SBI environment
li t1, 1000000
add t0, t0, t1 ; calculate the next time it should interrupt supervisor
mv a0, t0
li a6, FUNCTION_SET_TIME
li a7, EXTENSION_TIMER
ecall ; call SBI environment to set timer

Similiar software emulated RISC-V register would include CSR cycle.

@luojia65
Copy link

luojia65 commented Nov 8, 2022

Since OpenSBI is the standard way to expose M-mode functionality to S-mode, what I suggest it's that we use OpenSBI API definition, but write our own minimal implementation. This is just like how NuttX implementation POSIX API, which is much smaller than other(FreeBSD/Linux) implementation. It's always good to avoid the private(ad hoc) interface definition, especially a well defined interface(OpenSBI here) already exist.

@xiaoxiang781216 at #5782 (comment) : OpenSBI is clearly not the standard way of implementing SBI environment. An SBI environment may run on M-mode or HS-mode, and OpenSBI only supports M-mode. Though named after 'Open', OpenSBI mixes SBI standard with lots of software custom features like domains which the SBI standard does not aware of. This project is clearly not at any means a 'de-facto' standard, and if using OpenSBI is considered the only way of supporting NuttX, virtualized NuttX on RISC-V using HS- and VS-mode will be impossible.

Lots of SoC and board vendors would provide a compact bootloader environment for their RISC-V IoT mainboards. Those environments either open source or closed, implement chip specific code to speed up software execution, and may include vendor specific extensions (E. Feng et al. 2021 at USENIX, H. Lu et al. 2022 at ACM DAC) to improve security and add more functions. Some of them uses RustSBI who have github stars more than all other SBI implementations combined. If any SBI implementation bundles with NuttX, it will be impossible to benefit from those features and improvements in production.

@xiaoxiang781216
Copy link
Contributor

Anyway, you need select a base, otherwise you can't write a general code from NuttX to invoke the functionality exposed by SBI. It's fine that vendor define the private extension, but all vendor need share the common part.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants