-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
arch/risc-v: Improve speed of context switch #5731
Conversation
I tried this PR with maix-bit:kostest on QEMU but the ASSERTION happened.
|
@masayuki2009 Emm, I meet another problem with qemu-6.2 with mainline or this PR:
I'll try it again with qemu 5.2 |
@masayuki2009 I can't reproduce this issue but the problem mentioned in #5672 occured:
|
9a9ed86
to
be94446
Compare
11ed30a
to
a828df8
Compare
@no1wudi is it ready for reviewing? |
Emm, smp is still not work with latest update. |
3e0d3c7
to
4807870
Compare
@no1wudi |
cb1d69b
to
c5481cc
Compare
Signed-off-by: Huang Qi <huangqi3@xiaomi.com>
Can someone please explain to me how this change is supposed to work with MMU and address environments ? Am I misunderstanding something ? The xcp.regs are now allocated from the user task's stack memory, prior to this change they were in the kernel's memory space. Please note the following example
If rtcb->xpc.regs points to the old task's stack, which resides in its address environment, how can we perform a register save there ? The old implementation saved the task context in the kernel space memory, which is why it is safe to change the address environment as the memory for xcp.regs[] exists in the kernel space and is always mapped ? |
since rtcb->xcp.regs itself is still in the kernel space, riscv_switchcontext can update rtcb->xcp.regs to the new context on the stack:
The problem we can see is that up_schedule_sigaction may touch the stack of non current thread directly: We haven't found other places will touch the memory pointed by xcp.regs of the non current thread, if you find some please point out. Since RISCV kernel mode isn't mainlined yet, @anchao will test the same change(#5645) on ARM. |
Thank you for your response, yes it is clear that xcp.regs is still in kernel memory and is addressable. However, the memory for the context itself exists in MMU mapped memory only if I'm right. As What I'm concerned about is that calling Unless I'm totally missing something here. |
I can confirm that for RISC-V the flat and protected targets work (with #5881) but the kernel mode target crashes with a load page fault. I will investigate further but the code was working before i.e. I could enter nsh and run things on the console. Now the SW crashes before nsh is launched. |
Yes, I think so.
call group_addrenv here is too early, even without @no1wudi 's change, it still can't work as expect, because the current stack is swapped out. The right place is in riscv_swint::SYS_switch_context.
Since the similar code also exist on ARM, @anchao will try to provide a patch fixing this issue. |
Do you have try the context switch between two user space thread without this change? It work before may just because the next thread is kernel thread. |
Yes it is too early, but only with the new context switch change. It worked previously since the tcb.xcp contained memory for the task's context. So calling riscv_switchcontext(a,b) memory for both a and b exist exists in kernel space, so it is always mapped, and safe to use even after calling group_addrenv(). After this riscv_switchcontext(a,b) is using b's stack, which is imo correct. |
I had several user tasks running so yes it got tested. As I said previously the memory for tcb.xcp.regs[] was in the tcb itself, which means it is in kernel memory for kernel an user tasks both. |
Yes, xcp.regs[] is in the kernel space, but the thread stack pointed by SP isn't. |
Exactly, and now the memory for saving the task context is taken from the stack.
Previously the memory was in tcb, as xcp.regs was a buffer, not a pointer: |
But the thread stack also switch out. For example, let's assume that the user space thread call semwait and the count of sem is zero:
So, I think the problem exist in the origin code. |
Oops, you are also right, the original code is suspicious too, as the first thing exception_common does is this:
I assume the same issue is in the ARM implementation too. This is not an issue with the new S-mode function however:
Saving the context does not use stack, and this is why the old code works on my table. But indeed, the location for calling group_addrenv() is suspicious, as the context switch is done by using the new tasks environment, even though the old task's registers are still in use. |
Yes, ARM has the similar issue, @anchao hit the problem before like this:
he made a temp fix like this:
The above problem is fixed. So, to fix this issue:
@no1wudi please prepare two patches for RISCV tomorrow, so @pussuw can verify with his S-mode change.
If we postpone the address space switch until save_ctx return, this PR should work with your S-mode change, I think. Yes, but it isn't reliable since we can't control how C compiler generate the machine code which may insert push/pop operation between group_addrenv<->riscv_switchcontext and riscv_switchcontext<->riscv_swint in the new version or different compiler options.
Yes, we move need do the address space change with the interrupt or kernel stack, otherwise we have to write the whole process in assemble code, so we can control the stack usage manually. |
Yes only option is to make sure interrupt or kernel stack is in use. I'm not even sure how the old implementation seemingly works for me. If the return address is put into stack, the SW should get lost when it attempts to return from group_addrenv(), as the memory pointed to by SP is no longer the same. Unless the next task is always a kernel task when a user task is being suspended, then the address is still mapped, like you already said. I must have gotten lucky in my testing or something. I guess this is what saves it
So when a kernel task is resumed and a user task is suspended, the user task's environment is left in place, I guess.. |
Summary
Follow #5645
Impact
RISC-V
Testing
maix-bit/rv-virt:nsh/smp on qemu