Make #VC handling IST #271

p4zuu · 2024-02-14T16:32:01Z

Making #VC handler working on IST stack is cleaner. However, it requires to properly handle side-effects, like the handling of nested #VC.

This PR:

makes #VC and #DB IST
add a few methods to easily find if a given address is on #VC IST stack memory region
handle nested IST #VC
makes a few comments in the VC code Rust idiomatic.

A few interrogations I would like to solve before merging:

please double-check that the addresses created for IST stacks are correct
is making #DB IST stricly necessary to make #VC IST?

Since many actions from untrusted part (hypervisor or user-space) can raise a #VC excetpion during the syscall gap, #VC handling stack switching should be IST. Signed-off-by: Thomas Leroy <thomas.leroy@suse.com>

For more convenient tracking of the stack bounds, we can turn the percpu init and IST stacks tracking from a VirtAddr to a MemoryRegion object. Signed-off-by: Thomas Leroy <thomas.leroy@suse.com>

We need a public method to access IST stack bounds. Let's start with #VC. Signed-off-by: Thomas Leroy <thomas.leroy@suse.com>

Now that #VC handler runs on an IST stack, nested #VC can be handled on the same stack than the parent handler, possibly overwriting the parent's stack content. We need to handle nested #VC differently. We can detect if #VC is nested by checking the value of RSP pushed on the stack in the early exception handler. We can also detect #VC raised from user-mode. For both cases, we relocate the stack to the current task's stack that is safe to use. Finally, we need to copy the pushed registers to the new stack address, and call the regular #VC handler like if #VC handling was not IST. Signed-off-by: Thomas Leroy <thomas.leroy@suse.com>

Replace C /**/ multiline comments by //. Signed-off-by: Thomas Leroy <thomas.leroy@suse.com>

msft-jlange · 2024-02-14T17:31:14Z

What is the rationale for using IST dispatch? I believe we are on a path not to use SYSCALL due to the complexities it will introduce with TDX Partitioning, so there will be no RIP/RSP modification gap on user/kernel transitions. There is no use of the GS segment in the kernel, so there is no SWAPGS gap on user/kernel transitions. As a result, there should never be a window in which dispatch on the current stack is not possible. Given these statements, is there any benefit for using IST dispatch for any of these exceptions?

Even if #VC dispatch is considered vulnerable to stack-based delivery challenges, why move #DB to an IST stack? The considerations that compel other operating systems to use IST dispatch for #DB do not apply to the COCONUT-SVSM kernel.

I believe we will want to avoid IST usage as much as we can because of the challenges that IST dispatch poses to reentrant delivery, so we should be able to clearly articulate a valuable reason that is relevant within the COCONUT-SVSM architecture.

00xc · 2024-02-14T18:04:34Z

What is the rationale for using IST dispatch? I believe we are on a path not to use SYSCALL due to the complexities it will introduce with TDX Partitioning (...)

Perhaps this is a dumb question, but what's the alternative here for transitioning to CPL-0? I guess int 0x80-style interrupts?

msft-jlange · 2024-02-14T20:46:15Z

Perhaps this is a dumb question, but what's the alternative here for transitioning to CPL-0? I guess int 0x80-style interrupts?

Yes, the INT N instruction. I expect that the performance gap between software interrupt dispatch and SYSCALL transitions has narrowed significantly since the SYSCALL instruction was first designed 20 years ago. I can try to dig up numbers.

msft-jlange · 2024-02-15T05:29:29Z

kernel/src/cpu/vc.rs

+    let mut new_rsp = if from_user(ctx) || vc_on_ist_stack(ctx) {
+        this_cpu().current_stack.end()
+    } else {
+        VirtAddr::from(ctx.frame.rsp)


This will not be correct in a SYSCALL gap. If the #VC is delivered immediately after control transfers to kernel mode to the SYSCALL entry point, the CS on the exception frame will indicate a kernel mode CS, but the RSP in the exception frame will be the user-mode RSP since the SYSCALL entry point will not yet have been able to switch to the kernel-mode stack. It would be a serious problem if this routine were to attempt to switch off of the IST stack back onto the user-mode stack.

Because the untrusted host can choose to remove the page backing the SYSCALL entry point and replace it with a different page, it is certainly possible for the first instruction at the SYSCALL entry point to raise #VC due to a page-not-validated error, so this must be anticipated.

A similar problem exists immediately prior to SYSRET. A carefully timed interrupt by the untrusted host could cause the code page to disappear after the user RSP is reloaded but before SYSRET executes, and the resulting #VC will again appear to come from kernel mode on the user stack. This code must also anticipate that possibility. Of course, both of these cases are security attacks which could reasonably result in a panic, but that panic must occur before an inadvertent switch to the user stack.

This will not be correct in a SYSCALL gap. If the #VC is delivered immediately after control transfers to kernel mode to the SYSCALL entry point, the CS on the exception frame will indicate a kernel mode CS, but the RSP in the exception frame will be the user-mode RSP since the SYSCALL entry point will not yet have been able to switch to the kernel-mode stack. It would be a serious problem if this routine were to attempt to switch off of the IST stack back onto the user-mode stack.

Indeed. I made the choice not to try to detect the SYSCALL gap here at the beginning of the vc_switch_off_ist() function since we still don't support SYSCALL handling, but that's something I had in mind and would have worked on when we have proper SYSCALL handling. I can try to detect and handle the syscall gap (of course if we go for the SYSCALL option instead of the software interrupt way you mentioned).

A similar problem exists immediately prior to SYSRET. A carefully timed interrupt by the untrusted host could cause the code page to disappear after the user RSP is reloaded but before SYSRET executes, and the resulting #VC will again appear to come from kernel mode on the user stack. This code must also anticipate that possibility. Of course, both of these cases are security attacks which could reasonably result in a panic, but that panic must occur before an inadvertent switch to the user stack.

Indeed again, I was less aware of the same issue in SYSRET. Many thanks for the precision.
I guess spotting that the exception context's RIP is at the very early SYSCALL handler stage (or very end of SYSRET handler) would be a solution. Do you see any better way? And does relocating #VC handler's stack to current task's stack if we detect the SYSCALL gap look the proper way to you? Or would you find that panic would be safer?

p4zuu · 2024-08-05T10:28:01Z

The current syscall handling developing plan goes to software interrupt instead of syscall/sysret. I think we can safely close this.

p4zuu added 5 commits February 13, 2024 17:42

kernel/cpu/idt: make #DB and #VC IST

f900832

Since many actions from untrusted part (hypervisor or user-space) can raise a #VC excetpion during the syscall gap, #VC handling stack switching should be IST. Signed-off-by: Thomas Leroy <thomas.leroy@suse.com>

kernel/cpu/percpu: IST and init stack MemoryRegion

861d121

For more convenient tracking of the stack bounds, we can turn the percpu init and IST stacks tracking from a VirtAddr to a MemoryRegion object. Signed-off-by: Thomas Leroy <thomas.leroy@suse.com>

kernel/cpu/percpu: get #VC IST stack bounds

b2b8f98

We need a public method to access IST stack bounds. Let's start with #VC. Signed-off-by: Thomas Leroy <thomas.leroy@suse.com>

kernel/cpu/idt/vc: use Rust idiomatic comments

bae49de

Replace C /**/ multiline comments by //. Signed-off-by: Thomas Leroy <thomas.leroy@suse.com>

joergroedel self-requested a review February 14, 2024 16:43

joergroedel added the wait-for-review PR needs for approval by reviewers label Feb 14, 2024

joergroedel self-assigned this Feb 14, 2024

msft-jlange reviewed Feb 15, 2024

View reviewed changes

00xc mentioned this pull request Apr 17, 2024

Support execution in user-mode #314

Merged

p4zuu closed this Aug 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make #VC handling IST #271

Make #VC handling IST #271

p4zuu commented Feb 14, 2024

msft-jlange commented Feb 14, 2024

00xc commented Feb 14, 2024

msft-jlange commented Feb 14, 2024

msft-jlange Feb 15, 2024

p4zuu Feb 15, 2024

p4zuu commented Aug 5, 2024

Make #VC handling IST #271

Make #VC handling IST #271

Conversation

p4zuu commented Feb 14, 2024

msft-jlange commented Feb 14, 2024

00xc commented Feb 14, 2024

msft-jlange commented Feb 14, 2024

msft-jlange Feb 15, 2024

Choose a reason for hiding this comment

p4zuu Feb 15, 2024

Choose a reason for hiding this comment

p4zuu commented Aug 5, 2024