Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make #VC handling IST #271

Closed
wants to merge 5 commits into from
Closed

Make #VC handling IST #271

wants to merge 5 commits into from

Conversation

p4zuu
Copy link
Collaborator

@p4zuu p4zuu commented Feb 14, 2024

Making #VC handler working on IST stack is cleaner. However, it requires to properly handle side-effects, like the handling of nested #VC.

This PR:

  • makes #VC and #DB IST
  • add a few methods to easily find if a given address is on #VC IST stack memory region
  • handle nested IST #VC
  • makes a few comments in the VC code Rust idiomatic.

A few interrogations I would like to solve before merging:

  • please double-check that the addresses created for IST stacks are correct
  • is making #DB IST stricly necessary to make #VC IST?

Since many actions from untrusted part (hypervisor or user-space) can
raise a #VC excetpion during the syscall gap, #VC handling stack
switching should be IST.

Signed-off-by: Thomas Leroy <thomas.leroy@suse.com>
For more convenient tracking of the stack bounds, we can turn
the percpu init and IST stacks tracking from a VirtAddr to a
MemoryRegion object.

Signed-off-by: Thomas Leroy <thomas.leroy@suse.com>
We need a public method to access IST stack bounds.
Let's start with #VC.

Signed-off-by: Thomas Leroy <thomas.leroy@suse.com>
Now that #VC handler runs on an IST stack, nested #VC can be handled
on the same stack than the parent handler, possibly overwriting the
parent's stack content.

We need to handle nested #VC differently. We can detect if #VC is
nested by checking the value of RSP pushed on the stack in the early
exception handler. We can also detect #VC raised from user-mode.
For both cases, we relocate the stack to the current task's stack that
is safe to use.

Finally, we need to copy the pushed registers to the new stack address,
and call the regular #VC handler like if #VC handling was not IST.

Signed-off-by: Thomas Leroy <thomas.leroy@suse.com>
Replace C /**/ multiline comments by //.

Signed-off-by: Thomas Leroy <thomas.leroy@suse.com>
@joergroedel joergroedel self-requested a review February 14, 2024 16:43
@joergroedel joergroedel added the wait-for-review PR needs for approval by reviewers label Feb 14, 2024
@joergroedel joergroedel self-assigned this Feb 14, 2024
@msft-jlange
Copy link
Contributor

What is the rationale for using IST dispatch? I believe we are on a path not to use SYSCALL due to the complexities it will introduce with TDX Partitioning, so there will be no RIP/RSP modification gap on user/kernel transitions. There is no use of the GS segment in the kernel, so there is no SWAPGS gap on user/kernel transitions. As a result, there should never be a window in which dispatch on the current stack is not possible. Given these statements, is there any benefit for using IST dispatch for any of these exceptions?

Even if #VC dispatch is considered vulnerable to stack-based delivery challenges, why move #DB to an IST stack? The considerations that compel other operating systems to use IST dispatch for #DB do not apply to the COCONUT-SVSM kernel.

I believe we will want to avoid IST usage as much as we can because of the challenges that IST dispatch poses to reentrant delivery, so we should be able to clearly articulate a valuable reason that is relevant within the COCONUT-SVSM architecture.

@00xc
Copy link
Member

00xc commented Feb 14, 2024

What is the rationale for using IST dispatch? I believe we are on a path not to use SYSCALL due to the complexities it will introduce with TDX Partitioning (...)

Perhaps this is a dumb question, but what's the alternative here for transitioning to CPL-0? I guess int 0x80-style interrupts?

@msft-jlange
Copy link
Contributor

Perhaps this is a dumb question, but what's the alternative here for transitioning to CPL-0? I guess int 0x80-style interrupts?

Yes, the INT N instruction. I expect that the performance gap between software interrupt dispatch and SYSCALL transitions has narrowed significantly since the SYSCALL instruction was first designed 20 years ago. I can try to dig up numbers.

let mut new_rsp = if from_user(ctx) || vc_on_ist_stack(ctx) {
this_cpu().current_stack.end()
} else {
VirtAddr::from(ctx.frame.rsp)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will not be correct in a SYSCALL gap. If the #VC is delivered immediately after control transfers to kernel mode to the SYSCALL entry point, the CS on the exception frame will indicate a kernel mode CS, but the RSP in the exception frame will be the user-mode RSP since the SYSCALL entry point will not yet have been able to switch to the kernel-mode stack. It would be a serious problem if this routine were to attempt to switch off of the IST stack back onto the user-mode stack.

Because the untrusted host can choose to remove the page backing the SYSCALL entry point and replace it with a different page, it is certainly possible for the first instruction at the SYSCALL entry point to raise #VC due to a page-not-validated error, so this must be anticipated.

A similar problem exists immediately prior to SYSRET. A carefully timed interrupt by the untrusted host could cause the code page to disappear after the user RSP is reloaded but before SYSRET executes, and the resulting #VC will again appear to come from kernel mode on the user stack. This code must also anticipate that possibility. Of course, both of these cases are security attacks which could reasonably result in a panic, but that panic must occur before an inadvertent switch to the user stack.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will not be correct in a SYSCALL gap. If the #VC is delivered immediately after control transfers to kernel mode to the SYSCALL entry point, the CS on the exception frame will indicate a kernel mode CS, but the RSP in the exception frame will be the user-mode RSP since the SYSCALL entry point will not yet have been able to switch to the kernel-mode stack. It would be a serious problem if this routine were to attempt to switch off of the IST stack back onto the user-mode stack.

Indeed. I made the choice not to try to detect the SYSCALL gap here at the beginning of the vc_switch_off_ist() function since we still don't support SYSCALL handling, but that's something I had in mind and would have worked on when we have proper SYSCALL handling. I can try to detect and handle the syscall gap (of course if we go for the SYSCALL option instead of the software interrupt way you mentioned).

A similar problem exists immediately prior to SYSRET. A carefully timed interrupt by the untrusted host could cause the code page to disappear after the user RSP is reloaded but before SYSRET executes, and the resulting #VC will again appear to come from kernel mode on the user stack. This code must also anticipate that possibility. Of course, both of these cases are security attacks which could reasonably result in a panic, but that panic must occur before an inadvertent switch to the user stack.

Indeed again, I was less aware of the same issue in SYSRET. Many thanks for the precision.
I guess spotting that the exception context's RIP is at the very early SYSCALL handler stage (or very end of SYSRET handler) would be a solution. Do you see any better way? And does relocating #VC handler's stack to current task's stack if we detect the SYSCALL gap look the proper way to you? Or would you find that panic would be safer?

@p4zuu
Copy link
Collaborator Author

p4zuu commented Aug 5, 2024

The current syscall handling developing plan goes to software interrupt instead of syscall/sysret. I think we can safely close this.

@p4zuu p4zuu closed this Aug 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wait-for-review PR needs for approval by reviewers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants