-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make #VC handling IST #271
Conversation
Since many actions from untrusted part (hypervisor or user-space) can raise a #VC excetpion during the syscall gap, #VC handling stack switching should be IST. Signed-off-by: Thomas Leroy <thomas.leroy@suse.com>
For more convenient tracking of the stack bounds, we can turn the percpu init and IST stacks tracking from a VirtAddr to a MemoryRegion object. Signed-off-by: Thomas Leroy <thomas.leroy@suse.com>
We need a public method to access IST stack bounds. Let's start with #VC. Signed-off-by: Thomas Leroy <thomas.leroy@suse.com>
Now that #VC handler runs on an IST stack, nested #VC can be handled on the same stack than the parent handler, possibly overwriting the parent's stack content. We need to handle nested #VC differently. We can detect if #VC is nested by checking the value of RSP pushed on the stack in the early exception handler. We can also detect #VC raised from user-mode. For both cases, we relocate the stack to the current task's stack that is safe to use. Finally, we need to copy the pushed registers to the new stack address, and call the regular #VC handler like if #VC handling was not IST. Signed-off-by: Thomas Leroy <thomas.leroy@suse.com>
Replace C /**/ multiline comments by //. Signed-off-by: Thomas Leroy <thomas.leroy@suse.com>
What is the rationale for using IST dispatch? I believe we are on a path not to use SYSCALL due to the complexities it will introduce with TDX Partitioning, so there will be no RIP/RSP modification gap on user/kernel transitions. There is no use of the GS segment in the kernel, so there is no SWAPGS gap on user/kernel transitions. As a result, there should never be a window in which dispatch on the current stack is not possible. Given these statements, is there any benefit for using IST dispatch for any of these exceptions? Even if #VC dispatch is considered vulnerable to stack-based delivery challenges, why move #DB to an IST stack? The considerations that compel other operating systems to use IST dispatch for #DB do not apply to the COCONUT-SVSM kernel. I believe we will want to avoid IST usage as much as we can because of the challenges that IST dispatch poses to reentrant delivery, so we should be able to clearly articulate a valuable reason that is relevant within the COCONUT-SVSM architecture. |
Perhaps this is a dumb question, but what's the alternative here for transitioning to CPL-0? I guess |
Yes, the INT N instruction. I expect that the performance gap between software interrupt dispatch and SYSCALL transitions has narrowed significantly since the SYSCALL instruction was first designed 20 years ago. I can try to dig up numbers. |
let mut new_rsp = if from_user(ctx) || vc_on_ist_stack(ctx) { | ||
this_cpu().current_stack.end() | ||
} else { | ||
VirtAddr::from(ctx.frame.rsp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will not be correct in a SYSCALL gap. If the #VC is delivered immediately after control transfers to kernel mode to the SYSCALL entry point, the CS on the exception frame will indicate a kernel mode CS, but the RSP in the exception frame will be the user-mode RSP since the SYSCALL entry point will not yet have been able to switch to the kernel-mode stack. It would be a serious problem if this routine were to attempt to switch off of the IST stack back onto the user-mode stack.
Because the untrusted host can choose to remove the page backing the SYSCALL entry point and replace it with a different page, it is certainly possible for the first instruction at the SYSCALL entry point to raise #VC due to a page-not-validated error, so this must be anticipated.
A similar problem exists immediately prior to SYSRET. A carefully timed interrupt by the untrusted host could cause the code page to disappear after the user RSP is reloaded but before SYSRET executes, and the resulting #VC will again appear to come from kernel mode on the user stack. This code must also anticipate that possibility. Of course, both of these cases are security attacks which could reasonably result in a panic, but that panic must occur before an inadvertent switch to the user stack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will not be correct in a SYSCALL gap. If the #VC is delivered immediately after control transfers to kernel mode to the SYSCALL entry point, the CS on the exception frame will indicate a kernel mode CS, but the RSP in the exception frame will be the user-mode RSP since the SYSCALL entry point will not yet have been able to switch to the kernel-mode stack. It would be a serious problem if this routine were to attempt to switch off of the IST stack back onto the user-mode stack.
Indeed. I made the choice not to try to detect the SYSCALL gap here at the beginning of the vc_switch_off_ist()
function since we still don't support SYSCALL handling, but that's something I had in mind and would have worked on when we have proper SYSCALL handling. I can try to detect and handle the syscall gap (of course if we go for the SYSCALL option instead of the software interrupt way you mentioned).
A similar problem exists immediately prior to SYSRET. A carefully timed interrupt by the untrusted host could cause the code page to disappear after the user RSP is reloaded but before SYSRET executes, and the resulting #VC will again appear to come from kernel mode on the user stack. This code must also anticipate that possibility. Of course, both of these cases are security attacks which could reasonably result in a panic, but that panic must occur before an inadvertent switch to the user stack.
Indeed again, I was less aware of the same issue in SYSRET. Many thanks for the precision.
I guess spotting that the exception context's RIP is at the very early SYSCALL handler stage (or very end of SYSRET handler) would be a solution. Do you see any better way? And does relocating #VC handler's stack to current task's stack if we detect the SYSCALL gap look the proper way to you? Or would you find that panic would be safer?
The current syscall handling developing plan goes to software interrupt instead of |
Making #VC handler working on IST stack is cleaner. However, it requires to properly handle side-effects, like the handling of nested #VC.
This PR:
A few interrogations I would like to solve before merging: