-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Perform CR4 and CR0 pinning from hypervisors #19
Comments
Hi. I've been working on this here: torvalds/linux@master...pdxjohnny:hc_harden almost ready to send out for review. |
Ah! Very cool. I wonder, though, if it would be better to make this entire a host based defense? i.e. do not involve the guest at all and just detect when special bits are set and keep them set? Or maybe, have a default set of bits and let the guest add more? (I'd like this defense to work "out of the box" for all guests without having the guest need to know to make a hypercall to gain the protection.) |
With regards to the Container as VM usecase.
I like this idea. I'm not sure how we could convince upstream to take that though, so far when we'd thought about that we'd come to the conclusion that we don't know what other kernels people might be running under KVM that would do whacky things like flip those bits intentionally. In which case the KVM people might not be so keen on us breaking functionality for those. But minimally I'd guess that patch would pretty much be: int kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
{
unsigned long old_cr4 = kvm_read_cr4(vcpu);
unsigned long pdptr_bits = X86_CR4_PGE | X86_CR4_PSE | X86_CR4_PAE |
X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_PKE;
if (kvm_valid_cr4(vcpu, cr4))
return 1;
if (~cr4 & old_cr4 & (X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_UMIP))
return 1;
/* ... */ We could make some sort of IOCTL to have QEMU, etc. request that we run in a once-turned-on-never-turn-off mode. The downside to this is that now we have to go touch QMEU, firecracker, etc. Whereas the hypercall approach is pretty self contained within the kernel (aside from the one line QEMU cpuid bit patch). I'm not sure what the most upstreamably friendly approach his here though. As it stands right now the patch is a hypercall that allows for pinning SMEP, SMAP, UMIP, and WP. The same bits you had already pinned. The initial approach had been to allow pinning of arbitrary bits. That was met with skepticism over the TS and MP bits in CR0. So I think it might be easiest to upstream if we first allow for only the pinning of the bits we already pin in software and then we could add the ability to pin others if we care to do that later in a different patch. I'm also meeting with skepticism around KEXEC. I think I'll try to debug what's up with the early boot code clearing those bits to see if we can make this an always on option. If it works with KEXEC, then we probably don't need a CONFIG_ for it, since we can argue that it's functionally equivalent to existing pinning and therefore should be on by default. |
Well shit, I spoke too soon. The patch I have to the early boot code works in KVM, but not on my physical host... If I can't get it working by mid next week I'll just keep the Kconfig option and make it
Setting Also, I finally ran across the code that's really supposed to be enabling |
Just an update. Almost done addressing all RFCv1 comments to send out RFCv2. One of the comments was that we need to ensure SMIs can't modify pinned bits. Which lead to the discovery that AMD and Intel SDMs specify that the CR* fields within SMRAM are readonly. So we shouldn't be able to modify them in the first place. Should be done with the patch to make sure they are readonly soon. It'll be the new first patch in the pv cr pinning patchset. |
Update, PATCH sent: https://lkml.org/lkml/2020/6/17/921 |
Just an update on this. Intel has a review process that I'm gated by right now. I have the next version ready but haven't been able to get a review. https://github.com/pdxjohnny/linux/tree/patch_v2_internal_v6 |
If someone wants to take this and put a co-authored by on it that works for me. Otherwise I probably need another probably 6 months to get this rebased and out the door, provided there are no major issues with the rebase. I've been working on getting Intel proper CI/CD so that the review process can scale. |
I can give a shot at this. |
@gwelymernans Want to have a meeting sometime I can go over the patchset as it was while ago and the test cases? |
@pdxjohnny That's a good idea. :-) Are you available Wednesday? My schedule's fairly open then or later in the week. |
I want to help too! |
Meeting invite sent for 10-11 AM PDT. Please propose a new time if that doesn't work for either of you. I am free most afternoons. I'll try to have instructions on how to run the tests. If either of you want a tar of the filesystem on the SUT I had I can tar it up, but I'd need you to give me a network location to dump it to. It's an at this point old fedora image with the git repos and userspace dependencies installed. Meeting Link: https://meet.google.com/gys-epwh-avc If you both could please read over coverletter and the patchset beforehand that would be great too, as it's dense at some points and we'll get the most out of our meeting it you've at least skimmed it.
|
Great! I would prefer to start 2 or 1 hour earlier if it works for you (I'm UTC+1). |
I can start as early as 8 AM PDT |
Or we can do before. 6-7 AM PDT, but I have a hard conflict 7-8 AM PDT every day. I originally made it two hours but if you both read that stuff beforehand we should be able to do it in an hour. |
Also, please make sure you have a machine ready for this. I'm pretty sure anything within the past 10 years will work so long as it has virtualization. Ideally one of you would have an Intel and one of you would have an AMD then we could validate the AMD support and get you both spun up all at once. I think if you have a machine remotely accessible via AMT or BMC you could do it too, you just need enough to help you re-image the OS and interact with the BIOS menu on the host. Proposed agenda:
|
Correction: 10am PDT will be perfect for me Wednesday. |
Something related came up recently with the kernel IBT (Indirect Branch Tracking) patchset, which is not upstream but looks to be getting close. Kernel IBT enforcement wants to wants to have its CR4 bit pinned, but it caused problems with kexec. See this thread for some more details: The working solution is to unset CR0.WP before CR4.CET in the old kexec-ing kernel, inlined with no pinning check. I was thinking it could cause further problems for KVM pinning, so just wanted to point it out here. Please feel free to ping me if you have questions. |
|
Sorry guys I got dropped from the call and have to join another call
graph TD
run_tests[Run all the tests]
subgraph L0[L0 - host]
kvm_unit_tests
qemu[QEMU]
kexec_tools[kexec-tools]
end
subgraph L1[L2 - guest]
subgraph L1_fs[L1 - guest - filesystem]
guest_qemu[QEMU]
qemu_kexec_tools[kexec-tools]
end
end
subgraph L2[L2 - guest of the guest]
L2_fs
end
run_tests -->|QEMU = L0_qemu| kvm_unit_tests
qemu -->|copy| guest_qemu
kexec_tools-->|copy| qemu_kexec_tools
guest_qemu -->|virtio9p| L2_fs
qemu_kexec_tools-->|virtio9p| L2_fs
|
The conflict that may be the cause of the issue in my initial rebase is in arch/x86/kvm/emulate.c where post_leave_smm() was patched: l0kod@6b49d7b#diff-50a0b0edc05bf5218443999c4f0c82b96115995a68dca707080331ff3190b937 |
FYI, @l0kod, I had to apply this patch to your tree to get it to compile without this feature. diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 86f66d8a7817..44c183bad720 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -168,7 +168,9 @@ static inline void __init kvm_paravirt_cr_pinning_init(void)
}
static inline void kvm_setup_paravirt_cr_pinning(unsigned long cr0_pinned_bits,
- unsigned long cr4_pinned_bits)
+ unsigned long cr0_pinned_mask,
+ unsigned long cr4_pinned_bits,
+ unsigned long cr4_pinned_mask)
{
}
#endif |
That is indeed an issue but it doesn't come from me. :) |
We are working on a new set of KVM features to harden the kernel, and we took some of the CR pinning ideas but followed a much simpler path (i.e. without VMM/IOCTL support): https://lore.kernel.org/all/20230505152046.6575-6-mic@digikod.net/ @pdxjohnny, your patch series could still be useful in the future, let's see how it goes. 😉 Feel free to join the discussion on the mailing list, there is a lot to do and we're looking for contributors! |
Woohoo! We had avoided the hypercall route in order to support live migration |
Similar to the recent CR4 and CR0 pinning done in the native_write_cr*() functions, it would be even better to have KVM (and other hypervisors) pin those bits for their guest kernels too.
The text was updated successfully, but these errors were encountered: