Perform CR4 and CR0 pinning from hypervisors #19

kees · 2019-11-15T05:46:56Z

Similar to the recent CR4 and CR0 pinning done in the native_write_cr*() functions, it would be even better to have KVM (and other hypervisors) pin those bits for their guest kernels too.

johnandersen777 · 2019-11-19T17:23:07Z

Hi. I've been working on this here: torvalds/linux@master...pdxjohnny:hc_harden almost ready to send out for review.

kees · 2019-11-19T17:57:58Z

Ah! Very cool. I wonder, though, if it would be better to make this entire a host based defense? i.e. do not involve the guest at all and just detect when special bits are set and keep them set? Or maybe, have a default set of bits and let the guest add more? (I'd like this defense to work "out of the box" for all guests without having the guest need to know to make a hypercall to gain the protection.)

johnandersen777 · 2019-11-26T23:02:29Z

Moving discussion here (CC: @redgecombe @kaccardi)

With regards to the Container as VM usecase.
I noticed that ChromeOS does Linux support in a VM with a hardened
config. Is this something that would be enabled there if implemented?
We're taking this through review internally and was told an "ack from
those [chromeos] folks would go a long way"

Ah! Very cool. I wonder, though, if it would be better to make this entire a host based defense? i.e. do not involve the guest at all and just detect when special bits are set and keep them set? Or maybe, have a default set of bits and let the guest add more? (I'd like this defence to work "out of the box" for all guests without having the guest need to know to make a hypercall to gain the protection.)

I like this idea. I'm not sure how we could convince upstream to take that though, so far when we'd thought about that we'd come to the conclusion that we don't know what other kernels people might be running under KVM that would do whacky things like flip those bits intentionally. In which case the KVM people might not be so keen on us breaking functionality for those.

But minimally I'd guess that patch would pretty much be:

int kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
{
     unsigned long old_cr4 = kvm_read_cr4(vcpu);
     unsigned long pdptr_bits = X86_CR4_PGE | X86_CR4_PSE | X86_CR4_PAE |
                                X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_PKE;

     if (kvm_valid_cr4(vcpu, cr4))
             return 1;

     if (~cr4 & old_cr4 & (X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_UMIP))
             return 1;

    /* ... */

We could make some sort of IOCTL to have QEMU, etc. request that we run in a once-turned-on-never-turn-off mode. The downside to this is that now we have to go touch QMEU, firecracker, etc. Whereas the hypercall approach is pretty self contained within the kernel (aside from the one line QEMU cpuid bit patch). I'm not sure what the most upstreamably friendly approach his here though.

As it stands right now the patch is a hypercall that allows for pinning SMEP, SMAP, UMIP, and WP. The same bits you had already pinned. The initial approach had been to allow pinning of arbitrary bits. That was met with skepticism over the TS and MP bits in CR0. So I think it might be easiest to upstream if we first allow for only the pinning of the bits we already pin in software and then we could add the ability to pin others if we care to do that later in a different patch.

I'm also meeting with skepticism around KEXEC. I think I'll try to debug what's up with the early boot code clearing those bits to see if we can make this an always on option. If it works with KEXEC, then we probably don't need a CONFIG_ for it, since we can argue that it's functionally equivalent to existing pinning and therefore should be on by default.

johnandersen777 · 2019-11-27T18:33:23Z

Kexec is working!

johnandersen777 · 2019-11-27T20:30:48Z

Well shit, I spoke too soon. The patch I have to the early boot code works in KVM, but not on my physical host... If I can't get it working by mid next week I'll just keep the Kconfig option and make it !KEXEC for the public RFC.

CR0.WP and CR4.SMEP work, still trying to narrow down what's causing it not to boot when on the physical host (never makes it to first console prints to screen, haven't hooked up serial).

Setting CR4.UMIP or CR4.SMAP causes it not to boot. I think this is because of EFLAGS.AC. Either way it's wednesday and so I'm giving up on Kexec for now :/

Also, I finally ran across the code that's really supposed to be enabling SMAP / UMIP and I see that surprise surprise there is a well thought out existing process for enabling them. It was also pointed out that if we kexec to a new kernel the new kernel might be an older kernel that doesn't know it shouldn't turn off those bits. In which case it would blow up. So the boot time disablable kexec is probably the safest and most usable option long term.

johnandersen777 · 2020-02-03T21:36:00Z

Just an update. Almost done addressing all RFCv1 comments to send out RFCv2. One of the comments was that we need to ensure SMIs can't modify pinned bits. Which lead to the discovery that AMD and Intel SDMs specify that the CR* fields within SMRAM are readonly. So we shouldn't be able to modify them in the first place. Should be done with the patch to make sure they are readonly soon. It'll be the new first patch in the pv cr pinning patchset.

johnandersen777 · 2020-06-22T18:26:06Z

Update, PATCH sent: https://lkml.org/lkml/2020/6/17/921

johnandersen777 · 2021-01-13T21:05:13Z

Just an update on this. Intel has a review process that I'm gated by right now. I have the next version ready but haven't been able to get a review. https://github.com/pdxjohnny/linux/tree/patch_v2_internal_v6

johnandersen777 · 2021-12-10T14:01:50Z

If someone wants to take this and put a co-authored by on it that works for me. Otherwise I probably need another probably 6 months to get this rebased and out the door, provided there are no major issues with the rebase. I've been working on getting Intel proper CI/CD so that the review process can scale.

bwendling · 2022-03-14T01:52:08Z

I can give a shot at this.

johnandersen777 · 2022-03-14T02:10:27Z

@gwelymernans Want to have a meeting sometime I can go over the patchset as it was while ago and the test cases?

bwendling · 2022-03-14T06:10:39Z

@pdxjohnny That's a good idea. :-) Are you available Wednesday? My schedule's fairly open then or later in the week.

l0kod · 2022-03-14T14:39:24Z

I want to help too!

johnandersen777 · 2022-03-14T14:51:28Z

Meeting invite sent for 10-11 AM PDT. Please propose a new time if that doesn't work for either of you. I am free most afternoons. I'll try to have instructions on how to run the tests. If either of you want a tar of the filesystem on the SUT I had I can tar it up, but I'd need you to give me a network location to dump it to. It's an at this point old fedora image with the git repos and userspace dependencies installed.

Meeting Link: https://meet.google.com/gys-epwh-avc

If you both could please read over coverletter and the patchset beforehand that would be great too, as it's dense at some points and we'll get the most out of our meeting it you've at least skimmed it.

l0kod · 2022-03-14T15:06:51Z

Great! I would prefer to start 2 or 1 hour earlier if it works for you (I'm UTC+1).

johnandersen777 · 2022-03-14T15:07:40Z

I can start as early as 8 AM PDT

johnandersen777 · 2022-03-14T15:09:28Z

Or we can do before. 6-7 AM PDT, but I have a hard conflict 7-8 AM PDT every day. I originally made it two hours but if you both read that stuff beforehand we should be able to do it in an hour.

johnandersen777 · 2022-03-14T15:12:16Z

Also, please make sure you have a machine ready for this. I'm pretty sure anything within the past 10 years will work so long as it has virtualization. Ideally one of you would have an Intel and one of you would have an AMD then we could validate the AMD support and get you both spun up all at once. I think if you have a machine remotely accessible via AMT or BMC you could do it too, you just need enough to help you re-image the OS and interact with the BIOS menu on the host.

Proposed agenda:

Introductions
Q&A
Get existing code at old point in git history running

l0kod · 2022-03-14T15:15:42Z

Correction: 10am PDT will be perfect for me Wednesday.

rpedgeco · 2022-03-14T15:56:42Z

Something related came up recently with the kernel IBT (Indirect Branch Tracking) patchset, which is not upstream but looks to be getting close. Kernel IBT enforcement wants to wants to have its CR4 bit pinned, but it caused problems with kexec. See this thread for some more details:
https://lore.kernel.org/lkml/eed8902f21ba9e5f93562432f6b5920137860a98.camel@intel.com/

The working solution is to unset CR0.WP before CR4.CET in the old kexec-ing kernel, inlined with no pinning check. I was thinking it could cause further problems for KVM pinning, so just wanted to point it out here. Please feel free to ping me if you have questions.

johnandersen777 · 2022-03-16T17:41:33Z

kvm/next for target for next patchset
run.sh runs the integration tests
test setup
- Machine in front of you with virtuatialization support
https://web.archive.org/web/20191105205408/http://events19.linuxfoundation.org/wp-content/uploads/2017/12/Improving-KVM-x86-Nested-Virtualization-Liran-Alon-Oracle.pdf
https://github.com/pdxjohnny/kvm-unit-tests

johnandersen777 · 2022-03-22T17:49:15Z

Sorry guys I got dropped from the call and have to join another call

For the triple fault, it might be related to the setup which I copied from other tests
- I am not sure what the ACPI stuff happens
- l0kod@cb01481#diff-b9503de1ae72a27304b70f3f192f86c875cfd66e05bc555eb974df55c2efa270R77-R83

graph TD
  run_tests[Run all the tests]

  subgraph L0[L0 - host]
     kvm_unit_tests
     qemu[QEMU]
     kexec_tools[kexec-tools]
  end

  subgraph L1[L2 - guest]
    subgraph L1_fs[L1 - guest - filesystem]
      guest_qemu[QEMU]
      qemu_kexec_tools[kexec-tools]
    end
  end

  subgraph L2[L2 - guest of the guest]
    L2_fs
  end

  run_tests -->|QEMU = L0_qemu| kvm_unit_tests

  qemu -->|copy| guest_qemu
  kexec_tools-->|copy| qemu_kexec_tools

  guest_qemu -->|virtio9p| L2_fs
  qemu_kexec_tools-->|virtio9p| L2_fs

Always ensure we are using QEMU with patches from series
- https://github.com/pdxjohnny/qemu/commits/41356f28cad1f54d84cb81ce8bfdbbfef29ad68e
  - johnandersen777/qemu@b74bacd
  - johnandersen777/qemu@41356f2
- kvm-unit-tests need to be run with the version of QEMU you built from these patches
  - https://github.com/pdxjohnny/protecting_cr4_host_setup/blob/92c4238e02bb5fe9f60572137a4ec6a2ddcd9450/test.sh#L62
- Need to copy QEMU into L1 and then L2 in integration tests passes through L1 fs via 9p
Ensure kexec-tools
L2 tests
- https://github.com/pdxjohnny/protecting_cr4_host_setup/blob/master/chroot/home/johnsa1/run.sh
- https://github.com/pdxjohnny/protecting_cr4_host_setup/blob/92c4238e02bb5fe9f60572137a4ec6a2ddcd9450/test.sh#L124-L149

l0kod · 2022-03-22T18:22:39Z

The conflict that may be the cause of the issue in my initial rebase is in arch/x86/kvm/emulate.c where post_leave_smm() was patched: l0kod@6b49d7b#diff-50a0b0edc05bf5218443999c4f0c82b96115995a68dca707080331ff3190b937

isanbard · 2022-03-22T18:51:59Z

FYI, @l0kod, I had to apply this patch to your tree to get it to compile without this feature.

diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 86f66d8a7817..44c183bad720 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -168,7 +168,9 @@ static inline void __init kvm_paravirt_cr_pinning_init(void)
 }

 static inline void kvm_setup_paravirt_cr_pinning(unsigned long cr0_pinned_bits,
-                                                unsigned long cr4_pinned_bits)
+                                                unsigned long cr0_pinned_mask,
+                                                unsigned long cr4_pinned_bits,
+                                                unsigned long cr4_pinned_mask)
 {
 }
 #endif

l0kod · 2022-03-23T13:55:49Z

That is indeed an issue but it doesn't come from me. :)
Actually we didn't notice this because we compile with CONFIG_KVM_GUEST, and you should too. ;)

l0kod · 2023-05-05T18:52:43Z

We are working on a new set of KVM features to harden the kernel, and we took some of the CR pinning ideas but followed a much simpler path (i.e. without VMM/IOCTL support): https://lore.kernel.org/all/20230505152046.6575-6-mic@digikod.net/
See the cover letter for a detailed explanation of Heki: https://lore.kernel.org/all/20230505152046.6575-1-mic@digikod.net/

@pdxjohnny, your patch series could still be useful in the future, let's see how it goes. 😉

Feel free to join the discussion on the mailing list, there is a lot to do and we're looking for contributors!

johnandersen777 · 2023-05-05T22:58:47Z

Woohoo!

We had avoided the hypercall route in order to support live migration

kees added good first issue Good for newcomers [ARCH] x86_64 Needed on the 64-bit x86 architecture (ARCH=x86) labels Nov 15, 2019

kees removed the good first issue Good for newcomers label Feb 25, 2020

kees mentioned this issue Apr 6, 2022

thin self-hosting hypervisor layer #75

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Perform CR4 and CR0 pinning from hypervisors #19

Perform CR4 and CR0 pinning from hypervisors #19

kees commented Nov 15, 2019

johnandersen777 commented Nov 19, 2019 •

edited

Loading

kees commented Nov 19, 2019

johnandersen777 commented Nov 26, 2019

johnandersen777 commented Nov 27, 2019

johnandersen777 commented Nov 27, 2019 •

edited

Loading

johnandersen777 commented Feb 3, 2020

johnandersen777 commented Jun 22, 2020

johnandersen777 commented Jan 13, 2021 •

edited

Loading

johnandersen777 commented Dec 10, 2021 •

edited

Loading

bwendling commented Mar 14, 2022

johnandersen777 commented Mar 14, 2022

bwendling commented Mar 14, 2022

l0kod commented Mar 14, 2022

johnandersen777 commented Mar 14, 2022 •

edited

Loading

l0kod commented Mar 14, 2022

johnandersen777 commented Mar 14, 2022 •

edited

Loading

johnandersen777 commented Mar 14, 2022

johnandersen777 commented Mar 14, 2022 •

edited

Loading

l0kod commented Mar 14, 2022

rpedgeco commented Mar 14, 2022

johnandersen777 commented Mar 16, 2022 •

edited

Loading

johnandersen777 commented Mar 22, 2022 •

edited

Loading

l0kod commented Mar 22, 2022

isanbard commented Mar 22, 2022

l0kod commented Mar 23, 2022

l0kod commented May 5, 2023

johnandersen777 commented May 5, 2023

Perform CR4 and CR0 pinning from hypervisors #19

Perform CR4 and CR0 pinning from hypervisors #19

Comments

kees commented Nov 15, 2019

johnandersen777 commented Nov 19, 2019 • edited Loading

kees commented Nov 19, 2019

johnandersen777 commented Nov 26, 2019

johnandersen777 commented Nov 27, 2019

johnandersen777 commented Nov 27, 2019 • edited Loading

johnandersen777 commented Feb 3, 2020

johnandersen777 commented Jun 22, 2020

johnandersen777 commented Jan 13, 2021 • edited Loading

johnandersen777 commented Dec 10, 2021 • edited Loading

bwendling commented Mar 14, 2022

johnandersen777 commented Mar 14, 2022

bwendling commented Mar 14, 2022

l0kod commented Mar 14, 2022

johnandersen777 commented Mar 14, 2022 • edited Loading

l0kod commented Mar 14, 2022

johnandersen777 commented Mar 14, 2022 • edited Loading

johnandersen777 commented Mar 14, 2022

johnandersen777 commented Mar 14, 2022 • edited Loading

l0kod commented Mar 14, 2022

rpedgeco commented Mar 14, 2022

johnandersen777 commented Mar 16, 2022 • edited Loading

johnandersen777 commented Mar 22, 2022 • edited Loading

l0kod commented Mar 22, 2022

isanbard commented Mar 22, 2022

l0kod commented Mar 23, 2022

l0kod commented May 5, 2023

johnandersen777 commented May 5, 2023

johnandersen777 commented Nov 19, 2019 •

edited

Loading

johnandersen777 commented Nov 27, 2019 •

edited

Loading

johnandersen777 commented Jan 13, 2021 •

edited

Loading

johnandersen777 commented Dec 10, 2021 •

edited

Loading

johnandersen777 commented Mar 14, 2022 •

edited

Loading

johnandersen777 commented Mar 14, 2022 •

edited

Loading

johnandersen777 commented Mar 14, 2022 •

edited

Loading

johnandersen777 commented Mar 16, 2022 •

edited

Loading

johnandersen777 commented Mar 22, 2022 •

edited

Loading