Skip to content

vmm, hypervisor: kvm: Add KVM SEV-SNP to Cloud Hypervisor#7942

Merged
rbradford merged 17 commits intocloud-hypervisor:mainfrom
rhakobyan:kvm_sev_snp_revamped
Apr 17, 2026
Merged

vmm, hypervisor: kvm: Add KVM SEV-SNP to Cloud Hypervisor#7942
rbradford merged 17 commits intocloud-hypervisor:mainfrom
rhakobyan:kvm_sev_snp_revamped

Conversation

@rhakobyan
Copy link
Copy Markdown
Contributor

@rhakobyan rhakobyan commented Apr 1, 2026

This PR adds support to Cloud Hypervisor to boot a guest on KVM with AMD SEV-SNP support and an oak stage0 firmware built into IGVM.

The work here is derivded from #7787 and #7316 with some changes, improvements and additions to have it boot a SEV-SNP guest on real hardware.

Key Changes

  • Add support for guest_memfd, enabling private memory for confidential VMs
  • Implement KVM SEV-SNP ioctls KVM_SEV_INIT2 , KVM_SEV_SNP_LAUNCH_ START, KVM_SEV_SNP_LAUNCH_ UPDATE, KVM_SEV_SNP_LAUNCH_ FINISH
  • IGVM loader refactoring to support KVM SEV-SNP handling for VMSA, page types, etc.
  • Boot a kernel with oak stage0 firmware through IGVM

Fixes #6653

Testing

  1. Build oak stage0 firmware:
 bazel build //stage0_bin --config=release
  1. Build an IGVM with the stage0. Example with buildigvm:
buildigvm sev-snp --firmware <stage0_bin> \
                  --output <igvm_image> \
                  -- cpucount <n>
  1. Build Cloud Hypervisor:
cargo build --no-default-features --features "kvm,igvm,sev_snp,fw_cfg"
  1. Run cloud hypervisor:
cloud-hypervisor --platform sev_snp=on \
                    --igvm <igvm_with_stage0> \ 
                    --kernel <kernel_image> \
                    --fw-cfg-config <intramfs=off...> \
                    ...

@rhakobyan rhakobyan marked this pull request as ready for review April 1, 2026 01:52
@rhakobyan rhakobyan requested a review from a team as a code owner April 1, 2026 01:52
@phip1611
Copy link
Copy Markdown
Member

phip1611 commented Apr 1, 2026

This PR description says it builds on #7787 and #7316, but #7787 is still open and #7316 was closed.

Also, the current title makes this look very close in scope to #7787, so at the moment it is not clear to me whether this is:

I'm marking this as draft for now until that relationship is clarified. Please expand the PR description and explain:

  1. how this differs from hypervisor: kvm: Add KVM SEV-SNP support #7787,
  2. whether it depends on any commits or concepts from hypervisor: kvm: Add KVM SEV-SNP support #7787, and
  3. whether hypervisor: kvm: Add KVM SEV-SNP support #7787 should be closed in favor of this PR.

If this PR requires #7787 first, then I think this should be closed for now and reopened once all prerequisites are merged. This keeps review focused.

If I'm misunderstanding the intent here, feel free to reach out.

@phip1611 phip1611 marked this pull request as draft April 1, 2026 05:08
@rbradford
Copy link
Copy Markdown
Member

This PR description says it builds on #7787 and #7316, but #7787 is still open and #7316 was closed.

Also, the current title makes this look very close in scope to #7787, so at the moment it is not clear to me whether this is:

@phip1611 My colleague @rhakobyan opened this PR, derived from the code in the original PR, but he has the ability to build and test this PR on the hardware. Something that the author of #7787 said they couldn't do - I think this PR should supersede that PR.

We identified that there was some dead code in the original PR related to register programming that was superfluous when using Project Oak stage0, which was a requirement of that version and this version. It would be possible to add that back when we can come up with a solution that does not require IGVM.

@phip1611
Copy link
Copy Markdown
Member

phip1611 commented Apr 1, 2026

Thanks for the clarification @rbradford - good to know! It would be great if we could make this clear in the PR description. Feel free to undraft!

Copy link
Copy Markdown
Member

@rbradford rbradford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @rhakobyan - some more stuff for you to look at. Please also add some documentation like you did in the PR body showing how to run it.

It also looks like, right now, we should make the sev_snp feature depend on the igvm and fw_cfg features.

Comment thread hypervisor/src/kvm/mod.rs Outdated
Comment thread hypervisor/src/kvm/mod.rs Outdated
}
memfd = if kvm_guest_memfd_supported {
// TODO: Refactor to create memfd when the memory region is created so
// that the size is appropriate.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get this comment - is it that a new guest_memfd should be created for each region (e.g. for the < 3GiB and + 4GiB regions)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think this goes back to the earlier comment. Right now, we create one memfd spanning the whole virtual address space, but should instead do this per-region.

Comment thread hypervisor/src/kvm/mod.rs Outdated
Comment thread hypervisor/src/kvm/mod.rs
Comment thread arch/src/x86_64/mod.rs
Comment thread vmm/src/vm.rs Outdated
// (e.g. MSHV) handle this through their own import path.
#[cfg(all(feature = "kvm", feature = "sev_snp"))]
if cpu_manager.lock().unwrap().sev_snp_enabled() {
Self::reserve_region_for_stage0(&memory_manager)?;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have to do this differently on MSHV?

Comment thread vmm/src/seccomp_filters.rs Outdated
Comment thread vmm/src/cpu.rs
Comment thread vmm/src/cpu.rs
Comment thread vmm/src/igvm/igvm_loader.rs
Comment thread vmm/src/igvm/igvm_loader.rs Outdated
Comment thread hypervisor/src/kvm/mod.rs Outdated
#[cfg(feature = "igvm")]
pub const STAGE0_START_ADDRESS: GuestAddress = GuestAddress(0xffe0_0000);
#[cfg(feature = "igvm")]
pub const STAGE0_SIZE: usize = 0x20_0000;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IDK if we want to use stage0's specific reset vector here, or if we would prefer that it's a bit more generic. The industry uses EDK2, which we want to be able to swap either for a non-traditional bootloader like stage0 or one that implements UEFI

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I have made this a bit more generic, though ideally I think we'd wanna parse IGVM directives to check the VMSA, etc. I've added a TODO.

Comment thread hypervisor/src/kvm/x86_64/sev.rs Outdated
Comment thread vmm/src/igvm/igvm_loader.rs Outdated
Comment thread hypervisor/src/kvm/x86_64/sev.rs Outdated
}
}

pub(crate) fn init2(&self, vm: &VmFd) -> Result<()> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to make sure that we pass in the SEV_FEATURES into the VMSA here before calling init2, at least it seems like QEMU does this:
https://gitlab.com/qemu-project/qemu/-/blob/master/target/i386/sev.c#L1895-1914

This makes sense to me, and it looks like we're just providing defaults. This isn't actually want we want to do--SEV_FEATURES is something that should be passed in from the user

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we currently go with defaults, but will need to parse SEV_FEATURES from IGVM I think. I'll check how QEMU handles this.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I updated this — it's now parsing IGVM for SEV features.

@rhakobyan rhakobyan force-pushed the kvm_sev_snp_revamped branch from 4745bd2 to 1ba0cfe Compare April 8, 2026 05:06
@rhakobyan rhakobyan marked this pull request as ready for review April 8, 2026 05:23
@rhakobyan
Copy link
Copy Markdown
Contributor Author

Thanks for the clarification @rbradford - good to know! It would be great if we could make this clear in the PR description. Feel free to undraft!

Thanks @phip1611, I have updated the PR to be more explicit in what it's doing.

copybara-service Bot pushed a commit to project-oak/oak that referenced this pull request Apr 8, 2026
We are adding support for stage0 in Cloud Hypervisor and found that we were busy polling for a PCI device on the bridge... Eventually we discovered that Cloud Hypervisor actually has its own `PciConfiguration` here: https://github.com/cloud-hypervisor/cloud-hypervisor/blob/8248650e799457fee3068b0bf3b54ffcade4bce1/pci/src/bus.rs#L66-L82

Teach Stage0 about this so that we can init the machine with this new type

# Changes
- Moved the `mmio64_hole()` method to a helper method which introduces an `addr_size`--This is solely just so that we can customize what arg is passed
- Added the `IntelVirtualPcieHost` machine type

# Tests

```
$ nix develop
$ bazel test //stage0
...
INFO: Found 3 targets and 2 test targets...
INFO: Elapsed time: 183.706s, Critical Path: 172.55s
INFO: 1966 processes: 850 internal, 1116 linux-sandbox.
INFO: Build completed successfully, 1966 total actions
//stage0:stage0_test                                                     PASSED in 0.0s
//stage0/testing:stage0_kernel_test                                      PASSED in 0.9s

Executed 2 out of 2 tests: 2 tests pass.
```

I also launched a SEV-SNP VM using Cloud hypervisor. We're in the midst of adding this support, so please look at the steps in cloud-hypervisor/cloud-hypervisor#7942 (comment) for reproducing the binaries used for testing.

I also used the `buildigvm` binary from https://gitlab.com/qemu-project/buildigvm, which was checked in recently

[stage0_pcie.txt](https://github.com/user-attachments/files/26471385/stage0_pcie.txt)

Change-Id: Id3ee62393df631fcaa7cc9144aa7fc416a6a6964
use crate::igvm::{BootPageAcceptance, HV_PAGE_SIZE, IgvmLoadedInfo, StartupMemoryType};
use crate::memory_manager::{Error as MemoryManagerError, MemoryManager};

#[cfg(feature = "sev_snp")]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, we decided to put the kernel hash logic in here huh. I was thinking we could leave it out to keep the focus solely on enabling the launch, but we can leave this in here...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I'll move this out to a separate PR. I'll address the comments there directly.

Comment thread vmm/src/igvm/igvm_loader.rs Outdated
#[cfg(feature = "sev_snp")]
const SNP_CPUID_LIMIT: u32 = 64;
#[cfg(all(feature = "sev_snp", target_arch = "x86_64"))]
const SEV_HASH_BLOCK_ADDRESS: u64 = 0x10c00;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all comes from my pending PR in stage0 (https://github.com/project-oak/oak/pull/5073/changes), which I need to actually get checked in...

We could leave links as to where these are all found (in EDK2). For instance, the SEV_HASH_BLOCK_ADDRESS comes from https://github.com/tianocore/edk2/blob/f98662c5e35b6ab60f46ee4350fa0e6eab0497cf/OvmfPkg/Include/Fdf/MemFd.fdf.inc#L89-L93

Comment thread vmm/src/vm.rs
} else {
None
};
vm.sev_snp_init(Self::get_default_sev_snp_guest_policy())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love the default policy, although we'll need to make sure this is eventually injectable from the guest :) There are certain policies we need to enable from the guest side to make sure that they're enabled in the future (enable/disable SMT, make sure debug is off, etc.) Doesn't have to be this patch, but we'll need to keep in mind that this will be an actively changed paramter

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I have noted this, will get the policy from IGVM in a subsequent PRs to keep this one simpler.

Comment thread vmm/src/vm.rs Outdated
if let Some(acpi_hash) = _fw_cfg_acpi_hash.as_deref() {
let mut config = config.lock().unwrap();
if let Some(payload) = config.payload.as_mut() {
Self::ensure_cmdline_param(payload, "acpi_hash", acpi_hash);
Copy link
Copy Markdown
Contributor

@CookieComputing CookieComputing Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious as to why we're adding acpi_hash to the cmdline (or even checking for an ACPI hash at all)? I know this is enabled in stage0 (we add it there as an additional cmdline after we've launched, but in the measured boot use-case involving kernel hashes, this is actually un-desireable--CH would launch with a kernel cmdline digest that is different from what was measured previously via igvmmeasure and what not.

IMO, this isn't a CH/VMM responsibility to inject the acpi hash into the VM. CH/our VMM should just generate the required ACPI tables and then launch, let the CVM care about the hashes and what not

Comment thread hypervisor/src/kvm/mod.rs
}

#[cfg(all(feature = "sev_snp", target_arch = "x86_64"))]
fn import_isolated_pages(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tangent, but I was a bit confused on the import_isolated_pages naming until I realized this was more or less just a wrapper over SNP_LAUNCH_UPDATE (import data into the context for launch digest) and SNP_LAUNCH_FINISH (complete the digest and storing it as the measurement for launch), perhaps this is more relevant for the MSHV use-case

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you're definitely right, import_isolated_pages is an MSHV-speicifc ioctl (https://docs.rs/mshv-ioctls/latest/src/mshv_ioctls/ioctls/vm.rs.html#157-168) so the naming here is a not great. We should probably think about naming this something more generic.

Comment thread vmm/src/acpi.rs Outdated
use std::fmt::Write;
write!(&mut hex_digest, "{byte:02x}").unwrap();
}
hex_digest
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it's okay to have the ACPI table hashes and to print them out, although I don't think we should be injecting them into the kernel cmdline, which will cause attestation issues

Comment thread vmm/src/cpu.rs
// loader populates vCPU registers from the VMSA via set_sev_control_register
// (currently KVM-specific; MSHV handles this through its own import path).
// igvm_enabled is kept as an explicit flag rather than derived from sev_snp
// state because IGVM could theoretically be used independently of SEV-SNP.
Copy link
Copy Markdown
Contributor

@CookieComputing CookieComputing Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have a few non-SEV-SNP VMs that try to emulate the flow of an SEV-SNP CVM (mainly for dev/testing purposes), so this is fairly valid. We'd like to keep the parameters mostly the same if trying to launch basically the same VM except without SEV-SNP technology

Comment thread vmm/src/vm.rs Outdated
Comment on lines +1178 to +1179
// Inject acpi_hash into cmdline if needed (for boot() path where
// init_sev_snp's ensure_cmdline_param may not have run).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's okay for us to do a check from the VMM to see that the ACPI hash matches but it should not be injected into the actual cmdline

Comment thread vmm/src/vm.rs Outdated

// Stage0 region is only needed for KVM SEV-SNP with IGVM.
#[cfg(all(feature = "kvm", feature = "sev_snp"))]
fn reserve_region_for_stage0(memory_manager: &Arc<Mutex<MemoryManager>>) -> Result<()> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same note as above--let's not tailor specifically for Stage0 and it's quirks, but instead prime for generic bootloader entry paths

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed the name, thanks.

Comment thread vmm/src/igvm/igvm_loader.rs Outdated
let mut loader = Loader::new(memory);

let mut parameter_areas: HashMap<u32, ParameterAreaState> = HashMap::new();
#[cfg(all(feature = "sev_snp", target_arch = "x86_64"))]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make the measured boot an option that can be passed in? This is what QEMU does.

In the SEV SNP object in QEMU, we toggle kernel-hashes=on to enforce the measured boot hash, otherwise I suppose it would be whatever pages were enforced via SNP_LAUNCH_UPDATE

Comment thread vmm/src/igvm/igvm_loader.rs Outdated
}

#[cfg(all(feature = "sev_snp", target_arch = "x86_64"))]
impl MeasuredBootInfo {
Copy link
Copy Markdown
Contributor

@CookieComputing CookieComputing Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO MeasuredBootInfo and all of the above SEV SNP logic should be all delegated to sev.rs, since it's not really an IGVM concept

Comment thread vmm/src/igvm/igvm_loader.rs Outdated

let table = PaddedSevHashTable {
table: SevHashTable {
guid: uuid::uuid!("9438d606-4f22-4cc9-b479-a793d411fd21").to_bytes_le(),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These UUID strings should be attributed to their provenance here: https://github.com/tianocore/edk2/blob/76406ef119625ee3185d4ae3bf7701147dba50ac/OvmfPkg/AmdSev/BlobVerifierLibSevHashes/BlobVerifierSevHashes.c#L23-L30

Otherwise, it'll look really confusing when reading at a glance

@rhakobyan rhakobyan force-pushed the kvm_sev_snp_revamped branch 5 times, most recently from ac74021 to 59d73c9 Compare April 17, 2026 01:42
Talador12 and others added 17 commits April 17, 2026 04:02
Add support for guest_memfd (available in Linux kernel v6.8+), which
enables private memory for confidential VMs.

Key changes:
- Introduce UserMemoryRegion abstraction with guest_memfd fields
- Add From impls between kvm_userspace_memory_region2 and UserMemoryRegion
- Convert all KVM memory region operations from kvm_userspace_memory_region
  to kvm_userspace_memory_region2, with automatic fallback to v1 when
  guest_memfd is not supported
- Add set_user_memory_region() wrapper that dispatches to v1/v2 based on
  kvm_guest_memfd_supported capability
- Create guest_memfd via KVM_CREATE_GUEST_MEMFD ioctl when supported
- Extend KvmDirtyLogSlot to preserve region2 fields across dirty log
  start/stop cycles

This is prerequisite infrastructure for KVM-based confidential computing
that requires private guest memory backed by guest_memfd.

Co-authored-by: Alex Orozco <aorozco@google.com>
Signed-off-by: Keith Adler <kadler@cloudflare.com>
Signed-off-by: Ruben Hakobyan <hruben@meta.com>
Previously, the payload validation rejected an IGVM file combined with
a kernel or firmware.

Relax this constraint to allow an IGVM carrying a firmware (e.g Oak
stage0) to be paired with a separate kernel image.

This enables fw_cfg-style boot where stage0 loads a kernel provided
through fw_cfg rather than embedded in the IGVM file itself.

Signed-off-by: Ruben Hakobyan <hruben@meta.com>
When we use igvm + kvm, we setup the regs and sregs using the cpuid
page. We still need to setup the fpu in configure_vcpu.

Co-authored-by: Keith Adler <kadler@cloudflare.com>
Signed-off-by: Keith Adler <kadler@cloudflare.com>
Co-authored-by: Alex Orozco <aorozco@google.com>
Signed-off-by: Alex Orozco <aorozco@google.com>
Signed-off-by: Ruben Hakobyan <hruben@meta.com>
The SNP guest policy (AMD SEV-SNP ABI bits controlling SMT, migration,
debug, etc.) was previously hardcoded inside the MSHV implementation.
Widen Vm::sev_snp_init() to accept an SnpPolicy parameter so each
hypervisor backend receives the policy at init time.

Add get_default_sev_snp_guest_policy() in the VMM to construct the
default policy.

Co-authored-by: Keith Adler <kadler@cloudflare.com>
Signed-off-by: Keith Adler <kadler@cloudflare.com>
Co-authored-by: Alex Orozco <aorozco@google.com>
Signed-off-by: Alex Orozco <aorozco@google.com>
Signed-off-by: Ruben Hakobyan <hruben@meta.com>
Change configure_system to take an Option<GuestAddress>
since rsdp is wrapped into an option anyways (we use configure
system to setup the mptables).

Co-authored-by: Keith Adler <kadler@cloudflare.com>
Signed-off-by: Keith Adler <kadler@cloudflare.com>
Co-authored-by: Alex Orozco <aorozco@google.com>
Signed-off-by: Alex Orozco <aorozco@google.com>
Signed-off-by: Ruben Hakobyan <hruben@meta.com>
The load_payload and load_payload_async functions previously received a
sev_snp_enabled flag to decide whether to call load_igvm with or
without the host_data parameter. Replace this with a single code path
that always passes host_data behind a cfg(feature = "sev_snp") gate,
removing the runtime branch and the extra parameter threaded through
three call sites.

Co-authored-by: Keith Adler <kadler@cloudflare.com>
Signed-off-by: Keith Adler <kadler@cloudflare.com>
Co-authored-by: Alex Orozco <aorozco@google.com>
Signed-off-by: Alex Orozco <aorozco@google.com>
Signed-off-by: Ruben Hakobyan <hruben@meta.com>
Move IGVM file parsing from load_igvm() into a dedicated parse_igvm()
helper in igvm/mod.rs, and parse the file upfront in Vm::new() so the
resulting IgvmFile struct is available throughout VM initialization.

This is a prerequisite for extracting VMSA SEV features from the parsed
IGVM before issuing KVM_SEV_INIT2, which needs sev_features.

Signed-off-by: Ruben Hakobyan <hruben@meta.com>
Introduce the SevFd abstraction that wraps /dev/sev and implements the
KVM_SEV_INIT2 and KVM_SEV_SNP_LAUNCH_START ioctls for SEV-SNP VM
initialization on KVM.

Key changes:
- Add sev.rs with KvmSevInit and KvmSevSnpLaunchStart ioctl structs
  matching the kernel layout (linux/arch/x86/include/uapi/asm/kvm.h)
- Implement KVM_SEV_INIT2 and KVM_SEV_SNP_LAUNCH_START ioctls
- Set KVM_MEMORY_ATTRIBUTE_PRIVATE on newly created memory regions
  when guest_memfd is supported
- Widen SevSnpPageAccessProxy cfg gates from mshv-only to all
  sev_snp-enabled builds
- Make sev_snp_init a required trait method (remove default impl)
- Include KVM_SEV_SNP_LAUNCH_START in the seccomp allowlist
- Parse VMSA SEV features from IGVM and include them in the
  KVM_SEV_INIT2 ioctl

Co-authored-by: Keith Adler <kadler@cloudflare.com>
Signed-off-by: Keith Adler <kadler@cloudflare.com>
Co-authored-by: Alex Orozco <aorozco@google.com>
Signed-off-by: Alex Orozco <aorozco@google.com>
Co-authored-by: Rob Bradford <rbradford@meta.com>
Signed-off-by: Rob Bradford <rbradford@meta.com>
Signed-off-by: Ruben Hakobyan <hruben@meta.com>
Implement the KVM_SEV_SNP_LAUNCH_UPDATE ioctl.

Extend Vm::import_isolated_pages() with a uaddrs parameter carrying
host virtual addresses, which KVM needs, unlike MSHV. Compute uaddrs
from guest memory mappings in the IGVM loader.

Add KVM_SEV_SNP_LAUNCH_UPDATE to the seccomp allowlist.

Co-authored-by: Keith Adler <kadler@cloudflare.com>
Signed-off-by: Keith Adler <kadler@cloudflare.com>
Co-authored-by: Alex Orozco <aorozco@google.com>
Signed-off-by: Alex Orozco <aorozco@google.com>
Signed-off-by: Ruben Hakobyan <hruben@meta.com>
Add the KVM_SEV_SNP_LAUNCH_FINISH ioctl, which finalizes the SNP
launch sequence and transitions the VM into a runnable encrypted
state.

Additionally, add KVM_SEV_SNP_LAUNCH_FINISH to the seccomp allowlist.

Co-authored-by: Keith Adler <kadler@cloudflare.com>
Signed-off-by: Keith Adler <kadler@cloudflare.com>
Co-authored-by: Alex Orozco <aorozco@google.com>
Signed-off-by: Alex Orozco <aorozco@google.com>
Signed-off-by: Ruben Hakobyan <hruben@meta.com>
SEV-SNP guests will issue this hypercall to signal a change in the page
encryption status to the hypervisor.

Handle VcpuExit::Hypercall in the KVM vCPU run loop: decode the GPA,
page count, and private/shared attribute from the hypercall arguments,
then call KVM_SET_MEMORY_ATTRIBUTES to update the page state.

Co-authored-by: Keith Adler <kadler@cloudflare.com>
Signed-off-by: Keith Adler <kadler@cloudflare.com>
Co-authored-by: Alex Orozco <aorozco@google.com>
Signed-off-by: Alex Orozco <aorozco@google.com>
Signed-off-by: Ruben Hakobyan <hruben@meta.com>
During SNP boot all guest RAM is initially marked
KVM_MEMORY_ATTRIBUTE_PRIVATE. Pages imported via SNP_LAUNCH_UPDATE are
properly accepted by the guest, but generic RAM pages (e.g. the AP
trampoline at GPA 0xD000) are not. When stage0 on the BSP starts
secondary vCPUs via x2APIC, the APs try to execute from the trampoline
page through the shared mapping while KVM still has it marked private,
causing a KVM_EXIT_MEMORY_FAULT (flags=KVM_MEMORY_EXIT_FLAG_PRIVATE)
that previously fell through to the catch-all error, killing the VM.

Handle VcpuExit::MemoryFault by toggling the page's memory attribute
between private and shared based on the exit flags, allowing the vCPU
to retry the access.

Signed-off-by: Ruben Hakobyan <hruben@meta.com>
Adapt the IGVM loader to work with both MSHV and KVM backends, which
differ in page type constants, CPUID page layout, and VMSA handling.

Abstract page types into a PageTypeConfig struct populated at runtime
from the detected hypervisor, replacing hardcoded mshv_bindings constants.

Apply the VMSA register state to each vCPU via setup_sev_snp_regs(),
translating SevSelector attributes to KVM segment format using a bitfield
decoder.

KVM's SNP launch path sanitizes certain CPUID bits that could lead to
an insecure guest. If the VMM sets these bits, KVM rejects the CPUID
page import on the first attempt, requiring a retry with the
firmware-corrected values.

Pre-clear the known problematic bits before import to avoid the
reject-and-retry cycle:

- Leaf 0x1, ECX bit 24: TSC_DEADLINE (filtered by KVM)
- Leaf 0x7, EBX bit 1: SGX (filtered by KVM)
- Leaf 0x7, EDX: clear entirely (contains speculative features)
- Leaf 0x80000008, EBX bit 25: filtered by KVM
- Leaf 0x80000021, ECX: clear entirely

This keeps the CPUID page stable across launch updates and avoids
noisy error logs from the retry path.

Co-authored-by: Keith Adler <kadler@cloudflare.com>
Signed-off-by: Keith Adler <kadler@cloudflare.com>
Co-authored-by: Alex Orozco <aorozco@google.com>
Signed-off-by: Alex Orozco <aorozco@google.com>
Co-authored-by: Dylan Reid <dgreid@fb.com>
Signed-off-by: Dylan Reid <dgreid@fb.com>
Signed-off-by: Ruben Hakobyan <hruben@meta.com>
A bootloader/firmware (e.g. stage0) and the VMSA page require dedicated
memory regions at fixed GPAs.

Add reserve_region_for_stage0() to allocate these regions before IGVM
loading begins:
- Stage0 at GPA 0xffc0_0000 (4 MB)
- VMSA page at GPA 0xffff_ffff_f000 (4 KB)

These reservations are KVM-only; MSHV handles stage0/VMSA placement
through its own isolated import path.

Also add fw_cfg device creation and SYS_statx to the vCPU seccomp
allowlist (needed by stage0's file access pattern).

Co-authored-by: Keith Adler <kadler@cloudflare.com>
Signed-off-by: Keith Adler <kadler@cloudflare.com>
Co-authored-by: Alex Orozco <aorozco@google.com>
Signed-off-by: Alex Orozco <aorozco@google.com>
Signed-off-by: Ruben Hakobyan <hruben@meta.com>
The Linux x86 boot protocol defines the setup area as
(setup_sects + 1) * 512 bytes. Previously we exported only the
boot_params buffer (4096 bytes), which is wrong for kernels with
setup_sects >= 8 where the actual setup area exceeds boot_params.

Truncate or extend the existing buffer to the correct setup_sects-
derived length, reading any extra bytes directly from the kernel file.
This avoids an extra allocation in the common case (setup_sects <= 7)
and matches QEMU's fw_cfg_add_kernel() behavior in
hw/i386/x86-common.c.

Signed-off-by: Dylan Reid <dgreid@fb.com>
Boot-time block devices on PCI segment 0 use 32-bit BARs so early
firmware can access them without additional identity mapping in the
firmware page tables. However, hot-plugged block devices are only ever
seen by the OS kernel which handles 64-bit BARs natively.

Switch hot-plugged block devices to 64-bit BARs to avoid exhausting the
scarce 32-bit MMIO window (typically 2-3 GB between RAM and 4 GB) when
many devices are hot-plugged.

Extract the BAR sizing decision into use_64bit_bar_for_virtio_device()
and thread an is_hotplug flag through add_virtio_pci_device(). Add unit
tests covering all relevant combinations.

Signed-off-by: Dylan Reid <dgreid@fb.com>
Add build and clippy jobs for kvm+sev_snp+igvm+fw_cfg feature combination.

Signed-off-by: Keith Adler <kadler@cloudflare.com>
Signed-off-by: Ruben Hakobyan <hruben@meta.com>
@rhakobyan rhakobyan force-pushed the kvm_sev_snp_revamped branch from 59d73c9 to 8ab8332 Compare April 17, 2026 11:21
Copy link
Copy Markdown
Member

@rbradford rbradford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's land this - we can always iterate further later.

@rbradford rbradford enabled auto-merge April 17, 2026 12:19
@rbradford rbradford added this pull request to the merge queue Apr 17, 2026
Merged via the queue into cloud-hypervisor:main with commit ea2df94 Apr 17, 2026
39 of 40 checks passed
@rbradford rbradford moved this to ✅ Done in Cloud Hypervisor Roadmap Apr 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: ✅ Done

Development

Successfully merging this pull request may close these issues.

Supporting SEV-SNP guests on KVM

6 participants