Skip to content
Permalink
Kai-Huang/KVM-…
Switch branches/tags

Commits on Apr 12, 2021

  1. KVM: x86: Add capability to grant VM access to privileged SGX attribute

    Add a capability, KVM_CAP_SGX_ATTRIBUTE, that can be used by userspace
    to grant a VM access to a priveleged attribute, with args[0] holding a
    file handle to a valid SGX attribute file.
    
    The SGX subsystem restricts access to a subset of enclave attributes to
    provide additional security for an uncompromised kernel, e.g. to prevent
    malware from using the PROVISIONKEY to ensure its nodes are running
    inside a geniune SGX enclave and/or to obtain a stable fingerprint.
    
    To prevent userspace from circumventing such restrictions by running an
    enclave in a VM, KVM restricts guest access to privileged attributes by
    default.
    
    Cc: Andy Lutomirski <luto@amacapital.net>
    Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
    Signed-off-by: Kai Huang <kai.huang@intel.com>
    Sean Christopherson authored and intel-lab-lkp committed Apr 12, 2021
  2. KVM: VMX: Enable SGX virtualization for SGX1, SGX2 and LC

    Enable SGX virtualization now that KVM has the VM-Exit handlers needed
    to trap-and-execute ENCLS to ensure correctness and/or enforce the CPU
    model exposed to the guest.  Add a KVM module param, "sgx", to allow an
    admin to disable SGX virtualization independent of the kernel.
    
    When supported in hardware and the kernel, advertise SGX1, SGX2 and SGX
    LC to userspace via CPUID and wire up the ENCLS_EXITING bitmap based on
    the guest's SGX capabilities, i.e. to allow ENCLS to be executed in an
    SGX-enabled guest.  With the exception of the provision key, all SGX
    attribute bits may be exposed to the guest.  Guest access to the
    provision key, which is controlled via securityfs, will be added in a
    future patch.
    
    Note, KVM does not yet support exposing ENCLS_C leafs or ENCLV leafs.
    
    Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
    Signed-off-by: Kai Huang <kai.huang@intel.com>
    Sean Christopherson authored and intel-lab-lkp committed Apr 12, 2021
  3. KVM: VMX: Add ENCLS[EINIT] handler to support SGX Launch Control (LC)

    Add a VM-Exit handler to trap-and-execute EINIT when SGX LC is enabled
    in the host.  When SGX LC is enabled, the host kernel may rewrite the
    hardware values at will, e.g. to launch enclaves with different signers,
    thus KVM needs to intercept EINIT to ensure it is executed with the
    correct LE hash (even if the guest sees a hardwired hash).
    
    Switching the LE hash MSRs on VM-Enter/VM-Exit is not a viable option as
    writing the MSRs is prohibitively expensive, e.g. on SKL hardware each
    WRMSR is ~400 cycles.  And because EINIT takes tens of thousands of
    cycles to execute, the ~1500 cycle overhead to trap-and-execute EINIT is
    unlikely to be noticed by the guest, let alone impact its overall SGX
    performance.
    
    Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
    Signed-off-by: Kai Huang <kai.huang@intel.com>
    Sean Christopherson authored and intel-lab-lkp committed Apr 12, 2021
  4. KVM: VMX: Add emulation of SGX Launch Control LE hash MSRs

    Emulate the four Launch Enclave public key hash MSRs (LE hash MSRs) that
    exist on CPUs that support SGX Launch Control (LC).  SGX LC modifies the
    behavior of ENCLS[EINIT] to use the LE hash MSRs when verifying the key
    used to sign an enclave.  On CPUs without LC support, the LE hash is
    hardwired into the CPU to an Intel controlled key (the Intel key is also
    the reset value of the LE hash MSRs). Track the guest's desired hash so
    that a future patch can stuff the hash into the hardware MSRs when
    executing EINIT on behalf of the guest, when those MSRs are writable in
    host.
    
    Note, KVM allows writes to the LE hash MSRs if IA32_FEATURE_CONTROL is
    unlocked.  This is technically not architectural behavior, but it's
    roughly equivalent to the arch behavior of the MSRs being writable prior
    to activating SGX[1].  Emulating SGX activation is feasible, but adds no
    tangible benefits and would just create extra work for KVM and guest
    firmware.
    
    [1] SGX related bits in IA32_FEATURE_CONTROL cannot be set until SGX
        is activated, e.g. by firmware.  SGX activation is triggered by
        setting bit 0 in MSR 0x7a.  Until SGX is activated, the LE hash
        MSRs are writable, e.g. to allow firmware to lock down the LE
        root key with a non-Intel value.
    
    Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
    Co-developed-by: Kai Huang <kai.huang@intel.com>
    Signed-off-by: Kai Huang <kai.huang@intel.com>
    Sean Christopherson authored and intel-lab-lkp committed Apr 12, 2021
  5. KVM: VMX: Add SGX ENCLS[ECREATE] handler to enforce CPUID restrictions

    Add an ECREATE handler that will be used to intercept ECREATE for the
    purpose of enforcing and enclave's MISCSELECT, ATTRIBUTES and XFRM, i.e.
    to allow userspace to restrict SGX features via CPUID.  ECREATE will be
    intercepted when any of the aforementioned masks diverges from hardware
    in order to enforce the desired CPUID model, i.e. inject #GP if the
    guest attempts to set a bit that hasn't been enumerated as allowed-1 in
    CPUID.
    
    Note, access to the PROVISIONKEY is not yet supported.
    
    Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
    Co-developed-by: Kai Huang <kai.huang@intel.com>
    Signed-off-by: Kai Huang <kai.huang@intel.com>
    Sean Christopherson authored and intel-lab-lkp committed Apr 12, 2021
  6. KVM: VMX: Frame in ENCLS handler for SGX virtualization

    Introduce sgx.c and sgx.h, along with the framework for handling ENCLS
    VM-Exits.  Add a bool, enable_sgx, that will eventually be wired up to a
    module param to control whether or not SGX virtualization is enabled at
    runtime.
    
    Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
    Signed-off-by: Kai Huang <kai.huang@intel.com>
    Sean Christopherson authored and intel-lab-lkp committed Apr 12, 2021
  7. KVM: VMX: Add basic handling of VM-Exit from SGX enclave

    Add support for handling VM-Exits that originate from a guest SGX
    enclave.  In SGX, an "enclave" is a new CPL3-only execution environment,
    wherein the CPU and memory state is protected by hardware to make the
    state inaccesible to code running outside of the enclave.  When exiting
    an enclave due to an asynchronous event (from the perspective of the
    enclave), e.g. exceptions, interrupts, and VM-Exits, the enclave's state
    is automatically saved and scrubbed (the CPU loads synthetic state), and
    then reloaded when re-entering the enclave.  E.g. after an instruction
    based VM-Exit from an enclave, vmcs.GUEST_RIP will not contain the RIP
    of the enclave instruction that trigered VM-Exit, but will instead point
    to a RIP in the enclave's untrusted runtime (the guest userspace code
    that coordinates entry/exit to/from the enclave).
    
    To help a VMM recognize and handle exits from enclaves, SGX adds bits to
    existing VMCS fields, VM_EXIT_REASON.VMX_EXIT_REASON_FROM_ENCLAVE and
    GUEST_INTERRUPTIBILITY_INFO.GUEST_INTR_STATE_ENCLAVE_INTR.  Define the
    new architectural bits, and add a boolean to struct vcpu_vmx to cache
    VMX_EXIT_REASON_FROM_ENCLAVE.  Clear the bit in exit_reason so that
    checks against exit_reason do not need to account for SGX, e.g.
    "if (exit_reason == EXIT_REASON_EXCEPTION_NMI)" continues to work.
    
    KVM is a largely a passive observer of the new bits, e.g. KVM needs to
    account for the bits when propagating information to a nested VMM, but
    otherwise doesn't need to act differently for the majority of VM-Exits
    from enclaves.
    
    The one scenario that is directly impacted is emulation, which is for
    all intents and purposes impossible[1] since KVM does not have access to
    the RIP or instruction stream that triggered the VM-Exit.  The inability
    to emulate is a non-issue for KVM, as most instructions that might
    trigger VM-Exit unconditionally #UD in an enclave (before the VM-Exit
    check.  For the few instruction that conditionally #UD, KVM either never
    sets the exiting control, e.g. PAUSE_EXITING[2], or sets it if and only
    if the feature is not exposed to the guest in order to inject a #UD,
    e.g. RDRAND_EXITING.
    
    But, because it is still possible for a guest to trigger emulation,
    e.g. MMIO, inject a #UD if KVM ever attempts emulation after a VM-Exit
    from an enclave.  This is architecturally accurate for instruction
    VM-Exits, and for MMIO it's the least bad choice, e.g. it's preferable
    to killing the VM.  In practice, only broken or particularly stupid
    guests should ever encounter this behavior.
    
    Add a WARN in skip_emulated_instruction to detect any attempt to
    modify the guest's RIP during an SGX enclave VM-Exit as all such flows
    should either be unreachable or must handle exits from enclaves before
    getting to skip_emulated_instruction.
    
    [1] Impossible for all practical purposes.  Not truly impossible
        since KVM could implement some form of para-virtualization scheme.
    
    [2] PAUSE_LOOP_EXITING only affects CPL0 and enclaves exist only at
        CPL3, so we also don't need to worry about that interaction.
    
    Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
    Signed-off-by: Kai Huang <kai.huang@intel.com>
    Sean Christopherson authored and intel-lab-lkp committed Apr 12, 2021
  8. KVM: x86: Add reverse-CPUID lookup support for scattered SGX features

    Define a new KVM-only feature word for advertising and querying SGX
    sub-features in CPUID.0x12.0x0.EAX.  Because SGX1 and SGX2 are scattered
    in the kernel's feature word, they need to be translated so that the
    bit numbers match those of hardware.
    
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    Signed-off-by: Kai Huang <kai.huang@intel.com>
    sean-jc authored and intel-lab-lkp committed Apr 12, 2021
  9. KVM: x86: Add support for reverse CPUID lookup of scattered features

    Introduce a scheme that allows KVM's CPUID magic to support features
    that are scattered in the kernel's feature words.  To advertise and/or
    query guest support for CPUID-based features, KVM requires the bit
    number of an X86_FEATURE_* to match the bit number in its associated
    CPUID entry.  For scattered features, this does not hold true.
    
    Add a framework to allow defining KVM-only words, stored in
    kvm_cpu_caps after the shared kernel caps, that can be used to gather
    the scattered feature bits by translating X86_FEATURE_* flags into their
    KVM-defined feature.
    
    Note, because reverse_cpuid_check() effectively forces kvm_cpu_caps
    lookups to be resolved at compile time, there is no runtime cost for
    translating from kernel-defined to kvm-defined features.
    
    More details here:  https://lkml.kernel.org/r/X/jxCOLG+HUO4QlZ@google.com
    
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    Signed-off-by: Kai Huang <kai.huang@intel.com>
    sean-jc authored and intel-lab-lkp committed Apr 12, 2021
  10. KVM: x86: Define new #PF SGX error code bit

    Page faults that are signaled by the SGX Enclave Page Cache Map (EPCM),
    as opposed to the traditional IA32/EPT page tables, set an SGX bit in
    the error code to indicate that the #PF was induced by SGX.  KVM will
    need to emulate this behavior as part of its trap-and-execute scheme for
    virtualizing SGX Launch Control, e.g. to inject SGX-induced #PFs if
    EINIT faults in the host, and to support live migration.
    
    Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
    Signed-off-by: Kai Huang <kai.huang@intel.com>
    Sean Christopherson authored and intel-lab-lkp committed Apr 12, 2021
  11. KVM: x86: Export kvm_mmu_gva_to_gpa_{read,write}() for SGX (VMX)

    Export the gva_to_gpa() helpers for use by SGX virtualization when
    executing ENCLS[ECREATE] and ENCLS[EINIT] on behalf of the guest.
    To execute ECREATE and EINIT, KVM must obtain the GPA of the target
    Secure Enclave Control Structure (SECS) in order to get its
    corresponding HVA.
    
    Because the SECS must reside in the Enclave Page Cache (EPC), copying
    the SECS's data to a host-controlled buffer via existing exported
    helpers is not a viable option as the EPC is not readable or writable
    by the kernel.
    
    SGX virtualization will also use gva_to_gpa() to obtain HVAs for
    non-EPC pages in order to pass user pointers directly to ECREATE and
    EINIT, which avoids having to copy pages worth of data into the kernel.
    
    Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
    Acked-by: Jarkko Sakkinen <jarkko@kernel.org>
    Signed-off-by: Kai Huang <kai.huang@intel.com>
    Sean Christopherson authored and intel-lab-lkp committed Apr 12, 2021

Commits on Apr 2, 2021

  1. KVM: x86: Support KVM VMs sharing SEV context

    Add a capability for userspace to mirror SEV encryption context from
    one vm to another. On our side, this is intended to support a
    Migration Helper vCPU, but it can also be used generically to support
    other in-guest workloads scheduled by the host. The intention is for
    the primary guest and the mirror to have nearly identical memslots.
    
    The primary benefits of this are that:
    1) The VMs do not share KVM contexts (think APIC/MSRs/etc), so they
    can't accidentally clobber each other.
    2) The VMs can have different memory-views, which is necessary for post-copy
    migration (the migration vCPUs on the target need to read and write to
    pages, when the primary guest would VMEXIT).
    
    This does not change the threat model for AMD SEV. Any memory involved
    is still owned by the primary guest and its initial state is still
    attested to through the normal SEV_LAUNCH_* flows. If userspace wanted
    to circumvent SEV, they could achieve the same effect by simply attaching
    a vCPU to the primary VM.
    This patch deliberately leaves userspace in charge of the memslots for the
    mirror, as it already has the power to mess with them in the primary guest.
    
    This patch does not support SEV-ES (much less SNP), as it does not
    handle handing off attested VMSAs to the mirror.
    
    For additional context, we need a Migration Helper because SEV PSP migration
    is far too slow for our live migration on its own. Using an in-guest
    migrator lets us speed this up significantly.
    
    Signed-off-by: Nathan Tempelman <natet@google.com>
    Message-Id: <20210316014027.3116119-1-natet@google.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    Nathan Tempelman authored and bonzini committed Apr 2, 2021
  2. KVM: x86: pending exceptions must not be blocked by an injected event

    Injected interrupts/nmi should not block a pending exception,
    but rather be either lost if nested hypervisor doesn't
    intercept the pending exception (as in stock x86), or be delivered
    in exitintinfo/IDT_VECTORING_INFO field, as a part of a VMexit
    that corresponds to the pending exception.
    
    The only reason for an exception to be blocked is when nested run
    is pending (and that can't really happen currently
    but still worth checking for).
    
    Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
    Message-Id: <20210401143817.1030695-2-mlevitsk@redhat.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    Maxim Levitsky authored and bonzini committed Apr 2, 2021
  3. KVM: selftests: remove redundant semi-colon

    Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
    Message-Id: <20210401142514.1688199-1-yangyingliang@huawei.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    Yang Yingliang authored and bonzini committed Apr 2, 2021
  4. KVM: x86: introduce kvm_register_clear_available

    Small refactoring that will be used in the next patch.
    
    Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
    Message-Id: <20210401141814.1029036-4-mlevitsk@redhat.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    Maxim Levitsky authored and bonzini committed Apr 2, 2021
  5. KVM: nSVM: call nested_svm_load_cr3 on nested state load

    While KVM's MMU should be fully reset by loading of nested CR0/CR3/CR4
    by KVM_SET_SREGS, we are not in nested mode yet when we do it and therefore
    only root_mmu is reset.
    
    On regular nested entries we call nested_svm_load_cr3 which both updates
    the guest's CR3 in the MMU when it is needed, and it also initializes
    the mmu again which makes it initialize the walk_mmu as well when nested
    paging is enabled in both host and guest.
    
    Since we don't call nested_svm_load_cr3 on nested state load,
    the walk_mmu can be left uninitialized, which can lead to a NULL pointer
    dereference while accessing it if we happen to get a nested page fault
    right after entering the nested guest first time after the migration and
    we decide to emulate it, which leads to the emulator trying to access
    walk_mmu->gva_to_gpa which is NULL.
    
    Therefore we should call this function on nested state load as well.
    
    Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
    Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
    Message-Id: <20210401141814.1029036-3-mlevitsk@redhat.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    Maxim Levitsky authored and bonzini committed Apr 2, 2021
  6. KVM: nVMX: delay loading of PDPTRs to KVM_REQ_GET_NESTED_STATE_PAGES

    Similar to the rest of guest page accesses after migration,
    this should be delayed to KVM_REQ_GET_NESTED_STATE_PAGES
    request.
    
    Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
    Message-Id: <20210401141814.1029036-2-mlevitsk@redhat.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    Maxim Levitsky authored and bonzini committed Apr 2, 2021
  7. KVM: x86: dump_vmcs should include the autoload/autostore MSR lists

    When dumping the current VMCS state, include the MSRs that are being
    automatically loaded/stored during VM entry/exit.
    
    Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
    Signed-off-by: David Edmondson <david.edmondson@oracle.com>
    Message-Id: <20210318120841.133123-6-david.edmondson@oracle.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    dme authored and bonzini committed Apr 2, 2021
  8. KVM: x86: dump_vmcs should show the effective EFER

    If EFER is not being loaded from the VMCS, show the effective value by
    reference to the MSR autoload list or calculation.
    
    Suggested-by: Sean Christopherson <seanjc@google.com>
    Signed-off-by: David Edmondson <david.edmondson@oracle.com>
    Message-Id: <20210318120841.133123-5-david.edmondson@oracle.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    dme authored and bonzini committed Apr 2, 2021
  9. KVM: x86: dump_vmcs should consider only the load controls of EFER/PAT

    When deciding whether to dump the GUEST_IA32_EFER and GUEST_IA32_PAT
    fields of the VMCS, examine only the VM entry load controls, as saving
    on VM exit has no effect on whether VM entry succeeds or fails.
    
    Suggested-by: Sean Christopherson <seanjc@google.com>
    Signed-off-by: David Edmondson <david.edmondson@oracle.com>
    Message-Id: <20210318120841.133123-4-david.edmondson@oracle.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    dme authored and bonzini committed Apr 2, 2021
  10. KVM: x86: dump_vmcs should not conflate EFER and PAT presence in VMCS

    Show EFER and PAT based on their individual entry/exit controls.
    
    Signed-off-by: David Edmondson <david.edmondson@oracle.com>
    Message-Id: <20210318120841.133123-3-david.edmondson@oracle.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    dme authored and bonzini committed Apr 2, 2021
  11. KVM: x86: dump_vmcs should not assume GUEST_IA32_EFER is valid

    If the VM entry/exit controls for loading/saving MSR_EFER are either
    not available (an older processor or explicitly disabled) or not
    used (host and guest values are the same), reading GUEST_IA32_EFER
    from the VMCS returns an inaccurate value.
    
    Because of this, in dump_vmcs() don't use GUEST_IA32_EFER to decide
    whether to print the PDPTRs - always do so if the fields exist.
    
    Fixes: 4eb64dc ("KVM: x86: dump VMCS on invalid entry")
    Signed-off-by: David Edmondson <david.edmondson@oracle.com>
    Message-Id: <20210318120841.133123-2-david.edmondson@oracle.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    dme authored and bonzini committed Apr 2, 2021
  12. KVM: nSVM: improve SYSENTER emulation on AMD

    Currently to support Intel->AMD migration, if CPU vendor is GenuineIntel,
    we emulate the full 64 value for MSR_IA32_SYSENTER_{EIP|ESP}
    msrs, and we also emulate the sysenter/sysexit instruction in long mode.
    
    (Emulator does still refuse to emulate sysenter in 64 bit mode, on the
    ground that the code for that wasn't tested and likely has no users)
    
    However when virtual vmload/vmsave is enabled, the vmload instruction will
    update these 32 bit msrs without triggering their msr intercept,
    which will lead to having stale values in kvm's shadow copy of these msrs,
    which relies on the intercept to be up to date.
    
    Fix/optimize this by doing the following:
    
    1. Enable the MSR intercepts for SYSENTER MSRs iff vendor=GenuineIntel
       (This is both a tiny optimization and also ensures that in case
       the guest cpu vendor is AMD, the msrs will be 32 bit wide as
       AMD defined).
    
    2. Store only high 32 bit part of these msrs on interception and combine
       it with hardware msr value on intercepted read/writes
       iff vendor=GenuineIntel.
    
    3. Disable vmload/vmsave virtualization if vendor=GenuineIntel.
       (It is somewhat insane to set vendor=GenuineIntel and still enable
       SVM for the guest but well whatever).
       Then zero the high 32 bit parts when kvm intercepts and emulates vmload.
    
    Thanks a lot to Paulo Bonzini for helping me with fixing this in the most
    correct way.
    
    This patch fixes nested migration of 32 bit nested guests, that was
    broken because incorrect cached values of SYSENTER msrs were stored in
    the migration stream if L1 changed these msrs with
    vmload prior to L2 entry.
    
    Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
    Message-Id: <20210401111928.996871-3-mlevitsk@redhat.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    Maxim Levitsky authored and bonzini committed Apr 2, 2021
  13. KVM: x86: add guest_cpuid_is_intel

    This is similar to existing 'guest_cpuid_is_amd_or_hygon'
    
    Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
    Message-Id: <20210401111928.996871-2-mlevitsk@redhat.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    Maxim Levitsky authored and bonzini committed Apr 2, 2021
  14. KVM: x86: Account a variety of miscellaneous allocations

    Switch to GFP_KERNEL_ACCOUNT for a handful of allocations that are
    clearly associated with a single task/VM.
    
    Note, there are a several SEV allocations that aren't accounted, but
    those can (hopefully) be fixed by using the local stack for memory.
    
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    Message-Id: <20210331023025.2485960-3-seanjc@google.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    sean-jc authored and bonzini committed Apr 2, 2021
  15. KVM: SVM: Do not allow SEV/SEV-ES initialization after vCPUs are created

    Reject KVM_SEV_INIT and KVM_SEV_ES_INIT if they are attempted after one
    or more vCPUs have been created.  KVM assumes a VM is tagged SEV/SEV-ES
    prior to vCPU creation, e.g. init_vmcb() needs to mark the VMCB as SEV
    enabled, and svm_create_vcpu() needs to allocate the VMSA.  At best,
    creating vCPUs before SEV/SEV-ES init will lead to unexpected errors
    and/or behavior, and at worst it will crash the host, e.g.
    sev_launch_update_vmsa() will dereference a null svm->vmsa pointer.
    
    Fixes: 1654efc ("KVM: SVM: Add KVM_SEV_INIT command")
    Fixes: ad73109 ("KVM: SVM: Provide support to launch and run an SEV-ES guest")
    Cc: stable@vger.kernel.org
    Cc: Brijesh Singh <brijesh.singh@amd.com>
    Cc: Tom Lendacky <thomas.lendacky@amd.com>
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    Message-Id: <20210331031936.2495277-4-seanjc@google.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    sean-jc authored and bonzini committed Apr 2, 2021
  16. KVM: SVM: Do not set sev->es_active until KVM_SEV_ES_INIT completes

    Set sev->es_active only after the guts of KVM_SEV_ES_INIT succeeds.  If
    the command fails, e.g. because SEV is already active or there are no
    available ASIDs, then es_active will be left set even though the VM is
    not fully SEV-ES capable.
    
    Refactor the code so that "es_active" is passed on the stack instead of
    being prematurely shoved into sev_info, both to avoid having to unwind
    sev_info and so that it's more obvious what actually consumes es_active
    in sev_guest_init() and its helpers.
    
    Fixes: ad73109 ("KVM: SVM: Provide support to launch and run an SEV-ES guest")
    Cc: stable@vger.kernel.org
    Cc: Brijesh Singh <brijesh.singh@amd.com>
    Cc: Tom Lendacky <thomas.lendacky@amd.com>
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    Message-Id: <20210331031936.2495277-3-seanjc@google.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    sean-jc authored and bonzini committed Apr 2, 2021
  17. KVM: SVM: Use online_vcpus, not created_vcpus, to iterate over vCPUs

    Use the kvm_for_each_vcpu() helper to iterate over vCPUs when encrypting
    VMSAs for SEV, which effectively switches to use online_vcpus instead of
    created_vcpus.  This fixes a possible null-pointer dereference as
    created_vcpus does not guarantee a vCPU exists, since it is updated at
    the very beginning of KVM_CREATE_VCPU.  created_vcpus exists to allow the
    bulk of vCPU creation to run in parallel, while still correctly
    restricting the max number of max vCPUs.
    
    Fixes: ad73109 ("KVM: SVM: Provide support to launch and run an SEV-ES guest")
    Cc: stable@vger.kernel.org
    Cc: Brijesh Singh <brijesh.singh@amd.com>
    Cc: Tom Lendacky <thomas.lendacky@amd.com>
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    Message-Id: <20210331031936.2495277-2-seanjc@google.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    sean-jc authored and bonzini committed Apr 2, 2021
  18. KVM: x86/mmu: Simplify code for aging SPTEs in TDP MMU

    Use a basic NOT+AND sequence to clear the Accessed bit in TDP MMU SPTEs,
    as opposed to the fancy ffs()+clear_bit() logic that was copied from the
    legacy MMU.  The legacy MMU uses clear_bit() because it is operating on
    the SPTE itself, i.e. clearing needs to be atomic.  The TDP MMU operates
    on a local variable that it later writes to the SPTE, and so doesn't need
    to be atomic or even resident in memory.
    
    Opportunistically drop unnecessary initialization of new_spte, it's
    guaranteed to be written before being accessed.
    
    Using NOT+AND instead of ffs()+clear_bit() reduces the sequence from:
    
       0x0000000000058be6 <+134>:	test   %rax,%rax
       0x0000000000058be9 <+137>:	je     0x58bf4 <age_gfn_range+148>
       0x0000000000058beb <+139>:	test   %rax,%rdi
       0x0000000000058bee <+142>:	je     0x58cdc <age_gfn_range+380>
       0x0000000000058bf4 <+148>:	mov    %rdi,0x8(%rsp)
       0x0000000000058bf9 <+153>:	mov    $0xffffffff,%edx
       0x0000000000058bfe <+158>:	bsf    %eax,%edx
       0x0000000000058c01 <+161>:	movslq %edx,%rdx
       0x0000000000058c04 <+164>:	lock btr %rdx,0x8(%rsp)
       0x0000000000058c0b <+171>:	mov    0x8(%rsp),%r15
    
    to:
    
       0x0000000000058bdd <+125>:	test   %rax,%rax
       0x0000000000058be0 <+128>:	je     0x58beb <age_gfn_range+139>
       0x0000000000058be2 <+130>:	test   %rax,%r8
       0x0000000000058be5 <+133>:	je     0x58cc0 <age_gfn_range+352>
       0x0000000000058beb <+139>:	not    %rax
       0x0000000000058bee <+142>:	and    %r8,%rax
       0x0000000000058bf1 <+145>:	mov    %rax,%r15
    
    thus eliminating several memory accesses, including a locked access.
    
    Cc: Ben Gardon <bgardon@google.com>
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    Message-Id: <20210331004942.2444916-3-seanjc@google.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    sean-jc authored and bonzini committed Apr 2, 2021
  19. KVM: x86/mmu: Remove spurious clearing of dirty bit from TDP MMU SPTE

    Don't clear the dirty bit when aging a TDP MMU SPTE (in response to a MMU
    notifier event).  Prematurely clearing the dirty bit could cause spurious
    PML updates if aging a page happened to coincide with dirty logging.
    
    Note, tdp_mmu_set_spte_no_acc_track() flows into __handle_changed_spte(),
    so the host PFN will be marked dirty, i.e. there is no potential for data
    corruption.
    
    Fixes: a6a0b05 ("kvm: x86/mmu: Support dirty logging for the TDP MMU")
    Cc: Ben Gardon <bgardon@google.com>
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    Message-Id: <20210331004942.2444916-2-seanjc@google.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    sean-jc authored and bonzini committed Apr 2, 2021
  20. KVM: x86/mmu: Drop trace_kvm_age_page() tracepoint

    Remove x86's trace_kvm_age_page() tracepoint.  It's mostly redundant with
    the common trace_kvm_age_hva() tracepoint, and if there is a need for the
    extra details, e.g. gfn, referenced, etc... those details should be added
    to the common tracepoint so that all architectures and MMUs benefit from
    the info.
    
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    Message-Id: <20210326021957.1424875-19-seanjc@google.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    sean-jc authored and bonzini committed Apr 2, 2021
  21. KVM: Move arm64's MMU notifier trace events to generic code

    Move arm64's MMU notifier trace events into common code in preparation
    for doing the hva->gfn lookup in common code.  The alternative would be
    to trace the gfn instead of hva, but that's not obviously better and
    could also be done in common code.  Tracing the notifiers is also quite
    handy for debug regardless of architecture.
    
    Remove a completely redundant tracepoint from PPC e500.
    
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    Message-Id: <20210326021957.1424875-10-seanjc@google.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    sean-jc authored and bonzini committed Apr 2, 2021
  22. KVM: Move prototypes for MMU notifier callbacks to generic code

    Move the prototypes for the MMU notifier callbacks out of arch code and
    into common code.  There is no benefit to having each arch replicate the
    prototypes since any deviation from the invocation in common code will
    explode.
    
    No functional change intended.
    
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    Message-Id: <20210326021957.1424875-9-seanjc@google.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    sean-jc authored and bonzini committed Apr 2, 2021
  23. KVM: x86/mmu: Use leaf-only loop for walking TDP SPTEs when changing …

    …SPTE
    
    Use the leaf-only TDP iterator when changing the SPTE in reaction to a
    MMU notifier.  Practically speaking, this is a nop since the guts of the
    loop explicitly looks for 4k SPTEs, which are always leaf SPTEs.  Switch
    the iterator to match age_gfn_range() and test_age_gfn() so that a future
    patch can consolidate the core iterating logic.
    
    No real functional change intended.
    
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    Message-Id: <20210326021957.1424875-8-seanjc@google.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    sean-jc authored and bonzini committed Apr 2, 2021
  24. KVM: x86/mmu: Pass address space ID to TDP MMU root walkers

    Move the address space ID check that is performed when iterating over
    roots into the macro helpers to consolidate code.
    
    No functional change intended.
    
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    Message-Id: <20210326021957.1424875-7-seanjc@google.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    sean-jc authored and bonzini committed Apr 2, 2021
Older