Skip to content
Permalink
Sean-Christoph…
Switch branches/tags

Commits on Jun 22, 2021

  1. KVM: x86/mmu: Let guest use GBPAGES if supported in hardware and TDP …

    …is on
    
    Let the guest use 1g hugepages if TDP is enabled and the host supports
    GBPAGES, KVM can't actively prevent the guest from using 1g pages in this
    case since they can't be disabled in the hardware page walker.  While
    injecting a page fault if a bogus 1g page is encountered during a
    software page walk is perfectly reasonable since KVM is simply honoring
    userspace's vCPU model, doing so arguably doesn't provide any meaningful
    value, and at worst will be horribly confusing as the guest will see
    inconsistent behavior and seemingly spurious page faults.
    
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    sean-jc authored and intel-lab-lkp committed Jun 22, 2021
  2. KVM: x86/mmu: Get CR4.SMEP from MMU, not vCPU, in shadow page fault

    Use the current MMU instead of vCPU state to query CR4.SMEP when handling
    a page fault.  In the nested NPT case, the current CR4.SMEP reflects L2,
    whereas the page fault is shadowing L1's NPT, which uses L1's hCR4.
    Practically speaking, this is a nop a NPT walks are always user faults,
    i.e. this code will never be reached, but fix it up for consistency.
    
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    sean-jc authored and intel-lab-lkp committed Jun 22, 2021
  3. KVM: x86/mmu: Get CR0.WP from MMU, not vCPU, in shadow page fault

    Use the current MMU instead of vCPU state to query CR0.WP when handling
    a page fault.  In the nested NPT case, the current CR0.WP reflects L2,
    whereas the page fault is shadowing L1's NPT.  Practically speaking, this
    is a nop a NPT walks are always user faults, but fix it up for
    consistency.
    
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    sean-jc authored and intel-lab-lkp committed Jun 22, 2021
  4. KVM: x86/mmu: Drop redundant rsvd bits reset for nested NPT

    Drop the extra reset of shadow_zero_bits in the nested NPT flow now
    that shadow_mmu_init_context computes the correct level for nested NPT.
    
    No functional change intended.
    
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    sean-jc authored and intel-lab-lkp committed Jun 22, 2021
  5. KVM: x86/mmu: Optimize and clean up so called "last nonleaf level" logic

    Drop the pre-computed last_nonleaf_level, which is arguably wrong and at
    best confusing.  Per the comment:
    
      Can have large pages at levels 2..last_nonleaf_level-1.
    
    the intent of the variable would appear to be to track what levels can
    _legally_ have large pages, but that intent doesn't align with reality.
    The computed value will be wrong for 5-level paging, or if 1gb pages are
    not supported.
    
    The flawed code is not a problem in practice, because except for 32-bit
    PSE paging, bit 7 is reserved if large pages aren't supported at the
    level.  Take advantage of this invariant and simply omit the level magic
    math for 64-bit page tables (including PAE).
    
    For 32-bit paging (non-PAE), the adjustments are needed purely because
    bit 7 is ignored if PSE=0.  Retain that logic as is, but make
    is_last_gpte() unique per PTTYPE so that the PSE check is avoided for
    PAE and EPT paging.  In the spirit of avoiding branches, bump the "last
    nonleaf level" for 32-bit PSE paging by adding the PSE bit itself.
    
    Note, bit 7 is ignored or has other meaning in CR3/EPTP, but despite
    FNAME(walk_addr_generic) briefly grabbing CR3/EPTP in "pte", they are
    not PTEs and will blow up all the other gpte helpers.
    
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    sean-jc authored and intel-lab-lkp committed Jun 22, 2021
  6. KVM: x86: Enhance comments for MMU roles and nested transition tricki…

    …ness
    
    Expand the comments for the MMU roles.  The interactions with gfn_track
    PGD reuse in particular are hairy.
    
    Regarding PGD reuse, add comments in the nested virtualization flows to
    call out why kvm_init_mmu() is unconditionally called even when nested
    TDP is used.
    
    Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    sean-jc authored and intel-lab-lkp committed Jun 22, 2021
  7. KVM: x86/mmu: WARN on any reserved SPTE value when making a valid SPTE

    Replace make_spte()'s WARN on a collision with the magic MMIO value with
    a generic WARN on reserved bits being set (including EPT's reserved WX
    combination).  Warning on any reserved bits covers MMIO, A/D tracking
    bits with PAE paging, and in theory any future goofs that are introduced.
    
    Opportunistically convert to ONCE behavior to avoid spamming the kernel
    log, odds are very good that if KVM screws up one SPTE, it will botch all
    SPTEs for the same MMU.
    
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    sean-jc authored and intel-lab-lkp committed Jun 22, 2021
  8. KVM: x86/mmu: Add helpers to do full reserved SPTE checks w/ generic MMU

    Extract the reserved SPTE check and print helpers in get_mmio_spte() to
    new helpers so that KVM can also WARN on reserved badness when making a
    SPTE.
    
    Tag the checking helper with __always_inline to improve the probability
    of the compiler generating optimal code for the checking loop, e.g. gcc
    appears to avoid using %rbp when the helper is tagged with a vanilla
    "inline".
    
    No functional change intended.
    
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    sean-jc authored and intel-lab-lkp committed Jun 22, 2021
  9. KVM: x86/mmu: Use MMU's role to determine PTTYPE

    Use the MMU's role instead of vCPU state or role_regs to determine the
    PTTYPE, i.e. which helpers to wire up.
    
    No functional change intended.
    
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    sean-jc authored and intel-lab-lkp committed Jun 22, 2021
  10. KVM: x86/mmu: Collapse 32-bit PAE and 64-bit statements for helpers

    Skip paging32E_init_context() and paging64_init_context_common() and go
    directly to paging64_init_context() (was the common version) now that
    the relevant flows don't need to distinguish between 64-bit PAE and
    32-bit PAE for other reasons.
    
    No functional change intended.
    
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    sean-jc authored and intel-lab-lkp committed Jun 22, 2021
  11. KVM: x86/mmu: Add a helper to calculate root from role_regs

    Add a helper to calculate the level for non-EPT page tables from the
    MMU's role_regs.
    
    No functional change intended.
    
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    sean-jc authored and intel-lab-lkp committed Jun 22, 2021
  12. KVM: x86/mmu: Add helper to update paging metadata

    Consolidate MMU guest metadata updates into a common helper for TDP,
    shadow, and nested MMUs.
    
    No functional change intended.
    
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    sean-jc authored and intel-lab-lkp committed Jun 22, 2021
  13. KVM: x86/mmu: Don't update nested guest's paging bitmasks if CR0.PG=0

    Don't bother updating the bitmasks and last-leaf information if paging is
    disabled as the metadata will never be used.
    
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    sean-jc authored and intel-lab-lkp committed Jun 22, 2021
  14. KVM: x86/mmu: Consolidate reset_rsvds_bits_mask() calls

    Move calls to reset_rsvds_bits_mask() out of the various mode statements
    and under a more generic !CR0.PG check.  This will allow for additional
    code consolidation in the future.
    
    No functional change intended.
    
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    sean-jc authored and intel-lab-lkp committed Jun 22, 2021
  15. KVM: x86/mmu: Use MMU role_regs to get LA57, and drop vCPU LA57 helper

    Get LA57 from the role_regs, which are initialized from the vCPU even
    though TDP is enabled, instead of pulling the value directly from the
    vCPU when computing the guest's root_level for TDP MMUs.  Note, the check
    is inside an is_long_mode() statement, so that requirement is not lost.
    
    Use role_regs even though the MMU's role is available and arguably
    "better".  A future commit will consolidate the guest root level logic,
    and it needs access to EFER.LMA, which is not tracked in the role (it
    can't be toggled on VM-Exit, unlike LA57).
    
    Drop is_la57_mode() as there are no remaining users, and to discourage
    pulling MMU state from the vCPU (in the future).
    
    No functional change intended.
    
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    sean-jc authored and intel-lab-lkp committed Jun 22, 2021
  16. KVM: x86/mmu: Get nested MMU's root level from the MMU's role

    Initialize the MMU's (guest) root_level using its mmu_role instead of
    redoing the calculations.  The role_regs used to calculate the mmu_role
    are initialized from the vCPU, i.e. this should be a complete nop.
    
    No functional change intended.
    
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    sean-jc authored and intel-lab-lkp committed Jun 22, 2021
  17. KVM: x86/mmu: Drop "nx" from MMU context now that there are no readers

    Drop kvm_mmu.nx as there no consumers left.
    
    No functional change intended.
    
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    sean-jc authored and intel-lab-lkp committed Jun 22, 2021
  18. KVM: x86/mmu: Use MMU's role to get EFER.NX during MMU configuration

    Get the MMU's effective EFER.NX from its role instead of using the
    one-off, dedicated flag.  This will allow dropping said flag in a
    future commit.
    
    No functional change intended.
    
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    sean-jc authored and intel-lab-lkp committed Jun 22, 2021
  19. KVM: x86/mmu: Use MMU's role/role_regs to compute context's metadata

    Use the MMU's role and role_regs to calculate the MMU's guest root level
    and NX bit.  For some flows, the vCPU state may not be correct (or
    relevant), e.g. EPT doesn't interact with EFER.NX and nested NPT will
    configure the guest_mmu with possibly-stale vCPU state.
    
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    sean-jc authored and intel-lab-lkp committed Jun 22, 2021
  20. KVM: x86/mmu: Use MMU's role to detect EFER.NX in guest page walk

    Use the NX bit from the MMU's role instead of the MMU itself so that the
    redundant, dedicated "nx" flag can be dropped.
    
    No functional change intended.
    
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    sean-jc authored and intel-lab-lkp committed Jun 22, 2021
  21. KVM: x86/mmu: Use MMU's roles to compute last non-leaf level

    Use the MMU's role to get CR4.PSE when determining the last level at
    which the guest _cannot_ create a non-leaf PTE, i.e. cannot create a
    huge page.
    
    Note, the existing logic is arguably wrong when considering 5-level
    paging and the case where 1gb pages aren't supported.  In practice, the
    logic is confusing but not broken, because except for 32-bit non-PAE
    paging, the PAGE_SIZE bit is reserved when a huge page isn't supported at
    that level.  I.e. PAGE_SIZE=1 will terminate the guest walk one way or
    another.  Furthermore, last_nonleaf_level is only consulted after KVM has
    verified there are no reserved bits set.
    
    All that confusion will be addressed in a future patch by dropping
    last_nonleaf_level entirely.  For now, massage the code to continue the
    march toward using mmu_role for (almost) all MMU computations.
    
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    sean-jc authored and intel-lab-lkp committed Jun 22, 2021
  22. KVM: x86/mmu: Use MMU's role to compute PKRU bitmask

    Use the MMU's role to calculate the Protection Keys (Restrict Userspace)
    bitmask instead of pulling bits from current vCPU state.  For some flows,
    the vCPU state may not be correct (or relevant), e.g. EPT doesn't
    interact with PKRU.  Case in point, the "ept" param simply disappears.
    
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    sean-jc authored and intel-lab-lkp committed Jun 22, 2021
  23. KVM: x86/mmu: Use MMU's role to compute permission bitmask

    Use the MMU's role to generate the permission bitmasks for the MMU.
    For some flows, the vCPU state may not be correct (or relevant), e.g.
    the nested NPT MMU can be initialized with incoherent vCPU state.
    
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    sean-jc authored and intel-lab-lkp committed Jun 22, 2021
  24. KVM: x86/mmu: Drop vCPU param from reserved bits calculator

    Drop the vCPU param from __reset_rsvds_bits_mask() as it's now unused,
    and ideally will remain unused in the future.  Any information that's
    needed by the low level helper should be explicitly provided as it's used
    for both shadow/host MMUs and guest MMUs, i.e. vCPU state may be
    meaningless or simply wrong.
    
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    sean-jc authored and intel-lab-lkp committed Jun 22, 2021
  25. KVM: x86/mmu: Use MMU's role to get CR4.PSE for computing rsvd bits

    Use the MMU's role to get CR4.PSE when calculating reserved bits for the
    guest's PTEs.  Practically speaking, this is a glorified nop as the role
    always come from vCPU state for the relevant flows, but converting to
    the roles will provide consistency once everything else is converted, and
    will Just Work if the "always comes from vCPU" behavior were ever to
    change (unlikely).
    
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    sean-jc authored and intel-lab-lkp committed Jun 22, 2021
  26. KVM: x86/mmu: Don't grab CR4.PSE for calculating shadow reserved bits

    Unconditionally pass pse=false when calculating reserved bits for shadow
    PTEs.  CR4.PSE is only relevant for 32-bit non-PAE paging, which KVM does
    not use for shadow paging (including nested NPT).
    
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    sean-jc authored and intel-lab-lkp committed Jun 22, 2021
  27. KVM: x86/mmu: Always Set new mmu_role immediately after checking old …

    …role
    
    Refactor shadow MMU initialization to immediately set its new mmu_role
    after verifying it differs from the old role, and so that all flavors
    of MMU initialization share the same check-and-set pattern.  Immediately
    setting the role will allow future commits to use mmu_role to configure
    the MMU without consuming stale state.
    
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    sean-jc authored and intel-lab-lkp committed Jun 22, 2021
  28. KVM: x86/mmu: Set CR4.PKE/LA57 in MMU role iff long mode is active

    Don't set cr4_pke or cr4_la57 in the MMU role if long mode isn't active,
    which is required for protection keys and 5-level paging to be fully
    enabled.  Ignoring the bit avoids unnecessary reconfiguration on reuse,
    and also means consumers of mmu_role don't need to manually check for
    long mode.
    
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    sean-jc authored and intel-lab-lkp committed Jun 22, 2021
  29. KVM: x86/mmu: Do not set paging-related bits in MMU role if CR0.PG=0

    Don't set CR0/CR4/EFER bits in the MMU role if paging is disabled, paging
    modifiers are irrelevant if there is no paging in the first place.
    Somewhat arbitrarily clear gpte_is_8_bytes for shadow paging if paging is
    disabled in the guest.  Again, there are no guest PTEs to process, so the
    size is meaningless.
    
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    sean-jc authored and intel-lab-lkp committed Jun 22, 2021
  30. KVM: x86/mmu: Add helpers to query mmu_role bits

    Add helpers via a builder macro for all mmu_role bits that track a CR0,
    CR4, or EFER bit.  Digging out the bits manually is not exactly the most
    readable code.
    
    Future commits will switch to using mmu_role instead of vCPU state to
    configure the MMU, i.e. there are about to be a large number of users.
    
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    sean-jc authored and intel-lab-lkp committed Jun 22, 2021
  31. KVM: x86/mmu: Rename "nxe" role bit to "efer_nx" for macro shenanigans

    Rename "nxe" to "efer_nx" so that future macro magic can use the pattern
    <reg>_<bit> for all CR0, CR4, and EFER bits that included in the role.
    Using "efer_nx" also makes it clear that the role bit reflects EFER.NX,
    not the NX bit in the corresponding PTE.
    
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    sean-jc authored and intel-lab-lkp committed Jun 22, 2021
  32. KVM: x86/mmu: Use MMU's role_regs, not vCPU state, to compute mmu_role

    Use the provided role_regs to calculate the mmu_role instead of pulling
    bits from current vCPU state.  For some flows, e.g. nested TDP, the vCPU
    state may not be correct (or relevant).
    
    Cc: Maxim Levitsky <mlevitsk@redhat.com>
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    sean-jc authored and intel-lab-lkp committed Jun 22, 2021
  33. KVM: x86/mmu: Ignore CR0 and CR4 bits in nested EPT MMU role

    Do not incorporate CR0/CR4 bits into the role for the nested EPT MMU, as
    EPT behavior is not influenced by CR0/CR4.  Note, this is the guest_mmu,
    (L1's EPT), not nested_mmu (L2's IA32 paging); the nested_mmu does need
    CR0/CR4, and is initialized in a separate flow.
    
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    sean-jc authored and intel-lab-lkp committed Jun 22, 2021
  34. KVM: x86/mmu: Consolidate misc updates into shadow_mmu_init_context()

    Consolidate the MMU metadata update calls to deduplicate code, and to
    prep for future cleanup.
    
    No functional change intended.
    
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    sean-jc authored and intel-lab-lkp committed Jun 22, 2021
  35. KVM: x86/mmu: Add struct and helpers to retrieve MMU role bits from regs

    Introduce "struct kvm_mmu_role_regs" to hold the register state that is
    incorporated into the mmu_role.  For nested TDP, the register state that
    is factored into the MMU isn't vCPU state; the dedicated struct will be
    used to propagate the correct state throughout the flows without having
    to pass multiple params, and also provides helpers for the various flag
    accessors.
    
    Intentionally make the new helpers cumbersome/ugly by prepending four
    underscores.  In the not-too-distant future, it will be preferable to use
    the mmu_role to query bits as the mmu_role can drop irrelevant bits
    without creating contradictions, e.g. clearing CR4 bits when CR0.PG=0.
    Reserve the clean helper names (no underscores) for the mmu_role.
    
    Add a helper for vCPU conversion, which is the common case.
    
    No functional change intended.
    
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    sean-jc authored and intel-lab-lkp committed Jun 22, 2021
Older