Skip to content
Permalink
Maciej-S-Szmig…
Switch branches/tags

Commits on Aug 13, 2021

  1. KVM: Optimize overlapping memslots check

    Do a quick lookup for possibly overlapping gfns when creating or moving
    a memslot instead of performing a linear scan of the whole memslot set.
    
    Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
    maciejsszmigiero authored and intel-lab-lkp committed Aug 13, 2021
  2. KVM: Optimize gfn lookup in kvm_zap_gfn_range()

    Introduce a memslots gfn upper bound operation and use it to optimize
    kvm_zap_gfn_range().
    This way this handler can do a quick lookup for intersecting gfns and won't
    have to do a linear scan of the whole memslot set.
    
    Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
    maciejsszmigiero authored and intel-lab-lkp committed Aug 13, 2021
  3. KVM: Keep memslots in tree-based structures instead of array-based ones

    The current memslot code uses a (reverse gfn-ordered) memslot array for
    keeping track of them.
    
    Because the memslot array that is currently in use cannot be modified
    every memslot management operation (create, delete, move, change flags)
    has to make a copy of the whole array so it has a scratch copy to work on.
    
    Strictly speaking, however, it is only necessary to make copy of the
    memslot that is being modified, copying all the memslots currently present
    is just a limitation of the array-based memslot implementation.
    
    Two memslot sets, however, are still needed so the VM continues to run
    on the currently active set while the requested operation is being
    performed on the second, currently inactive one.
    
    In order to have two memslot sets, but only one copy of actual memslots
    it is necessary to split out the memslot data from the memslot sets.
    
    The memslots themselves should be also kept independent of each other
    so they can be individually added or deleted.
    
    These two memslot sets should normally point to the same set of
    memslots. They can, however, be desynchronized when performing a
    memslot management operation by replacing the memslot to be modified
    by its copy.  After the operation is complete, both memslot sets once
    again point to the same, common set of memslot data.
    
    This commit implements the aforementioned idea.
    
    For tracking of gfns an ordinary rbtree is used since memslots cannot
    overlap in the guest address space and so this data structure is
    sufficient for ensuring that lookups are done quickly.
    
    The "last used slot" mini-caches (both per-slot set one and per-vCPU one),
    that keep track of the last found-by-gfn memslot, are still present in the
    new code.
    
    Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
    Co-developed-by: Sean Christopherson <seanjc@google.com>
    maciejsszmigiero authored and intel-lab-lkp committed Aug 13, 2021
  4. KVM: s390: Introduce kvm_s390_get_gfn_end()

    And use it where s390 code would just access the memslot with the highest
    gfn directly.
    
    No functional change intended.
    
    Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
    maciejsszmigiero authored and intel-lab-lkp committed Aug 13, 2021
  5. KVM: Use interval tree to do fast hva lookup in memslots

    The current memslots implementation only allows quick binary search by gfn,
    quick lookup by hva is not possible - the implementation has to do a linear
    scan of the whole memslots array, even though the operation being performed
    might apply just to a single memslot.
    
    This significantly hurts performance of per-hva operations with higher
    memslot counts.
    
    Since hva ranges can overlap between memslots an interval tree is needed
    for tracking them.
    
    Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
    maciejsszmigiero authored and intel-lab-lkp committed Aug 13, 2021
  6. KVM: Resolve memslot ID via a hash table instead of via a static array

    Memslot ID to the corresponding memslot mappings are currently kept as
    indices in static id_to_index array.
    The size of this array depends on the maximum allowed memslot count
    (regardless of the number of memslots actually in use).
    
    This has become especially problematic recently, when memslot count cap was
    removed, so the maximum count is now full 32k memslots - the maximum
    allowed by the current KVM API.
    
    Keeping these IDs in a hash table (instead of an array) avoids this
    problem.
    
    Resolving a memslot ID to the actual memslot (instead of its index) will
    also enable transitioning away from an array-based implementation of the
    whole memslots structure in a later commit.
    
    Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
    maciejsszmigiero authored and intel-lab-lkp committed Aug 13, 2021
  7. KVM: Just resync arch fields when slots_arch_lock gets reacquired

    There is no need to copy the whole memslot data after releasing
    slots_arch_lock for a moment to install temporary memslots copy in
    kvm_set_memslot() since this lock only protects the arch field of each
    memslot.
    
    Just resync this particular field after reacquiring slots_arch_lock.
    
    Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
    maciejsszmigiero authored and intel-lab-lkp committed Aug 13, 2021
  8. KVM: Move WARN on invalid memslot index to update_memslots()

    Since kvm_memslot_move_forward() can theoretically return a negative
    memslot index even when kvm_memslot_move_backward() returned a positive one
    (and so did not WARN) let's just move the warning to the common code.
    
    Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
    maciejsszmigiero authored and intel-lab-lkp committed Aug 13, 2021
  9. KVM: Integrate gfn_to_memslot_approx() into search_memslots()

    s390 arch has gfn_to_memslot_approx() which is almost identical to
    search_memslots(), differing only in that in case the gfn falls in a hole
    one of the memslots bordering the hole is returned.
    
    Add this lookup mode as an option to search_memslots() so we don't have two
    almost identical functions for looking up a memslot by its gfn.
    
    Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
    maciejsszmigiero authored and intel-lab-lkp committed Aug 13, 2021
  10. KVM: x86: Move n_memslots_pages recalc to kvm_arch_prepare_memory_reg…

    …ion()
    
    This allows us to return a proper error code in case we spot an underflow.
    
    Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
    maciejsszmigiero authored and intel-lab-lkp committed Aug 13, 2021
  11. KVM: Add "old" memslot parameter to kvm_arch_prepare_memory_region()

    This is needed for the next commit, which moves n_memslots_pages
    recalculation from kvm_arch_commit_memory_region() to the aforementioned
    function to allow for returning an error code.
    
    While we are at it let's also rename the "memslot" parameter to "new" for
    consistency with kvm_arch_commit_memory_region(), which uses the same
    argument set now.
    
    No functional change intended.
    
    Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
    maciejsszmigiero authored and intel-lab-lkp committed Aug 13, 2021
  12. KVM: x86: Don't call kvm_mmu_change_mmu_pages() if the count hasn't c…

    …hanged
    
    There is no point in calling kvm_mmu_change_mmu_pages() for memslot
    operations that don't change the total page count, so do it just for
    KVM_MR_CREATE and KVM_MR_DELETE.
    
    Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
    maciejsszmigiero authored and intel-lab-lkp committed Aug 13, 2021
  13. KVM: x86: Cache total page count to avoid traversing the memslot array

    There is no point in recalculating from scratch the total number of pages
    in all memslots each time a memslot is created or deleted.
    
    Just cache the value and update it accordingly on each such operation so
    the code doesn't need to traverse the whole memslot array each time.
    
    Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
    maciejsszmigiero authored and intel-lab-lkp committed Aug 13, 2021

Commits on Aug 11, 2021

  1. KVM: MMU: change tracepoints arguments to kvm_page_fault

    Pass struct kvm_page_fault to tracepoints instead of
    extracting the arguments from the struct.
    
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    bonzini committed Aug 11, 2021
  2. KVM: MMU: change disallowed_hugepage_adjust() arguments to kvm_page_f…

    …ault
    
    Pass struct kvm_page_fault to disallowed_hugepage_adjust() instead of
    extracting the arguments from the struct.  Tweak a bit the conditions
    to avoid long lines.
    
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    bonzini committed Aug 11, 2021
  3. KVM: MMU: change kvm_mmu_hugepage_adjust() arguments to kvm_page_fault

    Pass struct kvm_page_fault to kvm_mmu_hugepage_adjust() instead of
    extracting the arguments from the struct; the results are also stored
    in the struct, so the callers are adjusted consequently.
    
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    bonzini committed Aug 11, 2021
  4. KVM: MMU: change fast_page_fault() arguments to kvm_page_fault

    Pass struct kvm_page_fault to fast_page_fault() instead of
    extracting the arguments from the struct.
    
    Suggested-by: Isaku Yamahata <isaku.yamahata@intel.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    bonzini committed Aug 11, 2021
  5. KVM: MMU: change tdp_mmu_map_handle_target_level() arguments to kvm_p…

    …age_fault
    
    Pass struct kvm_page_fault to tdp_mmu_map_handle_target_level() instead of
    extracting the arguments from the struct.
    
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    bonzini committed Aug 11, 2021
  6. KVM: MMU: change kvm_tdp_mmu_map() arguments to kvm_page_fault

    Pass struct kvm_page_fault to kvm_tdp_mmu_map() instead of
    extracting the arguments from the struct.
    
    Suggested-by: Isaku Yamahata <isaku.yamahata@intel.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    bonzini committed Aug 11, 2021
  7. KVM: MMU: change FNAME(fetch)() arguments to kvm_page_fault

    Pass struct kvm_page_fault to FNAME(fetch)() instead of
    extracting the arguments from the struct.
    
    Suggested-by: Isaku Yamahata <isaku.yamahata@intel.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    bonzini committed Aug 11, 2021
  8. KVM: MMU: change __direct_map() arguments to kvm_page_fault

    Pass struct kvm_page_fault to __direct_map() instead of
    extracting the arguments from the struct.
    
    Suggested-by: Isaku Yamahata <isaku.yamahata@intel.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    bonzini committed Aug 11, 2021
  9. KVM: MMU: change handle_abnormal_pfn() arguments to kvm_page_fault

    Pass struct kvm_page_fault to handle_abnormal_pfn() instead of
    extracting the arguments from the struct.
    
    Suggested-by: Isaku Yamahata <isaku.yamahata@intel.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    bonzini committed Aug 11, 2021
  10. KVM: MMU: change try_async_pf() arguments to kvm_page_fault

    Add fields to struct kvm_page_fault corresponding to outputs of
    try_async_pf().  For now they have to be extracted again from struct
    kvm_page_fault in the subsequent steps, but this is temporary until
    other functions in the chain are switched over as well.
    
    Suggested-by: Isaku Yamahata <isaku.yamahata@intel.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    bonzini committed Aug 11, 2021
  11. KVM: MMU: change page_fault_handle_page_track() arguments to kvm_page…

    …_fault
    
    Add fields to struct kvm_page_fault corresponding to the arguments
    of page_fault_handle_page_track().  The fields are initialized in the
    callers, and page_fault_handle_page_track() receives a struct
    kvm_page_fault instead of having to extract the arguments out of it.
    
    Suggested-by: Isaku Yamahata <isaku.yamahata@intel.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    bonzini committed Aug 11, 2021
  12. KVM: MMU: change direct_page_fault() arguments to kvm_page_fault

    Add fields to struct kvm_page_fault corresponding to
    the arguments of direct_page_fault().  The fields are
    initialized in the callers, and direct_page_fault()
    receives a struct kvm_page_fault instead of having to
    extract the arguments out of it.
    
    Also adjust FNAME(page_fault) to store the max_level in
    struct kvm_page_fault, to keep it similar to the direct
    map path.
    
    Suggested-by: Isaku Yamahata <isaku.yamahata@intel.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    bonzini committed Aug 11, 2021
  13. KVM: MMU: change mmu->page_fault() arguments to kvm_page_fault

    Pass struct kvm_page_fault to mmu->page_fault() instead of
    extracting the arguments from the struct.  FNAME(page_fault) can use
    the precomputed bools from the error code.
    
    Suggested-by: Isaku Yamahata <isaku.yamahata@intel.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    bonzini committed Aug 11, 2021
  14. KVM: MMU: Introduce struct kvm_page_fault

    Create a single structure for arguments that are passed from
    kvm_mmu_do_page_fault to the page fault handlers.  Later
    the structure will grow to include various output parameters
    that are passed back to the next steps in the page fault
    handling.
    
    Suggested-by: Isaku Yamahata <isaku.yamahata@intel.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    bonzini committed Aug 11, 2021
  15. KVM: x86: clamp host mapping level to max_level in kvm_mmu_max_mappin…

    …g_level
    
    This patch started as a way to make kvm_mmu_hugepage_adjust a bit simpler,
    in preparation for switching it to struct kvm_page_fault, but it does
    fix a microscopic bug in zapping collapsible PTEs.
    
    If a large page size is disallowed but not all of them, kvm_mmu_max_mapping_level
    will return the host mapping level and the small PTEs will be zapped up
    to that level.  However, if e.g. 1GB are prohibited, we can still zap 4KB
    mapping and preserve the 2MB ones.  This can happen for example when NX
    huge pages are in use.
    
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    bonzini committed Aug 11, 2021
  16. KVM: MMU: pass unadulterated gpa to direct_page_fault

    Do not bother removing the low bits of the gpa.  This masking dates back
    to the very first commit of KVM but it is unnecessary---or even
    problematic, because the gpa is later used to fill in the MMIO page cache.
    
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    bonzini committed Aug 11, 2021
  17. KVM: x86/mmu: Drop 'shared' param from tdp_mmu_link_page()

    Drop @shared from tdp_mmu_link_page() and hardcode it to work for
    mmu_lock being held for read.  The helper has exactly one caller and
    in all likelihood will only ever have exactly one caller.  Even if KVM
    adds a path to install translations without an initiating page fault,
    odds are very, very good that the path will just be a wrapper to the
    "page fault" handler (both SNP and TDX RFCs propose patches to do
    exactly that).
    
    No functional change intended.
    
    Cc: Ben Gardon <bgardon@google.com>
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    Message-Id: <20210810224554.2978735-3-seanjc@google.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    sean-jc authored and bonzini committed Aug 11, 2021
  18. KVM: x86/mmu: Add detailed page size stats

    Existing KVM code tracks the number of large pages regardless of their
    sizes. Therefore, when large page of 1GB (or larger) is adopted, the
    information becomes less useful because lpages counts a mix of 1G and 2M
    pages.
    
    So remove the lpages since it is easy for user space to aggregate the info.
    Instead, provide a comprehensive page stats of all sizes from 4K to 512G.
    
    Suggested-by: Ben Gardon <bgardon@google.com>
    
    Reviewed-by: David Matlack <dmatlack@google.com>
    Reviewed-by: Ben Gardon <bgardon@google.com>
    Signed-off-by: Mingwei Zhang <mizhang@google.com>
    Cc: Jing Zhang <jingzhangos@google.com>
    Cc: David Matlack <dmatlack@google.com>
    Cc: Sean Christopherson <seanjc@google.com>
    Message-Id: <20210803044607.599629-4-mizhang@google.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    mzhang3579 authored and bonzini committed Aug 11, 2021
  19. KVM: x86/mmu: Avoid collision with !PRESENT SPTEs in TDP MMU lpage stats

    Factor in whether or not the old/new SPTEs are shadow-present when
    adjusting the large page stats in the TDP MMU.  A modified MMIO SPTE can
    toggle the page size bit, as bit 7 is used to store the MMIO generation,
    i.e. is_large_pte() can get a false positive when called on a MMIO SPTE.
    Ditto for nuking SPTEs with REMOVED_SPTE, which sets bit 7 in its magic
    value.
    
    Opportunistically move the logic below the check to verify at least one
    of the old/new SPTEs is shadow present.
    
    Use is/was_leaf even though is/was_present would suffice.  The code
    generation is roughly equivalent since all flags need to be computed
    prior to the code in question, and using the *_leaf flags will minimize
    the diff in a future enhancement to account all pages, i.e. will change
    the check to "is_leaf != was_leaf".
    
    Reviewed-by: David Matlack <dmatlack@google.com>
    Reviewed-by: Ben Gardon <bgardon@google.com>
    
    Fixes: 1699f65 ("kvm/x86: Fix 'lpages' kvm stat for TDM MMU")
    Cc: stable@vger.kernel.org
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    Signed-off-by: Mingwei Zhang <mizhang@google.com>
    Message-Id: <20210803044607.599629-3-mizhang@google.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    sean-jc authored and bonzini committed Aug 11, 2021
  20. KVM: x86/mmu: Remove redundant spte present check in mmu_set_spte

    Drop an unnecessary is_shadow_present_pte() check when updating the rmaps
    after installing a non-MMIO SPTE.  set_spte() is used only to create
    shadow-present SPTEs, e.g. MMIO SPTEs are handled early on, mmu_set_spte()
    runs with mmu_lock held for write, i.e. the SPTE can't be zapped between
    writing the SPTE and updating the rmaps.
    
    Opportunistically combine the "new SPTE" logic for large pages and rmaps.
    
    No functional change intended.
    
    Suggested-by: Ben Gardon <bgardon@google.com>
    
    Reviewed-by: David Matlack <dmatlack@google.com>
    Reviewed-by: Ben Gardon <bgardon@google.com>
    Reviewed-by: Sean Christopherson <seanjc@google.com>
    Signed-off-by: Mingwei Zhang <mizhang@google.com>
    Message-Id: <20210803044607.599629-2-mizhang@google.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    mzhang3579 authored and bonzini committed Aug 11, 2021
  21. KVM: stats: Add halt polling related histogram stats

    Add three log histogram stats to record the distribution of time spent
    on successful polling, failed polling and VCPU wait.
    halt_poll_success_hist: Distribution of spent time for a successful poll.
    halt_poll_fail_hist: Distribution of spent time for a failed poll.
    halt_wait_hist: Distribution of time a VCPU has spent on waiting.
    
    Signed-off-by: Jing Zhang <jingzhangos@google.com>
    Message-Id: <20210802165633.1866976-6-jingzhangos@google.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    jingzhangos authored and bonzini committed Aug 11, 2021
  22. KVM: stats: Add halt_wait_ns stats for all architectures

    Add simple stats halt_wait_ns to record the time a VCPU has spent on
    waiting for all architectures (not just powerpc).
    
    Signed-off-by: Jing Zhang <jingzhangos@google.com>
    Message-Id: <20210802165633.1866976-5-jingzhangos@google.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    jingzhangos authored and bonzini committed Aug 11, 2021
Older