Maciej-S-Szmig…
Commits on Aug 13, 2021
-
KVM: Optimize overlapping memslots check
Do a quick lookup for possibly overlapping gfns when creating or moving a memslot instead of performing a linear scan of the whole memslot set. Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
-
KVM: Optimize gfn lookup in kvm_zap_gfn_range()
Introduce a memslots gfn upper bound operation and use it to optimize kvm_zap_gfn_range(). This way this handler can do a quick lookup for intersecting gfns and won't have to do a linear scan of the whole memslot set. Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
-
KVM: Keep memslots in tree-based structures instead of array-based ones
The current memslot code uses a (reverse gfn-ordered) memslot array for keeping track of them. Because the memslot array that is currently in use cannot be modified every memslot management operation (create, delete, move, change flags) has to make a copy of the whole array so it has a scratch copy to work on. Strictly speaking, however, it is only necessary to make copy of the memslot that is being modified, copying all the memslots currently present is just a limitation of the array-based memslot implementation. Two memslot sets, however, are still needed so the VM continues to run on the currently active set while the requested operation is being performed on the second, currently inactive one. In order to have two memslot sets, but only one copy of actual memslots it is necessary to split out the memslot data from the memslot sets. The memslots themselves should be also kept independent of each other so they can be individually added or deleted. These two memslot sets should normally point to the same set of memslots. They can, however, be desynchronized when performing a memslot management operation by replacing the memslot to be modified by its copy. After the operation is complete, both memslot sets once again point to the same, common set of memslot data. This commit implements the aforementioned idea. For tracking of gfns an ordinary rbtree is used since memslots cannot overlap in the guest address space and so this data structure is sufficient for ensuring that lookups are done quickly. The "last used slot" mini-caches (both per-slot set one and per-vCPU one), that keep track of the last found-by-gfn memslot, are still present in the new code. Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Co-developed-by: Sean Christopherson <seanjc@google.com>
-
KVM: s390: Introduce kvm_s390_get_gfn_end()
And use it where s390 code would just access the memslot with the highest gfn directly. No functional change intended. Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
-
KVM: Use interval tree to do fast hva lookup in memslots
The current memslots implementation only allows quick binary search by gfn, quick lookup by hva is not possible - the implementation has to do a linear scan of the whole memslots array, even though the operation being performed might apply just to a single memslot. This significantly hurts performance of per-hva operations with higher memslot counts. Since hva ranges can overlap between memslots an interval tree is needed for tracking them. Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
-
KVM: Resolve memslot ID via a hash table instead of via a static array
Memslot ID to the corresponding memslot mappings are currently kept as indices in static id_to_index array. The size of this array depends on the maximum allowed memslot count (regardless of the number of memslots actually in use). This has become especially problematic recently, when memslot count cap was removed, so the maximum count is now full 32k memslots - the maximum allowed by the current KVM API. Keeping these IDs in a hash table (instead of an array) avoids this problem. Resolving a memslot ID to the actual memslot (instead of its index) will also enable transitioning away from an array-based implementation of the whole memslots structure in a later commit. Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
-
KVM: Just resync arch fields when slots_arch_lock gets reacquired
There is no need to copy the whole memslot data after releasing slots_arch_lock for a moment to install temporary memslots copy in kvm_set_memslot() since this lock only protects the arch field of each memslot. Just resync this particular field after reacquiring slots_arch_lock. Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
-
KVM: Move WARN on invalid memslot index to update_memslots()
Since kvm_memslot_move_forward() can theoretically return a negative memslot index even when kvm_memslot_move_backward() returned a positive one (and so did not WARN) let's just move the warning to the common code. Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
-
KVM: Integrate gfn_to_memslot_approx() into search_memslots()
s390 arch has gfn_to_memslot_approx() which is almost identical to search_memslots(), differing only in that in case the gfn falls in a hole one of the memslots bordering the hole is returned. Add this lookup mode as an option to search_memslots() so we don't have two almost identical functions for looking up a memslot by its gfn. Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
-
KVM: x86: Move n_memslots_pages recalc to kvm_arch_prepare_memory_reg…
…ion() This allows us to return a proper error code in case we spot an underflow. Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
-
KVM: Add "old" memslot parameter to kvm_arch_prepare_memory_region()
This is needed for the next commit, which moves n_memslots_pages recalculation from kvm_arch_commit_memory_region() to the aforementioned function to allow for returning an error code. While we are at it let's also rename the "memslot" parameter to "new" for consistency with kvm_arch_commit_memory_region(), which uses the same argument set now. No functional change intended. Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
-
KVM: x86: Don't call kvm_mmu_change_mmu_pages() if the count hasn't c…
…hanged There is no point in calling kvm_mmu_change_mmu_pages() for memslot operations that don't change the total page count, so do it just for KVM_MR_CREATE and KVM_MR_DELETE. Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
-
KVM: x86: Cache total page count to avoid traversing the memslot array
There is no point in recalculating from scratch the total number of pages in all memslots each time a memslot is created or deleted. Just cache the value and update it accordingly on each such operation so the code doesn't need to traverse the whole memslot array each time. Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Commits on Aug 11, 2021
-
KVM: MMU: change tracepoints arguments to kvm_page_fault
Pass struct kvm_page_fault to tracepoints instead of extracting the arguments from the struct. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
KVM: MMU: change disallowed_hugepage_adjust() arguments to kvm_page_f…
…ault Pass struct kvm_page_fault to disallowed_hugepage_adjust() instead of extracting the arguments from the struct. Tweak a bit the conditions to avoid long lines. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
KVM: MMU: change kvm_mmu_hugepage_adjust() arguments to kvm_page_fault
Pass struct kvm_page_fault to kvm_mmu_hugepage_adjust() instead of extracting the arguments from the struct; the results are also stored in the struct, so the callers are adjusted consequently. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
KVM: MMU: change fast_page_fault() arguments to kvm_page_fault
Pass struct kvm_page_fault to fast_page_fault() instead of extracting the arguments from the struct. Suggested-by: Isaku Yamahata <isaku.yamahata@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
KVM: MMU: change tdp_mmu_map_handle_target_level() arguments to kvm_p…
…age_fault Pass struct kvm_page_fault to tdp_mmu_map_handle_target_level() instead of extracting the arguments from the struct. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
KVM: MMU: change kvm_tdp_mmu_map() arguments to kvm_page_fault
Pass struct kvm_page_fault to kvm_tdp_mmu_map() instead of extracting the arguments from the struct. Suggested-by: Isaku Yamahata <isaku.yamahata@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
KVM: MMU: change FNAME(fetch)() arguments to kvm_page_fault
Pass struct kvm_page_fault to FNAME(fetch)() instead of extracting the arguments from the struct. Suggested-by: Isaku Yamahata <isaku.yamahata@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
KVM: MMU: change __direct_map() arguments to kvm_page_fault
Pass struct kvm_page_fault to __direct_map() instead of extracting the arguments from the struct. Suggested-by: Isaku Yamahata <isaku.yamahata@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
KVM: MMU: change handle_abnormal_pfn() arguments to kvm_page_fault
Pass struct kvm_page_fault to handle_abnormal_pfn() instead of extracting the arguments from the struct. Suggested-by: Isaku Yamahata <isaku.yamahata@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
KVM: MMU: change try_async_pf() arguments to kvm_page_fault
Add fields to struct kvm_page_fault corresponding to outputs of try_async_pf(). For now they have to be extracted again from struct kvm_page_fault in the subsequent steps, but this is temporary until other functions in the chain are switched over as well. Suggested-by: Isaku Yamahata <isaku.yamahata@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
KVM: MMU: change page_fault_handle_page_track() arguments to kvm_page…
…_fault Add fields to struct kvm_page_fault corresponding to the arguments of page_fault_handle_page_track(). The fields are initialized in the callers, and page_fault_handle_page_track() receives a struct kvm_page_fault instead of having to extract the arguments out of it. Suggested-by: Isaku Yamahata <isaku.yamahata@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
KVM: MMU: change direct_page_fault() arguments to kvm_page_fault
Add fields to struct kvm_page_fault corresponding to the arguments of direct_page_fault(). The fields are initialized in the callers, and direct_page_fault() receives a struct kvm_page_fault instead of having to extract the arguments out of it. Also adjust FNAME(page_fault) to store the max_level in struct kvm_page_fault, to keep it similar to the direct map path. Suggested-by: Isaku Yamahata <isaku.yamahata@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
KVM: MMU: change mmu->page_fault() arguments to kvm_page_fault
Pass struct kvm_page_fault to mmu->page_fault() instead of extracting the arguments from the struct. FNAME(page_fault) can use the precomputed bools from the error code. Suggested-by: Isaku Yamahata <isaku.yamahata@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
KVM: MMU: Introduce struct kvm_page_fault
Create a single structure for arguments that are passed from kvm_mmu_do_page_fault to the page fault handlers. Later the structure will grow to include various output parameters that are passed back to the next steps in the page fault handling. Suggested-by: Isaku Yamahata <isaku.yamahata@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
KVM: x86: clamp host mapping level to max_level in kvm_mmu_max_mappin…
…g_level This patch started as a way to make kvm_mmu_hugepage_adjust a bit simpler, in preparation for switching it to struct kvm_page_fault, but it does fix a microscopic bug in zapping collapsible PTEs. If a large page size is disallowed but not all of them, kvm_mmu_max_mapping_level will return the host mapping level and the small PTEs will be zapped up to that level. However, if e.g. 1GB are prohibited, we can still zap 4KB mapping and preserve the 2MB ones. This can happen for example when NX huge pages are in use. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
KVM: MMU: pass unadulterated gpa to direct_page_fault
Do not bother removing the low bits of the gpa. This masking dates back to the very first commit of KVM but it is unnecessary---or even problematic, because the gpa is later used to fill in the MMIO page cache. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
KVM: x86/mmu: Drop 'shared' param from tdp_mmu_link_page()
Drop @shared from tdp_mmu_link_page() and hardcode it to work for mmu_lock being held for read. The helper has exactly one caller and in all likelihood will only ever have exactly one caller. Even if KVM adds a path to install translations without an initiating page fault, odds are very, very good that the path will just be a wrapper to the "page fault" handler (both SNP and TDX RFCs propose patches to do exactly that). No functional change intended. Cc: Ben Gardon <bgardon@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210810224554.2978735-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
KVM: x86/mmu: Add detailed page size stats
Existing KVM code tracks the number of large pages regardless of their sizes. Therefore, when large page of 1GB (or larger) is adopted, the information becomes less useful because lpages counts a mix of 1G and 2M pages. So remove the lpages since it is easy for user space to aggregate the info. Instead, provide a comprehensive page stats of all sizes from 4K to 512G. Suggested-by: Ben Gardon <bgardon@google.com> Reviewed-by: David Matlack <dmatlack@google.com> Reviewed-by: Ben Gardon <bgardon@google.com> Signed-off-by: Mingwei Zhang <mizhang@google.com> Cc: Jing Zhang <jingzhangos@google.com> Cc: David Matlack <dmatlack@google.com> Cc: Sean Christopherson <seanjc@google.com> Message-Id: <20210803044607.599629-4-mizhang@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
KVM: x86/mmu: Avoid collision with !PRESENT SPTEs in TDP MMU lpage stats
Factor in whether or not the old/new SPTEs are shadow-present when adjusting the large page stats in the TDP MMU. A modified MMIO SPTE can toggle the page size bit, as bit 7 is used to store the MMIO generation, i.e. is_large_pte() can get a false positive when called on a MMIO SPTE. Ditto for nuking SPTEs with REMOVED_SPTE, which sets bit 7 in its magic value. Opportunistically move the logic below the check to verify at least one of the old/new SPTEs is shadow present. Use is/was_leaf even though is/was_present would suffice. The code generation is roughly equivalent since all flags need to be computed prior to the code in question, and using the *_leaf flags will minimize the diff in a future enhancement to account all pages, i.e. will change the check to "is_leaf != was_leaf". Reviewed-by: David Matlack <dmatlack@google.com> Reviewed-by: Ben Gardon <bgardon@google.com> Fixes: 1699f65 ("kvm/x86: Fix 'lpages' kvm stat for TDM MMU") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Mingwei Zhang <mizhang@google.com> Message-Id: <20210803044607.599629-3-mizhang@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
KVM: x86/mmu: Remove redundant spte present check in mmu_set_spte
Drop an unnecessary is_shadow_present_pte() check when updating the rmaps after installing a non-MMIO SPTE. set_spte() is used only to create shadow-present SPTEs, e.g. MMIO SPTEs are handled early on, mmu_set_spte() runs with mmu_lock held for write, i.e. the SPTE can't be zapped between writing the SPTE and updating the rmaps. Opportunistically combine the "new SPTE" logic for large pages and rmaps. No functional change intended. Suggested-by: Ben Gardon <bgardon@google.com> Reviewed-by: David Matlack <dmatlack@google.com> Reviewed-by: Ben Gardon <bgardon@google.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Mingwei Zhang <mizhang@google.com> Message-Id: <20210803044607.599629-2-mizhang@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
KVM: stats: Add halt polling related histogram stats
Add three log histogram stats to record the distribution of time spent on successful polling, failed polling and VCPU wait. halt_poll_success_hist: Distribution of spent time for a successful poll. halt_poll_fail_hist: Distribution of spent time for a failed poll. halt_wait_hist: Distribution of time a VCPU has spent on waiting. Signed-off-by: Jing Zhang <jingzhangos@google.com> Message-Id: <20210802165633.1866976-6-jingzhangos@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
KVM: stats: Add halt_wait_ns stats for all architectures
Add simple stats halt_wait_ns to record the time a VCPU has spent on waiting for all architectures (not just powerpc). Signed-off-by: Jing Zhang <jingzhangos@google.com> Message-Id: <20210802165633.1866976-5-jingzhangos@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>