Jing-Zhang/KVM…
Commits on Aug 10, 2021
-
KVM: stats: Add VM dirty_pages stats for the number of dirty pages
Add a generic VM stats dirty_pages to record the number of dirty pages reflected in dirty_bitmap at the moment. Original-by: Peter Feiner <pfeiner@google.com> Signed-off-by: Jing Zhang <jingzhangos@google.com>
Commits on Aug 7, 2021
-
KVM: selftests: Add a test of an unbacked nested PI descriptor
Add a regression test for the unsupported configuration of a VMCS12 posted interrupt descriptor that has no backing memory in L1. KVM should exit to userspace with KVM_INTERNAL_ERROR rather than just silently doing something wrong. Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Oliver Upton <oupton@google.com> Message-Id: <20210604172611.281819-13-jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
KVM: selftests: Introduce prepare_tpr_shadow
Add support for yet another page to hang from the VMCS12 for nested VMX testing: the virtual APIC page. This page is necessary for a VMCS12 to be launched with the "use TPR shadow" VM-execution control set (except in some oddball circumstances permitted by KVM). Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Oliver Upton <oupton@google.com> Message-Id: <20210604172611.281819-12-jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
KVM: x86: Exit to userspace when kvm_check_nested_events fails
If kvm_check_nested_events fails due to raising an EXIT_REASON_INTERNAL_ERROR, propagate it to userspace immediately, even if the vCPU would otherwise be sleeping. This happens for example when the posted interrupt descriptor points outside guest memory. Fixes: 966eefb ("KVM: nVMX: Disable vmcs02 posted interrupts if vmcs12 PID isn't mappable") Cc: stable@vger.kernel.org Reported-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
KVM: X86: Introduce mmu_rmaps_stat per-vm debugfs file
Use this file to dump rmap statistic information. The statistic is done by calculating the rmap count and the result is log-2-based. An example output of this looks like (idle 6GB guest, right after boot linux): Rmap_Count: 0 1 2-3 4-7 8-15 16-31 32-63 64-127 128-255 256-511 512-1023 Level=4K: 3086676 53045 12330 1272 502 121 76 2 0 0 0 Level=2M: 5947 231 0 0 0 0 0 0 0 0 0 Level=1G: 32 0 0 0 0 0 0 0 0 0 0 Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20210730220455.26054-5-peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
KVM: X86: Introduce kvm_mmu_slot_lpages() helpers
Introduce kvm_mmu_slot_lpages() to calculcate lpage_info and rmap array size. The other __kvm_mmu_slot_lpages() can take an extra parameter of npages rather than fetching from the memslot pointer. Start to use the latter one in kvm_alloc_memslot_metadata(). Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20210730220455.26054-4-peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
KVM: Allow to have arch-specific per-vm debugfs files
Allow archs to create arch-specific nodes under kvm->debugfs_dentry directory besides the stats fields. The new interface kvm_arch_create_vm_debugfs() is defined but not yet used. It's called after kvm->debugfs_dentry is created, so it can be referenced directly in kvm_arch_create_vm_debugfs(). Arch should define their own versions when they want to create extra debugfs nodes. Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20210730220455.26054-2-peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Commits on Aug 6, 2021
-
KVM: selftests: Move vcpu_args_set into perf_test_util
perf_test_util is used to set up KVM selftests where vCPUs touch a region of memory. The guest code is implemented in perf_test_util.c (not the calling selftests). The guest code requires a 1 parameter, the vcpuid, which has to be set by calling vcpu_args_set(vm, vcpu_id, 1, vcpu_id). Today all of the selftests that use perf_test_util are making this call. Instead, perf_test_util should just do it. This will save some code but more importantly prevents mistakes since totally non-obvious that this needs to be called and failing to do so results in vCPUs not accessing the right regions of memory. Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20210805172821.2622793-1-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
KVM: selftests: Support multiple slots in dirty_log_perf_test
Introduce a new option to dirty_log_perf_test: -x number_of_slots. This causes the test to attempt to split the region of memory into the given number of slots. If the region cannot be evenly divided, the test will fail. This allows testing with more than one slot and therefore measure how performance scales with the number of memslots. Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20210804222844.1419481-8-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
KVM: x86/mmu: Rename __gfn_to_rmap to gfn_to_rmap
gfn_to_rmap was removed in the previous patch so there is no need to retain the double underscore on __gfn_to_rmap. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20210804222844.1419481-7-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
KVM: x86/mmu: Leverage vcpu->last_used_slot for rmap_add and rmap_rec…
…ycle rmap_add() and rmap_recycle() both run in the context of the vCPU and thus we can use kvm_vcpu_gfn_to_memslot() to look up the memslot. This enables rmap_add() and rmap_recycle() to take advantage of vcpu->last_used_slot and avoid expensive memslot searching. This change improves the performance of "Populate memory time" in dirty_log_perf_test with tdp_mmu=N. In addition to improving the performance, "Populate memory time" no longer scales with the number of memslots in the VM. Command | Before | After ------------------------------- | ---------------- | ------------- ./dirty_log_perf_test -v64 -x1 | 15.18001570s | 14.99469366s ./dirty_log_perf_test -v64 -x64 | 18.71336392s | 14.98675076s Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20210804222844.1419481-6-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
KVM: x86/mmu: Leverage vcpu->last_used_slot in tdp_mmu_map_handle_tar…
…get_level The existing TDP MMU methods to handle dirty logging are vcpu-agnostic since they can be driven by MMU notifiers and other non-vcpu-specific events in addition to page faults. However this means that the TDP MMU is not benefiting from the new vcpu->last_used_slot. Fix that by introducing a tdp_mmu_map_set_spte_atomic() which is only called during a TDP page fault and has access to the kvm_vcpu for fast slot lookups. This improves "Populate memory time" in dirty_log_perf_test by 5%: Command | Before | After ------------------------------- | ---------------- | ------------- ./dirty_log_perf_test -v64 -x64 | 5.472321072s | 5.169832886s Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20210804222844.1419481-5-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
KVM: Cache the last used slot index per vCPU
The memslot for a given gfn is looked up multiple times during page fault handling. Avoid binary searching for it multiple times by caching the most recently used slot. There is an existing VM-wide last_used_slot but that does not work well for cases where vCPUs are accessing memory in different slots (see performance data below). Another benefit of caching the most recently use slot (versus looking up the slot once and passing around a pointer) is speeding up memslot lookups *across* faults and during spte prefetching. To measure the performance of this change I ran dirty_log_perf_test with 64 vCPUs and 64 memslots and measured "Populate memory time" and "Iteration 2 dirty memory time". Tests were ran with eptad=N to force dirty logging to use fast_page_fault so its performance could be measured. Config | Metric | Before | After ---------- | ----------------------------- | ------ | ------ tdp_mmu=Y | Populate memory time | 6.76s | 5.47s tdp_mmu=Y | Iteration 2 dirty memory time | 2.83s | 0.31s tdp_mmu=N | Populate memory time | 20.4s | 18.7s tdp_mmu=N | Iteration 2 dirty memory time | 2.65s | 0.30s The "Iteration 2 dirty memory time" results are especially compelling because they are equivalent to running the same test with a single memslot. In other words, fast_page_fault performance no longer scales with the number of memslots. Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20210804222844.1419481-4-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
KVM: Move last_used_slot logic out of search_memslots
Make search_memslots unconditionally search all memslots and move the last_used_slot logic up one level to __gfn_to_memslot. This is in preparation for introducing a per-vCPU last_used_slot. As part of this change convert existing callers of search_memslots to __gfn_to_memslot to avoid making any functional changes. Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20210804222844.1419481-3-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
KVM: Rename lru_slot to last_used_slot
lru_slot is used to keep track of the index of the most-recently used memslot. The correct acronym would be "mru" but that is not a common acronym. So call it last_used_slot which is a bit more obvious. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20210804222844.1419481-2-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Commits on Aug 5, 2021
-
KVM: xen: do not use struct gfn_to_hva_cache
gfn_to_hva_cache is not thread-safe, so it is usually used only within a vCPU (whose code is protected by vcpu->mutex). The Xen interface implementation has such a cache in kvm->arch, but it is not really used except to store the location of the shared info page. Replace shinfo_set and shinfo_cache with just the value that is passed via KVM_XEN_ATTR_TYPE_SHARED_INFO; the only complication is that the initialization value is not zero anymore and therefore kvm_xen_init_vm needs to be introduced. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Commits on Aug 4, 2021
-
KVM: x86/pmu: Introduce pmc->is_paused to reduce the call time of per…
…f interfaces Based on our observations, after any vm-exit associated with vPMU, there are at least two or more perf interfaces to be called for guest counter emulation, such as perf_event_{pause, read_value, period}(), and each one will {lock, unlock} the same perf_event_ctx. The frequency of calls becomes more severe when guest use counters in a multiplexed manner. Holding a lock once and completing the KVM request operations in the perf context would introduce a set of impractical new interfaces. So we can further optimize the vPMU implementation by avoiding repeated calls to these interfaces in the KVM context for at least one pattern: After we call perf_event_pause() once, the event will be disabled and its internal count will be reset to 0. So there is no need to pause it again or read its value. Once the event is paused, event period will not be updated until the next time it's resumed or reprogrammed. And there is also no need to call perf_event_period twice for a non-running counter, considering the perf_event for a running counter is never paused. Based on this implementation, for the following common usage of sampling 4 events using perf on a 4u8g guest: echo 0 > /proc/sys/kernel/watchdog echo 25 > /proc/sys/kernel/perf_cpu_time_max_percent echo 10000 > /proc/sys/kernel/perf_event_max_sample_rate echo 0 > /proc/sys/kernel/perf_cpu_time_max_percent for i in `seq 1 1 10` do taskset -c 0 perf record \ -e cpu-cycles -e instructions -e branch-instructions -e cache-misses \ /root/br_instr a done the average latency of the guest NMI handler is reduced from 37646.7 ns to 32929.3 ns (~1.14x speed up) on the Intel ICX server. Also, in addition to collecting more samples, no loss of sampling accuracy was observed compared to before the optimization. Signed-off-by: Like Xu <likexu@tencent.com> Message-Id: <20210728120705.6855-1-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Peter Zijlstra <peterz@infradead.org> -
KVM: X86: Optimize zapping rmap
Using rmap_get_first() and rmap_remove() for zapping a huge rmap list could be slow. The easy way is to travers the rmap list, collecting the a/d bits and free the slots along the way. Provide a pte_list_destroy() and do exactly that. Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20210730220605.26377-1-peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
KVM: X86: Optimize pte_list_desc with per-array counter
Add a counter field into pte_list_desc, so as to simplify the add/remove/loop logic. E.g., we don't need to loop over the array any more for most reasons. This will make more sense after we've switched the array size to be larger otherwise the counter will be a waste. Initially I wanted to store a tail pointer at the head of the array list so we don't need to traverse the list at least for pushing new ones (if without the counter we traverse both the list and the array). However that'll need slightly more change without a huge lot benefit, e.g., after we grow entry numbers per array the list traversing is not so expensive. So let's be simple but still try to get as much benefit as we can with just these extra few lines of changes (not to mention the code looks easier too without looping over arrays). I used the same a test case to fork 500 child and recycle them ("./rmap_fork 500" [1]), this patch further speeds up the total fork time of about 4%, which is a total of 33% of vanilla kernel: Vanilla: 473.90 (+-5.93%) 3->15 slots: 366.10 (+-4.94%) Add counter: 351.00 (+-3.70%) [1] xzpeter/clibs@825436f Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20210730220602.26327-1-peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> -
KVM: X86: MMU: Tune PTE_LIST_EXT to be bigger
Currently rmap array element only contains 3 entries. However for EPT=N there could have a lot of guest pages that got tens of even hundreds of rmap entry. A normal distribution of a 6G guest (even if idle) shows this with rmap count statistics: Rmap_Count: 0 1 2-3 4-7 8-15 16-31 32-63 64-127 128-255 256-511 512-1023 Level=4K: 3089171 49005 14016 1363 235 212 15 7 0 0 0 Level=2M: 5951 227 0 0 0 0 0 0 0 0 0 Level=1G: 32 0 0 0 0 0 0 0 0 0 0 If we do some more fork some pages will grow even larger rmap counts. This patch makes PTE_LIST_EXT bigger so it'll be more efficient for the general use case of EPT=N as we do list reference less and the loops over PTE_LIST_EXT will be slightly more efficient; but still not too large so less waste when array not full. It should not affecting EPT=Y since EPT normally only has zero or one rmap entry for each page, so no array is even allocated. With a test case to fork 500 child and recycle them ("./rmap_fork 500" [1]), this patch speeds up fork time of about 29%. Before: 473.90 (+-5.93%) After: 366.10 (+-4.94%) [1] xzpeter/clibs@825436f Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20210730220455.26054-6-peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Commits on Aug 3, 2021
-
KVM: const-ify all relevant uses of struct kvm_memory_slot
As alluded to in commit f36f3f2 ("KVM: add "new" argument to kvm_arch_commit_memory_region"), a bunch of other places where struct kvm_memory_slot is used, needs to be refactored to preserve the "const"ness of struct kvm_memory_slot across-the-board. Signed-off-by: Hamza Mahfooz <someguy@effective-light.com> Message-Id: <20210713023338.57108-1-someguy@effective-light.com> [Do not touch body of slot_rmap_walk_init. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
KVM: Don't take mmu_lock for range invalidation unless necessary
Avoid taking mmu_lock for .invalidate_range_{start,end}() notifications that are unrelated to KVM. This is possible now that memslot updates are blocked from range_start() to range_end(); that ensures that lock elision happens in both or none, and therefore that mmu_notifier_count updates (which must occur while holding mmu_lock for write) are always paired across start->end. Based on patches originally written by Ben Gardon. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> -
KVM: Block memslot updates across range_start() and range_end()
We would like to avoid taking mmu_lock for .invalidate_range_{start,end}() notifications that are unrelated to KVM. Because mmu_notifier_count must be modified while holding mmu_lock for write, and must always be paired across start->end to stay balanced, lock elision must happen in both or none. Therefore, in preparation for this change, this patch prevents memslot updates across range_start() and range_end(). Note, technically flag-only memslot updates could be allowed in parallel, but stalling a memslot update for a relatively short amount of time is not a scalability issue, and this is all more than complex enough. A long note on the locking: a previous version of the patch used an rwsem to block the memslot update while the MMU notifier run, but this resulted in the following deadlock involving the pseudo-lock tagged as "mmu_notifier_invalidate_range_start". ====================================================== WARNING: possible circular locking dependency detected 5.12.0-rc3+ torvalds#6 Tainted: G OE ------------------------------------------------------ qemu-system-x86/3069 is trying to acquire lock: ffffffff9c775ca0 (mmu_notifier_invalidate_range_start){+.+.}-{0:0}, at: __mmu_notifier_invalidate_range_end+0x5/0x190 but task is already holding lock: ffffaff7410a9160 (&kvm->mmu_notifier_slots_lock){.+.+}-{3:3}, at: kvm_mmu_notifier_invalidate_range_start+0x36d/0x4f0 [kvm] which lock already depends on the new lock. This corresponds to the following MMU notifier logic: invalidate_range_start take pseudo lock down_read() (*) release pseudo lock invalidate_range_end take pseudo lock (**) up_read() release pseudo lock At point (*) we take the mmu_notifiers_slots_lock inside the pseudo lock; at point (**) we take the pseudo lock inside the mmu_notifiers_slots_lock. This could cause a deadlock (ignoring for a second that the pseudo lock is not a lock): - invalidate_range_start waits on down_read(), because the rwsem is held by install_new_memslots - install_new_memslots waits on down_write(), because the rwsem is held till (another) invalidate_range_end finishes - invalidate_range_end sits waits on the pseudo lock, held by invalidate_range_start. Removing the fairness of the rwsem breaks the cycle (in lockdep terms, it would change the *shared* rwsem readers into *shared recursive* readers), so open-code the wait using a readers count and a spinlock. This also allows handling blockable and non-blockable critical section in the same way. Losing the rwsem fairness does theoretically allow MMU notifiers to block install_new_memslots forever. Note that mm/mmu_notifier.c's own retry scheme in mmu_interval_read_begin also uses wait/wake_up and is likewise not fair. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Commits on Aug 2, 2021
-
KVM: nSVM: remove useless kvm_clear_*_queue
For an event to be in injected state when nested_svm_vmrun executes, it must have come from exitintinfo when svm_complete_interrupts ran: vcpu_enter_guest static_call(kvm_x86_run) -> svm_vcpu_run svm_complete_interrupts // now the event went from "exitintinfo" to "injected" static_call(kvm_x86_handle_exit) -> handle_exit svm_invoke_exit_handler vmrun_interception nested_svm_vmrun However, no event could have been in exitintinfo before a VMRUN vmexit. The code in svm.c is a bit more permissive than the one in vmx.c: if (is_external_interrupt(svm->vmcb->control.exit_int_info) && exit_code != SVM_EXIT_EXCP_BASE + PF_VECTOR && exit_code != SVM_EXIT_NPF && exit_code != SVM_EXIT_TASK_SWITCH && exit_code != SVM_EXIT_INTR && exit_code != SVM_EXIT_NMI) but in any case, a VMRUN instruction would not even start to execute during an attempted event delivery. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> -
KVM: x86: Preserve guest's CR0.CD/NW on INIT
Preserve CR0.CD and CR0.NW on INIT instead of forcing them to '1', as defined by both Intel's SDM and AMD's APM. Note, current versions of Intel's SDM are very poorly written with respect to INIT behavior. Table 9-1. "IA-32 and Intel 64 Processor States Following Power-up, Reset, or INIT" quite clearly lists power-up, RESET, _and_ INIT as setting CR0=60000010H, i.e. CD/NW=1. But the SDM then attempts to qualify CD/NW behavior in a footnote: 2. The CD and NW flags are unchanged, bit 4 is set to 1, all other bits are cleared. Presumably that footnote is only meant for INIT, as the RESET case and especially the power-up case are rather non-sensical. Another footnote all but confirms that: 6. Internal caches are invalid after power-up and RESET, but left unchanged with an INIT. Bare metal testing shows that CD/NW are indeed preserved on INIT (someone else can hack their BIOS to check RESET and power-up :-D). Reported-by: Reiji Watanabe <reijiw@google.com> Reviewed-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210713163324.627647-47-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> -
KVM: SVM: Drop redundant clearing of vcpu->arch.hflags at INIT/RESET
Drop redundant clears of vcpu->arch.hflags in init_vmcb() since kvm_vcpu_reset() always clears hflags, and it is also always zero at vCPU creation time. And of course, the second clearing in init_vmcb() was always redundant. Suggested-by: Reiji Watanabe <reijiw@google.com> Reviewed-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210713163324.627647-46-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
KVM: SVM: Emulate #INIT in response to triple fault shutdown
Emulate a full #INIT instead of simply initializing the VMCB if the guest hits a shutdown. Initializing the VMCB but not other vCPU state, much of which is mirrored by the VMCB, results in incoherent and broken vCPU state. Ideally, KVM would not automatically init anything on shutdown, and instead put the vCPU into e.g. KVM_MP_STATE_UNINITIALIZED and force userspace to explicitly INIT or RESET the vCPU. Even better would be to add KVM_MP_STATE_SHUTDOWN, since technically NMI can break shutdown (and SMI on Intel CPUs). But, that ship has sailed, and emulating #INIT is the next best thing as that has at least some connection with reality since there exist bare metal platforms that automatically INIT the CPU if it hits shutdown. Fixes: 46fe4dd ("[PATCH] KVM: SVM: Propagate cpu shutdown events to userspace") Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210713163324.627647-45-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
KVM: VMX: Move RESET-only VMWRITE sequences to init_vmcs()
Move VMWRITE sequences in vmx_vcpu_reset() guarded by !init_event into init_vmcs() to make it more obvious that they're, uh, initializing the VMCS. No meaningful functional change intended (though the order of VMWRITEs and whatnot is different). Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210713163324.627647-44-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
KVM: VMX: Remove redundant write to set vCPU as active at RESET/INIT
Drop a call to vmx_clear_hlt() during vCPU INIT, the guest's activity state is unconditionally set to "active" a few lines earlier in vmx_vcpu_reset(). No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210713163324.627647-43-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
KVM: VMX: Smush x2APIC MSR bitmap adjustments into single function
Consolidate all of the dynamic MSR bitmap adjustments into vmx_update_msr_bitmap_x2apic(), and rename the mode tracker to reflect that it is x2APIC specific. If KVM gains more cases of dynamic MSR pass-through, odds are very good that those new cases will be better off with their own logic, e.g. see Intel PT MSRs and MSR_IA32_SPEC_CTRL. Attempting to handle all updates in a common helper did more harm than good, as KVM ended up collecting a large number of useless "updates". Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210713163324.627647-42-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
KVM: VMX: Remove unnecessary initialization of msr_bitmap_mode
Don't bother initializing msr_bitmap_mode to 0, all of struct vcpu_vmx is zero initialized. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210713163324.627647-41-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
KVM: VMX: Don't redo x2APIC MSR bitmaps when userspace filter is changed
Drop an explicit call to update the x2APIC MSRs when the userspace MSR filter is modified. The x2APIC MSRs are deliberately exempt from userspace filtering. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210713163324.627647-40-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
KVM: nVMX: Remove obsolete MSR bitmap refresh at nested transitions
Drop unnecessary MSR bitmap updates during nested transitions, as L1's APIC_BASE MSR is not modified by the standard VM-Enter/VM-Exit flows, and L2's MSR bitmap is managed separately. In the unlikely event that L1 is pathological and loads APIC_BASE via the VM-Exit load list, KVM will handle updating the bitmap in its normal WRMSR flows. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210713163324.627647-39-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
KVM: VMX: Remove obsolete MSR bitmap refresh at vCPU RESET/INIT
Remove an unnecessary MSR bitmap refresh during vCPU RESET/INIT. In both cases, the MSR bitmap already has the desired values and state. At RESET, the vCPU is guaranteed to be running with x2APIC disabled, the x2APIC MSRs are guaranteed to be intercepted due to the MSR bitmap being initialized to all ones by alloc_loaded_vmcs(), and vmx->msr_bitmap_mode is guaranteed to be zero, i.e. reflecting x2APIC disabled. At INIT, the APIC_BASE MSR is not modified, thus there can't be any change in x2APIC state. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210713163324.627647-38-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
KVM: x86: Move setting of sregs during vCPU RESET/INIT to common x86
Move the setting of CR0, CR4, EFER, RFLAGS, and RIP from vendor code to common x86. VMX and SVM now have near-identical sequences, the only difference being that VMX updates the exception bitmap. Updating the bitmap on SVM is unnecessary, but benign. Unfortunately it can't be left behind in VMX due to the need to update exception intercepts after the control registers are set. Reviewed-by: Reiji Watanabe <reijiw@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210713163324.627647-37-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>