Skip to content
Permalink
Eric-Farman/s3…
Switch branches/tags

Commits on Nov 10, 2021

  1. KVM: s390: Extend the USER_SIGP capability

    With commit 2444b35 ("KVM: s390: forward most SIGP orders to user
    space") we have a capability that allows the "fast" SIGP orders (as
    defined by the Programming Notes for the SIGNAL PROCESSOR instruction in
    the Principles of Operation) to be handled in-kernel, while all others are
    sent to userspace for processing.
    
    This works fine but it creates a situation when, for example, a SIGP SENSE
    might return CC1 (STATUS STORED, and status bits indicating the vcpu is
    stopped), when in actuality userspace is still processing a SIGP STOP AND
    STORE STATUS order, and the vcpu is not yet actually stopped. Thus, the
    SIGP SENSE should actually be returning CC2 (busy) instead of CC1.
    
    To fix this, add another CPU capability, dependent on the USER_SIGP one,
    and two associated IOCTLs. One IOCTL will be used by userspace to mark a
    vcpu "busy" processing a SIGP order, and cause concurrent orders handled
    in-kernel to be returned with CC2 (busy). Another IOCTL will be used by
    userspace to mark the SIGP "finished", and the vcpu free to process
    additional orders.
    
    Signed-off-by: Eric Farman <farman@linux.ibm.com>
    efarman authored and intel-lab-lkp committed Nov 10, 2021
  2. Capability/IOCTL/Documentation

    (This should be squashed with the next patch; it's just broken
    out for ease-of-future rebase.)
    
    Signed-off-by: Eric Farman <farman@linux.ibm.com>
    efarman authored and intel-lab-lkp committed Nov 10, 2021

Commits on Oct 27, 2021

  1. KVM: s390: add debug statement for diag 318 CPNC data

    The diag 318 data contains values that denote information regarding the
    guest's environment. Currently, it is unecessarily difficult to observe
    this value (either manually-inserted debug statements, gdb stepping, mem
    dumping etc). It's useful to observe this information to obtain an
    at-a-glance view of the guest's environment, so lets add a simple VCPU
    event that prints the CPNC to the s390dbf logs.
    
    Signed-off-by: Collin Walling <walling@linux.ibm.com>
    Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
    Link: https://lore.kernel.org/r/20211027025451.290124-1-walling@linux.ibm.com
    [borntraeger@de.ibm.com]: change debug level to 3
    Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
    collinwalling authored and borntraeger committed Oct 27, 2021
  2. KVM: s390: pv: properly handle page flags for protected guests

    Introduce variants of the convert and destroy page functions that also
    clear the PG_arch_1 bit used to mark them as secure pages.
    
    The PG_arch_1 flag is always allowed to overindicate; using the new
    functions introduced here allows to reduce the extent of overindication
    and thus improve performance.
    
    These new functions can only be called on pages for which a reference
    is already being held.
    
    Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
    Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
    Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
    Link: https://lore.kernel.org/r/20210920132502.36111-7-imbrenda@linux.ibm.com
    Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
    Claudio Imbrenda authored and borntraeger committed Oct 27, 2021
  3. KVM: s390: Fix handle_sske page fault handling

    If handle_sske cannot set the storage key, because there is no
    page table entry or no present large page entry, it calls
    fixup_user_fault.
    However, currently, if the call succeeds, handle_sske returns
    -EAGAIN, without having set the storage key.
    Instead, retry by continue'ing the loop without incrementing the
    address.
    The same issue in handle_pfmf was fixed by
    a11bdb1 ("KVM: s390: Fix pfmf and conditional skey emulation").
    
    Fixes: bd096f6 ("KVM: s390: Add skey emulation fault handling")
    Signed-off-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com>
    Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
    Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
    Link: https://lore.kernel.org/r/20211022152648.26536-1-scgl@linux.ibm.com
    Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
    Janis Schoetterl-Glausch authored and borntraeger committed Oct 27, 2021

Commits on Oct 25, 2021

  1. KVM: s390: Add a routine for setting userspace CPU state

    This capability exists, but we don't record anything when userspace
    enables it. Let's refactor that code so that a note can be made in
    the debug logs that it was enabled.
    
    Signed-off-by: Eric Farman <farman@linux.ibm.com>
    Reviewed-by: Thomas Huth <thuth@redhat.com>
    Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
    Reviewed-by: David Hildenbrand <david@redhat.com>
    Link: https://lore.kernel.org/r/20211008203112.1979843-7-farman@linux.ibm.com
    Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
    efarman authored and borntraeger committed Oct 25, 2021
  2. KVM: s390: Simplify SIGP Set Arch handling

    The Principles of Operations describe the various reasons that
    each individual SIGP orders might be rejected, and the status
    bit that are set for each condition.
    
    For example, for the Set Architecture order, it states:
    
      "If it is not true that all other CPUs in the configu-
       ration are in the stopped or check-stop state, ...
       bit 54 (incorrect state) ... is set to one."
    
    However, it also states:
    
      "... if the CZAM facility is installed, ...
       bit 55 (invalid parameter) ... is set to one."
    
    Since the Configuration-z/Architecture-Architectural Mode (CZAM)
    facility is unconditionally presented, there is no need to examine
    each VCPU to determine if it is started/stopped. It can simply be
    rejected outright with the Invalid Parameter bit.
    
    Fixes: b697e43 ("KVM: s390: Support Configuration z/Architecture Mode")
    Signed-off-by: Eric Farman <farman@linux.ibm.com>
    Reviewed-by: Thomas Huth <thuth@redhat.com>
    Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
    Reviewed-by: David Hildenbrand <david@redhat.com>
    Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
    Link: https://lore.kernel.org/r/20211008203112.1979843-2-farman@linux.ibm.com
    Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
    efarman authored and borntraeger committed Oct 25, 2021
  3. KVM: s390: pv: avoid stalls when making pages secure

    Improve make_secure_pte to avoid stalls when the system is heavily
    overcommitted. This was especially problematic in kvm_s390_pv_unpack,
    because of the loop over all pages that needed unpacking.
    
    Due to the locks being held, it was not possible to simply replace
    uv_call with uv_call_sched. A more complex approach was
    needed, in which uv_call is replaced with __uv_call, which does not
    loop. When the UVC needs to be executed again, -EAGAIN is returned, and
    the caller (or its caller) will try again.
    
    When -EAGAIN is returned, the path is the same as when the page is in
    writeback (and the writeback check is also performed, which is
    harmless).
    
    Fixes: 214d9bb ("s390/mm: provide memory management functions for protected KVM guests")
    Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
    Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
    Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
    Link: https://lore.kernel.org/r/20210920132502.36111-5-imbrenda@linux.ibm.com
    Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
    Claudio Imbrenda authored and borntraeger committed Oct 25, 2021
  4. KVM: s390: pv: avoid stalls for kvm_s390_pv_init_vm

    When the system is heavily overcommitted, kvm_s390_pv_init_vm might
    generate stall notifications.
    
    Fix this by using uv_call_sched instead of just uv_call. This is ok because
    we are not holding spinlocks.
    
    Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
    Fixes: 214d9bb ("s390/mm: provide memory management functions for protected KVM guests")
    Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
    Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
    Message-Id: <20210920132502.36111-4-imbrenda@linux.ibm.com>
    Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
    Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
    Claudio Imbrenda authored and borntraeger committed Oct 25, 2021
  5. KVM: s390: pv: avoid double free of sida page

    If kvm_s390_pv_destroy_cpu is called more than once, we risk calling
    free_page on a random page, since the sidad field is aliased with the
    gbea, which is not guaranteed to be zero.
    
    This can happen, for example, if userspace calls the KVM_PV_DISABLE
    IOCTL, and it fails, and then userspace calls the same IOCTL again.
    This scenario is only possible if KVM has some serious bug or if the
    hardware is broken.
    
    The solution is to simply return successfully immediately if the vCPU
    was already non secure.
    
    Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
    Fixes: 19e1227 ("KVM: S390: protvirt: Introduce instruction data area bounce buffer")
    Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
    Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
    Message-Id: <20210920132502.36111-3-imbrenda@linux.ibm.com>
    Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
    Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
    Claudio Imbrenda authored and borntraeger committed Oct 25, 2021
  6. KVM: s390: pv: add macros for UVC CC values

    Add macros to describe the 4 possible CC values returned by the UVC
    instruction.
    
    Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
    Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
    Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
    Message-Id: <20210920132502.36111-2-imbrenda@linux.ibm.com>
    Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
    Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
    Claudio Imbrenda authored and borntraeger committed Oct 25, 2021
  7. s390/mm: optimize reset_guest_reference_bit()

    We already optimize get_guest_storage_key() to assume that if we don't have
    a PTE table and don't have a huge page mapped that the storage key is 0.
    
    Similarly, optimize reset_guest_reference_bit() to simply do nothing if
    there is no PTE table and no huge page mapped.
    
    Signed-off-by: David Hildenbrand <david@redhat.com>
    Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
    Acked-by: Heiko Carstens <hca@linux.ibm.com>
    Link: https://lore.kernel.org/r/20210909162248.14969-10-david@redhat.com
    Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
    davidhildenbrand authored and borntraeger committed Oct 25, 2021
  8. s390/mm: optimize set_guest_storage_key()

    We already optimize get_guest_storage_key() to assume that if we don't have
    a PTE table and don't have a huge page mapped that the storage key is 0.
    
    Similarly, optimize set_guest_storage_key() to simply do nothing in case
    the key to set is 0.
    
    Signed-off-by: David Hildenbrand <david@redhat.com>
    Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
    Acked-by: Heiko Carstens <hca@linux.ibm.com>
    Link: https://lore.kernel.org/r/20210909162248.14969-9-david@redhat.com
    Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
    davidhildenbrand authored and borntraeger committed Oct 25, 2021
  9. s390/mm: no need for pte_alloc_map_lock() if we know the pmd is present

    pte_map_lock() is sufficient.
    
    Signed-off-by: David Hildenbrand <david@redhat.com>
    Acked-by: Heiko Carstens <hca@linux.ibm.com>
    Link: https://lore.kernel.org/r/20210909162248.14969-8-david@redhat.com
    Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
    davidhildenbrand authored and borntraeger committed Oct 25, 2021
  10. s390/uv: fully validate the VMA before calling follow_page()

    We should not walk/touch page tables outside of VMA boundaries when
    holding only the mmap sem in read mode. Evil user space can modify the
    VMA layout just before this function runs and e.g., trigger races with
    page table removal code since commit dd2283f ("mm: mmap: zap pages
    with read mmap_sem in munmap").
    
    find_vma() does not check if the address is >= the VMA start address;
    use vma_lookup() instead.
    
    Fixes: 214d9bb ("s390/mm: provide memory management functions for protected KVM guests")
    Signed-off-by: David Hildenbrand <david@redhat.com>
    Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
    Acked-by: Heiko Carstens <hca@linux.ibm.com>
    Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
    Link: https://lore.kernel.org/r/20210909162248.14969-6-david@redhat.com
    Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
    davidhildenbrand authored and borntraeger committed Oct 25, 2021
  11. s390/mm: fix VMA and page table handling code in storage key handling…

    … functions
    
    There are multiple things broken about our storage key handling
    functions:
    
    1. We should not walk/touch page tables outside of VMA boundaries when
       holding only the mmap sem in read mode. Evil user space can modify the
       VMA layout just before this function runs and e.g., trigger races with
       page table removal code since commit dd2283f ("mm: mmap: zap pages
       with read mmap_sem in munmap"). gfn_to_hva() will only translate using
       KVM memory regions, but won't validate the VMA.
    
    2. We should not allocate page tables outside of VMA boundaries: if
       evil user space decides to map hugetlbfs to these ranges, bad things
       will happen because we suddenly have PTE or PMD page tables where we
       shouldn't have them.
    
    3. We don't handle large PUDs that might suddenly appeared inside our page
       table hierarchy.
    
    Don't manually allocate page tables, properly validate that we have VMA and
    bail out on pud_large().
    
    All callers of page table handling functions, except
    get_guest_storage_key(), call fixup_user_fault() in case they
    receive an -EFAULT and retry; this will allocate the necessary page tables
    if required.
    
    To keep get_guest_storage_key() working as expected and not requiring
    kvm_s390_get_skeys() to call fixup_user_fault() distinguish between
    "there is simply no page table or huge page yet and the key is assumed
    to be 0" and "this is a fault to be reported".
    
    Although commit 637ff9e ("s390/mm: Add huge pmd storage key handling")
    introduced most of the affected code, it was actually already broken
    before when using get_locked_pte() without any VMA checks.
    
    Note: Ever since commit 637ff9e ("s390/mm: Add huge pmd storage key
    handling") we can no longer set a guest storage key (for example from
    QEMU during VM live migration) without actually resolving a fault.
    Although we would have created most page tables, we would choke on the
    !pmd_present(), requiring a call to fixup_user_fault(). I would
    have thought that this is problematic in combination with postcopy life
    migration ... but nobody noticed and this patch doesn't change the
    situation. So maybe it's just fine.
    
    Fixes: 9fcf93b ("KVM: S390: Create helper function get_guest_storage_key")
    Fixes: 24d5dd0 ("s390/kvm: Provide function for setting the guest storage key")
    Fixes: a7e19ab ("KVM: s390: handle missing storage-key facility")
    Signed-off-by: David Hildenbrand <david@redhat.com>
    Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
    Acked-by: Heiko Carstens <hca@linux.ibm.com>
    Link: https://lore.kernel.org/r/20210909162248.14969-5-david@redhat.com
    Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
    davidhildenbrand authored and borntraeger committed Oct 25, 2021
  12. s390/mm: validate VMA in PGSTE manipulation functions

    We should not walk/touch page tables outside of VMA boundaries when
    holding only the mmap sem in read mode. Evil user space can modify the
    VMA layout just before this function runs and e.g., trigger races with
    page table removal code since commit dd2283f ("mm: mmap: zap pages
    with read mmap_sem in munmap"). gfn_to_hva() will only translate using
    KVM memory regions, but won't validate the VMA.
    
    Further, we should not allocate page tables outside of VMA boundaries: if
    evil user space decides to map hugetlbfs to these ranges, bad things will
    happen because we suddenly have PTE or PMD page tables where we
    shouldn't have them.
    
    Similarly, we have to check if we suddenly find a hugetlbfs VMA, before
    calling get_locked_pte().
    
    Fixes: 2d42f94 ("s390/kvm: Add PGSTE manipulation functions")
    Signed-off-by: David Hildenbrand <david@redhat.com>
    Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
    Acked-by: Heiko Carstens <hca@linux.ibm.com>
    Link: https://lore.kernel.org/r/20210909162248.14969-4-david@redhat.com
    Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
    davidhildenbrand authored and borntraeger committed Oct 25, 2021
  13. s390/gmap: don't unconditionally call pte_unmap_unlock() in __gmap_zap()

    ... otherwise we will try unlocking a spinlock that was never locked via a
    garbage pointer.
    
    At the time we reach this code path, we usually successfully looked up
    a PGSTE already; however, evil user space could have manipulated the VMA
    layout in the meantime and triggered removal of the page table.
    
    Fixes: 1e133ab ("s390/mm: split arch/s390/mm/pgtable.c")
    Signed-off-by: David Hildenbrand <david@redhat.com>
    Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
    Acked-by: Heiko Carstens <hca@linux.ibm.com>
    Link: https://lore.kernel.org/r/20210909162248.14969-3-david@redhat.com
    Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
    davidhildenbrand authored and borntraeger committed Oct 25, 2021
  14. s390/gmap: validate VMA in __gmap_zap()

    We should not walk/touch page tables outside of VMA boundaries when
    holding only the mmap sem in read mode. Evil user space can modify the
    VMA layout just before this function runs and e.g., trigger races with
    page table removal code since commit dd2283f ("mm: mmap: zap pages
    with read mmap_sem in munmap"). The pure prescence in our guest_to_host
    radix tree does not imply that there is a VMA.
    
    Further, we should not allocate page tables (via get_locked_pte()) outside
    of VMA boundaries: if evil user space decides to map hugetlbfs to these
    ranges, bad things will happen because we suddenly have PTE or PMD page
    tables where we shouldn't have them.
    
    Similarly, we have to check if we suddenly find a hugetlbfs VMA, before
    calling get_locked_pte().
    
    Note that gmap_discard() is different:
    zap_page_range()->unmap_single_vma() makes sure to stay within VMA
    boundaries.
    
    Fixes: b31288f ("s390/kvm: support collaborative memory management")
    Signed-off-by: David Hildenbrand <david@redhat.com>
    Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
    Acked-by: Heiko Carstens <hca@linux.ibm.com>
    Link: https://lore.kernel.org/r/20210909162248.14969-2-david@redhat.com
    Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
    davidhildenbrand authored and borntraeger committed Oct 25, 2021

Commits on Oct 20, 2021

  1. KVM: s390: preserve deliverable_mask in __airqs_kick_single_vcpu

    Changing the deliverable mask in __airqs_kick_single_vcpu() is a bug. If
    one idle vcpu can't take the interrupts we want to deliver, we should
    look for another vcpu that can, instead of saying that we don't want
    to deliver these interrupts by clearing the bits from the
    deliverable_mask.
    
    Fixes: 9f30f62 ("KVM: s390: add gib_alert_irq_handler()")
    Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
    Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
    Reviewed-by: Michael Mueller <mimu@linux.ibm.com>
    Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
    Link: https://lore.kernel.org/r/20211019175401.3757927-3-pasic@linux.ibm.com
    Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
    halil-pasic authored and borntraeger committed Oct 20, 2021
  2. KVM: s390: clear kicked_mask before sleeping again

    The idea behind kicked mask is that we should not re-kick a vcpu that
    is already in the "kick" process, i.e. that was kicked and is
    is about to be dispatched if certain conditions are met.
    
    The problem with the current implementation is, that it assumes the
    kicked vcpu is going to enter SIE shortly. But under certain
    circumstances, the vcpu we just kicked will be deemed non-runnable and
    will remain in wait state. This can happen, if the interrupt(s) this
    vcpu got kicked to deal with got already cleared (because the interrupts
    got delivered to another vcpu). In this case kvm_arch_vcpu_runnable()
    would return false, and the vcpu would remain in kvm_vcpu_block(),
    but this time with its kicked_mask bit set. So next time around we
    wouldn't kick the vcpu form __airqs_kick_single_vcpu(), but would assume
    that we just kicked it.
    
    Let us make sure the kicked_mask is cleared before we give up on
    re-dispatching the vcpu.
    
    Fixes: 9f30f62 ("KVM: s390: add gib_alert_irq_handler()")
    Reported-by: Matthew Rosato <mjrosato@linux.ibm.com>
    Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
    Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
    Reviewed-by: Michael Mueller <mimu@linux.ibm.com>
    Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
    Link: https://lore.kernel.org/r/20211019175401.3757927-2-pasic@linux.ibm.com
    Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
    halil-pasic authored and borntraeger committed Oct 20, 2021

Commits on Sep 28, 2021

  1. KVM: s390: Function documentation fixes

    The latest compile changes pointed us to a few instances where we use
    the kernel documentation style but don't explain all variables or
    don't adhere to it 100%.
    
    It's easy to fix so let's do that.
    
    Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
    Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
    frankjaa authored and borntraeger committed Sep 28, 2021

Commits on Sep 27, 2021

  1. KVM: VMX: Fix a TSX_CTRL_CPUID_CLEAR field mask issue

    When updating the host's mask for its MSR_IA32_TSX_CTRL user return entry,
    clear the mask in the found uret MSR instead of vmx->guest_uret_msrs[i].
    Modifying guest_uret_msrs directly is completely broken as 'i' does not
    point at the MSR_IA32_TSX_CTRL entry.  In fact, it's guaranteed to be an
    out-of-bounds accesses as is always set to kvm_nr_uret_msrs in a prior
    loop. By sheer dumb luck, the fallout is limited to "only" failing to
    preserve the host's TSX_CTRL_CPUID_CLEAR.  The out-of-bounds access is
    benign as it's guaranteed to clear a bit in a guest MSR value, which are
    always zero at vCPU creation on both x86-64 and i386.
    
    Cc: stable@vger.kernel.org
    Fixes: 8ea8b8d ("KVM: VMX: Use common x86's uret MSR list as the one true list")
    Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
    Reviewed-by: Sean Christopherson <seanjc@google.com>
    Message-Id: <20210926015545.281083-1-zhenzhong.duan@intel.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    duanzhenzhong authored and bonzini committed Sep 27, 2021

Commits on Sep 24, 2021

  1. Merge tag 'kvmarm-fixes-5.15-1' of git://git.kernel.org/pub/scm/linux…

    …/kernel/git/kvmarm/kvmarm into kvm-master
    
    KVM/arm64 fixes for 5.15, take #1
    
    - Add missing FORCE target when building the EL2 object
    - Fix a PMU probe regression on some platforms
    bonzini committed Sep 24, 2021
  2. selftests: KVM: Explicitly use movq to read xmm registers

    Compiling the KVM selftests with clang emits the following warning:
    
    >> include/x86_64/processor.h:297:25: error: variable 'xmm0' is uninitialized when used here [-Werror,-Wuninitialized]
    >>                return (unsigned long)xmm0;
    
    where xmm0 is accessed via an uninitialized register variable.
    
    Indeed, this is a misuse of register variables, which really should only
    be used for specifying register constraints on variables passed to
    inline assembly. Rather than attempting to read xmm registers via
    register variables, just explicitly perform the movq from the desired
    xmm register.
    
    Fixes: 783e9e5 ("kvm: selftests: add API testing infrastructure")
    Signed-off-by: Oliver Upton <oupton@google.com>
    Message-Id: <20210924005147.1122357-1-oupton@google.com>
    Reviewed-by: Ricardo Koller <ricarkol@google.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    oupton authored and bonzini committed Sep 24, 2021
  3. selftests: KVM: Call ucall_init when setting up in rseq_test

    While x86 does not require any additional setup to use the ucall
    infrastructure, arm64 needs to set up the MMIO address used to signal a
    ucall to userspace. rseq_test does not initialize the MMIO address,
    resulting in the test spinning indefinitely.
    
    Fix the issue by calling ucall_init() during setup.
    
    Fixes: 61e52f1 ("KVM: selftests: Add a test for KVM_RUN+rseq to detect task migration bugs")
    Signed-off-by: Oliver Upton <oupton@google.com>
    Message-Id: <20210923220033.4172362-1-oupton@google.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    oupton authored and bonzini committed Sep 24, 2021

Commits on Sep 23, 2021

  1. KVM: Remove tlbs_dirty

    There is no user of tlbs_dirty.
    
    Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    Message-Id: <20210918005636.3675-4-jiangshanlai@gmail.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    Lai Jiangshan authored and bonzini committed Sep 23, 2021
  2. KVM: X86: Synchronize the shadow pagetable before link it

    If gpte is changed from non-present to present, the guest doesn't need
    to flush tlb per SDM.  So the host must synchronze sp before
    link it.  Otherwise the guest might use a wrong mapping.
    
    For example: the guest first changes a level-1 pagetable, and then
    links its parent to a new place where the original gpte is non-present.
    Finally the guest can access the remapped area without flushing
    the tlb.  The guest's behavior should be allowed per SDM, but the host
    kvm mmu makes it wrong.
    
    Fixes: 4731d4c ("KVM: MMU: out of sync shadow core")
    Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    Message-Id: <20210918005636.3675-3-jiangshanlai@gmail.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    Lai Jiangshan authored and bonzini committed Sep 23, 2021
  3. KVM: X86: Fix missed remote tlb flush in rmap_write_protect()

    When kvm->tlbs_dirty > 0, some rmaps might have been deleted
    without flushing tlb remotely after kvm_sync_page().  If @gfn
    was writable before and it's rmaps was deleted in kvm_sync_page(),
    and if the tlb entry is still in a remote running VCPU,  the @gfn
    is not safely protected.
    
    To fix the problem, kvm_sync_page() does the remote flush when
    needed to avoid the problem.
    
    Fixes: a4ee1ca ("KVM: MMU: delay flush all tlbs on sync_page path")
    Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    Message-Id: <20210918005636.3675-2-jiangshanlai@gmail.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    Lai Jiangshan authored and bonzini committed Sep 23, 2021
  4. KVM: x86: nSVM: don't copy virt_ext from vmcb12

    These field correspond to features that we don't expose yet to L2
    
    While currently there are no CVE worthy features in this field,
    if AMD adds more features to this field, that could allow guest
    escapes similar to CVE-2021-3653 and CVE-2021-3656.
    
    Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
    Message-Id: <20210914154825.104886-6-mlevitsk@redhat.com>
    Cc: stable@vger.kernel.org
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    Maxim Levitsky authored and bonzini committed Sep 23, 2021
  5. KVM: x86: nSVM: test eax for 4K alignment for GP errata workaround

    GP SVM errata workaround made the #GP handler always emulate
    the SVM instructions.
    
    However these instructions #GP in case the operand is not 4K aligned,
    but the workaround code didn't check this and we ended up
    emulating these instructions anyway.
    
    This is only an emulation accuracy check bug as there is no harm for
    KVM to read/write unaligned vmcb images.
    
    Fixes: 82a11e9 ("KVM: SVM: Add emulation support for #GP triggered by SVM instructions")
    
    Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
    Message-Id: <20210914154825.104886-4-mlevitsk@redhat.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    Maxim Levitsky authored and bonzini committed Sep 23, 2021
  6. KVM: x86: selftests: test simultaneous uses of V_IRQ from L1 and L0

    Test that if:
    
    * L1 disables virtual interrupt masking, and INTR intercept.
    
    * L1 setups a virtual interrupt to be injected to L2 and enters L2 with
      interrupts disabled, thus the virtual interrupt is pending.
    
    * Now an external interrupt arrives in L1 and since
      L1 doesn't intercept it, it should be delivered to L2 when
      it enables interrupts.
    
      to do this L0 (abuses) V_IRQ to setup an
      interrupt window, and returns to L2.
    
    * L2 enables interrupts.
      This should trigger the interrupt window,
      injection of the external interrupt and delivery
      of the virtual interrupt that can now be done.
    
    * Test that now L2 gets those interrupts.
    
    This is the test that demonstrates the issue that was
    fixed in the previous patch.
    
    Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
    Message-Id: <20210914154825.104886-3-mlevitsk@redhat.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    Maxim Levitsky authored and bonzini committed Sep 23, 2021
  7. KVM: x86: nSVM: restore int_vector in svm_clear_vintr

    In svm_clear_vintr we try to restore the virtual interrupt
    injection that might be pending, but we fail to restore
    the interrupt vector.
    
    Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
    Message-Id: <20210914154825.104886-2-mlevitsk@redhat.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    Maxim Levitsky authored and bonzini committed Sep 23, 2021

Commits on Sep 22, 2021

  1. kvm: x86: Add AMD PMU MSRs to msrs_to_save_all[]

    Intel PMU MSRs is in msrs_to_save_all[], so add AMD PMU MSRs to have a
    consistent behavior between Intel and AMD when using KVM_GET_MSRS,
    KVM_SET_MSRS or KVM_GET_MSR_INDEX_LIST.
    
    We have to add legacy and new MSRs to handle guests running without
    X86_FEATURE_PERFCTR_CORE.
    
    Signed-off-by: Fares Mehanna <faresx@amazon.de>
    Message-Id: <20210915133951.22389-1-faresx@amazon.de>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    Fares Mehanna authored and bonzini committed Sep 22, 2021
  2. KVM: x86: nVMX: re-evaluate emulation_required on nested VM exit

    If L1 had invalid state on VM entry (can happen on SMM transactions
    when we enter from real mode, straight to nested guest),
    
    then after we load 'host' state from VMCS12, the state has to become
    valid again, but since we load the segment registers with
    __vmx_set_segment we weren't always updating emulation_required.
    
    Update emulation_required explicitly at end of load_vmcs12_host_state.
    
    Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
    Message-Id: <20210913140954.165665-8-mlevitsk@redhat.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    Maxim Levitsky authored and bonzini committed Sep 22, 2021
Older