Skip to content

Commits

Permalink
privmem-v11.6
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Commits on Apr 12, 2023

  1. KVM: selftests: Test KVM exit behavior for private memory/access

    "Testing private access when memslot gets deleted" tests the behavior
    of KVM when a private memslot gets deleted while the VM is using the
    private memslot. When KVM looks up the deleted (slot = NULL) memslot,
    KVM should exit to userspace with KVM_EXIT_MEMORY_FAULT.
    
    In the second test, upon a private access to non-private memslot, KVM
    should also exit to userspace with KVM_EXIT_MEMORY_FAULT.
    
    Signed-off-by: Ackerley Tng <ackerleytng@google.com>
    Ackerley Tng authored and chao-p committed Apr 12, 2023
  2. KVM: selftests: Add tests around sharing a restrictedmem fd

    Tests that
    
    + Different memslots in the same VM should be able to share a
      restrictedmem_fd
    + A second VM cannot share the same offsets in a restrictedmem_fd
    + Different VMs should be able to share the same restrictedmem_fd, as
      long as the offsets in the restrictedmem_fd are different
    
    Signed-off-by: Ackerley Tng <ackerleytng@google.com>
    Ackerley Tng authored and chao-p committed Apr 12, 2023
  3. KVM: selftests: Default private_mem_conversions_test to use 1 restric…

    …tedmem file for test data
    
    Default the private/shared memory conversion tests to use a single
    file (when multiple memslots are requested), while executing on
    multiple vCPUs in parallel, to stress-test the restrictedmem subsystem.
    
    Also add a flag to allow multiple files to be used.
    
    Signed-off-by: Ackerley Tng <ackerleytng@google.com>
    Ackerley Tng authored and chao-p committed Apr 12, 2023
  4. KVM: selftests: Add vm_userspace_mem_region_add_with_restrictedmem

    Provide new function to allow restrictedmem's fd and offset to be
    specified in selftests.
    
    No functional change intended to vm_userspace_mem_region_add.
    
    Signed-off-by: Ackerley Tng <ackerleytng@google.com>
    Ackerley Tng authored and chao-p committed Apr 12, 2023
  5. KVM: selftests: Default private_mem_conversions_test to use 1 memslot…

    … for test data
    
    Default the private/shared memory conversion tests to use a single
    memslot, while executing on multiple vCPUs in parallel, to stress-test
    the restrictedmem subsystem.
    
    Also add a flag to allow multiple memslots to be used.
    
    Signed-off-by: Ackerley Tng <ackerleytng@google.com>
    Ackerley Tng authored and chao-p committed Apr 12, 2023
  6. KVM: selftests: Generalize private_mem_conversions_test for parallel …

    …execution
    
    By running the private/shared memory conversion tests on multiple
    vCPUs in parallel, we stress-test the restrictedmem subsystem to
    test conversion of non-overlapping GPA ranges in multiple memslots.
    
    Signed-off-by: Ackerley Tng <ackerleytng@google.com>
    Ackerley Tng authored and chao-p committed Apr 12, 2023
  7. KVM: selftests: Exercise restrictedmem allocation and truncation code…

    … after KVM invalidation code has been unbound
    
    The kernel interfaces restrictedmem_bind and restrictedmem_unbind are
    used by KVM to bind/unbind kvm functions to restrictedmem's
    invalidate_start and invalidate_end callbacks.
    
    After the KVM VM is freed, the KVM functions should have been unbound
    from the restrictedmem_fd's callbacks.
    
    In this test, we exercise fallocate to back and unback memory using
    the restrictedmem fd, and we expect no problems (crashes) after the
    KVM functions have been unbound.
    
    Signed-off-by: Ackerley Tng <ackerleytng@google.com>
    Ackerley Tng authored and chao-p committed Apr 12, 2023
  8. KVM: selftests: Test that VM private memory should not be readable fr…

    …om host
    
    After VM memory is remapped as private memory and guest has written to
    private memory, request the host to read the corresponding hva for
    that private memory.
    
    The host should not be able to read the value in private memory.
    
    This selftest shows that private memory contents of the guest are not
    accessible to host userspace via the HVA.
    
    Signed-off-by: Ackerley Tng <ackerleytng@google.com>
    Ackerley Tng authored and chao-p committed Apr 12, 2023
  9. KVM: selftests: Add testcase for creating private memslots

    Verify creating KVM_MEM_PRIVATE memslot fails with bad fd, bad alignment
    and overlapping offset. Modifying KVM_MEM_PRIVATE memslot is also not
    allowed at this time.
    
    Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
    chao-p committed Apr 12, 2023
  10. KVM: selftests: Add KVM_SET_USER_MEMORY_REGION2 helper

    Provide a raw version as well as an assert-success version to reduce the
    amount of boilerplate code need for basic usage.
    
    Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
    chao-p committed Apr 12, 2023
  11. KVM: selftests: x86: Add selftest for private memory conversions

    Add a selftest to exercise implicit/explicit conversion functionality
    within KVM and verify:
    
      - Shared memory is visible to host userspace
      - Private memory is not visible to host userspace
      - Host userspace and guest can communicate over shared memory
      - Data in shared backing is preserved across conversions (test's
        host userspace doesn't free the data)
    
    Signed-off-by: Vishal Annapurve <vannapurve@google.com>
    Co-developed-by: Sean Christopherson <seanjc@google.com>
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    vishals4gh authored and chao-p committed Apr 12, 2023
  12. KVM: selftests: Introduce VM "shape" to allow tests to specify the VM…

    … type
    
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    sean-jc authored and chao-p committed Apr 12, 2023
  13. KVM: selftests: Add helpers to do KVM_HC_MAP_GPA_RANGE hypercalls (x86)

    Signed-off-by: Vishal Annapurve <vannapurve@google.com>
    Co-developed-by: Sean Christopherson <seanjc@google.com>
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    vishals4gh authored and chao-p committed Apr 12, 2023
  14. KVM: selftests: Add helpers to convert guest memory b/w private and s…

    …hared
    
    Signed-off-by: Vishal Annapurve <vannapurve@google.com>
    Co-developed-by: Sean Christopherson <seanjc@google.com>
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    vishals4gh authored and chao-p committed Apr 12, 2023
  15. KVM: selftests: Add support for creating private memslots

    Signed-off-by: Sean Christopherson <seanjc@google.com>
    sean-jc authored and chao-p committed Apr 12, 2023
  16. KVM: selftests: Convert lib's mem regions to KVM_SET_USER_MEMORY_REGION2

    Signed-off-by: Sean Christopherson <seanjc@google.com>
    sean-jc authored and chao-p committed Apr 12, 2023
  17. KVM: selftests: Drop unused kvm_userspace_memory_region_find() helper

    Drop kvm_userspace_memory_region_find(), it's unused and a terrible API
    (probably why it's unused).  If anything outside of kvm_util.c needs to
    get at the memslot, userspace_mem_region_find() can be exposed to give
    others full access to all memory region/slot information.
    
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    sean-jc authored and chao-p committed Apr 12, 2023
  18. KVM: x86: Add support for "protected VMs" that can utilize private me…

    …mory
    
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    sean-jc authored and chao-p committed Apr 12, 2023
  19. KVM: Allow arch code to track number of memslot address spaces per VM

    Signed-off-by: Sean Christopherson <seanjc@google.com>
    sean-jc authored and chao-p committed Apr 12, 2023
  20. KVM: Drop superfluous __KVM_VCPU_MULTIPLE_ADDRESS_SPACE macro

    Signed-off-by: Sean Christopherson <seanjc@google.com>
    sean-jc authored and chao-p committed Apr 12, 2023
  21. KVM: x86/mmu: Handle page fault for private memory

    Handle page fault for KVM_MEM_PRIVATE memslot which contains memory
    pages for both fd-based private memory and hva-based shared memory.
    
    Architectures support such memslot can set 'is_private' field of the
    kvm_page_fault structure to indicate whether the page fault is caused by
    a private memory access or not. KVM itself maintain its own view of
    whether the fault page is private or not via memory attributes.
    
    To handle page fault for such memslot, KVM first checks if 'is_private'
    of the fault matches the memory attribute it maintains, it then:
      - For a successful match, private pfn is obtained via restrictedmem
        and shared pfn is obtained vir GUP().
      - For a failed match, KVM causes a KVM_EXIT_MEMORY_FAULT exit to
        userspace. Userspace then can convert memory between private/shared
        in host's view and retry the fault.
    
    Co-developed-by: Yu Zhang <yu.c.zhang@linux.intel.com>
    Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
    Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
    Reviewed-by: Fuad Tabba <tabba@google.com>
    Tested-by: Fuad Tabba <tabba@google.com>
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    chao-p committed Apr 12, 2023
  22. KVM: x86: Disallow hugepages when memory attributes are mixed

    Disallow creating hugepages with mixed memory attributes, e.g. shared
    versus private, as mapping a hugepage in this case would allow the guest
    to access memory with the wrong attributes, e.g. overlaying private memory
    with a shared hugepage.
    
    Tracking whether or not attributes are mixed via the existing
    disallow_lpage field, but use the most significant bit in 'disallow_lpage'
    to indicate a hugepage has mixed attributes instead using the normal
    refcounting.  Whether or not attributes are mixed is binary; either they
    are or they aren't.  Attempting to squeeze that info into the refcount is
    unnecessarily complex as it would require knowing the previous state of
    the mixed count when updating attributes.  Using a flag means KVM just
    needs to ensure the current status is reflected in the memslots.
    
    Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    chao-p committed Apr 12, 2023
  23. KVM: Enable and expose KVM_MEM_PRIVATE

    Enable KVM_MEM_PRIVATE memslot to allow guest memory provided through a
    restrictedmem_fd/restrictedmem_offset pair that points to memory pages
    backed by memfd_restricted().
    
    Such memslots are bound to restrictedmem and receive notifiers from
    restrictedmem when the backed memory gets invalidated or error. KVM
    cannot call GUP() to obtain the pfn for such memory, instead it calls
    restrictedmem_get_page().
    
    The extended memslot can still have the userspace_addr(hva). When use, a
    single memslot can maintain both private memory through restricted_fd
    and shared memory through userspace_addr. Whether the private or shared
    part is visible to guest is maintained by the per-page memory attribute
    KVM_MEMORY_ATTRIBUTE_PRIVATE.
    
    Co-developed-by: Yu Zhang <yu.c.zhang@linux.intel.com>
    Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
    Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
    Cc: Fuad Tabba <tabba@google.com>
    Co-developed-by: Sean Christopherson <seanjc@google.com>
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    chao-p committed Apr 12, 2023
  24. KVM: Unmap existing mappings when memory attribute changed

    Unmap the existing guest mappings when memory attribute is changed.
    It's a reasonable action for current KVM_MEMORY_ATTRIBUTE_PRIVATE
    attribute because shared pages and private pages are from different
    backends so when a page is changed between shared and private, the
    existing mapping should be invalidated and later the new mapping can
    be populated.
    
    During the memory attribute changing and the unmapping time frame,
    page fault handler may happen in the same memory range and can cause
    incorrect page state, invoke kvm_mmu_invalidate_* helpers to let the
    page fault handler retry during this time frame.
    
    Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
    Co-developed-by: Sean Christopherson <seanjc@google.com>
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    chao-p committed Apr 12, 2023
  25. KVM: Use gfn instead of hva for mmu_notifier_retry

    Currently in mmu_notifier invalidate path, hva range is recorded and
    then checked against by mmu_notifier_retry_hva() in the page fault
    handling path. However, for the to be introduced private memory, a page
    fault may not have a hva associated, checking gfn(gpa) makes more sense.
    
    For existing hva based shared memory, gfn is expected to also work. The
    only downside is when aliasing multiple gfns to a single hva, the
    current algorithm of checking multiple ranges could result in a much
    larger range being rejected. Such aliasing should be uncommon, so the
    impact is expected small.
    
    Suggested-by: Sean Christopherson <seanjc@google.com>
    Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
    Reviewed-by: Fuad Tabba <tabba@google.com>
    Tested-by: Fuad Tabba <tabba@google.com>
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    chao-p committed Apr 12, 2023
  26. KVM: Add KVM_EXIT_MEMORY_FAULT exit

    This new KVM exit allows userspace to handle memory-related errors. It
    indicates an error happens in KVM at guest memory range [gpa, gpa+size).
    The flags includes additional information for userspace to handle the
    error. Currently bit 0 is defined as 'private memory' where '1'
    indicates error happens due to private memory access and '0' indicates
    error happens due to shared memory access.
    
    When private memory is enabled, this new exit will be used for KVM to
    exit to userspace for shared <-> private memory conversion in memory
    encryption usage. In such usage, typically there are two kind of memory
    conversions:
      - explicit conversion: happens when guest explicitly calls into KVM
        to map a range (as private or shared), KVM then exits to userspace
        to perform the map/unmap operations.
      - implicit conversion: happens in KVM page fault handler where KVM
        exits to userspace for an implicit conversion when the page is in a
        different state than requested (private or shared).
    
    Suggested-by: Sean Christopherson <seanjc@google.com>
    Co-developed-by: Yu Zhang <yu.c.zhang@linux.intel.com>
    Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
    Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
    Reviewed-by: Fuad Tabba <tabba@google.com>
    Tested-by: Fuad Tabba <tabba@google.com>
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    chao-p committed Apr 12, 2023
  27. KVM: Introduce KVM_SET_USER_MEMORY_REGION2

    Introduce KVM_SET_USER_MEMORY_REGION2 to allow extension for future
    features. It works with kvm_userspace_memory_region2 which leaves room
    for new features. kvm_userspace_memory_region2 has compatible layout to
    kvm_userspace_memory_region so code working on existing fields can be
    reused.
    
    This is preparing work for adding new fd-based memslot that new fields
    are needed for this ioctl to specify the fd number and the offset.
    
    Cc: Jarkko Sakkinen <jarkko@kernel.org>
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
    sean-jc authored and chao-p committed Apr 12, 2023
  28. KVM: Introduce per-page memory attributes

    Introduce two ioctls to allow userspace to operate on the per-page
    attributes of the guest memory.
    
      - KVM_SET_MEMORY_ATTRIBUTES to set the per-page memory attributes
        to a guest memory range.
      - KVM_GET_SUPPORTED_MEMORY_ATTRIBUTES to return the KVM supported
        memory attributes.
    
    In confidential computing usage, whether a page is private or shared is
    necessary information for KVM to perform operations like page fault
    handling, page zapping etc. There are other potential use cases for
    per-page memory attributes, e.g. to make memory read-only (or no-exec,
    or exec-only, etc.) without having to modify memslots.
    
    Attributes are defined as u64 bitmask and currently only one attribute
    KVM_MEMORY_ATTRIBUTE_PRIVATE is defined for confidential computing
    usage.
    
    Both ioctls are advertised through KVM_CAP_MEMORY_ATTRIBUTES.
    
    Suggested-by: Sean Christopherson <seanjc@google.com>
    Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    Link: https://lore.kernel.org/all/Y2WB48kD0J4VGynX@google.com/
    Reviewed-by: Fuad Tabba <tabba@google.com>
    Tested-by: Fuad Tabba <tabba@google.com>
    chao-p committed Apr 12, 2023
  29. KVM: Convert KVM_ARCH_WANT_MMU_NOTIFIER to CONFIG_KVM_GENERIC_MMU_NOT…

    …IFIER
    
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    sean-jc authored and chao-p committed Apr 12, 2023
  30. KVM: PPC: Drop dead code related to KVM_ARCH_WANT_MMU_NOTIFIER

    Signed-off-by: Sean Christopherson <seanjc@google.com>
    sean-jc authored and chao-p committed Apr 12, 2023
  31. selftests: add basic selftest for memfd_restricted

    The test verifies that file descriptor created with memfd_restricted()
    does not allow read/write/mmap operations and checks offset/length on
    fallocate(FALLOC_FL_PUNCH_HOLE) should be page aligned.
    
    Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
    chao-p committed Apr 12, 2023
  32. mm: Introduce memfd_restricted system call to create restricted user …

    …memory
    
    Introduce 'memfd_restricted' system call with the ability to create
    memory areas that are restricted from userspace access through ordinary
    MMU operations (e.g. read/write/mmap). The memory content is expected to
    be used through the new in-kernel interface by a third kernel module.
    
    memfd_restricted() is useful for scenarios where a file descriptor(fd)
    can be used as an interface into mm but want to restrict userspace's
    ability on the fd. Initially it is designed to provide protections for
    KVM encrypted guest memory.
    
    Normally KVM uses memfd memory via mmapping the memfd into KVM userspace
    (e.g. QEMU) and then using the mmaped virtual address to setup the
    mapping in the KVM secondary page table (e.g. EPT). With confidential
    computing technologies like Intel TDX, the memfd memory may be encrypted
    with special key for special software domain (e.g. KVM guest) and is not
    expected to be directly accessed by userspace. Precisely, userspace
    access to such encrypted memory may lead to host crash so should be
    prevented.
    
    memfd_restricted() provides semantics required for KVM guest encrypted
    memory support that a fd created with memfd_restricted() is going to be
    used as the source of guest memory in confidential computing environment
    and KVM can directly interact with core-mm without the need to expose
    the memory content into KVM userspace.
    
    KVM userspace is still in charge of the lifecycle of the fd. It should
    pass the created fd to KVM. KVM uses the new restrictedmem_get_page() to
    obtain the physical memory page and then uses it to populate the KVM
    secondary page table entries.
    
    The userspace restricted memfd can be fallocate-ed or hole-punched
    from userspace. When hole-punched, KVM can get notified through
    invalidate_start/invalidate_end() callbacks, KVM then gets chance to
    remove any mapped entries of the range in the secondary page tables.
    
    Machine check can happen for memory pages in the restricted memfd,
    instead of routing this directly to userspace, we call the error()
    callback that KVM registered. KVM then gets chance to handle it
    correctly.
    
    memfd_restricted() itself is implemented as a shim layer on top of real
    memory file systems (currently tmpfs). Pages in restrictedmem are marked
    as unmovable and unevictable, this is required for current confidential
    usage. But in future this might be changed.
    
    Initially memfd_restricted() prevents userspace read, write and mmap. It
    may be extended to support other restricted semantics in the future.
    
    The system call is currently wired up for x86 arch.
    
    Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
    kiryl authored and chao-p committed Apr 12, 2023

Commits on Apr 10, 2023

  1. KVM: x86/mmu: Move filling of Hyper-V's TLB range struct into Hyper-V…

    … code
    
    Refactor Hyper-V's range-based TLB flushing API to take a gfn+nr_pages
    pair instead of a struct, and bury said struct in Hyper-V specific code.
    
    Passing along two params generates much better code for the common case
    where KVM is _not_ running on Hyper-V, as forwarding the flush on to
    Hyper-V's hv_flush_remote_tlbs_range() from kvm_flush_remote_tlbs_range()
    becomes a tail call.
    
    Cc: David Matlack <dmatlack@google.com>
    Reviewed-by: David Matlack <dmatlack@google.com>
    Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
    Link: https://lore.kernel.org/r/20230405003133.419177-3-seanjc@google.com
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    sean-jc committed Apr 10, 2023
  2. KVM: x86: Rename Hyper-V remote TLB hooks to match established scheme

    Rename the Hyper-V hooks for TLB flushing to match the naming scheme used
    by all the other TLB flushing hooks, e.g. in kvm_x86_ops, vendor code,
    arch hooks from common code, etc.
    
    Reviewed-by: David Matlack <dmatlack@google.com>
    Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
    Link: https://lore.kernel.org/r/20230405003133.419177-2-seanjc@google.com
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    sean-jc committed Apr 10, 2023

Commits on Apr 4, 2023

  1. KVM: x86/mmu: Merge all handle_changed_pte*() functions

    Merge __handle_changed_pte() and handle_changed_spte_acc_track() into a
    single function, handle_changed_pte(), as the two are always used
    together.  Remove the existing handle_changed_pte(), as it's just a
    wrapper that calls __handle_changed_pte() and
    handle_changed_spte_acc_track().
    
    Signed-off-by: Vipin Sharma <vipinsh@google.com>
    Reviewed-by: Ben Gardon <bgardon@google.com>
    Reviewed-by: David Matlack <dmatlack@google.com>
    [sean: massage changelog]
    Link: https://lore.kernel.org/r/20230321220021.2119033-14-seanjc@google.com
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    shvipin authored and sean-jc committed Apr 4, 2023
Older