Skip to content

Commits

Permalink
privmem-v11.5
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Commits on Mar 17, 2023

  1. KVM: selftests: Test KVM exit behavior for private memory/access

    "Testing private access when memslot gets deleted" tests the behavior
    of KVM when a private memslot gets deleted while the VM is using the
    private memslot. When KVM looks up the deleted (slot = NULL) memslot,
    KVM should exit to userspace with KVM_EXIT_MEMORY_FAULT.
    
    In the second test, upon a private access to non-private memslot, KVM
    should also exit to userspace with KVM_EXIT_MEMORY_FAULT.
    
    Signed-off-by: Ackerley Tng <ackerleytng@google.com>
    Ackerley Tng authored and chao-p committed Mar 17, 2023
  2. KVM: selftests: Add tests around sharing a restrictedmem fd

    Tests that
    
    + Different memslots in the same VM should be able to share a
      restrictedmem_fd
    + A second VM cannot share the same offsets in a restrictedmem_fd
    + Different VMs should be able to share the same restrictedmem_fd, as
      long as the offsets in the restrictedmem_fd are different
    
    Signed-off-by: Ackerley Tng <ackerleytng@google.com>
    Ackerley Tng authored and chao-p committed Mar 17, 2023
  3. KVM: selftests: Default private_mem_conversions_test to use 1 restric…

    …tedmem file for test data
    
    Default the private/shared memory conversion tests to use a single
    file (when multiple memslots are requested), while executing on
    multiple vCPUs in parallel, to stress-test the restrictedmem subsystem.
    
    Also add a flag to allow multiple files to be used.
    
    Signed-off-by: Ackerley Tng <ackerleytng@google.com>
    Ackerley Tng authored and chao-p committed Mar 17, 2023
  4. KVM: selftests: Add vm_userspace_mem_region_add_with_restrictedmem

    Provide new function to allow restrictedmem's fd and offset to be
    specified in selftests.
    
    No functional change intended to vm_userspace_mem_region_add.
    
    Signed-off-by: Ackerley Tng <ackerleytng@google.com>
    Ackerley Tng authored and chao-p committed Mar 17, 2023
  5. KVM: selftests: Default private_mem_conversions_test to use 1 memslot…

    … for test data
    
    Default the private/shared memory conversion tests to use a single
    memslot, while executing on multiple vCPUs in parallel, to stress-test
    the restrictedmem subsystem.
    
    Also add a flag to allow multiple memslots to be used.
    
    Signed-off-by: Ackerley Tng <ackerleytng@google.com>
    Ackerley Tng authored and chao-p committed Mar 17, 2023
  6. KVM: selftests: Generalize private_mem_conversions_test for parallel …

    …execution
    
    By running the private/shared memory conversion tests on multiple
    vCPUs in parallel, we stress-test the restrictedmem subsystem to
    test conversion of non-overlapping GPA ranges in multiple memslots.
    
    Signed-off-by: Ackerley Tng <ackerleytng@google.com>
    Ackerley Tng authored and chao-p committed Mar 17, 2023
  7. KVM: selftests: Exercise restrictedmem allocation and truncation code…

    … after KVM invalidation code has been unbound
    
    The kernel interfaces restrictedmem_bind and restrictedmem_unbind are
    used by KVM to bind/unbind kvm functions to restrictedmem's
    invalidate_start and invalidate_end callbacks.
    
    After the KVM VM is freed, the KVM functions should have been unbound
    from the restrictedmem_fd's callbacks.
    
    In this test, we exercise fallocate to back and unback memory using
    the restrictedmem fd, and we expect no problems (crashes) after the
    KVM functions have been unbound.
    
    Signed-off-by: Ackerley Tng <ackerleytng@google.com>
    Ackerley Tng authored and chao-p committed Mar 17, 2023
  8. KVM: selftests: Test that VM private memory should not be readable fr…

    …om host
    
    After VM memory is remapped as private memory and guest has written to
    private memory, request the host to read the corresponding hva for
    that private memory.
    
    The host should not be able to read the value in private memory.
    
    This selftest shows that private memory contents of the guest are not
    accessible to host userspace via the HVA.
    
    Signed-off-by: Ackerley Tng <ackerleytng@google.com>
    Ackerley Tng authored and chao-p committed Mar 17, 2023
  9. FROMLIST: KVM: selftests: Adjust VM's initial stack address to align …

    …with SysV ABI spec
    
    Align the guest stack to match calling sequence requirements in
    section "The Stack Frame" of the System V ABI AMD64 Architecture
    Processor Supplement, which requires the value (%rsp + 8), NOT %rsp,
    to be a multiple of 16 when control is transferred to the function
    entry point. I.e. in a normal function call, %rsp needs to be 16-byte
    aligned _before_ CALL, not after.
    
    This fixes unexpected #GPs in guest code when the compiler uses SSE
    instructions, e.g. to initialize memory, as many SSE instructions
    require memory operands (including those on the stack) to be
    16-byte-aligned.
    
    (am from https://lore.kernel.org/lkml/20230227180601.104318-1-ackerleytng@google.com/)
    Signed-off-by: Ackerley Tng <ackerleytng@google.com>
    Ackerley Tng authored and chao-p committed Mar 17, 2023
  10. KVM: selftests: Add testcase for creating private memslots

    Verify creating KVM_MEM_PRIVATE memslot fails with bad fd, bad alignment
    and overlapping offset. Modifying KVM_MEM_PRIVATE memslot is also not
    allowed at this time.
    
    Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
    chao-p committed Mar 17, 2023
  11. KVM: selftests: Add KVM_SET_USER_MEMORY_REGION2 helper

    Provide a raw version as well as an assert-success version to reduce the
    amount of boilerplate code need for basic usage.
    
    Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
    chao-p committed Mar 17, 2023
  12. KVM: selftests: x86: Add selftest for private memory conversions

    Add a selftest to exercise implicit/explicit conversion functionality
    within KVM and verify:
    
      - Shared memory is visible to host userspace
      - Private memory is not visible to host userspace
      - Host userspace and guest can communicate over shared memory
      - Data in shared backing is preserved across conversions (test's
        host userspace doesn't free the data)
    
    Signed-off-by: Vishal Annapurve <vannapurve@google.com>
    Co-developed-by: Sean Christopherson <seanjc@google.com>
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    vishals4gh authored and chao-p committed Mar 17, 2023
  13. KVM: selftests: Introduce VM "shape" to allow tests to specify the VM…

    … type
    
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    sean-jc authored and chao-p committed Mar 17, 2023
  14. KVM: selftests: Add helpers to do KVM_HC_MAP_GPA_RANGE hypercalls (x86)

    Signed-off-by: Vishal Annapurve <vannapurve@google.com>
    Co-developed-by: Sean Christopherson <seanjc@google.com>
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    vishals4gh authored and chao-p committed Mar 17, 2023
  15. KVM: selftests: Add helpers to convert guest memory b/w private and s…

    …hared
    
    Signed-off-by: Vishal Annapurve <vannapurve@google.com>
    Co-developed-by: Sean Christopherson <seanjc@google.com>
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    vishals4gh authored and chao-p committed Mar 17, 2023
  16. KVM: selftests: Add support for creating private memslots

    Signed-off-by: Sean Christopherson <seanjc@google.com>
    sean-jc authored and chao-p committed Mar 17, 2023
  17. KVM: selftests: Convert lib's mem regions to KVM_SET_USER_MEMORY_REGION2

    Signed-off-by: Sean Christopherson <seanjc@google.com>
    sean-jc authored and chao-p committed Mar 17, 2023
  18. KVM: selftests: Drop unused kvm_userspace_memory_region_find() helper

    Drop kvm_userspace_memory_region_find(), it's unused and a terrible API
    (probably why it's unused).  If anything outside of kvm_util.c needs to
    get at the memslot, userspace_mem_region_find() can be exposed to give
    others full access to all memory region/slot information.
    
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    sean-jc authored and chao-p committed Mar 17, 2023
  19. KVM: x86: Add support for "protected VMs" that can utilize private me…

    …mory
    
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    sean-jc authored and chao-p committed Mar 17, 2023
  20. KVM: Allow arch code to track number of memslot address spaces per VM

    Signed-off-by: Sean Christopherson <seanjc@google.com>
    sean-jc authored and chao-p committed Mar 17, 2023
  21. KVM: Drop superfluous __KVM_VCPU_MULTIPLE_ADDRESS_SPACE macro

    Signed-off-by: Sean Christopherson <seanjc@google.com>
    sean-jc authored and chao-p committed Mar 17, 2023
  22. KVM: x86/mmu: Handle page fault for private memory

    Handle page fault for KVM_MEM_PRIVATE memslot which contains memory
    pages for both fd-based private memory and hva-based shared memory.
    
    Architectures support such memslot can set 'is_private' field of the
    kvm_page_fault structure to indicate whether the page fault is caused by
    a private memory access or not. KVM itself maintain its own view of
    whether the fault page is private or not via memory attributes.
    
    To handle page fault for such memslot, KVM first checks if 'is_private'
    of the fault matches the memory attribute it maintains, it then:
      - For a successful match, private pfn is obtained via restrictedmem
        and shared pfn is obtained vir GUP().
      - For a failed match, KVM causes a KVM_EXIT_MEMORY_FAULT exit to
        userspace. Userspace then can convert memory between private/shared
        in host's view and retry the fault.
    
    Co-developed-by: Yu Zhang <yu.c.zhang@linux.intel.com>
    Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
    Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
    Reviewed-by: Fuad Tabba <tabba@google.com>
    Tested-by: Fuad Tabba <tabba@google.com>
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    chao-p committed Mar 17, 2023
  23. KVM: x86: Disallow hugepages when memory attributes are mixed

    Disallow creating hugepages with mixed memory attributes, e.g. shared
    versus private, as mapping a hugepage in this case would allow the guest
    to access memory with the wrong attributes, e.g. overlaying private memory
    with a shared hugepage.
    
    Tracking whether or not attributes are mixed via the existing
    disallow_lpage field, but use the most significant bit in 'disallow_lpage'
    to indicate a hugepage has mixed attributes instead using the normal
    refcounting.  Whether or not attributes are mixed is binary; either they
    are or they aren't.  Attempting to squeeze that info into the refcount is
    unnecessarily complex as it would require knowing the previous state of
    the mixed count when updating attributes.  Using a flag means KVM just
    needs to ensure the current status is reflected in the memslots.
    
    Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    chao-p committed Mar 17, 2023
  24. KVM: Enable and expose KVM_MEM_PRIVATE

    Enable KVM_MEM_PRIVATE memslot to allow guest memory provided through a
    restrictedmem_fd/restrictedmem_offset pair that points to memory pages
    backed by memfd_restricted().
    
    Such memslots are bound to restrictedmem and receive notifiers from
    restrictedmem when the backed memory gets invalidated or error. KVM
    cannot call GUP() to obtain the pfn for such memory, instead it calls
    restrictedmem_get_page().
    
    The extended memslot can still have the userspace_addr(hva). When use, a
    single memslot can maintain both private memory through restricted_fd
    and shared memory through userspace_addr. Whether the private or shared
    part is visible to guest is maintained by the per-page memory attribute
    KVM_MEMORY_ATTRIBUTE_PRIVATE.
    
    Co-developed-by: Yu Zhang <yu.c.zhang@linux.intel.com>
    Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
    Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
    Cc: Fuad Tabba <tabba@google.com>
    Co-developed-by: Sean Christopherson <seanjc@google.com>
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    chao-p committed Mar 17, 2023
  25. KVM: Unmap existing mappings when memory attribute changed

    Unmap the existing guest mappings when memory attribute is changed.
    It's a reasonable action for current KVM_MEMORY_ATTRIBUTE_PRIVATE
    attribute because shared pages and private pages are from different
    backends so when a page is changed between shared and private, the
    existing mapping should be invalidated and later the new mapping can
    be populated.
    
    During the memory attribute changing and the unmapping time frame,
    page fault handler may happen in the same memory range and can cause
    incorrect page state, invoke kvm_mmu_invalidate_* helpers to let the
    page fault handler retry during this time frame.
    
    Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
    Co-developed-by: Sean Christopherson <seanjc@google.com>
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    chao-p committed Mar 17, 2023
  26. KVM: Use gfn instead of hva for mmu_notifier_retry

    Currently in mmu_notifier invalidate path, hva range is recorded and
    then checked against by mmu_notifier_retry_hva() in the page fault
    handling path. However, for the to be introduced private memory, a page
    fault may not have a hva associated, checking gfn(gpa) makes more sense.
    
    For existing hva based shared memory, gfn is expected to also work. The
    only downside is when aliasing multiple gfns to a single hva, the
    current algorithm of checking multiple ranges could result in a much
    larger range being rejected. Such aliasing should be uncommon, so the
    impact is expected small.
    
    Suggested-by: Sean Christopherson <seanjc@google.com>
    Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
    Reviewed-by: Fuad Tabba <tabba@google.com>
    Tested-by: Fuad Tabba <tabba@google.com>
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    chao-p committed Mar 17, 2023
  27. KVM: Add KVM_EXIT_MEMORY_FAULT exit

    This new KVM exit allows userspace to handle memory-related errors. It
    indicates an error happens in KVM at guest memory range [gpa, gpa+size).
    The flags includes additional information for userspace to handle the
    error. Currently bit 0 is defined as 'private memory' where '1'
    indicates error happens due to private memory access and '0' indicates
    error happens due to shared memory access.
    
    When private memory is enabled, this new exit will be used for KVM to
    exit to userspace for shared <-> private memory conversion in memory
    encryption usage. In such usage, typically there are two kind of memory
    conversions:
      - explicit conversion: happens when guest explicitly calls into KVM
        to map a range (as private or shared), KVM then exits to userspace
        to perform the map/unmap operations.
      - implicit conversion: happens in KVM page fault handler where KVM
        exits to userspace for an implicit conversion when the page is in a
        different state than requested (private or shared).
    
    Suggested-by: Sean Christopherson <seanjc@google.com>
    Co-developed-by: Yu Zhang <yu.c.zhang@linux.intel.com>
    Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
    Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
    Reviewed-by: Fuad Tabba <tabba@google.com>
    Tested-by: Fuad Tabba <tabba@google.com>
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    chao-p committed Mar 17, 2023
  28. KVM: Introduce KVM_SET_USER_MEMORY_REGION2

    Introduce KVM_SET_USER_MEMORY_REGION2 to allow extension for future
    features. It works with kvm_userspace_memory_region2 which leaves room
    for new features. kvm_userspace_memory_region2 has compatible layout to
    kvm_userspace_memory_region so code working on existing fields can be
    reused.
    
    This is preparing work for adding new fd-based memslot that new fields
    are needed for this ioctl to specify the fd number and the offset.
    
    Cc: Jarkko Sakkinen <jarkko@kernel.org>
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
    sean-jc authored and chao-p committed Mar 17, 2023
  29. KVM: Introduce per-page memory attributes

    Introduce two ioctls to allow userspace to operate on the per-page
    attributes of the guest memory.
    
      - KVM_SET_MEMORY_ATTRIBUTES to set the per-page memory attributes
        to a guest memory range.
      - KVM_GET_SUPPORTED_MEMORY_ATTRIBUTES to return the KVM supported
        memory attributes.
    
    In confidential computing usage, whether a page is private or shared is
    necessary information for KVM to perform operations like page fault
    handling, page zapping etc. There are other potential use cases for
    per-page memory attributes, e.g. to make memory read-only (or no-exec,
    or exec-only, etc.) without having to modify memslots.
    
    Attributes are defined as u64 bitmask and currently only one attribute
    KVM_MEMORY_ATTRIBUTE_PRIVATE is defined for confidential computing
    usage.
    
    Both ioctls are advertised through KVM_CAP_MEMORY_ATTRIBUTES.
    
    Suggested-by: Sean Christopherson <seanjc@google.com>
    Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    Link: https://lore.kernel.org/all/Y2WB48kD0J4VGynX@google.com/
    Reviewed-by: Fuad Tabba <tabba@google.com>
    Tested-by: Fuad Tabba <tabba@google.com>
    chao-p committed Mar 17, 2023
  30. KVM: Convert KVM_ARCH_WANT_MMU_NOTIFIER to CONFIG_KVM_GENERIC_MMU_NOT…

    …IFIER
    
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    sean-jc authored and chao-p committed Mar 17, 2023
  31. KVM: PPC: Drop dead code related to KVM_ARCH_WANT_MMU_NOTIFIER

    Signed-off-by: Sean Christopherson <seanjc@google.com>
    sean-jc authored and chao-p committed Mar 17, 2023
  32. selftests: add basic selftest for memfd_restricted

    The test verifies that file descriptor created with memfd_restricted()
    does not allow read/write/mmap operations and checks offset/length on
    fallocate(FALLOC_FL_PUNCH_HOLE) should be page aligned.
    
    Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
    chao-p committed Mar 17, 2023
  33. mm: Introduce memfd_restricted system call to create restricted user …

    …memory
    
    Introduce 'memfd_restricted' system call with the ability to create
    memory areas that are restricted from userspace access through ordinary
    MMU operations (e.g. read/write/mmap). The memory content is expected to
    be used through the new in-kernel interface by a third kernel module.
    
    memfd_restricted() is useful for scenarios where a file descriptor(fd)
    can be used as an interface into mm but want to restrict userspace's
    ability on the fd. Initially it is designed to provide protections for
    KVM encrypted guest memory.
    
    Normally KVM uses memfd memory via mmapping the memfd into KVM userspace
    (e.g. QEMU) and then using the mmaped virtual address to setup the
    mapping in the KVM secondary page table (e.g. EPT). With confidential
    computing technologies like Intel TDX, the memfd memory may be encrypted
    with special key for special software domain (e.g. KVM guest) and is not
    expected to be directly accessed by userspace. Precisely, userspace
    access to such encrypted memory may lead to host crash so should be
    prevented.
    
    memfd_restricted() provides semantics required for KVM guest encrypted
    memory support that a fd created with memfd_restricted() is going to be
    used as the source of guest memory in confidential computing environment
    and KVM can directly interact with core-mm without the need to expose
    the memory content into KVM userspace.
    
    KVM userspace is still in charge of the lifecycle of the fd. It should
    pass the created fd to KVM. KVM uses the new restrictedmem_get_page() to
    obtain the physical memory page and then uses it to populate the KVM
    secondary page table entries.
    
    The userspace restricted memfd can be fallocate-ed or hole-punched
    from userspace. When hole-punched, KVM can get notified through
    invalidate_start/invalidate_end() callbacks, KVM then gets chance to
    remove any mapped entries of the range in the secondary page tables.
    
    Machine check can happen for memory pages in the restricted memfd,
    instead of routing this directly to userspace, we call the error()
    callback that KVM registered. KVM then gets chance to handle it
    correctly.
    
    memfd_restricted() itself is implemented as a shim layer on top of real
    memory file systems (currently tmpfs). Pages in restrictedmem are marked
    as unmovable and unevictable, this is required for current confidential
    usage. But in future this might be changed.
    
    Initially memfd_restricted() prevents userspace read, write and mmap. It
    may be extended to support other restricted semantics in the future.
    
    The system call is currently wired up for x86 arch.
    
    Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
    kiryl authored and chao-p committed Mar 17, 2023

Commits on Mar 16, 2023

  1. KVM: Change return type of kvm_arch_vm_ioctl() to "int"

    All kvm_arch_vm_ioctl() implementations now only deal with "int"
    types as return values, so we can change the return type of these
    functions to use "int" instead of "long".
    
    Signed-off-by: Thomas Huth <thuth@redhat.com>
    Acked-by: Anup Patel <anup@brainfault.org>
    Message-Id: <20230208140105.655814-7-thuth@redhat.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    huth authored and bonzini committed Mar 16, 2023
  2. KVM: Standardize on "int" return types instead of "long" in kvm_main.c

    KVM functions use "long" return values for functions that are wired up
    to "struct file_operations", but otherwise use "int" return values for
    functions that can return 0/-errno in order to avoid unintentional
    divergences between 32-bit and 64-bit kernels.
    Some code still uses "long" in unnecessary spots, though, which can
    cause a little bit of confusion and unnecessary size casts. Let's
    change these spots to use "int" types, too.
    
    Signed-off-by: Thomas Huth <thuth@redhat.com>
    Message-Id: <20230208140105.655814-6-thuth@redhat.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    huth authored and bonzini committed Mar 16, 2023
Older