Commits
privmem-v11.6
Name already in use
Commits on Apr 12, 2023
-
KVM: selftests: Test KVM exit behavior for private memory/access
"Testing private access when memslot gets deleted" tests the behavior of KVM when a private memslot gets deleted while the VM is using the private memslot. When KVM looks up the deleted (slot = NULL) memslot, KVM should exit to userspace with KVM_EXIT_MEMORY_FAULT. In the second test, upon a private access to non-private memslot, KVM should also exit to userspace with KVM_EXIT_MEMORY_FAULT. Signed-off-by: Ackerley Tng <ackerleytng@google.com>
-
KVM: selftests: Add tests around sharing a restrictedmem fd
Tests that + Different memslots in the same VM should be able to share a restrictedmem_fd + A second VM cannot share the same offsets in a restrictedmem_fd + Different VMs should be able to share the same restrictedmem_fd, as long as the offsets in the restrictedmem_fd are different Signed-off-by: Ackerley Tng <ackerleytng@google.com>
-
KVM: selftests: Default private_mem_conversions_test to use 1 restric…
…tedmem file for test data Default the private/shared memory conversion tests to use a single file (when multiple memslots are requested), while executing on multiple vCPUs in parallel, to stress-test the restrictedmem subsystem. Also add a flag to allow multiple files to be used. Signed-off-by: Ackerley Tng <ackerleytng@google.com>
-
KVM: selftests: Add vm_userspace_mem_region_add_with_restrictedmem
Provide new function to allow restrictedmem's fd and offset to be specified in selftests. No functional change intended to vm_userspace_mem_region_add. Signed-off-by: Ackerley Tng <ackerleytng@google.com>
-
KVM: selftests: Default private_mem_conversions_test to use 1 memslot…
… for test data Default the private/shared memory conversion tests to use a single memslot, while executing on multiple vCPUs in parallel, to stress-test the restrictedmem subsystem. Also add a flag to allow multiple memslots to be used. Signed-off-by: Ackerley Tng <ackerleytng@google.com>
-
KVM: selftests: Generalize private_mem_conversions_test for parallel …
…execution By running the private/shared memory conversion tests on multiple vCPUs in parallel, we stress-test the restrictedmem subsystem to test conversion of non-overlapping GPA ranges in multiple memslots. Signed-off-by: Ackerley Tng <ackerleytng@google.com>
-
KVM: selftests: Exercise restrictedmem allocation and truncation code…
… after KVM invalidation code has been unbound The kernel interfaces restrictedmem_bind and restrictedmem_unbind are used by KVM to bind/unbind kvm functions to restrictedmem's invalidate_start and invalidate_end callbacks. After the KVM VM is freed, the KVM functions should have been unbound from the restrictedmem_fd's callbacks. In this test, we exercise fallocate to back and unback memory using the restrictedmem fd, and we expect no problems (crashes) after the KVM functions have been unbound. Signed-off-by: Ackerley Tng <ackerleytng@google.com>
-
KVM: selftests: Test that VM private memory should not be readable fr…
…om host After VM memory is remapped as private memory and guest has written to private memory, request the host to read the corresponding hva for that private memory. The host should not be able to read the value in private memory. This selftest shows that private memory contents of the guest are not accessible to host userspace via the HVA. Signed-off-by: Ackerley Tng <ackerleytng@google.com>
-
KVM: selftests: Add testcase for creating private memslots
Verify creating KVM_MEM_PRIVATE memslot fails with bad fd, bad alignment and overlapping offset. Modifying KVM_MEM_PRIVATE memslot is also not allowed at this time. Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
-
KVM: selftests: Add KVM_SET_USER_MEMORY_REGION2 helper
Provide a raw version as well as an assert-success version to reduce the amount of boilerplate code need for basic usage. Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
-
KVM: selftests: x86: Add selftest for private memory conversions
Add a selftest to exercise implicit/explicit conversion functionality within KVM and verify: - Shared memory is visible to host userspace - Private memory is not visible to host userspace - Host userspace and guest can communicate over shared memory - Data in shared backing is preserved across conversions (test's host userspace doesn't free the data) Signed-off-by: Vishal Annapurve <vannapurve@google.com> Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> -
KVM: selftests: Introduce VM "shape" to allow tests to specify the VM…
… type Signed-off-by: Sean Christopherson <seanjc@google.com>
-
KVM: selftests: Add helpers to do KVM_HC_MAP_GPA_RANGE hypercalls (x86)
Signed-off-by: Vishal Annapurve <vannapurve@google.com> Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com>
-
KVM: selftests: Add helpers to convert guest memory b/w private and s…
…hared Signed-off-by: Vishal Annapurve <vannapurve@google.com> Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com>
-
KVM: selftests: Add support for creating private memslots
Signed-off-by: Sean Christopherson <seanjc@google.com>
-
KVM: selftests: Convert lib's mem regions to KVM_SET_USER_MEMORY_REGION2
Signed-off-by: Sean Christopherson <seanjc@google.com>
-
KVM: selftests: Drop unused kvm_userspace_memory_region_find() helper
Drop kvm_userspace_memory_region_find(), it's unused and a terrible API (probably why it's unused). If anything outside of kvm_util.c needs to get at the memslot, userspace_mem_region_find() can be exposed to give others full access to all memory region/slot information. Signed-off-by: Sean Christopherson <seanjc@google.com>
-
KVM: x86: Add support for "protected VMs" that can utilize private me…
…mory Signed-off-by: Sean Christopherson <seanjc@google.com>
-
KVM: Allow arch code to track number of memslot address spaces per VM
Signed-off-by: Sean Christopherson <seanjc@google.com>
-
KVM: Drop superfluous __KVM_VCPU_MULTIPLE_ADDRESS_SPACE macro
Signed-off-by: Sean Christopherson <seanjc@google.com>
-
KVM: x86/mmu: Handle page fault for private memory
Handle page fault for KVM_MEM_PRIVATE memslot which contains memory pages for both fd-based private memory and hva-based shared memory. Architectures support such memslot can set 'is_private' field of the kvm_page_fault structure to indicate whether the page fault is caused by a private memory access or not. KVM itself maintain its own view of whether the fault page is private or not via memory attributes. To handle page fault for such memslot, KVM first checks if 'is_private' of the fault matches the memory attribute it maintains, it then: - For a successful match, private pfn is obtained via restrictedmem and shared pfn is obtained vir GUP(). - For a failed match, KVM causes a KVM_EXIT_MEMORY_FAULT exit to userspace. Userspace then can convert memory between private/shared in host's view and retry the fault. Co-developed-by: Yu Zhang <yu.c.zhang@linux.intel.com> Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com> Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> -
KVM: x86: Disallow hugepages when memory attributes are mixed
Disallow creating hugepages with mixed memory attributes, e.g. shared versus private, as mapping a hugepage in this case would allow the guest to access memory with the wrong attributes, e.g. overlaying private memory with a shared hugepage. Tracking whether or not attributes are mixed via the existing disallow_lpage field, but use the most significant bit in 'disallow_lpage' to indicate a hugepage has mixed attributes instead using the normal refcounting. Whether or not attributes are mixed is binary; either they are or they aren't. Attempting to squeeze that info into the refcount is unnecessarily complex as it would require knowing the previous state of the mixed count when updating attributes. Using a flag means KVM just needs to ensure the current status is reflected in the memslots. Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com>
-
KVM: Enable and expose KVM_MEM_PRIVATE
Enable KVM_MEM_PRIVATE memslot to allow guest memory provided through a restrictedmem_fd/restrictedmem_offset pair that points to memory pages backed by memfd_restricted(). Such memslots are bound to restrictedmem and receive notifiers from restrictedmem when the backed memory gets invalidated or error. KVM cannot call GUP() to obtain the pfn for such memory, instead it calls restrictedmem_get_page(). The extended memslot can still have the userspace_addr(hva). When use, a single memslot can maintain both private memory through restricted_fd and shared memory through userspace_addr. Whether the private or shared part is visible to guest is maintained by the per-page memory attribute KVM_MEMORY_ATTRIBUTE_PRIVATE. Co-developed-by: Yu Zhang <yu.c.zhang@linux.intel.com> Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com> Cc: Fuad Tabba <tabba@google.com> Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com>
-
KVM: Unmap existing mappings when memory attribute changed
Unmap the existing guest mappings when memory attribute is changed. It's a reasonable action for current KVM_MEMORY_ATTRIBUTE_PRIVATE attribute because shared pages and private pages are from different backends so when a page is changed between shared and private, the existing mapping should be invalidated and later the new mapping can be populated. During the memory attribute changing and the unmapping time frame, page fault handler may happen in the same memory range and can cause incorrect page state, invoke kvm_mmu_invalidate_* helpers to let the page fault handler retry during this time frame. Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com> Co-developed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com>
-
KVM: Use gfn instead of hva for mmu_notifier_retry
Currently in mmu_notifier invalidate path, hva range is recorded and then checked against by mmu_notifier_retry_hva() in the page fault handling path. However, for the to be introduced private memory, a page fault may not have a hva associated, checking gfn(gpa) makes more sense. For existing hva based shared memory, gfn is expected to also work. The only downside is when aliasing multiple gfns to a single hva, the current algorithm of checking multiple ranges could result in a much larger range being rejected. Such aliasing should be uncommon, so the impact is expected small. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com> Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com>
-
KVM: Add KVM_EXIT_MEMORY_FAULT exit
This new KVM exit allows userspace to handle memory-related errors. It indicates an error happens in KVM at guest memory range [gpa, gpa+size). The flags includes additional information for userspace to handle the error. Currently bit 0 is defined as 'private memory' where '1' indicates error happens due to private memory access and '0' indicates error happens due to shared memory access. When private memory is enabled, this new exit will be used for KVM to exit to userspace for shared <-> private memory conversion in memory encryption usage. In such usage, typically there are two kind of memory conversions: - explicit conversion: happens when guest explicitly calls into KVM to map a range (as private or shared), KVM then exits to userspace to perform the map/unmap operations. - implicit conversion: happens in KVM page fault handler where KVM exits to userspace for an implicit conversion when the page is in a different state than requested (private or shared). Suggested-by: Sean Christopherson <seanjc@google.com> Co-developed-by: Yu Zhang <yu.c.zhang@linux.intel.com> Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com> Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> -
KVM: Introduce KVM_SET_USER_MEMORY_REGION2
Introduce KVM_SET_USER_MEMORY_REGION2 to allow extension for future features. It works with kvm_userspace_memory_region2 which leaves room for new features. kvm_userspace_memory_region2 has compatible layout to kvm_userspace_memory_region so code working on existing fields can be reused. This is preparing work for adding new fd-based memslot that new fields are needed for this ioctl to specify the fd number and the offset. Cc: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
-
KVM: Introduce per-page memory attributes
Introduce two ioctls to allow userspace to operate on the per-page attributes of the guest memory. - KVM_SET_MEMORY_ATTRIBUTES to set the per-page memory attributes to a guest memory range. - KVM_GET_SUPPORTED_MEMORY_ATTRIBUTES to return the KVM supported memory attributes. In confidential computing usage, whether a page is private or shared is necessary information for KVM to perform operations like page fault handling, page zapping etc. There are other potential use cases for per-page memory attributes, e.g. to make memory read-only (or no-exec, or exec-only, etc.) without having to modify memslots. Attributes are defined as u64 bitmask and currently only one attribute KVM_MEMORY_ATTRIBUTE_PRIVATE is defined for confidential computing usage. Both ioctls are advertised through KVM_CAP_MEMORY_ATTRIBUTES. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/all/Y2WB48kD0J4VGynX@google.com/ Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> -
KVM: Convert KVM_ARCH_WANT_MMU_NOTIFIER to CONFIG_KVM_GENERIC_MMU_NOT…
…IFIER Signed-off-by: Sean Christopherson <seanjc@google.com>
-
KVM: PPC: Drop dead code related to KVM_ARCH_WANT_MMU_NOTIFIER
Signed-off-by: Sean Christopherson <seanjc@google.com>
-
selftests: add basic selftest for memfd_restricted
The test verifies that file descriptor created with memfd_restricted() does not allow read/write/mmap operations and checks offset/length on fallocate(FALLOC_FL_PUNCH_HOLE) should be page aligned. Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
-
mm: Introduce memfd_restricted system call to create restricted user …
…memory Introduce 'memfd_restricted' system call with the ability to create memory areas that are restricted from userspace access through ordinary MMU operations (e.g. read/write/mmap). The memory content is expected to be used through the new in-kernel interface by a third kernel module. memfd_restricted() is useful for scenarios where a file descriptor(fd) can be used as an interface into mm but want to restrict userspace's ability on the fd. Initially it is designed to provide protections for KVM encrypted guest memory. Normally KVM uses memfd memory via mmapping the memfd into KVM userspace (e.g. QEMU) and then using the mmaped virtual address to setup the mapping in the KVM secondary page table (e.g. EPT). With confidential computing technologies like Intel TDX, the memfd memory may be encrypted with special key for special software domain (e.g. KVM guest) and is not expected to be directly accessed by userspace. Precisely, userspace access to such encrypted memory may lead to host crash so should be prevented. memfd_restricted() provides semantics required for KVM guest encrypted memory support that a fd created with memfd_restricted() is going to be used as the source of guest memory in confidential computing environment and KVM can directly interact with core-mm without the need to expose the memory content into KVM userspace. KVM userspace is still in charge of the lifecycle of the fd. It should pass the created fd to KVM. KVM uses the new restrictedmem_get_page() to obtain the physical memory page and then uses it to populate the KVM secondary page table entries. The userspace restricted memfd can be fallocate-ed or hole-punched from userspace. When hole-punched, KVM can get notified through invalidate_start/invalidate_end() callbacks, KVM then gets chance to remove any mapped entries of the range in the secondary page tables. Machine check can happen for memory pages in the restricted memfd, instead of routing this directly to userspace, we call the error() callback that KVM registered. KVM then gets chance to handle it correctly. memfd_restricted() itself is implemented as a shim layer on top of real memory file systems (currently tmpfs). Pages in restrictedmem are marked as unmovable and unevictable, this is required for current confidential usage. But in future this might be changed. Initially memfd_restricted() prevents userspace read, write and mmap. It may be extended to support other restricted semantics in the future. The system call is currently wired up for x86 arch. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Commits on Apr 10, 2023
-
KVM: x86/mmu: Move filling of Hyper-V's TLB range struct into Hyper-V…
… code Refactor Hyper-V's range-based TLB flushing API to take a gfn+nr_pages pair instead of a struct, and bury said struct in Hyper-V specific code. Passing along two params generates much better code for the common case where KVM is _not_ running on Hyper-V, as forwarding the flush on to Hyper-V's hv_flush_remote_tlbs_range() from kvm_flush_remote_tlbs_range() becomes a tail call. Cc: David Matlack <dmatlack@google.com> Reviewed-by: David Matlack <dmatlack@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20230405003133.419177-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
-
KVM: x86: Rename Hyper-V remote TLB hooks to match established scheme
Rename the Hyper-V hooks for TLB flushing to match the naming scheme used by all the other TLB flushing hooks, e.g. in kvm_x86_ops, vendor code, arch hooks from common code, etc. Reviewed-by: David Matlack <dmatlack@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20230405003133.419177-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
Commits on Apr 4, 2023
-
KVM: x86/mmu: Merge all handle_changed_pte*() functions
Merge __handle_changed_pte() and handle_changed_spte_acc_track() into a single function, handle_changed_pte(), as the two are always used together. Remove the existing handle_changed_pte(), as it's just a wrapper that calls __handle_changed_pte() and handle_changed_spte_acc_track(). Signed-off-by: Vipin Sharma <vipinsh@google.com> Reviewed-by: Ben Gardon <bgardon@google.com> Reviewed-by: David Matlack <dmatlack@google.com> [sean: massage changelog] Link: https://lore.kernel.org/r/20230321220021.2119033-14-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>