Skip to content
Permalink
Peter-Xu/userf…
Switch branches/tags

Commits on Jul 15, 2021

  1. userfaultfd/selftests: Enable uffd-wp for shmem/hugetlbfs

    After we added support for shmem and hugetlbfs, we can turn uffd-wp test on
    always now.
    
    Define HUGETLB_EXPECTED_IOCTLS to avoid using UFFD_API_RANGE_IOCTLS_BASIC,
    because UFFD_API_RANGE_IOCTLS_BASIC is normally a superset of capabilities,
    while the test may not satisfy them all.  E.g., when hugetlb registered without
    minor mode, then we need to explicitly remove _UFFDIO_CONTINUE.  Same thing to
    uffd-wp, as we'll need to explicitly remove _UFFDIO_WRITEPROTECT if not
    registered with uffd-wp.
    
    For the long term, we may consider dropping UFFD_API_* macros completely from
    uapi/linux/userfaultfd.h header files, because it may cause kernel header
    update to easily break userspace.
    
    Signed-off-by: Peter Xu <peterx@redhat.com>
    xzpeter authored and intel-lab-lkp committed Jul 15, 2021
  2. mm/userfaultfd: Enable write protection for shmem & hugetlbfs

    We've had all the necessary changes ready for both shmem and hugetlbfs.  Turn
    on all the shmem/hugetlbfs switches for userfaultfd-wp.
    
    We can expand UFFD_API_RANGE_IOCTLS_BASIC with _UFFDIO_WRITEPROTECT too because
    all existing types now support write protection mode.
    
    Since vma_can_userfault() will be used elsewhere, move into userfaultfd_k.h.
    
    Signed-off-by: Peter Xu <peterx@redhat.com>
    xzpeter authored and intel-lab-lkp committed Jul 15, 2021
  3. mm/pagemap: Recognize uffd-wp bit for shmem/hugetlbfs

    This requires the pagemap code to be able to recognize the newly introduced
    swap special pte for uffd-wp, meanwhile the general case for hugetlb that we
    recently start to support.  It should make pagemap uffd-wp support complete.
    
    Signed-off-by: Peter Xu <peterx@redhat.com>
    xzpeter authored and intel-lab-lkp committed Jul 15, 2021
  4. hugetlb/userfaultfd: Only drop uffd-wp special pte if required

    As with shmem uffd-wp special ptes, only drop the uffd-wp special swap pte if
    unmapping an entire vma or synchronized such that faults can not race with the
    unmap operation.  This requires passing zap_flags all the way to the lowest
    level hugetlb unmap routine: __unmap_hugepage_range.
    
    In general, unmap calls originated in hugetlbfs code will pass the
    ZAP_FLAG_DROP_FILE_UFFD_WP flag as synchronization is in place to prevent
    faults.  The exception is hole punch which will first unmap without any
    synchronization.  Later when hole punch actually removes the page from the
    file, it will check to see if there was a subsequent fault and if so take the
    hugetlb fault mutex while unmapping again.  This second unmap will pass in
    ZAP_FLAG_DROP_FILE_UFFD_WP.
    
    The core justification of "whether to apply ZAP_FLAG_DROP_FILE_UFFD_WP flag
    when unmap a hugetlb range" is (IMHO): we should never reach a state when a
    page fault could errornously fault in a page-cache page that was wr-protected
    to be writable, even in an extremely short period.  That could happen if
    e.g. we pass ZAP_FLAG_DROP_FILE_UFFD_WP in hugetlbfs_punch_hole() when calling
    hugetlb_vmdelete_list(), because if a page fault triggers after that call and
    before the remove_inode_hugepages() right after it, the page cache can be
    mapped writable again in the small window, which can cause data corruption.
    
    Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
    Signed-off-by: Peter Xu <peterx@redhat.com>
    xzpeter authored and intel-lab-lkp committed Jul 15, 2021
  5. hugetlb/userfaultfd: Allow wr-protect none ptes

    Teach hugetlbfs code to wr-protect none ptes just in case the page cache
    existed for that pte.  Meanwhile we also need to be able to recognize a uffd-wp
    marker pte and remove it for uffd_wp_resolve.
    
    Since at it, introduce a variable "psize" to replace all references to the huge
    page size fetcher.
    
    Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
    Signed-off-by: Peter Xu <peterx@redhat.com>
    xzpeter authored and intel-lab-lkp committed Jul 15, 2021
  6. hugetlb/userfaultfd: Handle uffd-wp special pte in hugetlb pf handler

    Teach the hugetlb page fault code to understand uffd-wp special pte.  For
    example, when seeing such a pte we need to convert any write fault into a read
    one (which is fake - we'll retry the write later if so).  Meanwhile, for
    handle_userfault() we'll need to make sure we must wait for the special swap
    pte too just like a none pte.
    
    Note that we also need to teach UFFDIO_COPY about this special pte across the
    code path so that we can safely install a new page at this special pte as long
    as we know it's a stall entry.
    
    Signed-off-by: Peter Xu <peterx@redhat.com>
    xzpeter authored and intel-lab-lkp committed Jul 15, 2021
  7. mm/hugetlb: Introduce huge version of special swap pte helpers

    This is to let hugetlbfs be prepared to also recognize swap special ptes just
    like uffd-wp special swap ptes.
    
    Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
    Signed-off-by: Peter Xu <peterx@redhat.com>
    xzpeter authored and intel-lab-lkp committed Jul 15, 2021
  8. hugetlb/userfaultfd: Handle UFFDIO_WRITEPROTECT

    This starts from passing cp_flags into hugetlb_change_protection() so hugetlb
    will be able to handle MM_CP_UFFD_WP[_RESOLVE] requests.
    
    huge_pte_clear_uffd_wp() is introduced to handle the case where the
    UFFDIO_WRITEPROTECT is requested upon migrating huge page entries.
    
    Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
    Signed-off-by: Peter Xu <peterx@redhat.com>
    xzpeter authored and intel-lab-lkp committed Jul 15, 2021
  9. hugetlb/userfaultfd: Take care of UFFDIO_COPY_MODE_WP

    Firstly, pass the wp_copy variable into hugetlb_mcopy_atomic_pte() thoughout
    the stack.  Then, apply the UFFD_WP bit if UFFDIO_COPY_MODE_WP is with
    UFFDIO_COPY.  Introduce huge_pte_mkuffd_wp() for it.
    
    Hugetlb pages are only managed by hugetlbfs, so we're safe even without setting
    dirty bit in the huge pte if the page is installed as read-only.  However we'd
    better still keep the dirty bit set for a read-only UFFDIO_COPY pte (when
    UFFDIO_COPY_MODE_WP bit is set), not only to match what we do with shmem, but
    also because the page does contain dirty data that the kernel just copied from
    the userspace.
    
    Signed-off-by: Peter Xu <peterx@redhat.com>
    xzpeter authored and intel-lab-lkp committed Jul 15, 2021
  10. hugetlb/userfaultfd: Hook page faults for uffd write protection

    Hook up hugetlbfs_fault() with the capability to handle userfaultfd-wp faults.
    
    We do this slightly earlier than hugetlb_cow() so that we can avoid taking some
    extra locks that we definitely don't need.
    
    Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
    Signed-off-by: Peter Xu <peterx@redhat.com>
    xzpeter authored and intel-lab-lkp committed Jul 15, 2021
  11. mm/hugetlb: Introduce huge pte version of uffd-wp helpers

    They will be used in the follow up patches to either check/set/clear uffd-wp
    bit of a huge pte.
    
    So far it reuses all the small pte helpers.  Archs can overwrite these versions
    when necessary (with __HAVE_ARCH_HUGE_PTE_UFFD_WP* macros) in the future.
    
    Signed-off-by: Peter Xu <peterx@redhat.com>
    xzpeter authored and intel-lab-lkp committed Jul 15, 2021
  12. mm/hugetlb: Drop __unmap_hugepage_range definition from hugetlb.h

    Drop it in the header since it's only used in hugetlb.c.
    
    Suggested-by: Mike Kravetz <mike.kravetz@oracle.com>
    Signed-off-by: Peter Xu <peterx@redhat.com>
    xzpeter authored and intel-lab-lkp committed Jul 15, 2021
  13. shmem/userfaultfd: Pass over uffd-wp special swap pte when fork()

    It should be handled similarly like other uffd-wp wr-protected ptes: we should
    pass it over when the dst_vma has VM_UFFD_WP armed, otherwise drop it.
    
    Signed-off-by: Peter Xu <peterx@redhat.com>
    xzpeter authored and intel-lab-lkp committed Jul 15, 2021
  14. shmem/userfaultfd: Handle the left-overed special swap ptes

    Note that the special uffd-wp swap pte can be left over even if the page under
    the pte got evicted.  Normally when evict a page, we will unmap the ptes by
    walking through the reverse mapping.  However we never tracked such information
    for the special swap ptes because they're not real mappings but just markers.
    So we need to take care of that when we see a marker but when it's actually
    meaningless (the page behind it got evicted).
    
    We have already taken care of that in e.g. alloc_set_pte() where we'll treat
    the special swap pte as pte_none() when necessary.  However we need to also
    teach userfaultfd itself on either UFFDIO_COPY or handling page faults, so that
    everything will still work as expected.
    
    Signed-off-by: Peter Xu <peterx@redhat.com>
    xzpeter authored and intel-lab-lkp committed Jul 15, 2021
  15. shmem/userfaultfd: Allows file-back mem to be uffd wr-protected on thps

    We don't have "huge" version of PTE_SWP_UFFD_WP_SPECIAL, instead when necessary
    we split the thp if the huge page is uffd wr-protected previously.
    
    However split the thp is not enough, because file-backed thp is handled totally
    differently comparing to anonymous thps - rather than doing a real split, the
    thp pmd will simply got dropped in __split_huge_pmd_locked().
    
    That is definitely not enough if e.g. when there is a thp covers range [0, 2M)
    but we want to wr-protect small page resides in [4K, 8K) range, because after
    __split_huge_pmd() returns, there will be a none pmd.
    
    Here we leverage the previously introduced change_protection_prepare() macro so
    that we'll populate the pmd with a pgtable page.  Then change_pte_range() will
    do all the rest for us, e.g., install the uffd-wp swap special pte marker at
    any pte that we'd like to wr-protect, under the protection of pgtable lock.
    
    Signed-off-by: Peter Xu <peterx@redhat.com>
    xzpeter authored and intel-lab-lkp committed Jul 15, 2021
  16. shmem/userfaultfd: Allow wr-protect none pte for file-backed mem

    File-backed memory differs from anonymous memory in that even if the pte is
    missing, the data could still resides either in the file or in page/swap cache.
    So when wr-protect a pte, we need to consider none ptes too.
    
    We do that by installing the uffd-wp special swap pte as a marker.  So when
    there's a future write to the pte, the fault handler will go the special path
    to first fault-in the page as read-only, then report to userfaultfd server with
    the wr-protect message.
    
    On the other hand, when unprotecting a page, it's also possible that the pte
    got unmapped but replaced by the special uffd-wp marker.  Then we'll need to be
    able to recover from a uffd-wp special swap pte into a none pte, so that the
    next access to the page will fault in correctly as usual when trigger the fault
    handler next time, rather than sending a uffd-wp message.
    
    Special care needs to be taken throughout the change_protection_range()
    process.  Since now we allow user to wr-protect a none pte, we need to be able
    to pre-populate the page table entries if we see !anonymous && MM_CP_UFFD_WP
    requests, otherwise change_protection_range() will always skip when the pgtable
    entry does not exist.
    
    Note that this patch only covers the small pages (pte level) but not covering
    any of the transparent huge pages yet.  But this will be a base for thps too.
    
    Signed-off-by: Peter Xu <peterx@redhat.com>
    xzpeter authored and intel-lab-lkp committed Jul 15, 2021
  17. shmem/userfaultfd: Persist uffd-wp bit across zapping for file-backed

    File-backed memory is prone to being unmapped at any time.  It means all
    information in the pte will be dropped, including the uffd-wp flag.
    
    Since the uffd-wp info cannot be stored in page cache or swap cache, persist
    this wr-protect information by installing the special uffd-wp marker pte when
    we're going to unmap a uffd wr-protected pte.  When the pte is accessed again,
    we will know it's previously wr-protected by recognizing the special pte.
    
    Meanwhile add a new flag ZAP_FLAG_DROP_FILE_UFFD_WP when we don't want to
    persist such an information.  For example, when destroying the whole vma, or
    punching a hole in a shmem file.  For the latter, we can only drop the uffd-wp
    bit when holding the page lock.  It means the unmap_mapping_range() in
    shmem_fallocate() still reuqires to zap without ZAP_FLAG_DROP_FILE_UFFD_WP
    because that's still racy with the page faults.
    
    Signed-off-by: Peter Xu <peterx@redhat.com>
    xzpeter authored and intel-lab-lkp committed Jul 15, 2021
  18. mm: Introduce ZAP_FLAG_SKIP_SWAP

    Firstly, the comment in zap_pte_range() is misleading because it checks against
    details rather than check_mappings, so it's against what the code did.
    
    Meanwhile, it's confusing too on not explaining why passing in the details
    pointer would mean to skip all swap entries.  New user of zap_details could
    very possibly miss this fact if they don't read deep until zap_pte_range()
    because there's no comment at zap_details talking about it at all, so swap
    entries could be errornously skipped without being noticed.
    
    This partly reverts 3e8715f ("mm: drop zap_details::check_swap_entries"),
    but introduce ZAP_FLAG_SKIP_SWAP flag, which means the opposite of previous
    "details" parameter: the caller should explicitly set this to skip swap
    entries, otherwise swap entries will always be considered (which is still the
    major case here).
    
    Cc: Kirill A. Shutemov <kirill@shutemov.name>
    Reviewed-by: Alistair Popple <apopple@nvidia.com>
    Signed-off-by: Peter Xu <peterx@redhat.com>
    xzpeter authored and intel-lab-lkp committed Jul 15, 2021
  19. mm: Introduce zap_details.zap_flags

    Instead of trying to introduce one variable for every new zap_details fields,
    let's introduce a flag so that it can start to encode true/false informations.
    
    Let's start to use this flag first to clean up the only check_mapping variable.
    Firstly, the name "check_mapping" implies this is a "boolean", but actually it
    stores the mapping inside, just in a way that it won't be set if we don't want
    to check the mapping.
    
    To make things clearer, introduce the 1st zap flag ZAP_FLAG_CHECK_MAPPING, so
    that we only check against the mapping if this bit set.  At the same time, we
    can rename check_mapping into zap_mapping and set it always.
    
    Since at it, introduce another helper zap_check_mapping_skip() and use it in
    zap_pte_range() properly.
    
    Some old comments have been removed in zap_pte_range() because they're
    duplicated, and since now we're with ZAP_FLAG_CHECK_MAPPING flag, it'll be very
    easy to grep this information by simply grepping the flag.
    
    It'll also make life easier when we want to e.g. pass in zap_flags into the
    callers like unmap_mapping_pages() (instead of adding new booleans besides the
    even_cows parameter).
    
    Signed-off-by: Peter Xu <peterx@redhat.com>
    xzpeter authored and intel-lab-lkp committed Jul 15, 2021
  20. mm: Drop first_index/last_index in zap_details

    The first_index/last_index parameters in zap_details are actually only used in
    unmap_mapping_range_tree().  At the meantime, this function is only called by
    unmap_mapping_pages() once.  Instead of passing these two variables through the
    whole stack of page zapping code, remove them from zap_details and let them
    simply be parameters of unmap_mapping_range_tree(), which is inlined.
    
    Reviewed-by: Alistair Popple <apopple@nvidia.com>
    Signed-off-by: Peter Xu <peterx@redhat.com>
    xzpeter authored and intel-lab-lkp committed Jul 15, 2021
  21. shmem/userfaultfd: Handle uffd-wp special pte in page fault handler

    File-backed memories are prone to unmap/swap so the ptes are always unstable.
    This could lead to userfaultfd-wp information got lost when unmapped or swapped
    out on such types of memory, for example, shmem.  To keep such an information
    persistent, we will start to use the newly introduced swap-like special ptes to
    replace a null pte when those ptes were removed.
    
    Prepare this by handling such a special pte first before it is applied in the
    general page fault handler.
    
    The handling of this special pte page fault is similar to missing fault, but it
    should happen after the pte missing logic since the special pte is designed to
    be a swap-like pte.  Meanwhile it should be handled before do_swap_page() so
    that the swap core logic won't be confused to see such an illegal swap pte.
    
    This is a slow path of uffd-wp handling, because unmap of wr-protected shmem
    ptes should be rare.  So far it should only trigger in two conditions:
    
      (1) When trying to punch holes in shmem_fallocate(), there will be a
          pre-unmap optimization before evicting the page.  That will create
          unmapped shmem ptes with wr-protected pages covered.
    
      (2) Swapping out of shmem pages
    
    Because of this, the page fault handling is simplifed too by not sending the
    wr-protect message in the 1st page fault, instead the page will be installed
    read-only, so the message will be generated until the next write, which will
    trigger the do_wp_page() path of general uffd-wp handling.
    
    Disable fault-around for all uffd-wp registered ranges for extra safety, and
    clean the code up a bit after we introduced MINOR fault.
    
    Signed-off-by: Peter Xu <peterx@redhat.com>
    xzpeter authored and intel-lab-lkp committed Jul 15, 2021
  22. mm/swap: Introduce the idea of special swap ptes

    We used to have special swap entries, like migration entries, hw-poison
    entries, device private entries, etc.
    
    Those "special swap entries" reside in the range that they need to be at least
    swap entries first, and their types are decided by swp_type(entry).
    
    This patch introduces another idea called "special swap ptes".
    
    It's very easy to get confused against "special swap entries", but a speical
    swap pte should never contain a swap entry at all.  It means, it's illegal to
    call pte_to_swp_entry() upon a special swap pte.
    
    Make the uffd-wp special pte to be the first special swap pte.
    
    Before this patch, is_swap_pte()==true means one of the below:
    
       (a.1) The pte has a normal swap entry (non_swap_entry()==false).  For
             example, when an anonymous page got swapped out.
    
       (a.2) The pte has a special swap entry (non_swap_entry()==true).  For
             example, a migration entry, a hw-poison entry, etc.
    
    After this patch, is_swap_pte()==true means one of the below, where case (b) is
    added:
    
     (a) The pte contains a swap entry.
    
       (a.1) The pte has a normal swap entry (non_swap_entry()==false).  For
             example, when an anonymous page got swapped out.
    
       (a.2) The pte has a special swap entry (non_swap_entry()==true).  For
             example, a migration entry, a hw-poison entry, etc.
    
     (b) The pte does not contain a swap entry at all (so it cannot be passed
         into pte_to_swp_entry()).  For example, uffd-wp special swap pte.
    
    Teach the whole mm core about this new idea.  It's done by introducing another
    helper called pte_has_swap_entry() which stands for case (a.1) and (a.2).
    Before this patch, it will be the same as is_swap_pte() because there's no
    special swap pte yet.  Now for most of the previous use of is_swap_entry() in
    mm core, we'll need to use the new helper pte_has_swap_entry() instead, to make
    sure we won't try to parse a swap entry from a swap special pte (which does not
    contain a swap entry at all!).  We either handle the swap special pte, or it'll
    naturally use the default "else" paths.
    
    Warn properly (e.g., in do_swap_page()) when we see a special swap pte - we
    should never call do_swap_page() upon those ptes, but just to bail out early if
    it happens.
    
    Signed-off-by: Peter Xu <peterx@redhat.com>
    xzpeter authored and intel-lab-lkp committed Jul 15, 2021
  23. mm/userfaultfd: Introduce special pte for unmapped file-backed mem

    This patch introduces a very special swap-like pte for file-backed memories.
    
    Currently it's only defined for x86_64 only, but as long as any arch that can
    properly define the UFFD_WP_SWP_PTE_SPECIAL value as requested, it should
    conceptually work too.
    
    We will use this special pte to arm the ptes that got either unmapped or
    swapped out for a file-backed region that was previously wr-protected.  This
    special pte could trigger a page fault just like swap entries, and as long as
    the page fault will satisfy pte_none()==false && pte_present()==false.
    
    Then we can revive the special pte into a normal pte backed by the page cache.
    
    This idea is greatly inspired by Hugh and Andrea in the discussion, which is
    referenced in the links below.
    
    The other idea (from Hugh) is that we use swp_type==1 and swp_offset=0 as the
    special pte.  The current solution (as pointed out by Andrea) is slightly
    preferred in that we don't even need swp_entry_t knowledge at all in trapping
    these accesses.  Meanwhile, we also reuse _PAGE_SWP_UFFD_WP from the anonymous
    swp entries.
    
    This patch only introduces the special pte and its operators.  It's not yet
    applied to have any functional difference.
    
    Link: https://lore.kernel.org/lkml/20201126222359.8120-1-peterx@redhat.com/
    Link: https://lore.kernel.org/lkml/20201130230603.46187-1-peterx@redhat.com/
    Suggested-by: Andrea Arcangeli <aarcange@redhat.com>
    Suggested-by: Hugh Dickins <hughd@google.com>
    Signed-off-by: Peter Xu <peterx@redhat.com>
    xzpeter authored and intel-lab-lkp committed Jul 15, 2021
  24. mm: Clear vmf->pte after pte_unmap_same() returns

    pte_unmap_same() will always unmap the pte pointer.  After the unmap, vmf->pte
    will not be valid any more.  We should clear it.
    
    It was safe only because no one is accessing vmf->pte after pte_unmap_same()
    returns, since the only caller of pte_unmap_same() (so far) is do_swap_page(),
    where vmf->pte will in most cases be overwritten very soon.
    
    pte_unmap_same() will be used in other places in follow up patches, so that
    vmf->pte will not always be re-written.  This patch enables us to call
    functions like finish_fault() because that'll conditionally unmap the pte by
    checking vmf->pte first.  Or, alloc_set_pte() will make sure to allocate a new
    pte even after calling pte_unmap_same().
    
    Since we'll need to modify vmf->pte, directly pass in vmf into pte_unmap_same()
    and then we can also avoid the long parameter list.
    
    Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
    Signed-off-by: Peter Xu <peterx@redhat.com>
    xzpeter authored and intel-lab-lkp committed Jul 15, 2021
  25. shmem/userfaultfd: Take care of UFFDIO_COPY_MODE_WP

    Firstly, pass wp_copy into shmem_mfill_atomic_pte() through the stack.
    Then apply the UFFD_WP bit properly when the UFFDIO_COPY on shmem is with
    UFFDIO_COPY_MODE_WP, then wp_copy lands mfill_atomic_install_pte() which is
    newly introduced very recently.
    
    We need to make sure shmem_mfill_atomic_pte() will always set the dirty bit in
    pte even if UFFDIO_COPY_MODE_WP is set.  After the rework of minor fault series
    on shmem we need to slightly touch up the logic there, since uffd-wp needs to
    be applied even if writable==false previously (e.g., for shmem private mapping).
    
    Note: we must do pte_wrprotect() if !writable in mfill_atomic_install_pte(), as
    mk_pte() could return a writable pte (e.g., when VM_SHARED on a shmem file).
    
    Signed-off-by: Peter Xu <peterx@redhat.com>
    xzpeter authored and intel-lab-lkp committed Jul 15, 2021
  26. mm/shmem: Unconditionally set pte dirty in mfill_atomic_install_pte

    It was conditionally done previously, as there's one shmem special case that we
    use SetPageDirty() instead.  However that's not necessary and it should be
    easier and cleaner to do it unconditionally in mfill_atomic_install_pte().
    
    The most recent discussion about this is here, where Hugh explained the history
    of SetPageDirty() and why it's possible that it's not required at all:
    
    https://lore.kernel.org/lkml/alpine.LSU.2.11.2104121657050.1097@eggly.anvils/
    
    Currently mfill_atomic_install_pte() has three callers:
    
            1. shmem_mfill_atomic_pte
            2. mcopy_atomic_pte
            3. mcontinue_atomic_pte
    
    After the change: case (1) should have its SetPageDirty replaced by the dirty
    bit on pte (so we unify them together, finally), case (2) should have no
    functional change at all as it has page_in_cache==false, case (3) may add a
    dirty bit to the pte.  However since case (3) is UFFDIO_CONTINUE for shmem,
    it's merely 100% sure the page is dirty after all, so should not make a real
    difference either.
    
    This should make it much easier to follow on which case will set dirty for
    uffd, as we'll simply set it all now for all uffd related ioctls.  Meanwhile,
    no special handling of SetPageDirty() if there's no need.
    
    Cc: Hugh Dickins <hughd@google.com>
    Cc: Axel Rasmussen <axelrasmussen@google.com>
    Cc: Andrea Arcangeli <aarcange@redhat.com>
    Signed-off-by: Peter Xu <peterx@redhat.com>
    xzpeter authored and intel-lab-lkp committed Jul 15, 2021

Commits on Jul 11, 2021

  1. Linux 5.14-rc1

    torvalds committed Jul 11, 2021
  2. mm/rmap: try_to_migrate() skip zone_device !device_private

    I know nothing about zone_device pages and !device_private pages; but if
    try_to_migrate_one() will do nothing for them, then it's better that
    try_to_migrate() filter them first, than trawl through all their vmas.
    
    Signed-off-by: Hugh Dickins <hughd@google.com>
    Reviewed-by: Shakeel Butt <shakeelb@google.com>
    Reviewed-by: Alistair Popple <apopple@nvidia.com>
    Link: https://lore.kernel.org/lkml/1241d356-8ec9-f47b-a5ec-9b2bf66d242@google.com/
    Cc: Andrew Morton <akpm@linux-foundation.org>
    Cc: Jason Gunthorpe <jgg@nvidia.com>
    Cc: Ralph Campbell <rcampbell@nvidia.com>
    Cc: Christoph Hellwig <hch@lst.de>
    Cc: Yang Shi <shy828301@gmail.com>
    Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Hugh Dickins authored and torvalds committed Jul 11, 2021
  3. mm/rmap: fix new bug: premature return from page_mlock_one()

    In the unlikely race case that page_mlock_one() finds VM_LOCKED has been
    cleared by the time it got page table lock, page_vma_mapped_walk_done()
    must be called before returning, either explicitly, or by a final call
    to page_vma_mapped_walk() - otherwise the page table remains locked.
    
    Fixes: cd62734 ("mm/rmap: split try_to_munlock from try_to_unmap")
    Signed-off-by: Hugh Dickins <hughd@google.com>
    Reviewed-by: Alistair Popple <apopple@nvidia.com>
    Reviewed-by: Shakeel Butt <shakeelb@google.com>
    Reported-by: kernel test robot <oliver.sang@intel.com>
    Link: https://lore.kernel.org/lkml/20210711151446.GB4070@xsang-OptiPlex-9020/
    Link: https://lore.kernel.org/lkml/f71f8523-cba7-3342-40a7-114abc5d1f51@google.com/
    Cc: Andrew Morton <akpm@linux-foundation.org>
    Cc: Jason Gunthorpe <jgg@nvidia.com>
    Cc: Ralph Campbell <rcampbell@nvidia.com>
    Cc: Christoph Hellwig <hch@lst.de>
    Cc: Yang Shi <shy828301@gmail.com>
    Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Hugh Dickins authored and torvalds committed Jul 11, 2021
  4. mm/rmap: fix old bug: munlocking THP missed other mlocks

    The kernel recovers in due course from missing Mlocked pages: but there
    was no point in calling page_mlock() (formerly known as
    try_to_munlock()) on a THP, because nothing got done even when it was
    found to be mapped in another VM_LOCKED vma.
    
    It's true that we need to be careful: Mlocked accounting of pte-mapped
    THPs is too difficult (so consistently avoided); but Mlocked accounting
    of only-pmd-mapped THPs is supposed to work, even when multiple mappings
    are mlocked and munlocked or munmapped.  Refine the tests.
    
    There is already a VM_BUG_ON_PAGE(PageDoubleMap) in page_mlock(), so
    page_mlock_one() does not even have to worry about that complication.
    
    (I said the kernel recovers: but would page reclaim be likely to split
    THP before rediscovering that it's VM_LOCKED? I've not followed that up)
    
    Fixes: 9a73f61 ("thp, mlock: do not mlock PTE-mapped file huge pages")
    Signed-off-by: Hugh Dickins <hughd@google.com>
    Reviewed-by: Shakeel Butt <shakeelb@google.com>
    Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
    Link: https://lore.kernel.org/lkml/cfa154c-d595-406-eb7d-eb9df730f944@google.com/
    Cc: Andrew Morton <akpm@linux-foundation.org>
    Cc: Alistair Popple <apopple@nvidia.com>
    Cc: Jason Gunthorpe <jgg@nvidia.com>
    Cc: Ralph Campbell <rcampbell@nvidia.com>
    Cc: Christoph Hellwig <hch@lst.de>
    Cc: Yang Shi <shy828301@gmail.com>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Hugh Dickins authored and torvalds committed Jul 11, 2021
  5. mm/rmap: fix comments left over from recent changes

    Parallel developments in mm/rmap.c have left behind some out-of-date
    comments: try_to_migrate_one() also accepts TTU_SYNC (already commented
    in try_to_migrate() itself), and try_to_migrate() returns nothing at
    all.
    
    TTU_SPLIT_FREEZE has just been deleted, so reword the comment about it
    in mm/huge_memory.c; and TTU_IGNORE_ACCESS was removed in 5.11, so
    delete the "recently referenced" comment from try_to_unmap_one() (once
    upon a time the comment was near the removed codeblock, but they drifted
    apart).
    
    Signed-off-by: Hugh Dickins <hughd@google.com>
    Reviewed-by: Shakeel Butt <shakeelb@google.com>
    Reviewed-by: Alistair Popple <apopple@nvidia.com>
    Link: https://lore.kernel.org/lkml/563ce5b2-7a44-5b4d-1dfd-59a0e65932a9@google.com/
    Cc: Andrew Morton <akpm@linux-foundation.org>
    Cc: Jason Gunthorpe <jgg@nvidia.com>
    Cc: Ralph Campbell <rcampbell@nvidia.com>
    Cc: Christoph Hellwig <hch@lst.de>
    Cc: Yang Shi <shy828301@gmail.com>
    Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Hugh Dickins authored and torvalds committed Jul 11, 2021
  6. Merge tag 'irq-urgent-2021-07-11' of git://git.kernel.org/pub/scm/lin…

    …ux/kernel/git/tip/tip
    
    Pull irq fixes from Ingo Molnar:
     "Two fixes:
    
       - Fix a MIPS IRQ handling RCU bug
    
       - Remove a DocBook annotation for a parameter that doesn't exist
         anymore"
    
    * tag 'irq-urgent-2021-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
      irqchip/mips: Fix RCU violation when using irqdomain lookup on interrupt entry
      genirq/irqdesc: Drop excess kernel-doc entry @lookup
    torvalds committed Jul 11, 2021
  7. Merge tag 'sched-urgent-2021-07-11' of git://git.kernel.org/pub/scm/l…

    …inux/kernel/git/tip/tip
    
    Pull scheduler fixes from Ingo Molnar:
     "Three fixes:
    
       - Fix load tracking bug/inconsistency
    
       - Fix a sporadic CFS bandwidth constraints enforcement bug
    
       - Fix a uclamp utilization tracking bug for newly woken tasks"
    
    * tag 'sched-urgent-2021-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
      sched/uclamp: Ignore max aggregation if rq is idle
      sched/fair: Fix CFS bandwidth hrtimer expiry type
      sched/fair: Sync load_sum with load_avg after dequeue
    torvalds committed Jul 11, 2021
  8. Merge tag 'perf-urgent-2021-07-11' of git://git.kernel.org/pub/scm/li…

    …nux/kernel/git/tip/tip
    
    Pull perf fixes from Ingo Molnar:
     "A fix and a hardware-enablement addition:
    
       - Robustify uncore_snbep's skx_iio_set_mapping()'s error cleanup
    
       - Add cstate event support for Intel ICELAKE_X and ICELAKE_D"
    
    * tag 'perf-urgent-2021-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
      perf/x86/intel/uncore: Clean up error handling path of iio mapping
      perf/x86/cstate: Add ICELAKE_X and ICELAKE_D support
    torvalds committed Jul 11, 2021
  9. Merge tag 'locking-urgent-2021-07-11' of git://git.kernel.org/pub/scm…

    …/linux/kernel/git/tip/tip
    
    Pull locking fixes from Ingo Molnar:
    
     - Fix a Sparc crash
    
     - Fix a number of objtool warnings
    
     - Fix /proc/lockdep output on certain configs
    
     - Restore a kprobes fail-safe
    
    * tag 'locking-urgent-2021-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
      locking/atomic: sparc: Fix arch_cmpxchg64_local()
      kprobe/static_call: Restore missing static_call_text_reserved()
      static_call: Fix static_call_text_reserved() vs __init
      jump_label: Fix jump_label_text_reserved() vs __init
      locking/lockdep: Fix meaningless /proc/lockdep output of lock classes on !CONFIG_PROVE_LOCKING
    torvalds committed Jul 11, 2021
Older