Skip to content
Permalink
Anand-Moon/Mes…
Switch branches/tags

Commits on Jun 17, 2021

  1. phy: amlogic: meson8b-usb2: don't log an error on -EPROBE_DEFER

    devm_phy_create can return -EPROBE_DEFER if the phy-supply is not ready
    yet. Silence this warning as the driver framework will re-attempt
    registering the PHY. Use dev_err_probe() for phy resources to indicate
    the deferral reason when waiting for the resource to come up.
    
    Cc: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
    Signed-off-by: Anand Moon <linux.amoon@gmail.com>
    moonlinux authored and intel-lab-lkp committed Jun 17, 2021
  2. phy: amlogic: meson8b-usb2: Power off the PHY by putting it into rese…

    …t mode.
    
    Power off the PHY by putting it into reset mode.
    Drop the phy power reset since we are doing reset of phy
    after we configure the phy. No functional change.
    
    Cc: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
    Signed-off-by: Anand Moon <linux.amoon@gmail.com>
    moonlinux authored and intel-lab-lkp committed Jun 17, 2021
  3. phy: amlogic: meson8b-usb2: Use phy reset callback function

    Reoder the code for phy reset mode in .reset callback function.
    Reset control is shared between two phy so use the phy name
    as shared id.
    
    Cc: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
    Signed-off-by: Anand Moon <linux.amoon@gmail.com>
    moonlinux authored and intel-lab-lkp committed Jun 17, 2021
  4. phy: amlogic: meson8b-usb2: Reorder phy poweroff callback function

    Move the phy_meson8b_usb2_power_off fundtion to avoid compilation
    error.
    
    drivers/phy/amlogic/phy-meson8b-usb2.c:247:3: error:
    	implicit declaration of function 'phy_meson8b_usb2_power_off';
    
    Cc: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
    Signed-off-by: Anand Moon <linux.amoon@gmail.com>
    moonlinux authored and intel-lab-lkp committed Jun 17, 2021
  5. phy: amlogic: meson8b-usb2: Use phy set_mode callback function

    Reorder the code for phy set_mode in .set_mode callback function.
    For now configure the phy in host mode.
    
    Cc: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
    Signed-off-by: Anand Moon <linux.amoon@gmail.com>
    moonlinux authored and intel-lab-lkp committed Jun 17, 2021
  6. phy: amlogic: meson8b-usb2: Use phy exit callback function

    Reorder the code for phy bulkclk disable in .exit callback function.
    
    Cc: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
    Signed-off-by: Anand Moon <linux.amoon@gmail.com>
    moonlinux authored and intel-lab-lkp committed Jun 17, 2021
  7. phy: amlogic: meson8b-usb2: Use phy init callback function

    Reorder the code for bulk clk_enable into .init callback function.
    
    Cc: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
    Signed-off-by: Anand Moon <linux.amoon@gmail.com>
    moonlinux authored and intel-lab-lkp committed Jun 17, 2021
  8. phy: amlogic: meson8b-usb2: Use clock bulk to get clocks for phy

    Use clock bulk helpers to get/enable/disable clocks,
    it will be easier to handle clocks. No functional change intended.
    
    Cc: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
    Signed-off-by: Anand Moon <linux.amoon@gmail.com>
    moonlinux authored and intel-lab-lkp committed Jun 17, 2021
  9. Merge tag 'fixes_for_v5.13-rc7' of git://git.kernel.org/pub/scm/linux…

    …/kernel/git/jack/linux-fs
    
    Pull quota and fanotify fixes from Jan Kara:
     "A fixup finishing disabling of quotactl_path() syscall (I've missed
      archs using different way to declare syscalls) and a fix of an fd leak
      in error handling path of fanotify"
    
    * tag 'fixes_for_v5.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
      quota: finish disable quotactl_path syscall
      fanotify: fix copy_event_to_user() fid error clean up
    torvalds committed Jun 17, 2021

Commits on Jun 16, 2021

  1. Merge branch 'akpm' (patches from Andrew)

    Merge misc fixes from Andrew Morton:
     "18 patches.
    
      Subsystems affected by this patch series: mm (memory-failure, swap,
      slub, hugetlb, memory-failure, slub, thp, sparsemem), and coredump"
    
    * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
      mm/sparse: fix check_usemap_section_nr warnings
      mm: thp: replace DEBUG_VM BUG with VM_WARN when unmap fails for split
      mm/thp: unmap_mapping_page() to fix THP truncate_cleanup_page()
      mm/thp: fix page_address_in_vma() on file THP tails
      mm/thp: fix vma_address() if virtual address below file offset
      mm/thp: try_to_unmap() use TTU_SYNC for safe splitting
      mm/thp: make is_huge_zero_pmd() safe and quicker
      mm/thp: fix __split_huge_pmd_locked() on shmem migration entry
      mm, thp: use head page in __migration_entry_wait()
      mm/slub.c: include swab.h
      crash_core, vmcoreinfo: append 'SECTION_SIZE_BITS' to vmcoreinfo
      mm/memory-failure: make sure wait for page writeback in memory_failure
      mm/hugetlb: expand restore_reserve_on_error functionality
      mm/slub: actually fix freelist pointer vs redzoning
      mm/slub: fix redzoning for small allocations
      mm/slub: clarify verification reporting
      mm/swap: fix pte_same_as_swp() not removing uffd-wp bit when compare
      mm,hwpoison: fix race with hugetlb page allocation
    torvalds committed Jun 16, 2021
  2. mm/sparse: fix check_usemap_section_nr warnings

    I see a "virt_to_phys used for non-linear address" warning from
    check_usemap_section_nr() on arm64 platforms.
    
    In current implementation of NODE_DATA, if CONFIG_NEED_MULTIPLE_NODES=y,
    pglist_data is dynamically allocated and assigned to node_data[].
    
    For example, in arch/arm64/include/asm/mmzone.h:
    
      extern struct pglist_data *node_data[];
      #define NODE_DATA(nid)          (node_data[(nid)])
    
    If CONFIG_NEED_MULTIPLE_NODES=n, pglist_data is defined as a global
    variable named "contig_page_data".
    
    For example, in include/linux/mmzone.h:
    
      extern struct pglist_data contig_page_data;
      #define NODE_DATA(nid)          (&contig_page_data)
    
    If CONFIG_DEBUG_VIRTUAL is not enabled, __pa() can handle both
    dynamically allocated linear addresses and symbol addresses.  However,
    if (CONFIG_DEBUG_VIRTUAL=y && CONFIG_NEED_MULTIPLE_NODES=n) we can see
    the "virt_to_phys used for non-linear address" warning because that
    &contig_page_data is not a linear address on arm64.
    
    Warning message:
    
      virt_to_phys used for non-linear address: (contig_page_data+0x0/0x1c00)
      WARNING: CPU: 0 PID: 0 at arch/arm64/mm/physaddr.c:15 __virt_to_phys+0x58/0x68
      Modules linked in:
      CPU: 0 PID: 0 Comm: swapper Tainted: G        W         5.13.0-rc1-00074-g1140ab592e2e #3
      Hardware name: linux,dummy-virt (DT)
      pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO BTYPE=--)
      Call trace:
         __virt_to_phys+0x58/0x68
         check_usemap_section_nr+0x50/0xfc
         sparse_init_nid+0x1ac/0x28c
         sparse_init+0x1c4/0x1e0
         bootmem_init+0x60/0x90
         setup_arch+0x184/0x1f0
         start_kernel+0x78/0x488
    
    To fix it, create a small function to handle both translation.
    
    Link: https://lkml.kernel.org/r/1623058729-27264-1-git-send-email-miles.chen@mediatek.com
    Signed-off-by: Miles Chen <miles.chen@mediatek.com>
    Cc: Mike Rapoport <rppt@kernel.org>
    Cc: Baoquan He <bhe@redhat.com>
    Cc: Kazu <k-hagio-ab@nec.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    milesdotchen authored and torvalds committed Jun 16, 2021
  3. mm: thp: replace DEBUG_VM BUG with VM_WARN when unmap fails for split

    When debugging the bug reported by Wang Yugui [1], try_to_unmap() may
    fail, but the first VM_BUG_ON_PAGE() just checks page_mapcount() however
    it may miss the failure when head page is unmapped but other subpage is
    mapped.  Then the second DEBUG_VM BUG() that check total mapcount would
    catch it.  This may incur some confusion.
    
    As this is not a fatal issue, so consolidate the two DEBUG_VM checks
    into one VM_WARN_ON_ONCE_PAGE().
    
    [1] https://lore.kernel.org/linux-mm/20210412180659.B9E3.409509F4@e16-tech.com/
    
    Link: https://lkml.kernel.org/r/d0f0db68-98b8-ebfb-16dc-f29df24cf012@google.com
    Signed-off-by: Yang Shi <shy828301@gmail.com>
    Reviewed-by: Zi Yan <ziy@nvidia.com>
    Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
    Signed-off-by: Hugh Dickins <hughd@google.com>
    Cc: Alistair Popple <apopple@nvidia.com>
    Cc: Jan Kara <jack@suse.cz>
    Cc: Jue Wang <juew@google.com>
    Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
    Cc: Miaohe Lin <linmiaohe@huawei.com>
    Cc: Minchan Kim <minchan@kernel.org>
    Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
    Cc: Oscar Salvador <osalvador@suse.de>
    Cc: Peter Xu <peterx@redhat.com>
    Cc: Ralph Campbell <rcampbell@nvidia.com>
    Cc: Shakeel Butt <shakeelb@google.com>
    Cc: Wang Yugui <wangyugui@e16-tech.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    yang-shi authored and torvalds committed Jun 16, 2021
  4. mm/thp: unmap_mapping_page() to fix THP truncate_cleanup_page()

    There is a race between THP unmapping and truncation, when truncate sees
    pmd_none() and skips the entry, after munmap's zap_huge_pmd() cleared
    it, but before its page_remove_rmap() gets to decrement
    compound_mapcount: generating false "BUG: Bad page cache" reports that
    the page is still mapped when deleted.  This commit fixes that, but not
    in the way I hoped.
    
    The first attempt used try_to_unmap(page, TTU_SYNC|TTU_IGNORE_MLOCK)
    instead of unmap_mapping_range() in truncate_cleanup_page(): it has
    often been an annoyance that we usually call unmap_mapping_range() with
    no pages locked, but there apply it to a single locked page.
    try_to_unmap() looks more suitable for a single locked page.
    
    However, try_to_unmap_one() contains a VM_BUG_ON_PAGE(!pvmw.pte,page):
    it is used to insert THP migration entries, but not used to unmap THPs.
    Copy zap_huge_pmd() and add THP handling now? Perhaps, but their TLB
    needs are different, I'm too ignorant of the DAX cases, and couldn't
    decide how far to go for anon+swap.  Set that aside.
    
    The second attempt took a different tack: make no change in truncate.c,
    but modify zap_huge_pmd() to insert an invalidated huge pmd instead of
    clearing it initially, then pmd_clear() between page_remove_rmap() and
    unlocking at the end.  Nice.  But powerpc blows that approach out of the
    water, with its serialize_against_pte_lookup(), and interesting pgtable
    usage.  It would need serious help to get working on powerpc (with a
    minor optimization issue on s390 too).  Set that aside.
    
    Just add an "if (page_mapped(page)) synchronize_rcu();" or other such
    delay, after unmapping in truncate_cleanup_page()? Perhaps, but though
    that's likely to reduce or eliminate the number of incidents, it would
    give less assurance of whether we had identified the problem correctly.
    
    This successful iteration introduces "unmap_mapping_page(page)" instead
    of try_to_unmap(), and goes the usual unmap_mapping_range_tree() route,
    with an addition to details.  Then zap_pmd_range() watches for this
    case, and does spin_unlock(pmd_lock) if so - just like
    page_vma_mapped_walk() now does in the PVMW_SYNC case.  Not pretty, but
    safe.
    
    Note that unmap_mapping_page() is doing a VM_BUG_ON(!PageLocked) to
    assert its interface; but currently that's only used to make sure that
    page->mapping is stable, and zap_pmd_range() doesn't care if the page is
    locked or not.  Along these lines, in invalidate_inode_pages2_range()
    move the initial unmap_mapping_range() out from under page lock, before
    then calling unmap_mapping_page() under page lock if still mapped.
    
    Link: https://lkml.kernel.org/r/a2a4a148-cdd8-942c-4ef8-51b77f643dbe@google.com
    Fixes: fc127da ("truncate: handle file thp")
    Signed-off-by: Hugh Dickins <hughd@google.com>
    Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
    Reviewed-by: Yang Shi <shy828301@gmail.com>
    Cc: Alistair Popple <apopple@nvidia.com>
    Cc: Jan Kara <jack@suse.cz>
    Cc: Jue Wang <juew@google.com>
    Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
    Cc: Miaohe Lin <linmiaohe@huawei.com>
    Cc: Minchan Kim <minchan@kernel.org>
    Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
    Cc: Oscar Salvador <osalvador@suse.de>
    Cc: Peter Xu <peterx@redhat.com>
    Cc: Ralph Campbell <rcampbell@nvidia.com>
    Cc: Shakeel Butt <shakeelb@google.com>
    Cc: Wang Yugui <wangyugui@e16-tech.com>
    Cc: Zi Yan <ziy@nvidia.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Hugh Dickins authored and torvalds committed Jun 16, 2021
  5. mm/thp: fix page_address_in_vma() on file THP tails

    Anon THP tails were already supported, but memory-failure may need to
    use page_address_in_vma() on file THP tails, which its page->mapping
    check did not permit: fix it.
    
    hughd adds: no current usage is known to hit the issue, but this does
    fix a subtle trap in a general helper: best fixed in stable sooner than
    later.
    
    Link: https://lkml.kernel.org/r/a0d9b53-bf5d-8bab-ac5-759dc61819c1@google.com
    Fixes: 800d8c6 ("shmem: add huge pages support")
    Signed-off-by: Jue Wang <juew@google.com>
    Signed-off-by: Hugh Dickins <hughd@google.com>
    Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Reviewed-by: Yang Shi <shy828301@gmail.com>
    Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
    Cc: Alistair Popple <apopple@nvidia.com>
    Cc: Jan Kara <jack@suse.cz>
    Cc: Miaohe Lin <linmiaohe@huawei.com>
    Cc: Minchan Kim <minchan@kernel.org>
    Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
    Cc: Oscar Salvador <osalvador@suse.de>
    Cc: Peter Xu <peterx@redhat.com>
    Cc: Ralph Campbell <rcampbell@nvidia.com>
    Cc: Shakeel Butt <shakeelb@google.com>
    Cc: Wang Yugui <wangyugui@e16-tech.com>
    Cc: Zi Yan <ziy@nvidia.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    jueatgithub authored and torvalds committed Jun 16, 2021
  6. mm/thp: fix vma_address() if virtual address below file offset

    Running certain tests with a DEBUG_VM kernel would crash within hours,
    on the total_mapcount BUG() in split_huge_page_to_list(), while trying
    to free up some memory by punching a hole in a shmem huge page: split's
    try_to_unmap() was unable to find all the mappings of the page (which,
    on a !DEBUG_VM kernel, would then keep the huge page pinned in memory).
    
    When that BUG() was changed to a WARN(), it would later crash on the
    VM_BUG_ON_VMA(end < vma->vm_start || start >= vma->vm_end, vma) in
    mm/internal.h:vma_address(), used by rmap_walk_file() for
    try_to_unmap().
    
    vma_address() is usually correct, but there's a wraparound case when the
    vm_start address is unusually low, but vm_pgoff not so low:
    vma_address() chooses max(start, vma->vm_start), but that decides on the
    wrong address, because start has become almost ULONG_MAX.
    
    Rewrite vma_address() to be more careful about vm_pgoff; move the
    VM_BUG_ON_VMA() out of it, returning -EFAULT for errors, so that it can
    be safely used from page_mapped_in_vma() and page_address_in_vma() too.
    
    Add vma_address_end() to apply similar care to end address calculation,
    in page_vma_mapped_walk() and page_mkclean_one() and try_to_unmap_one();
    though it raises a question of whether callers would do better to supply
    pvmw->end to page_vma_mapped_walk() - I chose not, for a smaller patch.
    
    An irritation is that their apparent generality breaks down on KSM
    pages, which cannot be located by the page->index that page_to_pgoff()
    uses: as commit 4b0ece6 ("mm: migrate: fix remove_migration_pte()
    for ksm pages") once discovered.  I dithered over the best thing to do
    about that, and have ended up with a VM_BUG_ON_PAGE(PageKsm) in both
    vma_address() and vma_address_end(); though the only place in danger of
    using it on them was try_to_unmap_one().
    
    Sidenote: vma_address() and vma_address_end() now use compound_nr() on a
    head page, instead of thp_size(): to make the right calculation on a
    hugetlbfs page, whether or not THPs are configured.  try_to_unmap() is
    used on hugetlbfs pages, but perhaps the wrong calculation never
    mattered.
    
    Link: https://lkml.kernel.org/r/caf1c1a3-7cfb-7f8f-1beb-ba816e932825@google.com
    Fixes: a8fa41a ("mm, rmap: check all VMAs that PTE-mapped THP can be part of")
    Signed-off-by: Hugh Dickins <hughd@google.com>
    Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
    Cc: Alistair Popple <apopple@nvidia.com>
    Cc: Jan Kara <jack@suse.cz>
    Cc: Jue Wang <juew@google.com>
    Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
    Cc: Miaohe Lin <linmiaohe@huawei.com>
    Cc: Minchan Kim <minchan@kernel.org>
    Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
    Cc: Oscar Salvador <osalvador@suse.de>
    Cc: Peter Xu <peterx@redhat.com>
    Cc: Ralph Campbell <rcampbell@nvidia.com>
    Cc: Shakeel Butt <shakeelb@google.com>
    Cc: Wang Yugui <wangyugui@e16-tech.com>
    Cc: Yang Shi <shy828301@gmail.com>
    Cc: Zi Yan <ziy@nvidia.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Hugh Dickins authored and torvalds committed Jun 16, 2021
  7. mm/thp: try_to_unmap() use TTU_SYNC for safe splitting

    Stressing huge tmpfs often crashed on unmap_page()'s VM_BUG_ON_PAGE
    (!unmap_success): with dump_page() showing mapcount:1, but then its raw
    struct page output showing _mapcount ffffffff i.e.  mapcount 0.
    
    And even if that particular VM_BUG_ON_PAGE(!unmap_success) is removed,
    it is immediately followed by a VM_BUG_ON_PAGE(compound_mapcount(head)),
    and further down an IS_ENABLED(CONFIG_DEBUG_VM) total_mapcount BUG():
    all indicative of some mapcount difficulty in development here perhaps.
    But the !CONFIG_DEBUG_VM path handles the failures correctly and
    silently.
    
    I believe the problem is that once a racing unmap has cleared pte or
    pmd, try_to_unmap_one() may skip taking the page table lock, and emerge
    from try_to_unmap() before the racing task has reached decrementing
    mapcount.
    
    Instead of abandoning the unsafe VM_BUG_ON_PAGE(), and the ones that
    follow, use PVMW_SYNC in try_to_unmap_one() in this case: adding
    TTU_SYNC to the options, and passing that from unmap_page().
    
    When CONFIG_DEBUG_VM, or for non-debug too? Consensus is to do the same
    for both: the slight overhead added should rarely matter, except perhaps
    if splitting sparsely-populated multiply-mapped shmem.  Once confident
    that bugs are fixed, TTU_SYNC here can be removed, and the race
    tolerated.
    
    Link: https://lkml.kernel.org/r/c1e95853-8bcd-d8fd-55fa-e7f2488e78f@google.com
    Fixes: fec89c1 ("thp: rewrite freeze_page()/unfreeze_page() with generic rmap walkers")
    Signed-off-by: Hugh Dickins <hughd@google.com>
    Cc: Alistair Popple <apopple@nvidia.com>
    Cc: Jan Kara <jack@suse.cz>
    Cc: Jue Wang <juew@google.com>
    Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
    Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
    Cc: Miaohe Lin <linmiaohe@huawei.com>
    Cc: Minchan Kim <minchan@kernel.org>
    Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
    Cc: Oscar Salvador <osalvador@suse.de>
    Cc: Peter Xu <peterx@redhat.com>
    Cc: Ralph Campbell <rcampbell@nvidia.com>
    Cc: Shakeel Butt <shakeelb@google.com>
    Cc: Wang Yugui <wangyugui@e16-tech.com>
    Cc: Yang Shi <shy828301@gmail.com>
    Cc: Zi Yan <ziy@nvidia.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Hugh Dickins authored and torvalds committed Jun 16, 2021
  8. mm/thp: make is_huge_zero_pmd() safe and quicker

    Most callers of is_huge_zero_pmd() supply a pmd already verified
    present; but a few (notably zap_huge_pmd()) do not - it might be a pmd
    migration entry, in which the pfn is encoded differently from a present
    pmd: which might pass the is_huge_zero_pmd() test (though not on x86,
    since L1TF forced us to protect against that); or perhaps even crash in
    pmd_page() applied to a swap-like entry.
    
    Make it safe by adding pmd_present() check into is_huge_zero_pmd()
    itself; and make it quicker by saving huge_zero_pfn, so that
    is_huge_zero_pmd() will not need to do that pmd_page() lookup each time.
    
    __split_huge_pmd_locked() checked pmd_trans_huge() before: that worked,
    but is unnecessary now that is_huge_zero_pmd() checks present.
    
    Link: https://lkml.kernel.org/r/21ea9ca-a1f5-8b90-5e88-95fb1c49bbfa@google.com
    Fixes: e71769a ("mm: enable thp migration for shmem thp")
    Signed-off-by: Hugh Dickins <hughd@google.com>
    Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
    Reviewed-by: Yang Shi <shy828301@gmail.com>
    Cc: Alistair Popple <apopple@nvidia.com>
    Cc: Jan Kara <jack@suse.cz>
    Cc: Jue Wang <juew@google.com>
    Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
    Cc: Miaohe Lin <linmiaohe@huawei.com>
    Cc: Minchan Kim <minchan@kernel.org>
    Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
    Cc: Oscar Salvador <osalvador@suse.de>
    Cc: Peter Xu <peterx@redhat.com>
    Cc: Ralph Campbell <rcampbell@nvidia.com>
    Cc: Shakeel Butt <shakeelb@google.com>
    Cc: Wang Yugui <wangyugui@e16-tech.com>
    Cc: Zi Yan <ziy@nvidia.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Hugh Dickins authored and torvalds committed Jun 16, 2021
  9. mm/thp: fix __split_huge_pmd_locked() on shmem migration entry

    Patch series "mm/thp: fix THP splitting unmap BUGs and related", v10.
    
    Here is v2 batch of long-standing THP bug fixes that I had not got
    around to sending before, but prompted now by Wang Yugui's report
    https://lore.kernel.org/linux-mm/20210412180659.B9E3.409509F4@e16-tech.com/
    
    Wang Yugui has tested a rollup of these fixes applied to 5.10.39, and
    they have done no harm, but have *not* fixed that issue: something more
    is needed and I have no idea of what.
    
    This patch (of 7):
    
    Stressing huge tmpfs page migration racing hole punch often crashed on
    the VM_BUG_ON(!pmd_present) in pmdp_huge_clear_flush(), with DEBUG_VM=y
    kernel; or shortly afterwards, on a bad dereference in
    __split_huge_pmd_locked() when DEBUG_VM=n.  They forgot to allow for pmd
    migration entries in the non-anonymous case.
    
    Full disclosure: those particular experiments were on a kernel with more
    relaxed mmap_lock and i_mmap_rwsem locking, and were not repeated on the
    vanilla kernel: it is conceivable that stricter locking happens to avoid
    those cases, or makes them less likely; but __split_huge_pmd_locked()
    already allowed for pmd migration entries when handling anonymous THPs,
    so this commit brings the shmem and file THP handling into line.
    
    And while there: use old_pmd rather than _pmd, as in the following
    blocks; and make it clearer to the eye that the !vma_is_anonymous()
    block is self-contained, making an early return after accounting for
    unmapping.
    
    Link: https://lkml.kernel.org/r/af88612-1473-2eaa-903-8d1a448b26@google.com
    Link: https://lkml.kernel.org/r/dd221a99-efb3-cd1d-6256-7e646af29314@google.com
    Fixes: e71769a ("mm: enable thp migration for shmem thp")
    Signed-off-by: Hugh Dickins <hughd@google.com>
    Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
    Cc: Yang Shi <shy828301@gmail.com>
    Cc: Wang Yugui <wangyugui@e16-tech.com>
    Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
    Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
    Cc: Alistair Popple <apopple@nvidia.com>
    Cc: Ralph Campbell <rcampbell@nvidia.com>
    Cc: Zi Yan <ziy@nvidia.com>
    Cc: Miaohe Lin <linmiaohe@huawei.com>
    Cc: Minchan Kim <minchan@kernel.org>
    Cc: Jue Wang <juew@google.com>
    Cc: Peter Xu <peterx@redhat.com>
    Cc: Jan Kara <jack@suse.cz>
    Cc: Shakeel Butt <shakeelb@google.com>
    Cc: Oscar Salvador <osalvador@suse.de>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Hugh Dickins authored and torvalds committed Jun 16, 2021
  10. mm, thp: use head page in __migration_entry_wait()

    We notice that hung task happens in a corner but practical scenario when
    CONFIG_PREEMPT_NONE is enabled, as follows.
    
    Process 0                       Process 1                     Process 2..Inf
    split_huge_page_to_list
        unmap_page
            split_huge_pmd_address
                                    __migration_entry_wait(head)
                                                                  __migration_entry_wait(tail)
        remap_page (roll back)
            remove_migration_ptes
                rmap_walk_anon
                    cond_resched
    
    Where __migration_entry_wait(tail) is occurred in kernel space, e.g.,
    copy_to_user in fstat, which will immediately fault again without
    rescheduling, and thus occupy the cpu fully.
    
    When there are too many processes performing __migration_entry_wait on
    tail page, remap_page will never be done after cond_resched.
    
    This makes __migration_entry_wait operate on the compound head page,
    thus waits for remap_page to complete, whether the THP is split
    successfully or roll back.
    
    Note that put_and_wait_on_page_locked helps to drop the page reference
    acquired with get_page_unless_zero, as soon as the page is on the wait
    queue, before actually waiting.  So splitting the THP is only prevented
    for a brief interval.
    
    Link: https://lkml.kernel.org/r/b9836c1dd522e903891760af9f0c86a2cce987eb.1623144009.git.xuyu@linux.alibaba.com
    Fixes: ba98828 ("thp: add option to setup migration entries during PMD split")
    Suggested-by: Hugh Dickins <hughd@google.com>
    Signed-off-by: Gang Deng <gavin.dg@linux.alibaba.com>
    Signed-off-by: Xu Yu <xuyu@linux.alibaba.com>
    Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
    Acked-by: Hugh Dickins <hughd@google.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    vxuyu authored and torvalds committed Jun 16, 2021
  11. mm/slub.c: include swab.h

    Fixes build with CONFIG_SLAB_FREELIST_HARDENED=y.
    
    Hopefully.  But it's the right thing to do anwyay.
    
    Fixes: 1ad53d9 ("slub: improve bit diffusion for freelist ptr obfuscation")
    Link: https://bugzilla.kernel.org/show_bug.cgi?id=213417
    Reported-by: <vannguye@cisco.com>
    Acked-by: Kees Cook <keescook@chromium.org>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    akpm00 authored and torvalds committed Jun 16, 2021
  12. crash_core, vmcoreinfo: append 'SECTION_SIZE_BITS' to vmcoreinfo

    As mentioned in kernel commit 1d50e5d ("crash_core, vmcoreinfo:
    Append 'MAX_PHYSMEM_BITS' to vmcoreinfo"), SECTION_SIZE_BITS in the
    formula:
    
        #define SECTIONS_SHIFT    (MAX_PHYSMEM_BITS - SECTION_SIZE_BITS)
    
    Besides SECTIONS_SHIFT, SECTION_SIZE_BITS is also used to calculate
    PAGES_PER_SECTION in makedumpfile just like kernel.
    
    Unfortunately, this arch-dependent macro SECTION_SIZE_BITS changes, e.g.
    recently in kernel commit f0b13ee ("arm64/sparsemem: reduce
    SECTION_SIZE_BITS").  But user space wants a stable interface to get
    this info.  Such info is impossible to be deduced from a crashdump
    vmcore.  Hence append SECTION_SIZE_BITS to vmcoreinfo.
    
    Link: https://lkml.kernel.org/r/20210608103359.84907-1-kernelfans@gmail.com
    Link: http://lists.infradead.org/pipermail/kexec/2021-June/022676.html
    Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
    Acked-by: Baoquan He <bhe@redhat.com>
    Cc: Bhupesh Sharma <bhupesh.sharma@linaro.org>
    Cc: Kazuhito Hagio <k-hagio@ab.jp.nec.com>
    Cc: Dave Young <dyoung@redhat.com>
    Cc: Boris Petkov <bp@alien8.de>
    Cc: Ingo Molnar <mingo@kernel.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: James Morse <james.morse@arm.com>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Cc: Will Deacon <will@kernel.org>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: Michael Ellerman <mpe@ellerman.id.au>
    Cc: Paul Mackerras <paulus@samba.org>
    Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
    Cc: Dave Anderson <anderson@redhat.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    liupingfan authored and torvalds committed Jun 16, 2021
  13. mm/memory-failure: make sure wait for page writeback in memory_failure

    Our syzkaller trigger the "BUG_ON(!list_empty(&inode->i_wb_list))" in
    clear_inode:
    
      kernel BUG at fs/inode.c:519!
      Internal error: Oops - BUG: 0 [#1] SMP
      Modules linked in:
      Process syz-executor.0 (pid: 249, stack limit = 0x00000000a12409d7)
      CPU: 1 PID: 249 Comm: syz-executor.0 Not tainted 4.19.95
      Hardware name: linux,dummy-virt (DT)
      pstate: 80000005 (Nzcv daif -PAN -UAO)
      pc : clear_inode+0x280/0x2a8
      lr : clear_inode+0x280/0x2a8
      Call trace:
        clear_inode+0x280/0x2a8
        ext4_clear_inode+0x38/0xe8
        ext4_free_inode+0x130/0xc68
        ext4_evict_inode+0xb20/0xcb8
        evict+0x1a8/0x3c0
        iput+0x344/0x460
        do_unlinkat+0x260/0x410
        __arm64_sys_unlinkat+0x6c/0xc0
        el0_svc_common+0xdc/0x3b0
        el0_svc_handler+0xf8/0x160
        el0_svc+0x10/0x218
      Kernel panic - not syncing: Fatal exception
    
    A crash dump of this problem show that someone called __munlock_pagevec
    to clear page LRU without lock_page: do_mmap -> mmap_region -> do_munmap
    -> munlock_vma_pages_range -> __munlock_pagevec.
    
    As a result memory_failure will call identify_page_state without
    wait_on_page_writeback.  And after truncate_error_page clear the mapping
    of this page.  end_page_writeback won't call sb_clear_inode_writeback to
    clear inode->i_wb_list.  That will trigger BUG_ON in clear_inode!
    
    Fix it by checking PageWriteback too to help determine should we skip
    wait_on_page_writeback.
    
    Link: https://lkml.kernel.org/r/20210604084705.3729204-1-yangerkun@huawei.com
    Fixes: 0bc1f8b ("hwpoison: fix the handling path of the victimized page frame that belong to non-LRU")
    Signed-off-by: yangerkun <yangerkun@huawei.com>
    Acked-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
    Cc: Jan Kara <jack@suse.cz>
    Cc: Theodore Ts'o <tytso@mit.edu>
    Cc: Oscar Salvador <osalvador@suse.de>
    Cc: Yu Kuai <yukuai3@huawei.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    yangerkun authored and torvalds committed Jun 16, 2021
  14. mm/hugetlb: expand restore_reserve_on_error functionality

    The routine restore_reserve_on_error is called to restore reservation
    information when an error occurs after page allocation.  The routine
    alloc_huge_page modifies the mapping reserve map and potentially the
    reserve count during allocation.  If code calling alloc_huge_page
    encounters an error after allocation and needs to free the page, the
    reservation information needs to be adjusted.
    
    Currently, restore_reserve_on_error only takes action on pages for which
    the reserve count was adjusted(HPageRestoreReserve flag).  There is
    nothing wrong with these adjustments.  However, alloc_huge_page ALWAYS
    modifies the reserve map during allocation even if the reserve count is
    not adjusted.  This can cause issues as observed during development of
    this patch [1].
    
    One specific series of operations causing an issue is:
    
     - Create a shared hugetlb mapping
       Reservations for all pages created by default
    
     - Fault in a page in the mapping
       Reservation exists so reservation count is decremented
    
     - Punch a hole in the file/mapping at index previously faulted
       Reservation and any associated pages will be removed
    
     - Allocate a page to fill the hole
       No reservation entry, so reserve count unmodified
       Reservation entry added to map by alloc_huge_page
    
     - Error after allocation and before instantiating the page
       Reservation entry remains in map
    
     - Allocate a page to fill the hole
       Reservation entry exists, so decrement reservation count
    
    This will cause a reservation count underflow as the reservation count
    was decremented twice for the same index.
    
    A user would observe a very large number for HugePages_Rsvd in
    /proc/meminfo.  This would also likely cause subsequent allocations of
    hugetlb pages to fail as it would 'appear' that all pages are reserved.
    
    This sequence of operations is unlikely to happen, however they were
    easily reproduced and observed using hacked up code as described in [1].
    
    Address the issue by having the routine restore_reserve_on_error take
    action on pages where HPageRestoreReserve is not set.  In this case, we
    need to remove any reserve map entry created by alloc_huge_page.  A new
    helper routine vma_del_reservation assists with this operation.
    
    There are three callers of alloc_huge_page which do not currently call
    restore_reserve_on error before freeing a page on error paths.  Add
    those missing calls.
    
    [1] https://lore.kernel.org/linux-mm/20210528005029.88088-1-almasrymina@google.com/
    
    Link: https://lkml.kernel.org/r/20210607204510.22617-1-mike.kravetz@oracle.com
    Fixes: 96b96a9 ("mm/hugetlb: fix huge page reservation leak in private mapping error paths"
    Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
    Reviewed-by: Mina Almasry <almasrymina@google.com>
    Cc: Axel Rasmussen <axelrasmussen@google.com>
    Cc: Peter Xu <peterx@redhat.com>
    Cc: Muchun Song <songmuchun@bytedance.com>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    mjkravetz authored and torvalds committed Jun 16, 2021
  15. mm/slub: actually fix freelist pointer vs redzoning

    It turns out that SLUB redzoning ("slub_debug=Z") checks from
    s->object_size rather than from s->inuse (which is normally bumped to
    make room for the freelist pointer), so a cache created with an object
    size less than 24 would have the freelist pointer written beyond
    s->object_size, causing the redzone to be corrupted by the freelist
    pointer.  This was very visible with "slub_debug=ZF":
    
      BUG test (Tainted: G    B            ): Right Redzone overwritten
      -----------------------------------------------------------------------------
    
      INFO: 0xffff957ead1c05de-0xffff957ead1c05df @offset=1502. First byte 0x1a instead of 0xbb
      INFO: Slab 0xffffef3950b47000 objects=170 used=170 fp=0x0000000000000000 flags=0x8000000000000200
      INFO: Object 0xffff957ead1c05d8 @offset=1496 fp=0xffff957ead1c0620
    
      Redzone  (____ptrval____): bb bb bb bb bb bb bb bb               ........
      Object   (____ptrval____): 00 00 00 00 00 f6 f4 a5               ........
      Redzone  (____ptrval____): 40 1d e8 1a aa                        @....
      Padding  (____ptrval____): 00 00 00 00 00 00 00 00               ........
    
    Adjust the offset to stay within s->object_size.
    
    (Note that no caches of in this size range are known to exist in the
    kernel currently.)
    
    Link: https://lkml.kernel.org/r/20210608183955.280836-4-keescook@chromium.org
    Link: https://lore.kernel.org/linux-mm/20200807160627.GA1420741@elver.google.com/
    Link: https://lore.kernel.org/lkml/0f7dd7b2-7496-5e2d-9488-2ec9f8e90441@suse.cz/Fixes: 89b83f2 (slub: avoid redzone when choosing freepointer location)
    Link: https://lore.kernel.org/lkml/CANpmjNOwZ5VpKQn+SYWovTkFB4VsT-RPwyENBmaK0dLcpqStkA@mail.gmail.com
    Signed-off-by: Kees Cook <keescook@chromium.org>
    Reported-by: Marco Elver <elver@google.com>
    Reported-by: "Lin, Zhenpeng" <zplin@psu.edu>
    Tested-by: Marco Elver <elver@google.com>
    Acked-by: Vlastimil Babka <vbabka@suse.cz>
    Cc: Christoph Lameter <cl@linux.com>
    Cc: David Rientjes <rientjes@google.com>
    Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
    Cc: Pekka Enberg <penberg@kernel.org>
    Cc: Roman Gushchin <guro@fb.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    kees authored and torvalds committed Jun 16, 2021
  16. mm/slub: fix redzoning for small allocations

    The redzone area for SLUB exists between s->object_size and s->inuse
    (which is at least the word-aligned object_size).  If a cache were
    created with an object_size smaller than sizeof(void *), the in-object
    stored freelist pointer would overwrite the redzone (e.g.  with boot
    param "slub_debug=ZF"):
    
      BUG test (Tainted: G    B            ): Right Redzone overwritten
      -----------------------------------------------------------------------------
    
      INFO: 0xffff957ead1c05de-0xffff957ead1c05df @offset=1502. First byte 0x1a instead of 0xbb
      INFO: Slab 0xffffef3950b47000 objects=170 used=170 fp=0x0000000000000000 flags=0x8000000000000200
      INFO: Object 0xffff957ead1c05d8 @offset=1496 fp=0xffff957ead1c0620
    
      Redzone  (____ptrval____): bb bb bb bb bb bb bb bb    ........
      Object   (____ptrval____): f6 f4 a5 40 1d e8          ...@..
      Redzone  (____ptrval____): 1a aa                      ..
      Padding  (____ptrval____): 00 00 00 00 00 00 00 00    ........
    
    Store the freelist pointer out of line when object_size is smaller than
    sizeof(void *) and redzoning is enabled.
    
    Additionally remove the "smaller than sizeof(void *)" check under
    CONFIG_DEBUG_VM in kmem_cache_sanity_check() as it is now redundant:
    SLAB and SLOB both handle small sizes.
    
    (Note that no caches within this size range are known to exist in the
    kernel currently.)
    
    Link: https://lkml.kernel.org/r/20210608183955.280836-3-keescook@chromium.org
    Fixes: 81819f0 ("SLUB core")
    Signed-off-by: Kees Cook <keescook@chromium.org>
    Acked-by: Vlastimil Babka <vbabka@suse.cz>
    Cc: Christoph Lameter <cl@linux.com>
    Cc: David Rientjes <rientjes@google.com>
    Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
    Cc: "Lin, Zhenpeng" <zplin@psu.edu>
    Cc: Marco Elver <elver@google.com>
    Cc: Pekka Enberg <penberg@kernel.org>
    Cc: Roman Gushchin <guro@fb.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    kees authored and torvalds committed Jun 16, 2021
  17. mm/slub: clarify verification reporting

    Patch series "Actually fix freelist pointer vs redzoning", v4.
    
    This fixes redzoning vs the freelist pointer (both for middle-position
    and very small caches).  Both are "theoretical" fixes, in that I see no
    evidence of such small-sized caches actually be used in the kernel, but
    that's no reason to let the bugs continue to exist, especially since
    people doing local development keep tripping over it.  :)
    
    This patch (of 3):
    
    Instead of repeating "Redzone" and "Poison", clarify which sides of
    those zones got tripped.  Additionally fix column alignment in the
    trailer.
    
    Before:
    
      BUG test (Tainted: G    B            ): Redzone overwritten
      ...
      Redzone (____ptrval____): bb bb bb bb bb bb bb bb      ........
      Object (____ptrval____): f6 f4 a5 40 1d e8            ...@..
      Redzone (____ptrval____): 1a aa                        ..
      Padding (____ptrval____): 00 00 00 00 00 00 00 00      ........
    
    After:
    
      BUG test (Tainted: G    B            ): Right Redzone overwritten
      ...
      Redzone  (____ptrval____): bb bb bb bb bb bb bb bb      ........
      Object   (____ptrval____): f6 f4 a5 40 1d e8            ...@..
      Redzone  (____ptrval____): 1a aa                        ..
      Padding  (____ptrval____): 00 00 00 00 00 00 00 00      ........
    
    The earlier commits that slowly resulted in the "Before" reporting were:
    
      d86bd1b ("mm/slub: support left redzone")
      ffc79d2 ("slub: use print_hex_dump")
      2492268 ("SLUB: change error reporting format to follow lockdep loosely")
    
    Link: https://lkml.kernel.org/r/20210608183955.280836-1-keescook@chromium.org
    Link: https://lkml.kernel.org/r/20210608183955.280836-2-keescook@chromium.org
    Link: https://lore.kernel.org/lkml/cfdb11d7-fb8e-e578-c939-f7f5fb69a6bd@suse.cz/
    Signed-off-by: Kees Cook <keescook@chromium.org>
    Acked-by: Vlastimil Babka <vbabka@suse.cz>
    Cc: Marco Elver <elver@google.com>
    Cc: "Lin, Zhenpeng" <zplin@psu.edu>
    Cc: Christoph Lameter <cl@linux.com>
    Cc: Pekka Enberg <penberg@kernel.org>
    Cc: David Rientjes <rientjes@google.com>
    Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
    Cc: Roman Gushchin <guro@fb.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    kees authored and torvalds committed Jun 16, 2021
  18. mm/swap: fix pte_same_as_swp() not removing uffd-wp bit when compare

    I found it by pure code review, that pte_same_as_swp() of unuse_vma()
    didn't take uffd-wp bit into account when comparing ptes.
    pte_same_as_swp() returning false negative could cause failure to
    swapoff swap ptes that was wr-protected by userfaultfd.
    
    Link: https://lkml.kernel.org/r/20210603180546.9083-1-peterx@redhat.com
    Fixes: f45ec5f ("userfaultfd: wp: support swap and page migration")
    Signed-off-by: Peter Xu <peterx@redhat.com>
    Acked-by: Hugh Dickins <hughd@google.com>
    Cc: Andrea Arcangeli <aarcange@redhat.com>
    Cc: <stable@vger.kernel.org>	[5.7+]
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    xzpeter authored and torvalds committed Jun 16, 2021
  19. mm,hwpoison: fix race with hugetlb page allocation

    When hugetlb page fault (under overcommitting situation) and
    memory_failure() race, VM_BUG_ON_PAGE() is triggered by the following
    race:
    
        CPU0:                           CPU1:
    
                                        gather_surplus_pages()
                                          page = alloc_surplus_huge_page()
        memory_failure_hugetlb()
          get_hwpoison_page(page)
            __get_hwpoison_page(page)
              get_page_unless_zero(page)
                                          zero = put_page_testzero(page)
                                          VM_BUG_ON_PAGE(!zero, page)
                                          enqueue_huge_page(h, page)
          put_page(page)
    
    __get_hwpoison_page() only checks the page refcount before taking an
    additional one for memory error handling, which is not enough because
    there's a time window where compound pages have non-zero refcount during
    hugetlb page initialization.
    
    So make __get_hwpoison_page() check page status a bit more for hugetlb
    pages with get_hwpoison_huge_page().  Checking hugetlb-specific flags
    under hugetlb_lock makes sure that the hugetlb page is not transitive.
    It's notable that another new function, HWPoisonHandlable(), is helpful
    to prevent a race against other transitive page states (like a generic
    compound page just before PageHuge becomes true).
    
    Link: https://lkml.kernel.org/r/20210603233632.2964832-2-nao.horiguchi@gmail.com
    Fixes: ead07f6 ("mm/memory-failure: introduce get_hwpoison_page() for consistent refcount handling")
    Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
    Reported-by: Muchun Song <songmuchun@bytedance.com>
    Acked-by: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: Oscar Salvador <osalvador@suse.de>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: Tony Luck <tony.luck@intel.com>
    Cc: <stable@vger.kernel.org>	[5.12+]
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    nhoriguchi authored and torvalds committed Jun 16, 2021
  20. Merge tag 'dmaengine-fix-5.13' of git://git.kernel.org/pub/scm/linux/…

    …kernel/git/vkoul/dmaengine
    
    Pull dmaengine fixes from Vinod Koul:
     "A bunch of driver fixes, notably:
    
       - More idxd fixes for driver unregister, error handling and bus
         assignment
    
       - HAS_IOMEM depends fix for few drivers
    
       - lock fix in pl330 driver
    
       - xilinx drivers fixes for initialize registers, missing dependencies
         and limiting descriptor IDs
    
       - mediatek descriptor management fixes"
    
    * tag 'dmaengine-fix-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine:
      dmaengine: mediatek: use GFP_NOWAIT instead of GFP_ATOMIC in prep_dma
      dmaengine: mediatek: do not issue a new desc if one is still current
      dmaengine: mediatek: free the proper desc in desc_free handler
      dmaengine: ipu: fix doc warning in ipu_irq.c
      dmaengine: rcar-dmac: Fix PM reference leak in rcar_dmac_probe()
      dmaengine: idxd: Fix missing error code in idxd_cdev_open()
      dmaengine: stedma40: add missing iounmap() on error in d40_probe()
      dmaengine: SF_PDMA depends on HAS_IOMEM
      dmaengine: QCOM_HIDMA_MGMT depends on HAS_IOMEM
      dmaengine: ALTERA_MSGDMA depends on HAS_IOMEM
      dmaengine: idxd: Add missing cleanup for early error out in probe call
      dmaengine: xilinx: dpdma: Limit descriptor IDs to 16 bits
      dmaengine: xilinx: dpdma: Add missing dependencies to Kconfig
      dmaengine: stm32-mdma: fix PM reference leak in stm32_mdma_alloc_chan_resourc()
      dmaengine: zynqmp_dma: Fix PM reference leak in zynqmp_dma_alloc_chan_resourc()
      dmaengine: xilinx: dpdma: initialize registers before request_irq
      dmaengine: pl330: fix wrong usage of spinlock flags in dma_cyclc
      dmaengine: fsl-dpaa2-qdma: Fix error return code in two functions
      dmaengine: idxd: add missing dsa driver unregister
      dmaengine: idxd: add engine 'struct device' missing bus type assignment
    torvalds committed Jun 16, 2021
  21. Merge tag 'clang-features-v5.13-rc7' of git://git.kernel.org/pub/scm/…

    …linux/kernel/git/kees/linux
    
    Pull clang LTO fix from Kees Cook:
     "It seems Clang has been scrubbing through the missing LTO IR flags for
      Clang 13, and the last of these 'only with LTO' flags is fixed now.
    
      I've asked that they please consider making these changes in a less
      'break all the Clang kernel builds' kind of way in the future. :P
    
      Summary:
    
       - The '-warn-stack-size' option under LTO has moved in Clang 13 (Tor
         Vic)"
    
    * tag 'clang-features-v5.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
      Makefile: lto: Pass -warn-stack-size only on LLD < 13.0.0
    torvalds committed Jun 16, 2021

Commits on Jun 15, 2021

  1. proc: only require mm_struct for writing

    Commit 591a22c ("proc: Track /proc/$pid/attr/ opener mm_struct") we
    started using __mem_open() to track the mm_struct at open-time, so that
    we could then check it for writes.
    
    But that also ended up making the permission checks at open time much
    stricter - and not just for writes, but for reads too.  And that in turn
    caused a regression for at least Fedora 29, where NIC interfaces fail to
    start when using NetworkManager.
    
    Since only the write side wanted the mm_struct test, ignore any failures
    by __mem_open() at open time, leaving reads unaffected.  The write()
    time verification of the mm_struct pointer will then catch the failure
    case because a NULL pointer will not match a valid 'current->mm'.
    
    Link: https://lore.kernel.org/netdev/YMjTlp2FSJYvoyFa@unreal/
    Fixes: 591a22c ("proc: Track /proc/$pid/attr/ opener mm_struct")
    Reported-and-tested-by: Leon Romanovsky <leon@kernel.org>
    Cc: Kees Cook <keescook@chromium.org>
    Cc: Christian Brauner <christian.brauner@ubuntu.com>
    Cc: Andrea Righi <andrea.righi@canonical.com>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    torvalds committed Jun 15, 2021
  2. afs: Fix an IS_ERR() vs NULL check

    The proc_symlink() function returns NULL on error, it doesn't return
    error pointers.
    
    Fixes: 5b86d4f ("afs: Implement network namespacing")
    Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
    Signed-off-by: David Howells <dhowells@redhat.com>
    cc: linux-afs@lists.infradead.org
    Link: https://lore.kernel.org/r/YLjMRKX40pTrJvgf@mwanda/
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    error27 authored and torvalds committed Jun 15, 2021
  3. quota: finish disable quotactl_path syscall

    In commit 5b9fedb ("quota: Disable quotactl_path syscall") Jan Kara
    disabled quotactl_path syscall on several architectures.
    
    This commit disables it on all architectures using unified list of
    system calls:
    
    - arm64
    - arc
    - csky
    - h8300
    - hexagon
    - nds32
    - nios2
    - openrisc
    - riscv (32/64)
    
    CC: Jan Kara <jack@suse.cz>
    CC: Christian Brauner <christian.brauner@ubuntu.com>
    CC: Sascha Hauer <s.hauer@pengutronix.de>
    Link: https://lore.kernel.org/lkml/20210512153621.n5u43jsytbik4yze@wittgenstein
    Link: https://lore.kernel.org/r/20210614153712.313707-1-marcin@juszkiewicz.com.pl
    Fixes: 5b9fedb ("quota: Disable quotactl_path syscall")
    Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
    Signed-off-by: Marcin Juszkiewicz <marcin@juszkiewicz.com.pl>
    Signed-off-by: Jan Kara <jack@suse.cz>
    hrw authored and jankara committed Jun 15, 2021

Commits on Jun 14, 2021

  1. Makefile: lto: Pass -warn-stack-size only on LLD < 13.0.0

    Since LLVM commit fc018eb, the '-warn-stack-size' flag has been dropped
    [1], leading to the following error message when building with Clang-13
    and LLD-13:
    
        ld.lld: error: -plugin-opt=-: ld.lld: Unknown command line argument
        '-warn-stack-size=2048'.  Try: 'ld.lld --help'
        ld.lld: Did you mean '--asan-stack=2048'?
    
    In the same way as with commit 2398ce8 ("x86, lto: Pass
    -stack-alignment only on LLD < 13.0.0") , make '-warn-stack-size'
    conditional on LLD < 13.0.0.
    
    [1] https://reviews.llvm.org/D103928
    
    Fixes: 24845dc ("Makefile: LTO: have linker check -Wframe-larger-than")
    Cc: stable@vger.kernel.org
    Link: ClangBuiltLinux#1377
    Signed-off-by: Tor Vic <torvic9@mailbox.org>
    Reviewed-by: Nathan Chancellor <nathan@kernel.org>
    Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
    Signed-off-by: Kees Cook <keescook@chromium.org>
    Link: https://lore.kernel.org/r/7631bab7-a8ab-f884-ab54-f4198976125c@mailbox.org
    torvic9 authored and kees committed Jun 14, 2021
  2. fanotify: fix copy_event_to_user() fid error clean up

    Ensure that clean up is performed on the allocated file descriptor and
    struct file object in the event that an error is encountered while copying
    fid info objects. Currently, we return directly to the caller when an error
    is experienced in the fid info copying helper, which isn't ideal given that
    the listener process could be left with a dangling file descriptor in their
    fdtable.
    
    Fixes: 5e469c8 ("fanotify: copy event fid info to user")
    Fixes: 44d705b ("fanotify: report name info for FAN_DIR_MODIFY event")
    Link: https://lore.kernel.org/linux-fsdevel/YMKv1U7tNPK955ho@google.com/T/#m15361cd6399dad4396aad650de25dbf6b312288e
    Link: https://lore.kernel.org/r/1ef8ae9100101eb1a91763c516c2e9a3a3b112bd.1623376346.git.repnop@google.com
    Signed-off-by: Matthew Bobrowski <repnop@google.com>
    Signed-off-by: Jan Kara <jack@suse.cz>
    matthewbobrowski authored and jankara committed Jun 14, 2021
Older