Skip to content

Commits

Permalink
Shiyang-Ruan/f…
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Commits on Jun 28, 2021

  1. fs/dax: Remove useless functions

    Since owner tracking is triggerred by pmem device, these functions are
    useless.  So remove them.
    
    Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
    irides authored and intel-lab-lkp committed Jun 28, 2021
    Copy the full SHA
    e1b9a2f View commit details
    Browse the repository at this point in the history
  2. md: Implement dax_holder_operations

    This is the case where the holder represents a mapped device, or a list
    of mapped devices more exactly(because it is possible to create more
    than one mapped device on one pmem device).
    
    Find out which mapped device the offset belongs to, and translate the
    offset from target device to mapped device.  When it is done, call
    dax_corrupted_range() for the holder of this mapped device.
    
    Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
    irides authored and intel-lab-lkp committed Jun 28, 2021
    Copy the full SHA
    4873895 View commit details
    Browse the repository at this point in the history
  3. dm: Introduce ->rmap() to find bdev offset

    Pmem device could be a target of mapped device.  In order to find out
    the global location on a mapped device, we introduce this to translate
    offset from target device to mapped device.
    
    Currently, we implement it on linear target, which is easy to do the
    translation.  Other targets will be supported in the future.  However,
    some targets may not support it because of the non-linear mapping.
    
    Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
    irides authored and intel-lab-lkp committed Jun 28, 2021
    Copy the full SHA
    f704b01 View commit details
    Browse the repository at this point in the history
  4. xfs: Implement ->corrupted_range() for XFS

    This function is used to handle errors which may cause data lost in
    filesystem.  Such as memory failure in fsdax mode.
    
    If the rmap feature of XFS enabled, we can query it to find files and
    metadata which are associated with the corrupt data.  For now all we do
    is kill processes with that file mapped into their address spaces, but
    future patches could actually do something about corrupt metadata.
    
    After that, the memory failure needs to notify the processes who are
    using those files.
    
    Only support data device.  Realtime device is not supported for now.
    
    Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
    irides authored and intel-lab-lkp committed Jun 28, 2021
    Copy the full SHA
    6833b94 View commit details
    Browse the repository at this point in the history
  5. mm: Introduce mf_dax_kill_procs() for fsdax case

    This function is called at the end of RMAP routine, i.e. filesystem
    recovery function.  The difference between mf_generic_kill_procs() is,
    mf_dax_kill_procs() accepts file mapping and offset instead of struct
    page.  It is because that different file mappings and offsets may share
    the same page in fsdax mode.  So, it is called when filesystem RMAP
    results are found.
    
    Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
    irides authored and intel-lab-lkp committed Jun 28, 2021
    Copy the full SHA
    dc92f6a View commit details
    Browse the repository at this point in the history
  6. pmem,mm: Implement ->memory_failure in pmem driver

    With dax_holder notify support, we are able to notify the memory failure
    from pmem driver to upper layers.  If there is something not support in
    the notify routine, memory_failure will fall back to the generic hanlder.
    
    Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
    irides authored and intel-lab-lkp committed Jun 28, 2021
    Copy the full SHA
    a9d137c View commit details
    Browse the repository at this point in the history
  7. mm: factor helpers for memory_failure_dev_pagemap

    memory_failure_dev_pagemap code is a bit complex before introduce RMAP
    feature for fsdax.  So it is needed to factor some helper functions to
    simplify these code.
    
    Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
    irides authored and intel-lab-lkp committed Jun 28, 2021
    Copy the full SHA
    9a11448 View commit details
    Browse the repository at this point in the history
  8. dax: Introduce holder for dax_device

    To easily track filesystem from a pmem device, we introduce a holder for
    dax_device structure, and also its operation.  This holder is used to
    remember who is using this dax_device:
     - When it is the backend of a filesystem, the holder will be the
       superblock of this filesystem.
     - When this pmem device is one of the targets in a mapped device, the
       holder will be this mapped device.  In this case, the mapped device
       has its own dax_device and it will follow the first rule.  So that we
       can finally track to the filesystem we needed.
    
    The holder and holder_ops will be set when filesystem is being mounted,
    or an target device is being activated.
    
    Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
    irides authored and intel-lab-lkp committed Jun 28, 2021
    Copy the full SHA
    8efff95 View commit details
    Browse the repository at this point in the history
  9. pagemap: Introduce ->memory_failure()

    When memory-failure occurs, we call this function which is implemented
    by each kind of devices.  For the fsdax case, pmem device driver
    implements it.  Pmem device driver will find out the filesystem in which
    the corrupted page located in.  And finally call filesystem handler to
    deal with this error.
    
    The filesystem will try to recover the corrupted data if necessary.
    
    Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
    irides authored and intel-lab-lkp committed Jun 28, 2021
    Copy the full SHA
    aa2662c View commit details
    Browse the repository at this point in the history

Commits on Jun 27, 2021

  1. Linux 5.13

    torvalds committed Jun 27, 2021
    Copy the full SHA
    62fb987 View commit details
    Browse the repository at this point in the history
  2. Revert "signal: Allow tasks to cache one sigqueue struct"

    This reverts commits 4bad58e (and
    399f8dd, which tried to fix it).
    
    I do not believe these are correct, and I'm about to release 5.13, so am
    reverting them out of an abundance of caution.
    
    The locking is odd, and appears broken.
    
    On the allocation side (in __sigqueue_alloc()), the locking is somewhat
    straightforward: it depends on sighand->siglock.  Since one caller
    doesn't hold that lock, it further then tests 'sigqueue_flags' to avoid
    the case with no locks held.
    
    On the freeing side (in sigqueue_cache_or_free()), there is no locking
    at all, and the logic instead depends on 'current' being a single
    thread, and not able to race with itself.
    
    To make things more exciting, there's also the data race between freeing
    a signal and allocating one, which is handled by using WRITE_ONCE() and
    READ_ONCE(), and being mutually exclusive wrt the initial state (ie
    freeing will only free if the old state was NULL, while allocating will
    obviously only use the value if it was non-NULL, so only one or the
    other will actually act on the value).
    
    However, while the free->alloc paths do seem mutually exclusive thanks
    to just the data value dependency, it's not clear what the memory
    ordering constraints are on it.  Could writes from the previous
    allocation possibly be delayed and seen by the new allocation later,
    causing logical inconsistencies?
    
    So it's all very exciting and unusual.
    
    And in particular, it seems that the freeing side is incorrect in
    depending on "current" being single-threaded.  Yes, 'current' is a
    single thread, but in the presense of asynchronous events even a single
    thread can have data races.
    
    And such asynchronous events can and do happen, with interrupts causing
    signals to be flushed and thus free'd (for example - sending a
    SIGCONT/SIGSTOP can happen from interrupt context, and can flush
    previously queued process control signals).
    
    So regardless of all the other questions about the memory ordering and
    locking for this new cached allocation, the sigqueue_cache_or_free()
    assumptions seem to be fundamentally incorrect.
    
    It may be that people will show me the errors of my ways, and tell me
    why this is all safe after all.  We can reinstate it if so.  But my
    current belief is that the WRITE_ONCE() that sets the cached entry needs
    to be a smp_store_release(), and the READ_ONCE() that finds a cached
    entry needs to be a smp_load_acquire() to handle memory ordering
    correctly.
    
    And the sequence in sigqueue_cache_or_free() would need to either use a
    lock or at least be interrupt-safe some way (perhaps by using something
    like the percpu 'cmpxchg': it doesn't need to be SMP-safe, but like the
    percpu operations it needs to be interrupt-safe).
    
    Fixes: 399f8dd ("signal: Prevent sigqueue caching after task got released")
    Fixes: 4bad58e ("signal: Allow tasks to cache one sigqueue struct")
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Oleg Nesterov <oleg@redhat.com>
    Cc: Christian Brauner <christian.brauner@ubuntu.com>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    torvalds committed Jun 27, 2021
    Copy the full SHA
    b4b27b9 View commit details
    Browse the repository at this point in the history

Commits on Jun 26, 2021

  1. Merge tag 's390-5.13-5' of git://git.kernel.org/pub/scm/linux/kernel/…

    …git/s390/linux
    
    Pull s390 fixes from Vasily Gorbik:
    
     - Fix a couple of late pt_regs flags handling findings of conversion to
       generic entry.
    
     - Fix potential register clobbering in stack switch helper.
    
     - Fix thread/group masks for offline cpus.
    
     - Fix cleanup of mdev resources when remove callback is invoked in
       vfio-ap code.
    
    * tag 's390-5.13-5' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
      s390/stack: fix possible register corruption with stack switch helper
      s390/topology: clear thread/group maps for offline cpus
      s390/vfio-ap: clean up mdev resources when remove callback invoked
      s390: clear pt_regs::flags on irq entry
      s390: fix system call restart with multiple signals
    torvalds committed Jun 26, 2021
    Copy the full SHA
    625acff View commit details
    Browse the repository at this point in the history
  2. Merge tag 'pinctrl-v5.13-3' of git://git.kernel.org/pub/scm/linux/ker…

    …nel/git/linusw/linux-pinctrl
    
    Pull pin control fixes from Linus Walleij:
     "Two last-minute fixes:
    
       - Put an fwnode in the errorpath in the SGPIO driver
    
       - Fix the number of GPIO lines per bank in the STM32 driver"
    
    * tag 'pinctrl-v5.13-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
      pinctrl: stm32: fix the reported number of GPIO lines per bank
      pinctrl: microchip-sgpio: Put fwnode in error case during ->probe()
    torvalds committed Jun 26, 2021
    Copy the full SHA
    b7050b2 View commit details
    Browse the repository at this point in the history

Commits on Jun 25, 2021

  1. Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/g…

    …it/jejb/scsi
    
    Pull SCSI fixes from James Bottomley:
     "Two small fixes, both in upper layer drivers (scsi disk and cdrom).
    
      The sd one is fixing a commit changing revalidation that came from the
      block tree a while ago (5.10) and the sr one adds handling of a
      condition we didn't previously handle for manually removed media"
    
    * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
      scsi: sd: Call sd_revalidate_disk() for ioctl(BLKRRPART)
      scsi: sr: Return appropriate error code when disk is ejected
    torvalds committed Jun 25, 2021
    Copy the full SHA
    e2f527b View commit details
    Browse the repository at this point in the history
  2. Merge branch 'akpm' (patches from Andrew)

    Merge misc fixes from Andrew Morton:
     "24 patches, based on 4a09d38.
    
      Subsystems affected by this patch series: mm (thp, vmalloc, hugetlb,
      memory-failure, and pagealloc), nilfs2, kthread, MAINTAINERS, and
      mailmap"
    
    * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (24 commits)
      mailmap: add Marek's other e-mail address and identity without diacritics
      MAINTAINERS: fix Marek's identity again
      mm/page_alloc: do bulk array bounds check after checking populated elements
      mm/page_alloc: __alloc_pages_bulk(): do bounds check before accessing array
      mm/hwpoison: do not lock page again when me_huge_page() successfully recovers
      mm,hwpoison: return -EHWPOISON to denote that the page has already been poisoned
      mm/memory-failure: use a mutex to avoid memory_failure() races
      mm, futex: fix shared futex pgoff on shmem huge page
      kthread: prevent deadlock when kthread_mod_delayed_work() races with kthread_cancel_delayed_work_sync()
      kthread_worker: split code for canceling the delayed work timer
      mm/vmalloc: unbreak kasan vmalloc support
      KVM: s390: prepare for hugepage vmalloc
      mm/vmalloc: add vmalloc_no_huge
      nilfs2: fix memory leak in nilfs_sysfs_delete_device_group
      mm/thp: another PVMW_SYNC fix in page_vma_mapped_walk()
      mm/thp: fix page_vma_mapped_walk() if THP mapped by ptes
      mm: page_vma_mapped_walk(): get vma_address_end() earlier
      mm: page_vma_mapped_walk(): use goto instead of while (1)
      mm: page_vma_mapped_walk(): add a level of indentation
      mm: page_vma_mapped_walk(): crossing page table boundary
      ...
    torvalds committed Jun 25, 2021
    Copy the full SHA
    7ce32ac View commit details
    Browse the repository at this point in the history
  3. userfaultfd: uapi: fix UFFDIO_CONTINUE ioctl request definition

    This ioctl request reads from uffdio_continue structure written by
    userspace which justifies _IOC_WRITE flag.  It also writes back to that
    structure which justifies _IOC_READ flag.
    
    See NOTEs in include/uapi/asm-generic/ioctl.h for more information.
    
    Fixes: f619147 ("userfaultfd: add UFFDIO_CONTINUE ioctl")
    Signed-off-by: Gleb Fotengauer-Malinovskiy <glebfm@altlinux.org>
    Acked-by: Peter Xu <peterx@redhat.com>
    Reviewed-by: Axel Rasmussen <axelrasmussen@google.com>
    Reviewed-by: Dmitry V. Levin <ldv@altlinux.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    glebfm authored and torvalds committed Jun 25, 2021
    Copy the full SHA
    808e9df View commit details
    Browse the repository at this point in the history
  4. Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/…

    …kernel/git/wsa/linux
    
    Pull i2c fixes from Wolfram Sang:
     "Three more driver bugfixes and an annotation fix for the core"
    
    * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
      i2c: robotfuzz-osif: fix control-request directions
      i2c: dev: Add __user annotation
      i2c: cp2615: check for allocation failure in cp2615_i2c_recv()
      i2c: i801: Ensure that SMBHSTSTS_INUSE_STS is cleared when leaving i801_access
    torvalds committed Jun 25, 2021
    Copy the full SHA
    55fcd44 View commit details
    Browse the repository at this point in the history
  5. Merge tag 'devprop-5.13-rc8' of git://git.kernel.org/pub/scm/linux/ke…

    …rnel/git/rafael/linux-pm
    
    Pull device properties framework fix from Rafael Wysocki:
     "Fix a NULL pointer dereference introduced by a recent commit and
      occurring when device_remove_software_node() is used with a device
      that has never been registered (Heikki Krogerus)"
    
    * tag 'devprop-5.13-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
      software node: Handle software node injection to an existing device properly
    torvalds committed Jun 25, 2021
    Copy the full SHA
    7764c62 View commit details
    Browse the repository at this point in the history
  6. Merge tag 'for-linus-5.13b-rc8-tag' of git://git.kernel.org/pub/scm/l…

    …inux/kernel/git/xen/tip
    
    Pull xen fix from Juergen Gross:
     "A fix for a regression introduced in 5.12: when migrating an irq
      related to a Xen user event to another cpu, a race might result
      in a WARN() triggering"
    
    * tag 'for-linus-5.13b-rc8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
      xen/events: reset active flag for lateeoi events later
    torvalds committed Jun 25, 2021
    Copy the full SHA
    b960e01 View commit details
    Browse the repository at this point in the history
  7. Merge tag 'for-linus-urgent' of git://git.kernel.org/pub/scm/virt/kvm…

    …/kvm
    
    Pull kvm fixes from Paolo Bonzini:
     "A selftests fix for ARM, and the fix for page reference count
      underflow. This is a very small fix that was provided by Nick Piggin
      and tested by myself"
    
    * tag 'for-linus-urgent' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
      KVM: do not allow mapping valid but non-reference-counted pages
      KVM: selftests: Fix mapping length truncation in m{,un}map()
    torvalds committed Jun 25, 2021
    Copy the full SHA
    616a99d View commit details
    Browse the repository at this point in the history
  8. Merge tag 'x86_urgent_for_v5.13' of git://git.kernel.org/pub/scm/linu…

    …x/kernel/git/tip/tip
    
    Pull x86 fixes from Borislav Petkov:
     "Two more urgent FPU fixes:
    
       - prevent unprivileged userspace from reinitializing supervisor
         states
    
       - prepare init_fpstate, which is the buffer used when initializing
         FPU state, properly in case the skip-writing-state-components
         XSAVE* variants are used"
    
    * tag 'x86_urgent_for_v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
      x86/fpu: Make init_fpstate correct with optimized XSAVE
      x86/fpu: Preserve supervisor states in sanitize_restored_user_xstate()
    torvalds committed Jun 25, 2021
    Copy the full SHA
    94ca94b View commit details
    Browse the repository at this point in the history
  9. Merge tag 'ceph-for-5.13-rc8' of https://github.com/ceph/ceph-client

    Pull ceph fixes from Ilya Dryomov:
     "Two regression fixes from the merge window: one in the auth code
      affecting old clusters and one in the filesystem for proper
      propagation of MDS request errors.
    
      Also included a locking fix for async creates, marked for stable"
    
    * tag 'ceph-for-5.13-rc8' of https://github.com/ceph/ceph-client:
      libceph: set global_id as soon as we get an auth ticket
      libceph: don't pass result into ac->ops->handle_reply()
      ceph: fix error handling in ceph_atomic_open and ceph_lookup
      ceph: must hold snap_rwsem when filling inode for async create
    torvalds committed Jun 25, 2021
    Copy the full SHA
    edf54d9 View commit details
    Browse the repository at this point in the history
  10. Merge tag 'netfs-fixes-20210621' of git://git.kernel.org/pub/scm/linu…

    …x/kernel/git/dhowells/linux-fs
    
    Pull netfs fixes from David Howells:
     "This contains patches to fix netfs_write_begin() and afs_write_end()
      in the following ways:
    
      (1) In netfs_write_begin(), extract the decision about whether to skip
          a page out to its own helper and have that clear around the region
          to be written, but not clear that region. This requires the
          filesystem to patch it up afterwards if the hole doesn't get
          completely filled.
    
      (2) Use offset_in_thp() in (1) rather than manually calculating the
          offset into the page.
    
      (3) Due to (1), afs_write_end() now needs to handle short data write
          into the page by generic_perform_write(). I've adopted an
          analogous approach to ceph of just returning 0 in this case and
          letting the caller go round again.
    
      It also adds a note that (in the future) the len parameter may extend
      beyond the page allocated. This is because the page allocation is
      deferred to write_begin() and that gets to decide what size of THP to
      allocate."
    
    Jeff Layton points out:
     "The netfs fix in particular fixes a data corruption bug in cephfs"
    
    * tag 'netfs-fixes-20210621' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
      netfs: fix test for whether we can skip read when writing beyond EOF
      afs: Fix afs_write_end() to handle short writes
    torvalds committed Jun 25, 2021
    Copy the full SHA
    9e736cf View commit details
    Browse the repository at this point in the history
  11. Merge tag 'gpio-fixes-for-v5.13' of git://git.kernel.org/pub/scm/linu…

    …x/kernel/git/brgl/linux
    
    Pull gpio fixes from Bartosz Golaszewski:
    
     - fix wake-up interrupt support on gpio-mxc
    
     - zero the padding bytes in a structure passed to user-space in the
       GPIO character device
    
     - require HAS_IOPORT_MAP in two drivers that need it to fix a Kbuild
       issue
    
    * tag 'gpio-fixes-for-v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
      gpio: AMD8111 and TQMX86 require HAS_IOPORT_MAP
      gpiolib: cdev: zero padding during conversion to gpioline_info_changed
      gpio: mxc: Fix disabled interrupt wake-up support
    torvalds committed Jun 25, 2021
    Copy the full SHA
    c13e302 View commit details
    Browse the repository at this point in the history
  12. Merge tag 'sound-5.13-rc8' of git://git.kernel.org/pub/scm/linux/kern…

    …el/git/tiwai/sound
    
    Pull sound fixes from Takashi Iwai:
     "Two small changes have been cherry-picked as a last material for 5.13:
      a coverage after UMN revert action and a stale MAINTAINERS entry fix"
    
    * tag 'sound-5.13-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
      MAINTAINERS: remove Timur Tabi from Freescale SOC sound drivers
      ASoC: rt5645: Avoid upgrading static warnings to errors
    torvalds committed Jun 25, 2021
    Copy the full SHA
    e41fc7c View commit details
    Browse the repository at this point in the history
  13. gpio: AMD8111 and TQMX86 require HAS_IOPORT_MAP

    Both of these drivers use ioport_map(), so they need to
    depend on HAS_IOPORT_MAP. Otherwise, they cannot be built
    even with COMPILE_TEST on architectures without an ioport
    implementation, such as ARCH=um.
    
    Reported-by: kernel test robot <lkp@intel.com>
    Signed-off-by: Johannes Berg <johannes.berg@intel.com>
    Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
    jmberg-intel authored and brgl committed Jun 25, 2021
    Copy the full SHA
    c6414e1 View commit details
    Browse the repository at this point in the history
  14. mailmap: add Marek's other e-mail address and identity without diacri…

    …tics
    
    Some of my commits were sent with identities
      Marek Behun <marek.behun@nic.cz>
      Marek Behún <marek.behun@nic.cz>
    while the correct one is
      Marek Behún <kabel@kernel.org>
    
    Put this into mailmap so that git shortlog prints all my commits under
    one identity.
    
    Link: https://lkml.kernel.org/r/20210616113624.19351-2-kabel@kernel.org
    Signed-off-by: Marek Behún <kabel@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    elkablo authored and torvalds committed Jun 25, 2021
    Copy the full SHA
    72a461a View commit details
    Browse the repository at this point in the history
  15. MAINTAINERS: fix Marek's identity again

    Fix my name to use diacritics, since MAINTAINERS supports it.
    
    Fix my e-mail address in MAINTAINERS' marvell10g PHY driver description,
    I accidentally put my other e-mail address here.
    
    Link: https://lkml.kernel.org/r/20210616113624.19351-1-kabel@kernel.org
    Signed-off-by: Marek Behún <kabel@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    elkablo authored and torvalds committed Jun 25, 2021
    Copy the full SHA
    ee924d3 View commit details
    Browse the repository at this point in the history
  16. mm/page_alloc: do bulk array bounds check after checking populated el…

    …ements
    
    Dan Carpenter reported the following
    
      The patch 0f87d9d: "mm/page_alloc: add an array-based interface
      to the bulk page allocator" from Apr 29, 2021, leads to the following
      static checker warning:
    
            mm/page_alloc.c:5338 __alloc_pages_bulk()
            warn: potentially one past the end of array 'page_array[nr_populated]'
    
    The problem can occur if an array is passed in that is fully populated.
    That potentially ends up allocating a single page and storing it past
    the end of the array.  This patch returns 0 if the array is fully
    populated.
    
    Link: https://lkml.kernel.org/r/20210618125102.GU30378@techsingularity.net
    Fixes: 0f87d9d ("mm/page_alloc: add an array-based interface to the bulk page allocator")
    Signed-off-by: Mel Gorman <mgorman@techsinguliarity.net>
    Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
    Cc: Jesper Dangaard Brouer <brouer@redhat.com>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    gormanm authored and torvalds committed Jun 25, 2021
    Copy the full SHA
    b3b64eb View commit details
    Browse the repository at this point in the history
  17. mm/page_alloc: __alloc_pages_bulk(): do bounds check before accessing…

    … array
    
    In the event that somebody would call this with an already fully
    populated page_array, the last loop iteration would do an access beyond
    the end of page_array.
    
    It's of course extremely unlikely that would ever be done, but this
    triggers my internal static analyzer.  Also, if it really is not
    supposed to be invoked this way (i.e., with no NULL entries in
    page_array), the nr_populated<nr_pages check could simply be removed
    instead.
    
    Link: https://lkml.kernel.org/r/20210507064504.1712559-1-linux@rasmusvillemoes.dk
    Fixes: 0f87d9d ("mm/page_alloc: add an array-based interface to the bulk page allocator")
    Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
    Acked-by: Mel Gorman <mgorman@techsingularity.net>
    Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Villemoes authored and torvalds committed Jun 25, 2021
    Copy the full SHA
    b08e50d View commit details
    Browse the repository at this point in the history
  18. mm/hwpoison: do not lock page again when me_huge_page() successfully …

    …recovers
    
    Currently me_huge_page() temporary unlocks page to perform some actions
    then locks it again later.  My testcase (which calls hard-offline on
    some tail page in a hugetlb, then accesses the address of the hugetlb
    range) showed that page allocation code detects this page lock on buddy
    page and printed out "BUG: Bad page state" message.
    
    check_new_page_bad() does not consider a page with __PG_HWPOISON as bad
    page, so this flag works as kind of filter, but this filtering doesn't
    work in this case because the "bad page" is not the actual hwpoisoned
    page.  So stop locking page again.  Actions to be taken depend on the
    page type of the error, so page unlocking should be done in ->action()
    callbacks.  So let's make it assumed and change all existing callbacks
    that way.
    
    Link: https://lkml.kernel.org/r/20210609072029.74645-1-nao.horiguchi@gmail.com
    Fixes: commit 78bb920 ("mm: hwpoison: dissolve in-use hugepage in unrecoverable memory error")
    Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
    Cc: Oscar Salvador <osalvador@suse.de>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: Tony Luck <tony.luck@intel.com>
    Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    nhoriguchi authored and torvalds committed Jun 25, 2021
    Copy the full SHA
    ea6d063 View commit details
    Browse the repository at this point in the history
  19. mm,hwpoison: return -EHWPOISON to denote that the page has already be…

    …en poisoned
    
    When memory_failure() is called with MF_ACTION_REQUIRED on the page that
    has already been hwpoisoned, memory_failure() could fail to send SIGBUS
    to the affected process, which results in infinite loop of MCEs.
    
    Currently memory_failure() returns 0 if it's called for already
    hwpoisoned page, then the caller, kill_me_maybe(), could return without
    sending SIGBUS to current process.  An action required MCE is raised
    when the current process accesses to the broken memory, so no SIGBUS
    means that the current process continues to run and access to the error
    page again soon, so running into MCE loop.
    
    This issue can arise for example in the following scenarios:
    
     - Two or more threads access to the poisoned page concurrently. If
       local MCE is enabled, MCE handler independently handles the MCE
       events. So there's a race among MCE events, and the second or latter
       threads fall into the situation in question.
    
     - If there was a precedent memory error event and memory_failure() for
       the event failed to unmap the error page for some reason, the
       subsequent memory access to the error page triggers the MCE loop
       situation.
    
    To fix the issue, make memory_failure() return an error code when the
    error page has already been hwpoisoned.  This allows memory error
    handler to control how it sends signals to userspace.  And make sure
    that any process touching a hwpoisoned page should get a SIGBUS even in
    "already hwpoisoned" path of memory_failure() as is done in page fault
    path.
    
    Link: https://lkml.kernel.org/r/20210521030156.2612074-3-nao.horiguchi@gmail.com
    Signed-off-by: Aili Yao <yaoaili@kingsoft.com>
    Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
    Reviewed-by: Oscar Salvador <osalvador@suse.de>
    Cc: Andy Lutomirski <luto@kernel.org>
    Cc: Borislav Petkov <bp@alien8.de>
    Cc: Borislav Petkov <bp@suse.de>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: Jue Wang <juew@google.com>
    Cc: Tony Luck <tony.luck@intel.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Aili Yao authored and torvalds committed Jun 25, 2021
    Copy the full SHA
    47af12b View commit details
    Browse the repository at this point in the history
  20. mm/memory-failure: use a mutex to avoid memory_failure() races

    Patch series "mm,hwpoison: fix sending SIGBUS for Action Required MCE", v5.
    
    I wrote this patchset to materialize what I think is the current
    allowable solution mentioned by the previous discussion [1].  I simply
    borrowed Tony's mutex patch and Aili's return code patch, then I queued
    another one to find error virtual address in the best effort manner.  I
    know that this is not a perfect solution, but should work for some
    typical case.
    
    [1]: https://lore.kernel.org/linux-mm/20210331192540.2141052f@alex-virtual-machine/
    
    This patch (of 2):
    
    There can be races when multiple CPUs consume poison from the same page.
    The first into memory_failure() atomically sets the HWPoison page flag
    and begins hunting for tasks that map this page.  Eventually it
    invalidates those mappings and may send a SIGBUS to the affected tasks.
    
    But while all that work is going on, other CPUs see a "success" return
    code from memory_failure() and so they believe the error has been
    handled and continue executing.
    
    Fix by wrapping most of the internal parts of memory_failure() in a
    mutex.
    
    [akpm@linux-foundation.org: make mf_mutex local to memory_failure()]
    
    Link: https://lkml.kernel.org/r/20210521030156.2612074-1-nao.horiguchi@gmail.com
    Link: https://lkml.kernel.org/r/20210521030156.2612074-2-nao.horiguchi@gmail.com
    Signed-off-by: Tony Luck <tony.luck@intel.com>
    Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
    Reviewed-by: Borislav Petkov <bp@suse.de>
    Reviewed-by: Oscar Salvador <osalvador@suse.de>
    Cc: Aili Yao <yaoaili@kingsoft.com>
    Cc: Andy Lutomirski <luto@kernel.org>
    Cc: Borislav Petkov <bp@alien8.de>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: Jue Wang <juew@google.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    aegl authored and torvalds committed Jun 25, 2021
    Copy the full SHA
    171936d View commit details
    Browse the repository at this point in the history
  21. mm, futex: fix shared futex pgoff on shmem huge page

    If more than one futex is placed on a shmem huge page, it can happen
    that waking the second wakes the first instead, and leaves the second
    waiting: the key's shared.pgoff is wrong.
    
    When 3.11 commit 13d60f4 ("futex: Take hugepages into account when
    generating futex_key"), the only shared huge pages came from hugetlbfs,
    and the code added to deal with its exceptional page->index was put into
    hugetlb source.  Then that was missed when 4.8 added shmem huge pages.
    
    page_to_pgoff() is what others use for this nowadays: except that, as
    currently written, it gives the right answer on hugetlbfs head, but
    nonsense on hugetlbfs tails.  Fix that by calling hugetlbfs-specific
    hugetlb_basepage_index() on PageHuge tails as well as on head.
    
    Yes, it's unconventional to declare hugetlb_basepage_index() there in
    pagemap.h, rather than in hugetlb.h; but I do not expect anything but
    page_to_pgoff() ever to need it.
    
    [akpm@linux-foundation.org: give hugetlb_basepage_index() prototype the correct scope]
    
    Link: https://lkml.kernel.org/r/b17d946b-d09-326e-b42a-52884c36df32@google.com
    Fixes: 800d8c6 ("shmem: add huge pages support")
    Reported-by: Neel Natu <neelnatu@google.com>
    Signed-off-by: Hugh Dickins <hughd@google.com>
    Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Acked-by: Thomas Gleixner <tglx@linutronix.de>
    Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
    Cc: Zhang Yi <wetpzy@gmail.com>
    Cc: Mel Gorman <mgorman@techsingularity.net>
    Cc: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Darren Hart <dvhart@infradead.org>
    Cc: Davidlohr Bueso <dave@stgolabs.net>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Hugh Dickins authored and torvalds committed Jun 25, 2021
    Copy the full SHA
    fe19bd3 View commit details
    Browse the repository at this point in the history
  22. kthread: prevent deadlock when kthread_mod_delayed_work() races with …

    …kthread_cancel_delayed_work_sync()
    
    The system might hang with the following backtrace:
    
    	schedule+0x80/0x100
    	schedule_timeout+0x48/0x138
    	wait_for_common+0xa4/0x134
    	wait_for_completion+0x1c/0x2c
    	kthread_flush_work+0x114/0x1cc
    	kthread_cancel_work_sync.llvm.16514401384283632983+0xe8/0x144
    	kthread_cancel_delayed_work_sync+0x18/0x2c
    	xxxx_pm_notify+0xb0/0xd8
    	blocking_notifier_call_chain_robust+0x80/0x194
    	pm_notifier_call_chain_robust+0x28/0x4c
    	suspend_prepare+0x40/0x260
    	enter_state+0x80/0x3f4
    	pm_suspend+0x60/0xdc
    	state_store+0x108/0x144
    	kobj_attr_store+0x38/0x88
    	sysfs_kf_write+0x64/0xc0
    	kernfs_fop_write_iter+0x108/0x1d0
    	vfs_write+0x2f4/0x368
    	ksys_write+0x7c/0xec
    
    It is caused by the following race between kthread_mod_delayed_work()
    and kthread_cancel_delayed_work_sync():
    
    CPU0				CPU1
    
    Context: Thread A		Context: Thread B
    
    kthread_mod_delayed_work()
      spin_lock()
      __kthread_cancel_work()
         spin_unlock()
         del_timer_sync()
    				kthread_cancel_delayed_work_sync()
    				  spin_lock()
    				  __kthread_cancel_work()
    				    spin_unlock()
    				    del_timer_sync()
    				    spin_lock()
    
    				  work->canceling++
    				  spin_unlock
         spin_lock()
       queue_delayed_work()
         // dwork is put into the worker->delayed_work_list
    
       spin_unlock()
    
    				  kthread_flush_work()
         // flush_work is put at the tail of the dwork
    
    				    wait_for_completion()
    
    Context: IRQ
    
      kthread_delayed_work_timer_fn()
        spin_lock()
        list_del_init(&work->node);
        spin_unlock()
    
    BANG: flush_work is not longer linked and will never get proceed.
    
    The problem is that kthread_mod_delayed_work() checks work->canceling
    flag before canceling the timer.
    
    A simple solution is to (re)check work->canceling after
    __kthread_cancel_work().  But then it is not clear what should be
    returned when __kthread_cancel_work() removed the work from the queue
    (list) and it can't queue it again with the new @delay.
    
    The return value might be used for reference counting.  The caller has
    to know whether a new work has been queued or an existing one was
    replaced.
    
    The proper solution is that kthread_mod_delayed_work() will remove the
    work from the queue (list) _only_ when work->canceling is not set.  The
    flag must be checked after the timer is stopped and the remaining
    operations can be done under worker->lock.
    
    Note that kthread_mod_delayed_work() could remove the timer and then
    bail out.  It is fine.  The other canceling caller needs to cancel the
    timer as well.  The important thing is that the queue (list)
    manipulation is done atomically under worker->lock.
    
    Link: https://lkml.kernel.org/r/20210610133051.15337-3-pmladek@suse.com
    Fixes: 9a6b06c ("kthread: allow to modify delayed kthread work")
    Signed-off-by: Petr Mladek <pmladek@suse.com>
    Reported-by: Martin Liu <liumartin@google.com>
    Cc: <jenhaochen@google.com>
    Cc: Minchan Kim <minchan@google.com>
    Cc: Nathan Chancellor <nathan@kernel.org>
    Cc: Nick Desaulniers <ndesaulniers@google.com>
    Cc: Oleg Nesterov <oleg@redhat.com>
    Cc: Tejun Heo <tj@kernel.org>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    pmladek authored and torvalds committed Jun 25, 2021
    Copy the full SHA
    5fa5434 View commit details
    Browse the repository at this point in the history
Older