Skip to content
Permalink
Tony-Huang/Add…
Switch branches/tags

Commits on Nov 17, 2021

  1. misc: Add iop driver for Sunplus SP7021

    Add iop driver for Sunplus SP7021
    
    Signed-off-by: Tony Huang <tony.huang@sunplus.com>
    Tony Huang authored and intel-lab-lkp committed Nov 17, 2021
  2. dt-binding: misc: Add iop yaml file for Sunplus SP7021

    Add iop yaml file for Sunplus SP7021
    
    Signed-off-by: Tony Huang <tony.huang@sunplus.com>
    Tony Huang authored and intel-lab-lkp committed Nov 17, 2021

Commits on Nov 6, 2021

  1. Merge tag '5.16-rc-part1-smb3-client-fixes' of git://git.samba.org/sf…

    …rench/cifs-2.6
    
    Pull cifs updates from Steve French:
    
     - reconnect fix for stable
    
     - minor mount option fix
    
     - debugging improvement for (TCP) connection issues
    
     - refactoring of common code to help ksmbd
    
    * tag '5.16-rc-part1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
      smb3: add dynamic trace points for socket connection
      cifs: Move SMB2_Create definitions to the shared area
      cifs: Move more definitions into the shared area
      cifs: move NEGOTIATE_PROTOCOL definitions out into the common area
      cifs: Create a new shared file holding smb2 pdu definitions
      cifs: add mount parameter tcpnodelay
      cifs: To match file servers, make sure the server hostname matches
    torvalds committed Nov 6, 2021
  2. Merge tag 'fsnotify_for_v5.16-rc1' of git://git.kernel.org/pub/scm/li…

    …nux/kernel/git/jack/linux-fs
    
    Pull fsnotify updates from Jan Kara:
     "Support for reporting filesystem errors through fanotify so that
      system health monitoring daemons can watch for these and act instead
      of scraping system logs"
    
    * tag 'fsnotify_for_v5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: (34 commits)
      samples: remove duplicate include in fs-monitor.c
      samples: Fix warning in fsnotify sample
      docs: Fix formatting of literal sections in fanotify docs
      samples: Make fs-monitor depend on libc and headers
      docs: Document the FAN_FS_ERROR event
      samples: Add fs error monitoring example
      ext4: Send notifications on error
      fanotify: Allow users to request FAN_FS_ERROR events
      fanotify: Emit generic error info for error event
      fanotify: Report fid info for file related file system errors
      fanotify: WARN_ON against too large file handles
      fanotify: Add helpers to decide whether to report FID/DFID
      fanotify: Wrap object_fh inline space in a creator macro
      fanotify: Support merging of error events
      fanotify: Support enqueueing of error events
      fanotify: Pre-allocate pool of error events
      fanotify: Reserve UAPI bits for FAN_FS_ERROR
      fsnotify: Support FS_ERROR event type
      fanotify: Require fid_mode for any non-fd event
      fanotify: Encode empty file handle when no inode is provided
      ...
    torvalds committed Nov 6, 2021
  3. Merge tag 'fs_for_v5.16-rc1' of git://git.kernel.org/pub/scm/linux/ke…

    …rnel/git/jack/linux-fs
    
    Pull quota, isofs, and reiserfs updates from Jan Kara:
     "Fixes for handling of corrupted quota files, fix for handling of
      corrupted isofs filesystem, and a small cleanup for reiserfs"
    
    * tag 'fs_for_v5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
      fs: reiserfs: remove useless new_opts in reiserfs_remount
      isofs: Fix out of bound access for corrupted isofs image
      quota: correct error number in free_dqentry()
      quota: check block number when reading the block in quota file
    torvalds committed Nov 6, 2021
  4. Merge tag 'xtensa-20211105' of git://github.com/jcmvbkbc/linux-xtensa

    Pull xtensa updates from Max Filippov:
    
     - add support for xtensa cores without windowed registers option
    
    * tag 'xtensa-20211105' of git://github.com/jcmvbkbc/linux-xtensa:
      xtensa: move section symbols to asm/sections.h
      xtensa: remove unused variable wmask
      xtensa: only build windowed register support code when needed
      xtensa: use register window specific opcodes only when present
      xtensa: implement call0 ABI support in assembly
      xtensa: definitions for call0 ABI
      xtensa: don't use a12 in __xtensa_copy_user in call0 ABI
      xtensa: don't use a12 in strncpy_user
      xtensa: use a14 instead of a15 in inline assembly
      xtensa: move _SimulateUserKernelVectorException out of WindowVectors
    torvalds committed Nov 6, 2021
  5. Merge tag 's390-5.16-1' of git://git.kernel.org/pub/scm/linux/kernel/…

    …git/s390/linux
    
    Pull s390 updates from Vasily Gorbik:
    
     - Add support for ftrace with direct call and ftrace direct call
       samples.
    
     - Add support for kernel command lines longer than current 896 bytes
       and make its length configurable.
    
     - Add support for BEAR enhancement facility to improve last breaking
       event instruction tracking.
    
     - Add kprobes sanity checks and testcases to prevent kprobe in the mid
       of an instruction.
    
     - Allow concurrent access to /dev/hwc for the CPUMF users.
    
     - Various ftrace / jump label improvements.
    
     - Convert unwinder tests to KUnit.
    
     - Add s390_iommu_aperture kernel parameter to tweak the limits on
       concurrently usable DMA mappings.
    
     - Add ap.useirq AP module option which can be used to disable interrupt
       use.
    
     - Add add_disk() error handling support to block device drivers.
    
     - Drop arch specific and use generic implementation of strlcpy and
       strrchr.
    
     - Several __pa/__va usages fixes.
    
     - Various cio, crypto, pci, kernel doc and other small fixes and
       improvements all over the code.
    
    [ Merge fixup as per https://lore.kernel.org/all/YXAqZ%2FEszRisunQw@osiris/ ]
    
    * tag 's390-5.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (63 commits)
      s390: make command line configurable
      s390: support command lines longer than 896 bytes
      s390/kexec_file: move kernel image size check
      s390/pci: add s390_iommu_aperture kernel parameter
      s390/spinlock: remove incorrect kernel doc indicator
      s390/string: use generic strlcpy
      s390/string: use generic strrchr
      s390/ap: function rework based on compiler warning
      s390/cio: make ccw_device_dma_* more robust
      s390/vfio-ap: s390/crypto: fix all kernel-doc warnings
      s390/hmcdrv: fix kernel doc comments
      s390/ap: new module option ap.useirq
      s390/cpumf: Allow multiple processes to access /dev/hwc
      s390/bitops: return true/false (not 1/0) from bool functions
      s390: add support for BEAR enhancement facility
      s390: introduce nospec_uses_trampoline()
      s390: rename last_break to pgm_last_break
      s390/ptrace: add last_break member to pt_regs
      s390/sclp: sort out physical vs virtual pointers usage
      s390/setup: convert start and end initrd pointers to virtual
      ...
    torvalds committed Nov 6, 2021
  6. Merge tag 'pci-v5.16-changes' of git://git.kernel.org/pub/scm/linux/k…

    …ernel/git/helgaas/pci
    
    Pull pci updates from Bjorn Helgaas:
     "Enumeration:
       - Conserve IRQs by setting up portdrv IRQs only when there are users
         (Jan Kiszka)
       - Rework and simplify _OSC negotiation for control of PCIe features
         (Joerg Roedel)
       - Remove struct pci_dev.driver pointer since it's redundant with the
         struct device.driver pointer (Uwe Kleine-König)
    
      Resource management:
       - Coalesce contiguous host bridge apertures from _CRS to accommodate
         BARs that cover more than one aperture (Kai-Heng Feng)
    
      Sysfs:
       - Check CAP_SYS_ADMIN before parsing user input (Krzysztof
         Wilczyński)
       - Return -EINVAL consistently from "store" functions (Krzysztof
         Wilczyński)
       - Use sysfs_emit() in endpoint "show" functions to avoid buffer
         overruns (Kunihiko Hayashi)
    
      PCIe native device hotplug:
       - Ignore Link Down/Up caused by resets during error recovery so
         endpoint drivers can remain bound to the device (Lukas Wunner)
    
      Virtualization:
       - Avoid bus resets on Atheros QCA6174, where they hang the device
         (Ingmar Klein)
       - Work around Pericom PI7C9X2G switch packet drop erratum by using
         store and forward mode instead of cut-through (Nathan Rossi)
       - Avoid trying to enable AtomicOps on VFs; the PF setting applies to
         all VFs (Selvin Xavier)
    
      MSI:
       - Document that /sys/bus/pci/devices/.../irq contains the legacy INTx
         interrupt or the IRQ of the first MSI (not MSI-X) vector (Barry
         Song)
    
      VPD:
       - Add pci_read_vpd_any() and pci_write_vpd_any() to access anywhere
         in the possible VPD space; use these to simplify the cxgb3 driver
         (Heiner Kallweit)
    
      Peer-to-peer DMA:
       - Add (not subtract) the bus offset when calculating DMA address
         (Wang Lu)
    
      ASPM:
       - Re-enable LTR at Downstream Ports so they don't report Unsupported
         Requests when reset or hot-added devices send LTR messages
         (Mingchuang Qiao)
    
      Apple PCIe controller driver:
       - Add driver for Apple M1 PCIe controller (Alyssa Rosenzweig, Marc
         Zyngier)
    
      Cadence PCIe controller driver:
       - Return success when probe succeeds instead of falling into error
         path (Li Chen)
    
      HiSilicon Kirin PCIe controller driver:
       - Reorganize PHY logic and add support for external PHY drivers
         (Mauro Carvalho Chehab)
       - Support PERST# GPIOs for HiKey970 external PEX 8606 bridge (Mauro
         Carvalho Chehab)
       - Add Kirin 970 support (Mauro Carvalho Chehab)
       - Make driver removable (Mauro Carvalho Chehab)
    
      Intel VMD host bridge driver:
       - If IOMMU supports interrupt remapping, leave VMD MSI-X remapping
         enabled (Adrian Huang)
       - Number each controller so we can tell them apart in
         /proc/interrupts (Chunguang Xu)
       - Avoid building on UML because VMD depends on x86 bare metal APIs
         (Johannes Berg)
    
      Marvell Aardvark PCIe controller driver:
       - Define macros for PCI_EXP_DEVCTL_PAYLOAD_* (Pali Rohár)
       - Set Max Payload Size to 512 bytes per Marvell spec (Pali Rohár)
       - Downgrade PIO Response Status messages to debug level (Marek Behún)
       - Preserve CRS SV (Config Request Retry Software Visibility) bit in
         emulated Root Control register (Pali Rohár)
       - Fix issue in configuring reference clock (Pali Rohár)
       - Don't clear status bits for masked interrupts (Pali Rohár)
       - Don't mask unused interrupts (Pali Rohár)
       - Avoid code repetition in advk_pcie_rd_conf() (Marek Behún)
       - Retry config accesses on CRS response (Pali Rohár)
       - Simplify emulated Root Capabilities initialization (Pali Rohár)
       - Fix several link training issues (Pali Rohár)
       - Fix link-up checking via LTSSM (Pali Rohár)
       - Fix reporting of Data Link Layer Link Active (Pali Rohár)
       - Fix emulation of W1C bits (Marek Behún)
       - Fix MSI domain .alloc() method to return zero on success (Marek
         Behún)
       - Read entire 16-bit MSI vector in MSI handler, not just low 8 bits
         (Marek Behún)
       - Clear Root Port I/O Space, Memory Space, and Bus Master Enable bits
         at startup; PCI core will set those as necessary (Pali Rohár)
       - When operating as a Root Port, set class code to "PCI Bridge"
         instead of the default "Mass Storage Controller" (Pali Rohár)
       - Add emulation for PCI_BRIDGE_CTL_BUS_RESET since aardvark doesn't
         implement this per spec (Pali Rohár)
       - Add emulation of option ROM BAR since aardvark doesn't implement
         this per spec (Pali Rohár)
    
      MediaTek MT7621 PCIe controller driver:
       - Add MediaTek MT7621 PCIe host controller driver and DT binding
         (Sergio Paracuellos)
    
      Qualcomm PCIe controller driver:
       - Add SC8180x compatible string (Bjorn Andersson)
       - Add endpoint controller driver and DT binding (Manivannan
         Sadhasivam)
       - Restructure to use of_device_get_match_data() (Prasad Malisetty)
       - Add SC7280-specific pcie_1_pipe_clk_src handling (Prasad Malisetty)
    
      Renesas R-Car PCIe controller driver:
       - Remove unnecessary includes (Geert Uytterhoeven)
    
      Rockchip DesignWare PCIe controller driver:
       - Add DT binding (Simon Xue)
    
      Socionext UniPhier Pro5 controller driver:
       - Serialize INTx masking/unmasking (Kunihiko Hayashi)
    
      Synopsys DesignWare PCIe controller driver:
       - Run dwc .host_init() method before registering MSI interrupt
         handler so we can deal with pending interrupts left by bootloader
         (Bjorn Andersson)
       - Clean up Kconfig dependencies (Andy Shevchenko)
       - Export symbols to allow more modular drivers (Luca Ceresoli)
    
      TI DRA7xx PCIe controller driver:
       - Allow host and endpoint drivers to be modules (Luca Ceresoli)
       - Enable external clock if present (Luca Ceresoli)
    
      TI J721E PCIe driver:
       - Disable PHY when probe fails after initializing it (Christophe
         JAILLET)
    
      MicroSemi Switchtec management driver:
       - Return error to application when command execution fails because an
         out-of-band reset has cleared the device BARs, Memory Space Enable,
         etc (Kelvin Cao)
       - Fix MRPC error status handling issue (Kelvin Cao)
       - Mask out other bits when reading of management VEP instance ID
         (Kelvin Cao)
       - Return EOPNOTSUPP instead of ENOTSUPP from sysfs show functions
         (Kelvin Cao)
       - Add check of event support (Logan Gunthorpe)
    
      Miscellaneous:
       - Remove unused pci_pool wrappers, which have been replaced by
         dma_pool (Cai Huoqing)
       - Use 'unsigned int' instead of bare 'unsigned' (Krzysztof
         Wilczyński)
       - Use kstrtobool() directly, sans strtobool() wrapper (Krzysztof
         Wilczyński)
       - Fix some sscanf(), sprintf() format mismatches (Krzysztof
         Wilczyński)
       - Update PCI subsystem information in MAINTAINERS (Krzysztof
         Wilczyński)
       - Correct some misspellings (Krzysztof Wilczyński)"
    
    * tag 'pci-v5.16-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (137 commits)
      PCI: Add ACS quirk for Pericom PI7C9X2G switches
      PCI: apple: Configure RID to SID mapper on device addition
      iommu/dart: Exclude MSI doorbell from PCIe device IOVA range
      PCI: apple: Implement MSI support
      PCI: apple: Add INTx and per-port interrupt support
      PCI: kirin: Allow removing the driver
      PCI: kirin: De-init the dwc driver
      PCI: kirin: Disable clkreq during poweroff sequence
      PCI: kirin: Move the power-off code to a common routine
      PCI: kirin: Add power_off support for Kirin 960 PHY
      PCI: kirin: Allow building it as a module
      PCI: kirin: Add MODULE_* macros
      PCI: kirin: Add Kirin 970 compatible
      PCI: kirin: Support PERST# GPIOs for HiKey970 external PEX 8606 bridge
      PCI: apple: Set up reference clocks when probing
      PCI: apple: Add initial hardware bring-up
      PCI: of: Allow matching of an interrupt-map local to a PCI device
      of/irq: Allow matching of an interrupt-map local to an interrupt controller
      irqdomain: Make of_phandle_args_to_fwspec() generally available
      PCI: Do not enable AtomicOps on VFs
      ...
    torvalds committed Nov 6, 2021
  7. Merge branch 'akpm' (patches from Andrew)

    Merge misc updates from Andrew Morton:
     "257 patches.
    
      Subsystems affected by this patch series: scripts, ocfs2, vfs, and
      mm (slab-generic, slab, slub, kconfig, dax, kasan, debug, pagecache,
      gup, swap, memcg, pagemap, mprotect, mremap, iomap, tracing, vmalloc,
      pagealloc, memory-failure, hugetlb, userfaultfd, vmscan, tools,
      memblock, oom-kill, hugetlbfs, migration, thp, readahead, nommu, ksm,
      vmstat, madvise, memory-hotplug, rmap, zsmalloc, highmem, zram,
      cleanups, kfence, and damon)"
    
    * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (257 commits)
      mm/damon: remove return value from before_terminate callback
      mm/damon: fix a few spelling mistakes in comments and a pr_debug message
      mm/damon: simplify stop mechanism
      Docs/admin-guide/mm/pagemap: wordsmith page flags descriptions
      Docs/admin-guide/mm/damon/start: simplify the content
      Docs/admin-guide/mm/damon/start: fix a wrong link
      Docs/admin-guide/mm/damon/start: fix wrong example commands
      mm/damon/dbgfs: add adaptive_targets list check before enable monitor_on
      mm/damon: remove unnecessary variable initialization
      Documentation/admin-guide/mm/damon: add a document for DAMON_RECLAIM
      mm/damon: introduce DAMON-based Reclamation (DAMON_RECLAIM)
      selftests/damon: support watermarks
      mm/damon/dbgfs: support watermarks
      mm/damon/schemes: activate schemes based on a watermarks mechanism
      tools/selftests/damon: update for regions prioritization of schemes
      mm/damon/dbgfs: support prioritization weights
      mm/damon/vaddr,paddr: support pageout prioritization
      mm/damon/schemes: prioritize regions within the quotas
      mm/damon/selftests: support schemes quotas
      mm/damon/dbgfs: support quotas of schemes
      ...
    torvalds committed Nov 6, 2021
  8. mm/damon: remove return value from before_terminate callback

    Since the return value of 'before_terminate' callback is never used, we
    make it have no return value.
    
    Link: https://lkml.kernel.org/r/20211029005023.8895-1-changbin.du@gmail.com
    Signed-off-by: Changbin Du <changbin.du@gmail.com>
    Reviewed-by: SeongJae Park <sj@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    changbindu authored and torvalds committed Nov 6, 2021
  9. mm/damon: fix a few spelling mistakes in comments and a pr_debug message

    There are a few spelling mistakes in the code.  Fix these.
    
    Link: https://lkml.kernel.org/r/20211028184157.614544-1-colin.i.king@gmail.com
    Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
    Reviewed-by: SeongJae Park <sj@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Colin Ian King authored and torvalds committed Nov 6, 2021
  10. mm/damon: simplify stop mechanism

    A kernel thread can exit gracefully with kthread_stop().  So we don't
    need a new flag 'kdamond_stop'.  And to make sure the task struct is not
    freed when accessing it, get reference to it before termination.
    
    Link: https://lkml.kernel.org/r/20211027130517.4404-1-changbin.du@gmail.com
    Signed-off-by: Changbin Du <changbin.du@gmail.com>
    Reviewed-by: SeongJae Park <sj@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    changbindu authored and torvalds committed Nov 6, 2021
  11. Docs/admin-guide/mm/pagemap: wordsmith page flags descriptions

    Some descriptions of page flags in 'pagemap.rst' are written in
    assumption of none-rst, which respects every new line, as below:
    
        7 - SLAB
           page is managed by the SLAB/SLOB/SLUB/SLQB kernel memory allocator
           When compound page is used, SLUB/SLQB will only set this flag on the head
    
    Because rst ignores the new line between the first sentence and second
    sentence, resulting html looks a little bit weird, as below.
    
        7 - SLAB
        page is managed by the SLAB/SLOB/SLUB/SLQB kernel memory allocator When
                                                                           ^
        compound page is used, SLUB/SLQB will only set this flag on the head
        page; SLOB will not flag it at all.
    
    This change makes it more natural and consistent with other parts in the
    rendered version.
    
    Link: https://lkml.kernel.org/r/20211022090311.3856-5-sj@kernel.org
    Signed-off-by: SeongJae Park <sj@kernel.org>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Peter Xu <peterx@redhat.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    SeongJae Park authored and torvalds committed Nov 6, 2021
  12. Docs/admin-guide/mm/damon/start: simplify the content

    Information in 'TL; DR' section of 'Getting Started' is duplicated in
    other parts of the doc.  It is also asking readers to visit the access
    pattern visualizations gallery web site to show the results of example
    visualization commands, while the users of the commands can use terminal
    output.
    
    To make the doc simple, this removes the duplicated 'TL; DR' section and
    replaces the visualization example commands with versions using terminal
    outputs.
    
    Link: https://lkml.kernel.org/r/20211022090311.3856-4-sj@kernel.org
    Signed-off-by: SeongJae Park <sj@kernel.org>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Peter Xu <peterx@redhat.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    SeongJae Park authored and torvalds committed Nov 6, 2021
  13. Docs/admin-guide/mm/damon/start: fix a wrong link

    The 'Getting Started' of DAMON is providing a link to DAMON's user
    interface document while saying about its user space tool's detailed
    usages.  This fixes the link.
    
    Link: https://lkml.kernel.org/r/20211022090311.3856-3-sj@kernel.org
    Signed-off-by: SeongJae Park <sj@kernel.org>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Peter Xu <peterx@redhat.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    SeongJae Park authored and torvalds committed Nov 6, 2021
  14. Docs/admin-guide/mm/damon/start: fix wrong example commands

    Patch series "Fix trivial nits in Documentation/admin-guide/mm".
    
    This patchset fixes trivial nits in admin guide documents for DAMON and
    pagemap.
    
    This patch (of 4):
    
    Some of the example commands in DAMON getting started guide are
    outdated, missing sudo, or just wrong.  This fixes those.
    
    Link: https://lkml.kernel.org/r/20211022090311.3856-2-sj@kernel.org
    Signed-off-by: SeongJae Park <sj@kernel.org>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Peter Xu <peterx@redhat.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    SeongJae Park authored and torvalds committed Nov 6, 2021
  15. mm/damon/dbgfs: add adaptive_targets list check before enable monitor_on

    When the ctx->adaptive_targets list is empty, I did some test on
    monitor_on interface like this.
    
        # cat /sys/kernel/debug/damon/target_ids
        #
        # echo on > /sys/kernel/debug/damon/monitor_on
        # damon: kdamond (5390) starts
    
    Though the ctx->adaptive_targets list is empty, but the kthread_run
    still be called, and the kdamond.x thread still be created, this is
    meaningless.
    
    So there adds a judgment in 'dbgfs_monitor_on_write', if the
    ctx->adaptive_targets list is empty, return -EINVAL.
    
    Link: https://lkml.kernel.org/r/0a60a6e8ec9d71989e0848a4dc3311996ca3b5d4.1634720326.git.xhao@linux.alibaba.com
    Signed-off-by: Xin Hao <xhao@linux.alibaba.com>
    Reviewed-by: SeongJae Park <sj@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Xin Hao authored and torvalds committed Nov 6, 2021
  16. mm/damon: remove unnecessary variable initialization

    Patch series "mm/damon: Fix some small bugs", v4.
    
    This patch (of 2):
    
    In 'damon_va_apply_three_regions' there is no need to set variable 'i'
    to zero.
    
    Link: https://lkml.kernel.org/r/b7df8d3dad0943a37e01f60c441b1968b2b20354.1634720326.git.xhao@linux.alibaba.com
    Link: https://lkml.kernel.org/r/cover.1634720326.git.xhao@linux.alibaba.com
    Signed-off-by: Xin Hao <xhao@linux.alibaba.com>
    Reviewed-by: SeongJae Park <sj@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Xin Hao authored and torvalds committed Nov 6, 2021
  17. Documentation/admin-guide/mm/damon: add a document for DAMON_RECLAIM

    This adds an admin-guide document for DAMON-based Reclamation.
    
    Link: https://lkml.kernel.org/r/20211019150731.16699-16-sj@kernel.org
    Signed-off-by: SeongJae Park <sj@kernel.org>
    Cc: Amit Shah <amit@kernel.org>
    Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: David Rientjes <rientjes@google.com>
    Cc: David Woodhouse <dwmw@amazon.com>
    Cc: Greg Thelen <gthelen@google.com>
    Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Leonard Foerster <foersleo@amazon.de>
    Cc: Marco Elver <elver@google.com>
    Cc: Markus Boehme <markubo@amazon.de>
    Cc: Shakeel Butt <shakeelb@google.com>
    Cc: Shuah Khan <shuah@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    SeongJae Park authored and torvalds committed Nov 6, 2021
  18. mm/damon: introduce DAMON-based Reclamation (DAMON_RECLAIM)

    This implements a new kernel subsystem that finds cold memory regions
    using DAMON and reclaims those immediately.  It is intended to be used
    as proactive lightweigh reclamation logic for light memory pressure.
    For heavy memory pressure, it could be inactivated and fall back to the
    traditional page-scanning based reclamation.
    
    It's implemented on top of DAMON framework to use the DAMON-based
    Operation Schemes (DAMOS) feature.  It utilizes all the DAMOS features
    including speed limit, prioritization, and watermarks.
    
    It could be enabled and tuned in boot time via the kernel boot
    parameter, and in run time via its module parameters
    ('/sys/module/damon_reclaim/parameters/') interface.
    
    [yangyingliang@huawei.com: fix error return code in damon_reclaim_turn()]
      Link: https://lkml.kernel.org/r/20211025124500.2758060-1-yangyingliang@huawei.com
    
    Link: https://lkml.kernel.org/r/20211019150731.16699-15-sj@kernel.org
    Signed-off-by: SeongJae Park <sj@kernel.org>
    Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
    Cc: Amit Shah <amit@kernel.org>
    Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: David Rientjes <rientjes@google.com>
    Cc: David Woodhouse <dwmw@amazon.com>
    Cc: Greg Thelen <gthelen@google.com>
    Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Leonard Foerster <foersleo@amazon.de>
    Cc: Marco Elver <elver@google.com>
    Cc: Markus Boehme <markubo@amazon.de>
    Cc: Shakeel Butt <shakeelb@google.com>
    Cc: Shuah Khan <shuah@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    SeongJae Park authored and torvalds committed Nov 6, 2021
  19. selftests/damon: support watermarks

    This updates DAMON selftests for 'schemes' debugfs file to reflect the
    changes in the format.
    
    Link: https://lkml.kernel.org/r/20211019150731.16699-14-sj@kernel.org
    Signed-off-by: SeongJae Park <sj@kernel.org>
    Cc: Amit Shah <amit@kernel.org>
    Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: David Rientjes <rientjes@google.com>
    Cc: David Woodhouse <dwmw@amazon.com>
    Cc: Greg Thelen <gthelen@google.com>
    Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Leonard Foerster <foersleo@amazon.de>
    Cc: Marco Elver <elver@google.com>
    Cc: Markus Boehme <markubo@amazon.de>
    Cc: Shakeel Butt <shakeelb@google.com>
    Cc: Shuah Khan <shuah@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    SeongJae Park authored and torvalds committed Nov 6, 2021
  20. mm/damon/dbgfs: support watermarks

    This updates DAMON debugfs interface to support the watermarks based
    schemes activation.  For this, now 'schemes' file receives five more
    values.
    
    Link: https://lkml.kernel.org/r/20211019150731.16699-13-sj@kernel.org
    Signed-off-by: SeongJae Park <sj@kernel.org>
    Cc: Amit Shah <amit@kernel.org>
    Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: David Rientjes <rientjes@google.com>
    Cc: David Woodhouse <dwmw@amazon.com>
    Cc: Greg Thelen <gthelen@google.com>
    Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Leonard Foerster <foersleo@amazon.de>
    Cc: Marco Elver <elver@google.com>
    Cc: Markus Boehme <markubo@amazon.de>
    Cc: Shakeel Butt <shakeelb@google.com>
    Cc: Shuah Khan <shuah@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    SeongJae Park authored and torvalds committed Nov 6, 2021
  21. mm/damon/schemes: activate schemes based on a watermarks mechanism

    DAMON-based operation schemes need to be manually turned on and off.  In
    some use cases, however, the condition for turning a scheme on and off
    would depend on the system's situation.  For example, schemes for
    proactive pages reclamation would need to be turned on when some memory
    pressure is detected, and turned off when the system has enough free
    memory.
    
    For easier control of schemes activation based on the system situation,
    this introduces a watermarks-based mechanism.  The client can describe
    the watermark metric (e.g., amount of free memory in the system),
    watermark check interval, and three watermarks, namely high, mid, and
    low.  If the scheme is deactivated, it only gets the metric and compare
    that to the three watermarks for every check interval.  If the metric is
    higher than the high watermark, the scheme is deactivated.  If the
    metric is between the mid watermark and the low watermark, the scheme is
    activated.  If the metric is lower than the low watermark, the scheme is
    deactivated again.  This is to allow users fall back to traditional
    page-granularity mechanisms.
    
    Link: https://lkml.kernel.org/r/20211019150731.16699-12-sj@kernel.org
    Signed-off-by: SeongJae Park <sj@kernel.org>
    Cc: Amit Shah <amit@kernel.org>
    Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: David Rientjes <rientjes@google.com>
    Cc: David Woodhouse <dwmw@amazon.com>
    Cc: Greg Thelen <gthelen@google.com>
    Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Leonard Foerster <foersleo@amazon.de>
    Cc: Marco Elver <elver@google.com>
    Cc: Markus Boehme <markubo@amazon.de>
    Cc: Shakeel Butt <shakeelb@google.com>
    Cc: Shuah Khan <shuah@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    SeongJae Park authored and torvalds committed Nov 6, 2021
  22. tools/selftests/damon: update for regions prioritization of schemes

    This updates the DAMON selftests for 'schemes' debugfs file, as the file
    format is updated.
    
    Link: https://lkml.kernel.org/r/20211019150731.16699-11-sj@kernel.org
    Signed-off-by: SeongJae Park <sj@kernel.org>
    Cc: Amit Shah <amit@kernel.org>
    Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: David Rientjes <rientjes@google.com>
    Cc: David Woodhouse <dwmw@amazon.com>
    Cc: Greg Thelen <gthelen@google.com>
    Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Leonard Foerster <foersleo@amazon.de>
    Cc: Marco Elver <elver@google.com>
    Cc: Markus Boehme <markubo@amazon.de>
    Cc: Shakeel Butt <shakeelb@google.com>
    Cc: Shuah Khan <shuah@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    SeongJae Park authored and torvalds committed Nov 6, 2021
  23. mm/damon/dbgfs: support prioritization weights

    This allows DAMON debugfs interface users set the prioritization weights
    by putting three more numbers to the 'schemes' file.
    
    Link: https://lkml.kernel.org/r/20211019150731.16699-10-sj@kernel.org
    Signed-off-by: SeongJae Park <sj@kernel.org>
    Cc: Amit Shah <amit@kernel.org>
    Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: David Rientjes <rientjes@google.com>
    Cc: David Woodhouse <dwmw@amazon.com>
    Cc: Greg Thelen <gthelen@google.com>
    Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Leonard Foerster <foersleo@amazon.de>
    Cc: Marco Elver <elver@google.com>
    Cc: Markus Boehme <markubo@amazon.de>
    Cc: Shakeel Butt <shakeelb@google.com>
    Cc: Shuah Khan <shuah@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    SeongJae Park authored and torvalds committed Nov 6, 2021
  24. mm/damon/vaddr,paddr: support pageout prioritization

    This makes the default monitoring primitives for virtual address spaces
    and the physical address sapce to support memory regions prioritization
    for 'PAGEOUT' DAMOS action.  It calculates hotness of each region as
    weighted sum of 'nr_accesses' and 'age' of the region and get the
    priority score as reverse of the hotness, so that cold regions can be
    paged out first.
    
    Link: https://lkml.kernel.org/r/20211019150731.16699-9-sj@kernel.org
    Signed-off-by: SeongJae Park <sj@kernel.org>
    Cc: Amit Shah <amit@kernel.org>
    Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: David Rientjes <rientjes@google.com>
    Cc: David Woodhouse <dwmw@amazon.com>
    Cc: Greg Thelen <gthelen@google.com>
    Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Leonard Foerster <foersleo@amazon.de>
    Cc: Marco Elver <elver@google.com>
    Cc: Markus Boehme <markubo@amazon.de>
    Cc: Shakeel Butt <shakeelb@google.com>
    Cc: Shuah Khan <shuah@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    SeongJae Park authored and torvalds committed Nov 6, 2021
  25. mm/damon/schemes: prioritize regions within the quotas

    This makes DAMON apply schemes to regions having higher priority first,
    if it cannot apply schemes to all regions due to the quotas.
    
    The prioritization function should be implemented in the monitoring
    primitives.  Those would commonly calculate the priority of the region
    using attributes of regions, namely 'size', 'nr_accesses', and 'age'.
    For example, some primitive would calculate the priority of each region
    using a weighted sum of 'nr_accesses' and 'age' of the region.
    
    The optimal weights would depend on give environments, so this makes
    those customizable.  Nevertheless, the score calculation functions are
    only encouraged to respect the weights, not mandated.
    
    Link: https://lkml.kernel.org/r/20211019150731.16699-8-sj@kernel.org
    Signed-off-by: SeongJae Park <sj@kernel.org>
    Cc: Amit Shah <amit@kernel.org>
    Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: David Rientjes <rientjes@google.com>
    Cc: David Woodhouse <dwmw@amazon.com>
    Cc: Greg Thelen <gthelen@google.com>
    Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Leonard Foerster <foersleo@amazon.de>
    Cc: Marco Elver <elver@google.com>
    Cc: Markus Boehme <markubo@amazon.de>
    Cc: Shakeel Butt <shakeelb@google.com>
    Cc: Shuah Khan <shuah@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    SeongJae Park authored and torvalds committed Nov 6, 2021
  26. mm/damon/selftests: support schemes quotas

    This updates DAMON selftests to support updated schemes debugfs file
    format for the quotas.
    
    Link: https://lkml.kernel.org/r/20211019150731.16699-7-sj@kernel.org
    Signed-off-by: SeongJae Park <sj@kernel.org>
    Cc: Amit Shah <amit@kernel.org>
    Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: David Rientjes <rientjes@google.com>
    Cc: David Woodhouse <dwmw@amazon.com>
    Cc: Greg Thelen <gthelen@google.com>
    Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Leonard Foerster <foersleo@amazon.de>
    Cc: Marco Elver <elver@google.com>
    Cc: Markus Boehme <markubo@amazon.de>
    Cc: Shakeel Butt <shakeelb@google.com>
    Cc: Shuah Khan <shuah@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    SeongJae Park authored and torvalds committed Nov 6, 2021
  27. mm/damon/dbgfs: support quotas of schemes

    This makes the debugfs interface of DAMON support the scheme quotas by
    chaning the format of the input for the schemes file.
    
    Link: https://lkml.kernel.org/r/20211019150731.16699-6-sj@kernel.org
    Signed-off-by: SeongJae Park <sj@kernel.org>
    Cc: Amit Shah <amit@kernel.org>
    Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: David Rientjes <rientjes@google.com>
    Cc: David Woodhouse <dwmw@amazon.com>
    Cc: Greg Thelen <gthelen@google.com>
    Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Leonard Foerster <foersleo@amazon.de>
    Cc: Marco Elver <elver@google.com>
    Cc: Markus Boehme <markubo@amazon.de>
    Cc: Shakeel Butt <shakeelb@google.com>
    Cc: Shuah Khan <shuah@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    SeongJae Park authored and torvalds committed Nov 6, 2021
  28. mm/damon/schemes: implement time quota

    The size quota feature of DAMOS is useful for IO resource-critical
    systems, but not so intuitive for CPU time-critical systems.  Systems
    using zram or zswap-like swap device would be examples.
    
    To provide another intuitive ways for such systems, this implements
    time-based quota for DAMON-based Operation Schemes.  If the quota is
    set, DAMOS tries to use only up to the user-defined quota of CPU time
    within a given time window.
    
    Link: https://lkml.kernel.org/r/20211019150731.16699-5-sj@kernel.org
    Signed-off-by: SeongJae Park <sj@kernel.org>
    Cc: Amit Shah <amit@kernel.org>
    Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: David Rientjes <rientjes@google.com>
    Cc: David Woodhouse <dwmw@amazon.com>
    Cc: Greg Thelen <gthelen@google.com>
    Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Leonard Foerster <foersleo@amazon.de>
    Cc: Marco Elver <elver@google.com>
    Cc: Markus Boehme <markubo@amazon.de>
    Cc: Shakeel Butt <shakeelb@google.com>
    Cc: Shuah Khan <shuah@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    SeongJae Park authored and torvalds committed Nov 6, 2021
  29. mm/damon/schemes: skip already charged targets and regions

    If DAMOS has stopped applying action in the middle of a group of memory
    regions due to its size quota, it starts the work again from the
    beginning of the address space in the next charge window.  If there is a
    huge memory region at the beginning of the address space and it fulfills
    the scheme's target data access pattern always, the action will applied
    to only the region.
    
    This mitigates the case by skipping memory regions that charged in
    current charge window at the beginning of next charge window.
    
    Link: https://lkml.kernel.org/r/20211019150731.16699-4-sj@kernel.org
    Signed-off-by: SeongJae Park <sj@kernel.org>
    Cc: Amit Shah <amit@kernel.org>
    Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: David Rientjes <rientjes@google.com>
    Cc: David Woodhouse <dwmw@amazon.com>
    Cc: Greg Thelen <gthelen@google.com>
    Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Leonard Foerster <foersleo@amazon.de>
    Cc: Marco Elver <elver@google.com>
    Cc: Markus Boehme <markubo@amazon.de>
    Cc: Shakeel Butt <shakeelb@google.com>
    Cc: Shuah Khan <shuah@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    SeongJae Park authored and torvalds committed Nov 6, 2021
  30. mm/damon/schemes: implement size quota for schemes application speed …

    …control
    
    There could be arbitrarily large memory regions fulfilling the target
    data access pattern of a DAMON-based operation scheme.  In the case,
    applying the action of the scheme could incur too high overhead.  To
    provide an intuitive way for avoiding it, this implements a feature
    called size quota.  If the quota is set, DAMON tries to apply the action
    only up to the given amount of memory regions within a given time
    window.
    
    Link: https://lkml.kernel.org/r/20211019150731.16699-3-sj@kernel.org
    Signed-off-by: SeongJae Park <sj@kernel.org>
    Cc: Amit Shah <amit@kernel.org>
    Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: David Rientjes <rientjes@google.com>
    Cc: David Woodhouse <dwmw@amazon.com>
    Cc: Greg Thelen <gthelen@google.com>
    Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Leonard Foerster <foersleo@amazon.de>
    Cc: Marco Elver <elver@google.com>
    Cc: Markus Boehme <markubo@amazon.de>
    Cc: Shakeel Butt <shakeelb@google.com>
    Cc: Shuah Khan <shuah@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    SeongJae Park authored and torvalds committed Nov 6, 2021
  31. mm/damon/paddr: support the pageout scheme

    Introduction
    ============
    
    This patchset 1) makes the engine for general data access
    pattern-oriented memory management (DAMOS) be more useful for production
    environments, and 2) implements a static kernel module for lightweight
    proactive reclamation using the engine.
    
    Proactive Reclamation
    ---------------------
    
    On general memory over-committed systems, proactively reclaiming cold
    pages helps saving memory and reducing latency spikes that incurred by
    the direct reclaim or the CPU consumption of kswapd, while incurring
    only minimal performance degradation[2].
    
    A Free Pages Reporting[8] based memory over-commit virtualization system
    would be one more specific use case.  In the system, the guest VMs
    reports their free memory to host, and the host reallocates the reported
    memory to other guests.  As a result, the system's memory utilization
    can be maximized.  However, the guests could be not so memory-frugal,
    because some kernel subsystems and user-space applications are designed
    to use as much memory as available.  Then, guests would report only
    small amount of free memory to host, results in poor memory utilization.
    Running the proactive reclamation in such guests could help mitigating
    this problem.
    
    Google has also implemented this idea and using it in their data center.
    They further proposed upstreaming it in LSFMM'19, and "the general
    consensus was that, while this sort of proactive reclaim would be useful
    for a number of users, the cost of this particular solution was too high
    to consider merging it upstream"[3].  The cost mainly comes from the
    coldness tracking.  Roughly speaking, the implementation periodically
    scans the 'Accessed' bit of each page.  For the reason, the overhead
    linearly increases as the size of the memory and the scanning frequency
    grows.  As a result, Google is known to dedicating one CPU for the work.
    That's a reasonable option to someone like Google, but it wouldn't be so
    to some others.
    
    DAMON and DAMOS: An engine for data access pattern-oriented memory management
    -----------------------------------------------------------------------------
    
    DAMON[4] is a framework for general data access monitoring.  Its
    adaptive monitoring overhead control feature minimizes its monitoring
    overhead.  It also let the upper-bound of the overhead be configurable
    by clients, regardless of the size of the monitoring target memory.
    While monitoring 70 GiB memory of a production system every 5
    milliseconds, it consumes less than 1% single CPU time.  For this, it
    could sacrify some of the quality of the monitoring results.
    Nevertheless, the lower-bound of the quality is configurable, and it
    uses a best-effort algorithm for better quality.  Our test results[5]
    show the quality is practical enough.  From the production system
    monitoring, we were able to find a 4 KiB region in the 70 GiB memory
    that shows highest access frequency.
    
    We normally don't monitor the data access pattern just for fun but to
    improve something like memory management.  Proactive reclamation is one
    such usage.  For such general cases, DAMON provides a feature called
    DAMon-based Operation Schemes (DAMOS)[6].  It makes DAMON an engine for
    general data access pattern oriented memory management.  Using this,
    clients can ask DAMON to find memory regions of specific data access
    pattern and apply some memory management action (e.g., page out, move to
    head of the LRU list, use huge page, ...).  We call the request
    'scheme'.
    
    Proactive Reclamation on top of DAMON/DAMOS
    -------------------------------------------
    
    Therefore, by using DAMON for the cold pages detection, the proactive
    reclamation's monitoring overhead issue can be solved.  Actually, we
    previously implemented a version of proactive reclamation using DAMOS
    and achieved noticeable improvements with our evaluation setup[5].
    Nevertheless, it more for a proof-of-concept, rather than production
    uses.  It supports only virtual address spaces of processes, and require
    additional tuning efforts for given workloads and the hardware.  For the
    tuning, we introduced a simple auto-tuning user space tool[8].  Google
    is also known to using a ML-based similar approach for their fleets[2].
    But, making it just works with intuitive knobs in the kernel would be
    helpful for general users.
    
    To this end, this patchset improves DAMOS to be ready for such
    production usages, and implements another version of the proactive
    reclamation, namely DAMON_RECLAIM, on top of it.
    
    DAMOS Improvements: Aggressiveness Control, Prioritization, and Watermarks
    --------------------------------------------------------------------------
    
    First of all, the current version of DAMOS supports only virtual address
    spaces.  This patchset makes it supports the physical address space for
    the page out action.
    
    Next major problem of the current version of DAMOS is the lack of the
    aggressiveness control, which can results in arbitrary overhead.  For
    example, if huge memory regions having the data access pattern of
    interest are found, applying the requested action to all of the regions
    could incur significant overhead.  It can be controlled by tuning the
    target data access pattern with manual or automated approaches[2,7].
    But, some people would prefer the kernel to just work with only
    intuitive tuning or default values.
    
    For such cases, this patchset implements a safeguard, namely time/size
    quota.  Using this, the clients can specify up to how much time can be
    used for applying the action, and/or up to how much memory regions the
    action can be applied within a user-specified time duration.  A followup
    question is, to which memory regions should the action applied within
    the limits? We implement a simple regions prioritization mechanism for
    each action and make DAMOS to apply the action to high priority regions
    first.  It also allows clients tune the prioritization mechanism to use
    different weights for size, access frequency, and age of memory regions.
    This means we could use not only LRU but also LFU or some fancy
    algorithms like CAR[9] with lightweight overhead.
    
    Though DAMON is lightweight, someone would want to remove even the cold
    pages monitoring overhead when it is unnecessary.  Currently, it should
    manually turned on and off by clients, but some clients would simply
    want to turn it on and off based on some metrics like free memory ratio
    or memory fragmentation.  For such cases, this patchset implements a
    watermarks-based automatic activation feature.  It allows the clients
    configure the metric of their interest, and three watermarks of the
    metric.  If the metric is higher than the high watermark or lower than
    the low watermark, the scheme is deactivated.  If the metric is lower
    than the mid watermark but higher than the low watermark, the scheme is
    activated.
    
    DAMON-based Reclaim
    -------------------
    
    Using the improved version of DAMOS, this patchset implements a static
    kernel module called 'damon_reclaim'.  It finds memory regions that
    didn't accessed for specific time duration and page out.  Consuming too
    much CPU for the paging out operations, or doing pageout too frequently
    can be critical for systems configuring their swap devices with
    software-defined in-memory block devices like zram/zswap or total number
    of writes limited devices like SSDs, respectively.  To avoid the
    problems, the time/size quotas can be configured.  Under the quotas, it
    pages out memory regions that didn't accessed longer first.  Also, to
    remove the monitoring overhead under peaceful situation, and to fall
    back to the LRU-list based page granularity reclamation when it doesn't
    make progress, the three watermarks based activation mechanism is used,
    with the free memory ratio as the watermark metric.
    
    For convenient configurations, it provides several module parameters.
    Using these, sysadmins can enable/disable it, and tune its parameters
    including the coldness identification time threshold, the time/size
    quotas and the three watermarks.
    
    Evaluation
    ==========
    
    In short, DAMON_RECLAIM with 50ms/s time quota and regions
    prioritization on v5.15-rc5 Linux kernel with ZRAM swap device achieves
    38.58% memory saving with only 1.94% runtime overhead.  For this,
    DAMON_RECLAIM consumes only 4.97% of single CPU time.
    
    Setup
    -----
    
    We evaluate DAMON_RECLAIM to show how each of the DAMOS improvements
    make effect.  For this, we measure DAMON_RECLAIM's CPU consumption,
    entire system memory footprint, total number of major page faults, and
    runtime of 24 realistic workloads in PARSEC3 and SPLASH-2X benchmark
    suites on my QEMU/KVM based virtual machine.  The virtual machine runs
    on an i3.metal AWS instance, has 130GiB memory, and runs a linux kernel
    built on latest -mm tree[1] plus this patchset.  It also utilizes a 4
    GiB ZRAM swap device.  We repeats the measurement 5 times and use
    averages.
    
    [1] https://github.com/hnaz/linux-mm/tree/v5.15-rc5-mmots-2021-10-13-19-55
    
    Detailed Results
    ----------------
    
    The results are summarized in the below table.
    
    With coldness identification threshold of 5 seconds, DAMON_RECLAIM
    without the time quota-based speed limit achieves 47.21% memory saving,
    but incur 4.59% runtime slowdown to the workloads on average.  For this,
    DAMON_RECLAIM consumes about 11.28% single CPU time.
    
    Applying time quotas of 200ms/s, 50ms/s, and 10ms/s without the regions
    prioritization reduces the slowdown to 4.89%, 2.65%, and 1.5%,
    respectively.  Time quota of 200ms/s (20%) makes no real change compared
    to the quota unapplied version, because the quota unapplied version
    consumes only 11.28% CPU time.  DAMON_RECLAIM's CPU utilization also
    similarly reduced: 11.24%, 5.51%, and 2.01% of single CPU time.  That
    is, the overhead is proportional to the speed limit.  Nevertheless, it
    also reduces the memory saving because it becomes less aggressive.  In
    detail, the three variants show 48.76%, 37.83%, and 7.85% memory saving,
    respectively.
    
    Applying the regions prioritization (page out regions that not accessed
    longer first within the time quota) further reduces the performance
    degradation.  Runtime slowdowns and total number of major page faults
    increase has been 4.89%/218,690% -> 4.39%/166,136% (200ms/s),
    2.65%/111,886% -> 1.94%/59,053% (50ms/s), and 1.5%/34,973.40% ->
    2.08%/8,781.75% (10ms/s).  The runtime under 10ms/s time quota has
    increased with prioritization, but apparently that's under the margin of
    error.
    
        time quota   prioritization  memory_saving  cpu_util  slowdown  pgmajfaults overhead
        N            N               47.21%         11.28%    4.59%     194,802%
        200ms/s      N               48.76%         11.24%    4.89%     218,690%
        50ms/s       N               37.83%         5.51%     2.65%     111,886%
        10ms/s       N               7.85%          2.01%     1.5%      34,793.40%
        200ms/s      Y               50.08%         10.38%    4.39%     166,136%
        50ms/s       Y               38.58%         4.97%     1.94%     59,053%
        10ms/s       Y               3.63%          1.73%     2.08%     8,781.75%
    
    Baseline and Complete Git Trees
    ===============================
    
    The patches are based on the latest -mm tree
    (v5.15-rc5-mmots-2021-10-13-19-55).  You can also clone the complete git tree
    from:
    
        $ git clone git://github.com/sjp38/linux -b damon_reclaim/patches/v1
    
    The web is also available:
    https://git.kernel.org/pub/scm/linux/kernel/git/sj/linux.git/tag/?h=damon_reclaim/patches/v1
    
    Sequence Of Patches
    ===================
    
    The first patch makes DAMOS support the physical address space for the
    page out action.  Following five patches (patches 2-6) implement the
    time/size quotas.  Next four patches (patches 7-10) implement the memory
    regions prioritization within the limit.  Then, three following patches
    (patches 11-13) implement the watermarks-based schemes activation.
    
    Finally, the last two patches (patches 14-15) implement and document the
    DAMON-based reclamation using the advanced DAMOS.
    
    [1] https://www.kernel.org/doc/html/v5.15-rc1/vm/damon/index.html
    [2] https://research.google/pubs/pub48551/
    [3] https://lwn.net/Articles/787611/
    [4] https://damonitor.github.io
    [5] https://damonitor.github.io/doc/html/latest/vm/damon/eval.html
    [6] https://lore.kernel.org/linux-mm/20211001125604.29660-1-sj@kernel.org/
    [7] https://github.com/awslabs/damoos
    [8] https://www.kernel.org/doc/html/latest/vm/free_page_reporting.html
    [9] https://www.usenix.org/conference/fast-04/car-clock-adaptive-replacement
    
    This patch (of 15):
    
    This makes the DAMON primitives for physical address space support the
    pageout action for DAMON-based Operation Schemes.  With this commit,
    hence, users can easily implement system-level data access-aware
    reclamations using DAMOS.
    
    [sj@kernel.org: fix missing-prototype build warning]
      Link: https://lkml.kernel.org/r/20211025064220.13904-1-sj@kernel.org
    
    Link: https://lkml.kernel.org/r/20211019150731.16699-1-sj@kernel.org
    Link: https://lkml.kernel.org/r/20211019150731.16699-2-sj@kernel.org
    Signed-off-by: SeongJae Park <sj@kernel.org>
    Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
    Cc: Amit Shah <amit@kernel.org>
    Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: David Woodhouse <dwmw@amazon.com>
    Cc: Marco Elver <elver@google.com>
    Cc: Leonard Foerster <foersleo@amazon.de>
    Cc: Greg Thelen <gthelen@google.com>
    Cc: Markus Boehme <markubo@amazon.de>
    Cc: David Rientjes <rientjes@google.com>
    Cc: Shakeel Butt <shakeelb@google.com>
    Cc: Shuah Khan <shuah@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    SeongJae Park authored and torvalds committed Nov 6, 2021
  32. mm/damon/dbgfs: remove unnecessary variables

    In some functions, it's unnecessary to declare 'err' and 'ret' variables
    at the same time.  This patch mainly to simplify the issue of such
    declarations by reusing one variable.
    
    Link: https://lkml.kernel.org/r/20211014073014.35754-1-sj@kernel.org
    Signed-off-by: Rongwei Wang <rongwei.wang@linux.alibaba.com>
    Signed-off-by: SeongJae Park <sj@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    wangrongwei authored and torvalds committed Nov 6, 2021
  33. mm/damon/vaddr: constify static mm_walk_ops

    The only usage of these structs is to pass their addresses to
    walk_page_range(), which takes a pointer to const mm_walk_ops as
    argument.  Make them const to allow the compiler to put them in
    read-only memory.
    
    Link: https://lkml.kernel.org/r/20211014075042.17174-2-rikard.falkeborn@gmail.com
    Signed-off-by: Rikard Falkeborn <rikard.falkeborn@gmail.com>
    Reviewed-by: SeongJae Park <sj@kernel.org>
    Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    rikardfalkeborn authored and torvalds committed Nov 6, 2021
Older