Skip to content
Permalink
rescue_options

Commits on Oct 29, 2019

  1. btrfs: Introduce new mount option to skip block group items scan

    [PROBLEM]
    There are some reports of corrupted fs which can't be mounted due to
    corrupted extent tree.
    
    However under such situation, it's more likely the fs/subvolume trees
    are still fine.
    
    For such case we normally go btrfs-restore and salvage as much as we
    can. However btrfs-restore can't list subvolumes as "btrfs subv list",
    making it harder to restore a fs.
    
    [ENHANCEMENT]
    This patch will introduce a new mount option "rescue=skipbg" to skip
    the mount time block group scan, and use chunk info solely to populate
    fake block group cache.
    
    The mount option has the following dependency:
    - RO mount
      Obviously.
    
    - No dirty log.
      Either there is no log, or use rescue=nologreplay mount option.
    
    - No way to remoutn RW
      Similar to rescue=nologreplay option.
    
    This allow kernel to accept all extent tree corruption, even when the
    whole extent tree is corrupted, and allow user to salvage data and
    subvolume info.
    
    Signed-off-by: Qu Wenruo <wqu@suse.com>
    adam900710 committed Oct 29, 2019
  2. btrfs: Introduce "rescue=" mount option

    This patch introduces a new "rescue=" mount option group for all those
    mount options for data recovery.
    
    Different rescue sub options are seperated by ':'. E.g
    "ro,rescue=nologreplay:usebackuproot".
    (The original plan is to use ';', but ';' needs to be escaped/quoted,
    or it will be interpreted by bash)
    
    And obviously, user can specify rescue options one by one like:
    "ro,rescue=nologreplay,rescue=usebackuproot"
    
    The following mount options are converted to "rescue=", old mount
    options are deprecated but still available for compatibility purpose:
    
    - usebackuproot
      Now it's "rescue=usebackuproot"
    
    - nologreplay
      Now it's "rescue=nologreplay"
    
    Signed-off-by: Qu Wenruo <wqu@suse.com>
    adam900710 committed Oct 29, 2019

Commits on Sep 30, 2019

  1. Linux 5.4-rc1

    torvalds committed Sep 30, 2019
  2. Merge tag 'for-5.4-rc1-tag' of git://git.kernel.org/pub/scm/linux/ker…

    …nel/git/kdave/linux
    
    Pull btrfs fixes from David Sterba:
     "A bunch of fixes that accumulated in recent weeks, mostly material for
      stable.
    
      Summary:
    
       - fix for regression from 5.3 that prevents to use balance convert
         with single profile
    
       - qgroup fixes: rescan race, accounting leak with multiple writers,
         potential leak after io failure recovery
    
       - fix for use after free in relocation (reported by KASAN)
    
       - other error handling fixups"
    
    * tag 'for-5.4-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
      btrfs: qgroup: Fix reserved data space leak if we have multiple reserve calls
      btrfs: qgroup: Fix the wrong target io_tree when freeing reserved data space
      btrfs: Fix a regression which we can't convert to SINGLE profile
      btrfs: relocation: fix use-after-free on dead relocation roots
      Btrfs: fix race setting up and completing qgroup rescan workers
      Btrfs: fix missing error return if writeback for extent buffer never started
      btrfs: adjust dirty_metadata_bytes after writeback failure of extent buffer
      Btrfs: fix selftests failure due to uninitialized i_mode in test inodes
    torvalds committed Sep 30, 2019
  3. Merge tag 'csky-for-linus-5.4-rc1' of git://github.com/c-sky/csky-linux

    Pull csky updates from Guo Ren:
     "This round of csky subsystem just some fixups:
    
       - Fix mb() synchronization problem
    
       - Fix dma_alloc_coherent with PAGE_SO attribute
    
       - Fix cache_op failed when cross memory ZONEs
    
       - Optimize arch_sync_dma_for_cpu/device with dma_inv_range
    
       - Fix ioremap function losing
    
       - Fix arch_get_unmapped_area() implementation
    
       - Fix defer cache flush for 610
    
       - Support kernel non-aligned access
    
       - Fix 610 vipt cache flush mechanism
    
       - Fix add zero_fp fixup perf backtrace panic
    
       - Move static keyword to the front of declaration
    
       - Fix csky_pmu.max_period assignment
    
       - Use generic free_initrd_mem()
    
       - entry: Remove unneeded need_resched() loop"
    
    * tag 'csky-for-linus-5.4-rc1' of git://github.com/c-sky/csky-linux:
      csky: Move static keyword to the front of declaration
      csky: entry: Remove unneeded need_resched() loop
      csky: Fixup csky_pmu.max_period assignment
      csky: Fixup add zero_fp fixup perf backtrace panic
      csky: Use generic free_initrd_mem()
      csky: Fixup 610 vipt cache flush mechanism
      csky: Support kernel non-aligned access
      csky: Fixup defer cache flush for 610
      csky: Fixup arch_get_unmapped_area() implementation
      csky: Fixup ioremap function losing
      csky: Optimize arch_sync_dma_for_cpu/device with dma_inv_range
      csky/dma: Fixup cache_op failed when cross memory ZONEs
      csky: Fixup dma_alloc_coherent with PAGE_SO attribute
      csky: Fixup mb() synchronization problem
    torvalds committed Sep 30, 2019
  4. Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel…

    …/git/soc/soc
    
    Pull ARM SoC fixes from Olof Johansson:
     "A few fixes that have trickled in through the merge window:
    
       - Video fixes for OMAP due to panel-dpi driver removal
    
       - Clock fixes for OMAP that broke no-idle quirks + nfsroot on DRA7
    
       - Fixing arch version on ASpeed ast2500
    
       - Two fixes for reset handling on ARM SCMI"
    
    * tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
      ARM: aspeed: ast2500 is ARMv6K
      reset: reset-scmi: add missing handle initialisation
      firmware: arm_scmi: reset: fix reset_state assignment in scmi_domain_reset
      bus: ti-sysc: Remove unpaired sysc_clkdm_deny_idle()
      ARM: dts: logicpd-som-lv: Fix i2c2 and i2c3 Pin mux
      ARM: dts: am3517-evm: Fix missing video
      ARM: dts: logicpd-torpedo-baseboard: Fix missing video
      ARM: omap2plus_defconfig: Fix missing video
      bus: ti-sysc: Fix handling of invalid clocks
      bus: ti-sysc: Fix clock handling for no-idle quirks
    torvalds committed Sep 30, 2019
  5. Merge tag 'trace-v5.4-3' of git://git.kernel.org/pub/scm/linux/kernel…

    …/git/rostedt/linux-trace
    
    Pull tracing fixes from Steven Rostedt:
     "A few more tracing fixes:
    
       - Fix a buffer overflow by checking nr_args correctly in probes
    
       - Fix a warning that is reported by clang
    
       - Fix a possible memory leak in error path of filter processing
    
       - Fix the selftest that checks for failures, but wasn't failing
    
       - Minor clean up on call site output of a memory trace event"
    
    * tag 'trace-v5.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
      selftests/ftrace: Fix same probe error test
      mm, tracing: Print symbol name for call_site in trace events
      tracing: Have error path in predicate_parse() free its allocated memory
      tracing: Fix clang -Wint-in-bool-context warnings in IF_ASSIGN macro
      tracing/probe: Fix to check the difference of nr_args before adding probe
    torvalds committed Sep 30, 2019
  6. Merge tag 'mmc-v5.4-2' of git://git.kernel.org/pub/scm/linux/kernel/g…

    …it/ulfh/mmc
    
    Pull more MMC updates from Ulf Hansson:
     "A couple more updates/fixes for MMC:
    
       - sdhci-pci: Add Genesys Logic GL975x support
    
       - sdhci-tegra: Recover loss in throughput for DMA
    
       - sdhci-of-esdhc: Fix DMA bug"
    
    * tag 'mmc-v5.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
      mmc: host: sdhci-pci: Add Genesys Logic GL975x support
      mmc: tegra: Implement ->set_dma_mask()
      mmc: sdhci: Let drivers define their DMA mask
      mmc: sdhci-of-esdhc: set DMA snooping based on DMA coherence
      mmc: sdhci: improve ADMA error reporting
    torvalds committed Sep 30, 2019
  7. csky: Move static keyword to the front of declaration

    Move the static keyword to the front of declaration of
    csky_pmu_of_device_ids, and resolve the following compiler
    warning that can be seen when building with warnings
    enabled (W=1):
    
    arch/csky/kernel/perf_event.c:1340:1: warning:
      ‘static’ is not at beginning of declaration [-Wold-style-declaration]
    
    Signed-off-by: Krzysztof Wilczynski <kw@linux.com>
    Signed-off-by: Guo Ren <guoren@kernel.org>
    kwilczynski authored and guoren83 committed Sep 30, 2019
  8. csky: entry: Remove unneeded need_resched() loop

    Since the enabling and disabling of IRQs within preempt_schedule_irq()
    is contained in a need_resched() loop, we don't need the outer arch
    code loop.
    
    Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
    Signed-off-by: Guo Ren <guoren@kernel.org>
    valschneider authored and guoren83 committed Sep 30, 2019
  9. Merge tag 'char-misc-5.4-rc1' of git://git.kernel.org/pub/scm/linux/k…

    …ernel/git/gregkh/char-misc
    
    Pull Documentation/process update from Greg KH:
     "Here are two small Documentation/process/embargoed-hardware-issues.rst
      file updates that missed my previous char/misc pull request.
    
      The first one adds an Intel representative for the process, and the
      second one cleans up the text a bit more when it comes to how the
      disclosure rules work, as it was a bit confusing to some companies"
    
    * tag 'char-misc-5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
      Documentation/process: Clarify disclosure rules
      Documentation/process: Volunteer as the ambassador for Intel
    torvalds committed Sep 30, 2019
  10. Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel…

    …/git/viro/vfs
    
    Pull more vfs updates from Al Viro:
     "A couple of misc patches"
    
    * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
      afs dynroot: switch to simple_dir_operations
      fs/handle.c - fix up kerneldoc
    torvalds committed Sep 30, 2019
  11. Merge tag '5.4-rc-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6

    Pull more cifs updates from Steve French:
     "Fixes from the recent SMB3 Test events and Storage Developer
      Conference (held the last two weeks).
    
      Here are nine smb3 patches including an important patch for debugging
      traces with wireshark, with three patches marked for stable.
    
      Additional fixes from last week to better handle some newly discovered
      reparse points, and a fix the create/mkdir path for setting the mode
      more atomically (in SMB3 Create security descriptor context), and one
      for path name processing are still being tested so are not included
      here"
    
    * tag '5.4-rc-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
      CIFS: Fix oplock handling for SMB 2.1+ protocols
      smb3: missing ACL related flags
      smb3: pass mode bits into create calls
      smb3: Add missing reparse tags
      CIFS: fix max ea value size
      fs/cifs/sess.c: Remove set but not used variable 'capabilities'
      fs/cifs/smb2pdu.c: Make SMB2_notify_init static
      smb3: fix leak in "open on server" perf counter
      smb3: allow decryption keys to be dumped by admin for debugging
    torvalds committed Sep 30, 2019
  12. csky: Fixup csky_pmu.max_period assignment

    The csky_pmu.max_period has type u64, and BIT() can only return
    32 bits unsigned long on C-SKY. The initialization for max_period
    will be incorrect when count_width is bigger than 32.
    
    Use BIT_ULL()
    
    Signed-off-by: Mao Han <han_mao@c-sky.com>
    Signed-off-by: Guo Ren <ren_guo@c-sky.com>
    MaoHan001 authored and guoren83 committed Sep 30, 2019
  13. csky: Fixup add zero_fp fixup perf backtrace panic

    We need set fp zero to let backtrace know the end. The patch fixup perf
    callchain panic problem, because backtrace didn't know what is the end
    of fp.
    
    Signed-off-by: Guo Ren <ren_guo@c-sky.com>
    Reported-by: Mao Han <han_mao@c-sky.com>
    guoren83 committed Sep 30, 2019
  14. csky: Use generic free_initrd_mem()

    The csky implementation of free_initrd_mem() is an open-coded version of
    free_reserved_area() without poisoning.
    
    Remove it and make csky use the generic version of free_initrd_mem().
    
    Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
    Signed-off-by: Guo Ren <guoren@kernel.org>
    rppt authored and guoren83 committed Sep 30, 2019
  15. Merge branch 'entropy'

    Merge active entropy generation updates.
    
    This is admittedly partly "for discussion".  We need to have a way
    forward for the boot time deadlocks where user space ends up waiting for
    more entropy, but no entropy is forthcoming because the system is
    entirely idle just waiting for something to happen.
    
    While this was triggered by what is arguably a user space bug with
    GDM/gnome-session asking for secure randomness during early boot, when
    they didn't even need any such truly secure thing, the issue ends up
    being that our "getrandom()" interface is prone to that kind of
    confusion, because people don't think very hard about whether they want
    to block for sufficient amounts of entropy.
    
    The approach here-in is to decide to not just passively wait for entropy
    to happen, but to start actively collecting it if it is missing.  This
    is not necessarily always possible, but if the architecture has a CPU
    cycle counter, there is a fair amount of noise in the exact timings of
    reasonably complex loads.
    
    We may end up tweaking the load and the entropy estimates, but this
    should be at least a reasonable starting point.
    
    As part of this, we also revert the revert of the ext4 IO pattern
    improvement that ended up triggering the reported lack of external
    entropy.
    
    * getrandom() active entropy waiting:
      Revert "Revert "ext4: make __ext4_get_inode_loc plug""
      random: try to actively add entropy rather than passively wait for it
    torvalds committed Sep 30, 2019
  16. Revert "Revert "ext4: make __ext4_get_inode_loc plug""

    This reverts commit 72dbcf7.
    
    Instead of waiting forever for entropy that may just not happen, we now
    try to actively generate entropy when required, and are thus hopefully
    avoiding the problem that caused the nice ext4 IO pattern fix to be
    reverted.
    
    So revert the revert.
    
    Cc: Ahmed S. Darwish <darwish.07@gmail.com>
    Cc: Ted Ts'o <tytso@mit.edu>
    Cc: Willy Tarreau <w@1wt.eu>
    Cc: Alexander E. Patrakov <patrakov@gmail.com>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    torvalds committed Sep 30, 2019
  17. random: try to actively add entropy rather than passively wait for it

    For 5.3 we had to revert a nice ext4 IO pattern improvement, because it
    caused a bootup regression due to lack of entropy at bootup together
    with arguably broken user space that was asking for secure random
    numbers when it really didn't need to.
    
    See commit 72dbcf7 (Revert "ext4: make __ext4_get_inode_loc plug").
    
    This aims to solve the issue by actively generating entropy noise using
    the CPU cycle counter when waiting for the random number generator to
    initialize.  This only works when you have a high-frequency time stamp
    counter available, but that's the case on all modern x86 CPU's, and on
    most other modern CPU's too.
    
    What we do is to generate jitter entropy from the CPU cycle counter
    under a somewhat complex load: calling the scheduler while also
    guaranteeing a certain amount of timing noise by also triggering a
    timer.
    
    I'm sure we can tweak this, and that people will want to look at other
    alternatives, but there's been a number of papers written on jitter
    entropy, and this should really be fairly conservative by crediting one
    bit of entropy for every timer-induced jump in the cycle counter.  Not
    because the timer itself would be all that unpredictable, but because
    the interaction between the timer and the loop is going to be.
    
    Even if (and perhaps particularly if) the timer actually happens on
    another CPU, the cacheline interaction between the loop that reads the
    cycle counter and the timer itself firing is going to add perturbations
    to the cycle counter values that get mixed into the entropy pool.
    
    As Thomas pointed out, with a modern out-of-order CPU, even quite simple
    loops show a fair amount of hard-to-predict timing variability even in
    the absense of external interrupts.  But this tries to take that further
    by actually having a fairly complex interaction.
    
    This is not going to solve the entropy issue for architectures that have
    no CPU cycle counter, but it's not clear how (and if) that is solvable,
    and the hardware in question is largely starting to be irrelevant.  And
    by doing this we can at least avoid some of the even more contentious
    approaches (like making the entropy waiting time out in order to avoid
    the possibly unbounded waiting).
    
    Cc: Ahmed Darwish <darwish.07@gmail.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Theodore Ts'o <tytso@mit.edu>
    Cc: Nicholas Mc Guire <hofrat@opentech.at>
    Cc: Andy Lutomirski <luto@kernel.org>
    Cc: Kees Cook <keescook@chromium.org>
    Cc: Willy Tarreau <w@1wt.eu>
    Cc: Alexander E. Patrakov <patrakov@gmail.com>
    Cc: Lennart Poettering <mzxreary@0pointer.de>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    torvalds committed Sep 30, 2019

Commits on Sep 29, 2019

  1. Merge tag 'fixes-5.4-merge-window' of git://git.kernel.org/pub/scm/li…

    …nux/kernel/git/tmlind/linux-omap into arm/fixes
    
    Fixes for omap variants
    
    Few fixes for ti-sysc interconnect target module driver for no-idle
    quirks that caused nfsroot to fail on some dra7 boards.
    
    And let's fixes to get LCD working again for logicpd board that got
    broken a while back with removal of panel-dpi driver. We need to now
    use generic CONFIG_DRM_PANEL_SIMPLE instead.
    
    * tag 'fixes-5.4-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
      bus: ti-sysc: Remove unpaired sysc_clkdm_deny_idle()
      ARM: dts: logicpd-som-lv: Fix i2c2 and i2c3 Pin mux
      ARM: dts: am3517-evm: Fix missing video
      ARM: dts: logicpd-torpedo-baseboard: Fix missing video
      ARM: omap2plus_defconfig: Fix missing video
      bus: ti-sysc: Fix handling of invalid clocks
      bus: ti-sysc: Fix clock handling for no-idle quirks
    
    Link: https://lore.kernel.org/r/pull-1568819401-72461@atomide.com
    Signed-off-by: Olof Johansson <olof@lixom.net>
    olofj committed Sep 29, 2019
  2. Merge tag 'scmi-fixes-5.4' of git://git.kernel.org/pub/scm/linux/kern…

    …el/git/sudeep.holla/linux into arm/fixes
    
    ARM SCMI fixes for v5.4
    
    Couple of fixes: one in scmi reset driver initialising missed scmi handle
    and an other in scmi reset API implementation fixing the assignment of
    reset state
    
    * tag 'scmi-fixes-5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux:
      reset: reset-scmi: add missing handle initialisation
      firmware: arm_scmi: reset: fix reset_state assignment in scmi_domain_reset
    
    Link: https://lore.kernel.org/r/20190918142139.GA4370@bogus
    Signed-off-by: Olof Johansson <olof@lixom.net>
    olofj committed Sep 29, 2019
  3. Merge tag 'libnvdimm-fixes-5.4-rc1' of git://git.kernel.org/pub/scm/l…

    …inux/kernel/git/nvdimm/nvdimm
    
    More libnvdimm updates from Dan Williams:
    
     - Complete the reworks to interoperate with powerpc dynamic huge page
       sizes
    
     - Fix a crash due to missed accounting for the powerpc 'struct
       page'-memmap mapping granularity
    
     - Fix badblock initialization for volatile (DRAM emulated) pmem ranges
    
     - Stop triggering request_key() notifications to userspace when
       NVDIMM-security is disabled / not present
    
     - Miscellaneous small fixups
    
    * tag 'libnvdimm-fixes-5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
      libnvdimm/region: Enable MAP_SYNC for volatile regions
      libnvdimm: prevent nvdimm from requesting key when security is disabled
      libnvdimm/region: Initialize bad block for volatile namespaces
      libnvdimm/nfit_test: Fix acpi_handle redefinition
      libnvdimm/altmap: Track namespace boundaries in altmap
      libnvdimm: Fix endian conversion issues 
      libnvdimm/dax: Pick the right alignment default when creating dax devices
      powerpc/book3s64: Export has_transparent_hugepage() related functions.
    torvalds committed Sep 29, 2019
  4. Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git…

    …/evalenti/linux-soc-thermal
    
    Pull thermal SoC updates from Eduardo Valentin:
     "This is a really small pull in the midst of a lot of pending patches.
    
      We are in the middle of restructuring how we are maintaining the
      thermal subsystem, as per discussion in our last LPC. For now, I am
      sending just some changes that were pending in my tree. Looking
      forward to get a more streamlined process in the next merge window"
    
    * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal:
      thermal: db8500: Rewrite to be a pure OF sensor
      thermal: db8500: Use dev helper variable
      thermal: db8500: Finalize device tree conversion
      thermal: thermal_mmio: remove some dead code
    torvalds committed Sep 29, 2019
  5. Merge branch 'i2c/for-next' of git://git.kernel.org/pub/scm/linux/ker…

    …nel/git/wsa/linux
    
    Pull  more i2c updates from Wolfram Sang:
    
     - make Lenovo Yoga C630 boot now that the dependencies are merged
    
     - restore BlockProcessCall for i801, accidently removed in this merge
       window
    
     - a bugfix for the riic driver
    
     - an improvement to the slave-eeprom driver which should have been in
       the first pull request but sadly got lost in the process
    
    * 'i2c/for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
      i2c: slave-eeprom: Add read only mode
      i2c: i801: Bring back Block Process Call support for certain platforms
      i2c: riic: Clear NACK in tend isr
      i2c: qcom-geni: Disable DMA processing on the Lenovo Yoga C630
    torvalds committed Sep 29, 2019
  6. Merge tag 'iommu-fixes-5.4-rc1' of git://git.kernel.org/pub/scm/linux…

    …/kernel/git/joro/iommu
    
    Pull iommu fixes from Joerg Roedel:
     "A couple of fixes for the AMD IOMMU driver have piled up:
    
       - Some fixes for the reworked IO page-table which caused memory leaks
         or did not allow to downgrade mappings under some conditions.
    
       - Locking fixes to fix a couple of possible races around accessing
         'struct protection_domain'. The races got introduced when the
         dma-ops path became lock-less in the fast-path"
    
    * tag 'iommu-fixes-5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
      iommu/amd: Lock code paths traversing protection_domain->dev_list
      iommu/amd: Lock dev_data in attach/detach code paths
      iommu/amd: Check for busy devices earlier in attach_device()
      iommu/amd: Take domain->lock for complete attach/detach path
      iommu/amd: Remove amd_iommu_devtable_lock
      iommu/amd: Remove domain->updated
      iommu/amd: Wait for completion of IOTLB flush in attach_device
      iommu/amd: Unmap all L7 PTEs when downgrading page-sizes
      iommu/amd: Introduce first_pte_l7() helper
      iommu/amd: Fix downgrading default page-sizes in alloc_pte()
      iommu/amd: Fix pages leak in free_pagetable()
    torvalds committed Sep 29, 2019
  7. Documentation/process: Clarify disclosure rules

    The role of the contact list provided by the disclosing party and how it
    affects the disclosure process and the ability to include experts into
    the development process is not really well explained.
    
    Neither is it entirely clear when the disclosing party will be informed
    about the fact that a developer who is not covered by an employer NDA needs
    to be brought in and disclosed.
    
    Explain the role of the contact list and the information policy along with
    an eventual conflict resolution better.
    
    Reported-by: Dave Hansen <dave.hansen@linux.intel.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
    Link: https://lore.kernel.org/r/alpine.DEB.2.21.1909251028390.10825@nanos.tec.linutronix.de
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Thomas Gleixner authored and gregkh committed Sep 29, 2019
  8. Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

    Pull networking fixes from David Miller:
    
     1) Sanity check URB networking device parameters to avoid divide by
        zero, from Oliver Neukum.
    
     2) Disable global multicast filter in NCSI, otherwise LLDP and IPV6
        don't work properly. Longer term this needs a better fix tho. From
        Vijay Khemka.
    
     3) Small fixes to selftests (use ping when ping6 is not present, etc.)
        from David Ahern.
    
     4) Bring back rt_uses_gateway member of struct rtable, it's semantics
        were not well understood and trying to remove it broke things. From
        David Ahern.
    
     5) Move usbnet snaity checking, ignore endpoints with invalid
        wMaxPacketSize. From Bjørn Mork.
    
     6) Missing Kconfig deps for sja1105 driver, from Mao Wenan.
    
     7) Various small fixes to the mlx5 DR steering code, from Alaa Hleihel,
        Alex Vesker, and Yevgeny Kliteynik
    
     8) Missing CAP_NET_RAW checks in various places, from Ori Nimron.
    
     9) Fix crash when removing sch_cbs entry while offloading is enabled,
        from Vinicius Costa Gomes.
    
    10) Signedness bug fixes, generally in looking at the result given by
        of_get_phy_mode() and friends. From Dan Crapenter.
    
    11) Disable preemption around BPF_PROG_RUN() calls, from Eric Dumazet.
    
    12) Don't create VRF ipv6 rules if ipv6 is disabled, from David Ahern.
    
    13) Fix quantization code in tcp_bbr, from Kevin Yang.
    
    * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (127 commits)
      net: tap: clean up an indentation issue
      nfp: abm: fix memory leak in nfp_abm_u32_knode_replace
      tcp: better handle TCP_USER_TIMEOUT in SYN_SENT state
      sk_buff: drop all skb extensions on free and skb scrubbing
      tcp_bbr: fix quantization code to not raise cwnd if not probing bandwidth
      mlxsw: spectrum_flower: Fail in case user specifies multiple mirror actions
      Documentation: Clarify trap's description
      mlxsw: spectrum: Clear VLAN filters during port initialization
      net: ena: clean up indentation issue
      NFC: st95hf: clean up indentation issue
      net: phy: micrel: add Asym Pause workaround for KSZ9021
      net: socionext: ave: Avoid using netdev_err() before calling register_netdev()
      ptp: correctly disable flags on old ioctls
      lib: dimlib: fix help text typos
      net: dsa: microchip: Always set regmap stride to 1
      nfp: flower: fix memory leak in nfp_flower_spawn_vnic_reprs
      nfp: flower: prevent memory leak in nfp_flower_spawn_phy_reprs
      net/sched: Set default of CONFIG_NET_TC_SKB_EXT to N
      vrf: Do not attempt to create IPv6 mcast rule if IPv6 is disabled
      net: sched: sch_sfb: don't call qdisc_put() while holding tree lock
      ...
    torvalds committed Sep 29, 2019

Commits on Sep 28, 2019

  1. Merge branch 'hugepage-fallbacks' (hugepatch patches from David Rient…

    …jes)
    
    Merge hugepage allocation updates from David Rientjes:
     "We (mostly Linus, Andrea, and myself) have been discussing offlist how
      to implement a sane default allocation strategy for hugepages on NUMA
      platforms.
    
      With these reverts in place, the page allocator will happily allocate
      a remote hugepage immediately rather than try to make a local hugepage
      available. This incurs a substantial performance degradation when
      memory compaction would have otherwise made a local hugepage
      available.
    
      This series reverts those reverts and attempts to propose a more sane
      default allocation strategy specifically for hugepages. Andrea
      acknowledges this is likely to fix the swap storms that he originally
      reported that resulted in the patches that removed __GFP_THISNODE from
      hugepage allocations.
    
      The immediate goal is to return 5.3 to the behavior the kernel has
      implemented over the past several years so that remote hugepages are
      not immediately allocated when local hugepages could have been made
      available because the increased access latency is untenable.
    
      The next goal is to introduce a sane default allocation strategy for
      hugepages allocations in general regardless of the configuration of
      the system so that we prevent thrashing of local memory when
      compaction is unlikely to succeed and can prefer remote hugepages over
      remote native pages when the local node is low on memory."
    
    Note on timing: this reverts the hugepage VM behavior changes that got
    introduced fairly late in the 5.3 cycle, and that fixed a huge
    performance regression for certain loads that had been around since
    4.18.
    
    Andrea had this note:
    
     "The regression of 4.18 was that it was taking hours to start a VM
      where 3.10 was only taking a few seconds, I reported all the details
      on lkml when it was finally tracked down in August 2018.
    
         https://lore.kernel.org/linux-mm/20180820032640.9896-2-aarcange@redhat.com/
    
      __GFP_THISNODE in MADV_HUGEPAGE made the above enterprise vfio
      workload degrade like in the "current upstream" above. And it still
      would have been that bad as above until 5.3-rc5"
    
    where the bad behavior ends up happening as you fill up a local node,
    and without that change, you'd get into the nasty swap storm behavior
    due to compaction working overtime to make room for more memory on the
    nodes.
    
    As a result 5.3 got the two performance fix reverts in rc5.
    
    However, David Rientjes then noted that those performance fixes in turn
    regressed performance for other loads - although not quite to the same
    degree.  He suggested reverting the reverts and instead replacing them
    with two small changes to how hugepage allocations are done (patch
    descriptions rephrased by me):
    
     - "avoid expensive reclaim when compaction may not succeed": just admit
       that the allocation failed when you're trying to allocate a huge-page
       and compaction wasn't successful.
    
     - "allow hugepage fallback to remote nodes when madvised": when that
       node-local huge-page allocation failed, retry without forcing the
       local node.
    
    but by then I judged it too late to replace the fixes for a 5.3 release.
    So 5.3 was released with behavior that harked back to the pre-4.18 logic.
    
    But now we're in the merge window for 5.4, and we can see if this
    alternate model fixes not just the horrendous swap storm behavior, but
    also restores the performance regression that the late reverts caused.
    
    Fingers crossed.
    
    * emailed patches from David Rientjes <rientjes@google.com>:
      mm, page_alloc: allow hugepage fallback to remote nodes when madvised
      mm, page_alloc: avoid expensive reclaim when compaction may not succeed
      Revert "Revert "Revert "mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask""
      Revert "Revert "mm, thp: restore node-local hugepage allocations""
    torvalds committed Sep 28, 2019
  2. selftests/ftrace: Fix same probe error test

    The "same probe" selftest that tests that adding the same probe fails
    doesn't add the same probe and passes, which fails the test.
    
    Fixes: b78b94b ("selftests/ftrace: Update kprobe event error testcase")
    Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
    rostedt committed Sep 28, 2019
  3. mm, tracing: Print symbol name for call_site in trace events

    To improve the readability of raw slab trace points, print the call_site ip
    using '%pS'. Then we can grep events with function names.
    
    [002] ....   808.188897: kmem_cache_free: call_site=putname+0x47/0x50 ptr=00000000cef40c80
    [002] ....   808.188898: kfree: call_site=security_cred_free+0x42/0x50 ptr=0000000062400820
    [002] ....   808.188904: kmem_cache_free: call_site=put_cred_rcu+0x88/0xa0 ptr=0000000058d74ef8
    [002] ....   808.188913: kmem_cache_alloc: call_site=prepare_creds+0x26/0x100 ptr=0000000058d74ef8 bytes_req=168 bytes_alloc=576 gfp_flags=GFP_KERNEL
    [002] ....   808.188917: kmalloc: call_site=security_prepare_creds+0x77/0xa0 ptr=0000000062400820 bytes_req=8 bytes_alloc=336 gfp_flags=GFP_KERNEL|__GFP_ZERO
    [002] ....   808.188920: kmem_cache_alloc: call_site=getname_flags+0x4f/0x1e0 ptr=00000000cef40c80 bytes_req=4096 bytes_alloc=4480 gfp_flags=GFP_KERNEL
    [002] ....   808.188925: kmem_cache_free: call_site=putname+0x47/0x50 ptr=00000000cef40c80
    [002] ....   808.188926: kfree: call_site=security_cred_free+0x42/0x50 ptr=0000000062400820
    [002] ....   808.188931: kmem_cache_free: call_site=put_cred_rcu+0x88/0xa0 ptr=0000000058d74ef8
    
    Link: http://lkml.kernel.org/r/20190914103215.23301-1-changbin.du@gmail.com
    
    Signed-off-by: Changbin Du <changbin.du@gmail.com>
    Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
    changbindu authored and rostedt committed Sep 28, 2019
  4. tracing: Have error path in predicate_parse() free its allocated memory

    In predicate_parse, there is an error path that is not going to
    out_free instead it returns directly which leads to a memory leak.
    
    Link: http://lkml.kernel.org/r/20190920225800.3870-1-navid.emamdoost@gmail.com
    
    Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com>
    Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
    Navidem authored and rostedt committed Sep 28, 2019
  5. tracing: Fix clang -Wint-in-bool-context warnings in IF_ASSIGN macro

    After r372664 in clang, the IF_ASSIGN macro causes a couple hundred
    warnings along the lines of:
    
    kernel/trace/trace_output.c:1331:2: warning: converting the enum
    constant to a boolean [-Wint-in-bool-context]
    kernel/trace/trace.h:409:3: note: expanded from macro
    'trace_assign_type'
                    IF_ASSIGN(var, ent, struct ftrace_graph_ret_entry,
                    ^
    kernel/trace/trace.h:371:14: note: expanded from macro 'IF_ASSIGN'
                    WARN_ON(id && (entry)->type != id);     \
                               ^
    264 warnings generated.
    
    This warning can catch issues with constructs like:
    
        if (state == A || B)
    
    where the developer really meant:
    
        if (state == A || state == B)
    
    This is currently the only occurrence of the warning in the kernel
    tree across defconfig, allyesconfig, allmodconfig for arm32, arm64,
    and x86_64. Add the implicit '!= 0' to the WARN_ON statement to fix
    the warnings and find potential issues in the future.
    
    Link: llvm/llvm-project@28b38c2
    Link: ClangBuiltLinux#686
    Link: http://lkml.kernel.org/r/20190926162258.466321-1-natechancellor@gmail.com
    
    Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
    Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
    Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
    nathanchance authored and rostedt committed Sep 28, 2019
  6. tracing/probe: Fix to check the difference of nr_args before adding p…

    …robe
    
    Steven reported that a test triggered:
    
    ==================================================================
     BUG: KASAN: slab-out-of-bounds in trace_kprobe_create+0xa9e/0xe40
     Read of size 8 at addr ffff8880c4f25a48 by task ftracetest/4798
    
     CPU: 2 PID: 4798 Comm: ftracetest Not tainted 5.3.0-rc6-test+ torvalds#30
     Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016
     Call Trace:
      dump_stack+0x7c/0xc0
      ? trace_kprobe_create+0xa9e/0xe40
      print_address_description+0x6c/0x332
      ? trace_kprobe_create+0xa9e/0xe40
      ? trace_kprobe_create+0xa9e/0xe40
      __kasan_report.cold.6+0x1a/0x3b
      ? trace_kprobe_create+0xa9e/0xe40
      kasan_report+0xe/0x12
      trace_kprobe_create+0xa9e/0xe40
      ? print_kprobe_event+0x280/0x280
      ? match_held_lock+0x1b/0x240
      ? find_held_lock+0xac/0xd0
      ? fs_reclaim_release.part.112+0x5/0x20
      ? lock_downgrade+0x350/0x350
      ? kasan_unpoison_shadow+0x30/0x40
      ? __kasan_kmalloc.constprop.6+0xc1/0xd0
      ? trace_kprobe_create+0xe40/0xe40
      ? trace_kprobe_create+0xe40/0xe40
      create_or_delete_trace_kprobe+0x2e/0x60
      trace_run_command+0xc3/0xe0
      ? trace_panic_handler+0x20/0x20
      ? kasan_unpoison_shadow+0x30/0x40
      trace_parse_run_command+0xdc/0x163
      vfs_write+0xe1/0x240
      ksys_write+0xba/0x150
      ? __ia32_sys_read+0x50/0x50
      ? tracer_hardirqs_on+0x61/0x180
      ? trace_hardirqs_off_caller+0x43/0x110
      ? mark_held_locks+0x29/0xa0
      ? do_syscall_64+0x14/0x260
      do_syscall_64+0x68/0x260
    
    Fix to check the difference of nr_args before adding probe
    on existing probes. This also may set the error log index
    bigger than the number of command parameters. In that case
    it sets the error position is next to the last parameter.
    
    Link: http://lkml.kernel.org/r/156966474783.3478.13217501608215769150.stgit@devnote2
    
    Fixes: ca89bc0 ("tracing/kprobe: Add multi-probe per event support")
    Reported-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
    Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
    Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
    mhiramat authored and rostedt committed Sep 28, 2019
  7. mm, page_alloc: allow hugepage fallback to remote nodes when madvised

    For systems configured to always try hard to allocate transparent
    hugepages (thp defrag setting of "always") or for memory that has been
    explicitly madvised to MADV_HUGEPAGE, it is often better to fallback to
    remote memory to allocate the hugepage if the local allocation fails
    first.
    
    The point is to allow the initial call to __alloc_pages_node() to attempt
    to defragment local memory to make a hugepage available, if possible,
    rather than immediately fallback to remote memory.  Local hugepages will
    always have a better access latency than remote (huge)pages, so an attempt
    to make a hugepage available locally is always preferred.
    
    If memory compaction cannot be successful locally, however, it is likely
    better to fallback to remote memory.  This could take on two forms: either
    allow immediate fallback to remote memory or do per-zone watermark checks.
    It would be possible to fallback only when per-zone watermarks fail for
    order-0 memory, since that would require local reclaim for all subsequent
    faults so remote huge allocation is likely better than thrashing the local
    zone for large workloads.
    
    In this case, it is assumed that because the system is configured to try
    hard to allocate hugepages or the vma is advised to explicitly want to try
    hard for hugepages that remote allocation is better when local allocation
    and memory compaction have both failed.
    
    Signed-off-by: David Rientjes <rientjes@google.com>
    Cc: Andrea Arcangeli <aarcange@redhat.com>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: Mel Gorman <mgorman@suse.de>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Cc: Stefan Priebe - Profihost AG <s.priebe@profihost.ag>
    Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
    Cc: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    David Rientjes authored and torvalds committed Sep 28, 2019
  8. mm, page_alloc: avoid expensive reclaim when compaction may not succeed

    Memory compaction has a couple significant drawbacks as the allocation
    order increases, specifically:
    
     - isolate_freepages() is responsible for finding free pages to use as
       migration targets and is implemented as a linear scan of memory
       starting at the end of a zone,
    
     - failing order-0 watermark checks in memory compaction does not account
       for how far below the watermarks the zone actually is: to enable
       migration, there must be *some* free memory available.  Per the above,
       watermarks are not always suffficient if isolate_freepages() cannot
       find the free memory but it could require hundreds of MBs of reclaim to
       even reach this threshold (read: potentially very expensive reclaim with
       no indication compaction can be successful), and
    
     - if compaction at this order has failed recently so that it does not even
       run as a result of deferred compaction, looping through reclaim can often
       be pointless.
    
    For hugepage allocations, these are quite substantial drawbacks because
    these are very high order allocations (order-9 on x86) and falling back to
    doing reclaim can potentially be *very* expensive without any indication
    that compaction would even be successful.
    
    Reclaim itself is unlikely to free entire pageblocks and certainly no
    reliance should be put on it to do so in isolation (recall lumpy reclaim).
    This means we should avoid reclaim and simply fail hugepage allocation if
    compaction is deferred.
    
    It is also not helpful to thrash a zone by doing excessive reclaim if
    compaction may not be able to access that memory.  If order-0 watermarks
    fail and the allocation order is sufficiently large, it is likely better
    to fail the allocation rather than thrashing the zone.
    
    Signed-off-by: David Rientjes <rientjes@google.com>
    Cc: Andrea Arcangeli <aarcange@redhat.com>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: Mel Gorman <mgorman@suse.de>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Cc: Stefan Priebe - Profihost AG <s.priebe@profihost.ag>
    Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
    Cc: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    David Rientjes authored and torvalds committed Sep 28, 2019
Older
You can’t perform that action at this time.