Skip to content
Permalink
Yafang-Shao/bp…
Switch branches/tags

Commits on Feb 11, 2022

  1. bpf: Show the used maps of each prog in progs.debug

    We can get bpf maps via maps.debug, and bpf progs via progs.debug, but
    we don't know the relationship of these progs and maps. Let show the
    used maps of each prog in progs.debug.
    
    The result as follows,
    
    $ cat progs.debug
      id name             attached         pinned
      19 dump_bpf_map     bpf_iter_bpf_map
        maps: 14 13
      21 dump_bpf_prog    bpf_iter_bpf_prog
        maps: 14 13
      36                                   /var/run/pnc/bpf/tc/prog/cls-ingress
        maps: 26 35 27 45 44 43 34 36 41 37 38 42
      37                                   /var/run/pnc/bpf/tc/prog/cls-egress
        maps: 26 35 27 32 45 44 28 43 34 36 41 37 38 42
      38                                   /var/run/pnc/bpf/tc/prog/connect4
        maps: 26 35 34 30 29 32 28
      39                                   /var/run/pnc/bpf/tc/prog/sockops
        maps: 35 32 28 33 31
      40                                   /var/run/pnc/bpf/tc/prog/skmsg
        maps: 35 33 31
    
    Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
    laoar authored and intel-lab-lkp committed Feb 11, 2022
  2. bpf: Show pinned file name in {progs, maps}.debug

    If the progs or maps are pinned into bpffs, the pinned file name can be
    displayed in progs.debug or maps.debug. It will help us to maintain the
    lifecycle of pinned maps and progs.
    
    The result as follows,
    
    $ cat progs.debug
      id name             attached         pinned
      19 dump_bpf_map     bpf_iter_bpf_map
      21 dump_bpf_prog    bpf_iter_bpf_prog
      22                                   /var/run/pnc/bpf/tc/prog/cls-ingress
      23                                   /var/run/pnc/bpf/tc/prog/cls-egress
      24                                   /var/run/pnc/bpf/tc/prog/connect4
      25                                   /var/run/pnc/bpf/tc/prog/sockops
      26                                   /var/run/pnc/bpf/tc/prog/skmsg
    
    $ cat maps.debug
      id name             max_entries  pinned
      13 iterator.rodata  1
      14 .rodata.str1.1   1
      16                  32           /var/run/pnc/bpf//tc//globals/ctrl
      17                  256          /var/run/pnc/bpf//tc//globals/events
      18                  102400       /var/run/pnc/bpf//tc//globals/egw_stat
      19                  1024         /var/run/pnc/bpf//tc//globals/egw_config
      20                  1024         /var/run/pnc/bpf//tc//globals/egw_mapping
      21                  102400       /var/run/pnc/bpf//tc//globals/egw_sock_info
      22                  0            /var/run/pnc/bpf//tc//globals/egw_sk_storage
      23                  102400       /var/run/pnc/bpf//tc//globals/egw_sock
      24                  1048576      /var/run/pnc/bpf//tc//globals/ip_identity
      25                  1024         /var/run/pnc/bpf//tc//globals/metrics
      26                  1024         /var/run/pnc/bpf//tc//globals/ns_ctrl
      27                  262144       /var/run/pnc/bpf//tc//globals/identity_ctrl
      28                  4194304      /var/run/pnc/bpf//tc//globals/rule
      29                  16384        /var/run/pnc/bpf//tc//globals/rgroup_rule
      30                  1048576      /var/run/pnc/bpf//tc//globals/rgroup_ref
      31                  102400       /var/run/pnc/bpf//tc//globals/policy_stats
      32                  102400       /var/run/pnc/bpf//tc//globals/lazy_record
      33                  102400       /var/run/pnc/bpf//tc//globals/flow_stats
      34                  1024         /var/run/pnc/bpf//tc//globals/fs_filter_raddr
      35                  1024         /var/run/pnc/bpf//tc//globals/fs_filter_l4addr
    
    The pinned name in these two files will be disappeared if the pinned file is removed.
    
    Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
    laoar authored and intel-lab-lkp committed Feb 11, 2022
  3. bpf: Add pin_name into struct bpf_map

    A new member pin_name is added into struct bpf_map, which will set when
    the map is pinned and cleared when the pinned file is removed.
    
    Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
    laoar authored and intel-lab-lkp committed Feb 11, 2022
  4. bpf: Add pin_name into struct bpf_prog_aux

    A new member pin_name is added into struct bpf_prog_aux, which will be
    set when the prog is set and cleared when the pinned file is removed.
    
    Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
    laoar authored and intel-lab-lkp committed Feb 11, 2022

Commits on Feb 4, 2022

  1. ixgbevf: Require large buffers for build_skb on 82599VF

    From 4.17 onwards the ixgbevf driver uses build_skb() to build an skb
    around new data in the page buffer shared with the ixgbe PF.
    This uses either a 2K or 3K buffer, and offsets the DMA mapping by
    NET_SKB_PAD + NET_IP_ALIGN. When using a smaller buffer RXDCTL is set to
    ensure the PF does not write a full 2K bytes into the buffer, which is
    actually 2K minus the offset.
    
    However on the 82599 virtual function, the RXDCTL mechanism is not
    available. The driver attempts to work around this by using the SET_LPE
    mailbox method to lower the maximm frame size, but the ixgbe PF driver
    ignores this in order to keep the PF and all VFs in sync[0].
    
    This means the PF will write up to the full 2K set in SRRCTL, causing it
    to write NET_SKB_PAD + NET_IP_ALIGN bytes past the end of the buffer.
    With 4K pages split into two buffers, this means it either writes
    NET_SKB_PAD + NET_IP_ALIGN bytes past the first buffer (and into the
    second), or NET_SKB_PAD + NET_IP_ALIGN bytes past the end of the DMA
    mapping.
    
    Avoid this by only enabling build_skb when using "large" buffers (3K).
    These are placed in each half of an order-1 page, preventing the PF from
    writing past the end of the mapping.
    
    [0]: Technically it only ever raises the max frame size, see
    ixgbe_set_vf_lpe() in ixgbe_sriov.c
    
    Fixes: f15c5ba ("ixgbevf: add support for using order 1 pages to receive large frames")
    Signed-off-by: Samuel Mendoza-Jonas <samjonas@amazon.com>
    Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
    Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    sam-aws authored and davem330 committed Feb 4, 2022
  2. net: sparx5: Fix get_stat64 crash in tcpdump

    This problem was found with Sparx5 when the tcpdump tool requests the
    do_get_stats64 (sparx5_get_stats64) statistic.
    
    The portstats pointer was incorrectly incremented when fetching priority
    based statistics.
    
    Fixes: af4b110 (net: sparx5: add ethtool configuration and statistics support)
    Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
    Link: https://lore.kernel.org/r/20220203102900.528987-1-steen.hegelund@microchip.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    steen-hegelund-mchp authored and Jakub Kicinski committed Feb 4, 2022
  3. gcc-plugins/stackleak: Use noinstr in favor of notrace

    While the stackleak plugin was already using notrace, objtool is now a
    bit more picky.  Update the notrace uses to noinstr.  Silences the
    following objtool warnings when building with:
    
    CONFIG_DEBUG_ENTRY=y
    CONFIG_STACK_VALIDATION=y
    CONFIG_VMLINUX_VALIDATION=y
    CONFIG_GCC_PLUGIN_STACKLEAK=y
    
      vmlinux.o: warning: objtool: do_syscall_64()+0x9: call to stackleak_track_stack() leaves .noinstr.text section
      vmlinux.o: warning: objtool: do_int80_syscall_32()+0x9: call to stackleak_track_stack() leaves .noinstr.text section
      vmlinux.o: warning: objtool: exc_general_protection()+0x22: call to stackleak_track_stack() leaves .noinstr.text section
      vmlinux.o: warning: objtool: fixup_bad_iret()+0x20: call to stackleak_track_stack() leaves .noinstr.text section
      vmlinux.o: warning: objtool: do_machine_check()+0x27: call to stackleak_track_stack() leaves .noinstr.text section
      vmlinux.o: warning: objtool: .text+0x5346e: call to stackleak_erase() leaves .noinstr.text section
      vmlinux.o: warning: objtool: .entry.text+0x143: call to stackleak_erase() leaves .noinstr.text section
      vmlinux.o: warning: objtool: .entry.text+0x10eb: call to stackleak_erase() leaves .noinstr.text section
      vmlinux.o: warning: objtool: .entry.text+0x17f9: call to stackleak_erase() leaves .noinstr.text section
    
    Note that the plugin's addition of calls to stackleak_track_stack() from
    noinstr functions is expected to be safe, as it isn't runtime
    instrumentation and is self-contained.
    
    Cc: Alexander Popov <alex.popov@linux.com>
    Suggested-by: Peter Zijlstra <peterz@infradead.org>
    Signed-off-by: Kees Cook <keescook@chromium.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    kees authored and torvalds committed Feb 4, 2022
  4. Merge tag 'net-5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel…

    …/git/netdev/net
    
    Pull networking fixes from Jakub Kicinski:
     "Including fixes from bpf, netfilter, and ieee802154.
    
      Current release - regressions:
    
       - Partially revert "net/smc: Add netlink net namespace support", fix
         uABI breakage
    
       - netfilter:
          - nft_ct: fix use after free when attaching zone template
          - nft_byteorder: track register operations
    
      Previous releases - regressions:
    
       - ipheth: fix EOVERFLOW in ipheth_rcvbulk_callback
    
       - phy: qca8081: fix speeds lower than 2.5Gb/s
    
       - sched: fix use-after-free in tc_new_tfilter()
    
      Previous releases - always broken:
    
       - tcp: fix mem under-charging with zerocopy sendmsg()
    
       - tcp: add missing tcp_skb_can_collapse() test in
         tcp_shift_skb_data()
    
       - neigh: do not trigger immediate probes on NUD_FAILED from
         neigh_managed_work, avoid a deadlock
    
       - bpf: use VM_MAP instead of VM_ALLOC for ringbuf, avoid KASAN
         false-positives
    
       - netfilter: nft_reject_bridge: fix for missing reply from prerouting
    
       - smc: forward wakeup to smc socket waitqueue after fallback
    
       - ieee802154:
          - return meaningful error codes from the netlink helpers
          - mcr20a: fix lifs/sifs periods
          - at86rf230, ca8210: stop leaking skbs on error paths
    
       - macsec: add missing un-offload call for NETDEV_UNREGISTER of parent
    
       - ax25: add refcount in ax25_dev to avoid UAF bugs
    
       - eth: mlx5e:
          - fix SFP module EEPROM query
          - fix broken SKB allocation in HW-GRO
          - IPsec offload: fix tunnel mode crypto for non-TCP/UDP flows
    
       - eth: amd-xgbe:
          - fix skb data length underflow
          - ensure reset of the tx_timer_active flag, avoid Tx timeouts
    
       - eth: stmmac: fix runtime pm use in stmmac_dvr_remove()
    
       - eth: e1000e: handshake with CSME starts from Alder Lake platforms"
    
    * tag 'net-5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (69 commits)
      ax25: fix reference count leaks of ax25_dev
      net: stmmac: ensure PTP time register reads are consistent
      net: ipa: request IPA register values be retained
      dt-bindings: net: qcom,ipa: add optional qcom,qmp property
      tools/resolve_btfids: Do not print any commands when building silently
      bpf: Use VM_MAP instead of VM_ALLOC for ringbuf
      net, neigh: Do not trigger immediate probes on NUD_FAILED from neigh_managed_work
      tcp: add missing tcp_skb_can_collapse() test in tcp_shift_skb_data()
      net: sparx5: do not refer to skb after passing it on
      Partially revert "net/smc: Add netlink net namespace support"
      net/mlx5e: Avoid field-overflowing memcpy()
      net/mlx5e: Use struct_group() for memcpy() region
      net/mlx5e: Avoid implicit modify hdr for decap drop rule
      net/mlx5e: IPsec: Fix tunnel mode crypto offload for non TCP/UDP traffic
      net/mlx5e: IPsec: Fix crypto offload for non TCP/UDP encapsulated traffic
      net/mlx5e: Don't treat small ceil values as unlimited in HTB offload
      net/mlx5: E-Switch, Fix uninitialized variable modact
      net/mlx5e: Fix handling of wrong devices during bond netevent
      net/mlx5e: Fix broken SKB allocation in HW-GRO
      net/mlx5e: Fix wrong calculation of header index in HW_GRO
      ...
    torvalds committed Feb 4, 2022
  5. Merge tag 'selinux-pr-20220203' of git://git.kernel.org/pub/scm/linux…

    …/kernel/git/pcmoore/selinux
    
    Pull selinux fix from Paul Moore:
     "One small SELinux patch to ensure that a policy structure field is
      properly reset after freeing so that we don't inadvertently do a
      double-free on certain error conditions"
    
    * tag 'selinux-pr-20220203' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
      selinux: fix double free of cond_list on error paths
    torvalds committed Feb 4, 2022
  6. Merge tag 'linux-kselftest-fixes-5.17-rc3' of git://git.kernel.org/pu…

    …b/scm/linux/kernel/git/shuah/linux-kselftest
    
    Pull Kselftest fixes from Shuah Khan:
     "Important fixes to several tests and documentation clarification on
      running mainline kselftest on stable releases. A few notable fixes:
    
       - fix kselftest run hang due to child processes that haven't been
         terminated. Fix signals all child processes
    
       - fix false pass/fail results from vdso_test_abi, openat2, mincore
    
       - build failures when using -j (multiple jobs) option
    
       - exec test build failure due to incorrect build rule for a run-time
         created "pipe"
    
       - zram test fixes related to interaction with zram-generator to make
         sure zram test to coordinate deleted with zram-generator
    
       - zram test compression ratio calculation fix and skipping
         max_comp_streams.
    
       - increasing rtc test timeout
    
       - cpufreq test to write test results to stdout which will necessary
         on automated test systems"
    
    * tag 'linux-kselftest-fixes-5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
      kselftest: Fix vdso_test_abi return status
      selftests: skip mincore.check_file_mmap when fs lacks needed support
      selftests: openat2: Skip testcases that fail with EOPNOTSUPP
      selftests: openat2: Add missing dependency in Makefile
      selftests: openat2: Print also errno in failure messages
      selftests: futex: Use variable MAKE instead of make
      selftests/exec: Remove pipe from TEST_GEN_FILES
      selftests/zram: Adapt the situation that /dev/zram0 is being used
      selftests/zram01.sh: Fix compression ratio calculation
      selftests/zram: Skip max_comp_streams interface on newer kernel
      docs/kselftest: clarify running mainline tests on stables
      kselftest: signal all child processes
      selftests: cpufreq: Write test output to stdout as well
      selftests: rtc: Increase test timeout so that all tests run
    torvalds committed Feb 4, 2022

Commits on Feb 3, 2022

  1. ax25: fix reference count leaks of ax25_dev

    The previous commit d01ffb9 ("ax25: add refcount in ax25_dev
    to avoid UAF bugs") introduces refcount into ax25_dev, but there
    are reference leak paths in ax25_ctl_ioctl(), ax25_fwd_ioctl(),
    ax25_rt_add(), ax25_rt_del() and ax25_rt_opt().
    
    This patch uses ax25_dev_put() and adjusts the position of
    ax25_addr_ax25dev() to fix reference cout leaks of ax25_dev.
    
    Fixes: d01ffb9 ("ax25: add refcount in ax25_dev to avoid UAF bugs")
    Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
    Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com>
    Link: https://lore.kernel.org/r/20220203150811.42256-1-duoming@zju.edu.cn
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Duoming Zhou authored and Jakub Kicinski committed Feb 3, 2022
  2. net: stmmac: ensure PTP time register reads are consistent

    Even if protected from preemption and interrupts, a small time window
    remains when the 2 register reads could return inconsistent values,
    each time the "seconds" register changes. This could lead to an about
    1-second error in the reported time.
    
    Add logic to ensure the "seconds" and "nanoseconds" values are consistent.
    
    Fixes: 92ba688 ("stmmac: add the support for PTP hw clock driver")
    Signed-off-by: Yannick Vignon <yannick.vignon@nxp.com>
    Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
    Link: https://lore.kernel.org/r/20220203160025.750632-1-yannick.vignon@oss.nxp.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Yackou authored and Jakub Kicinski committed Feb 3, 2022
  3. Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

    Daniel Borkmann says:
    
    ====================
    pull-request: bpf 2022-02-03
    
    We've added 6 non-merge commits during the last 10 day(s) which contain
    a total of 7 files changed, 11 insertions(+), 236 deletions(-).
    
    The main changes are:
    
    1) Fix BPF ringbuf to allocate its area with VM_MAP instead of VM_ALLOC
       flag which otherwise trips over KASAN, from Hou Tao.
    
    2) Fix unresolved symbol warning in resolve_btfids due to LSM callback
       rename, from Alexei Starovoitov.
    
    3) Fix a possible race in inc_misses_counter() when IRQ would trigger
       during counter update, from He Fengqing.
    
    4) Fix tooling infra for cross-building with clang upon probing whether
       gcc provides the standard libraries, from Jean-Philippe Brucker.
    
    5) Fix silent mode build for resolve_btfids, from Nathan Chancellor.
    
    6) Drop unneeded and outdated lirc.h header copy from tooling infra as
       BPF does not require it anymore, from Sean Young.
    
    * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
      tools/resolve_btfids: Do not print any commands when building silently
      bpf: Use VM_MAP instead of VM_ALLOC for ringbuf
      tools: Ignore errors from `which' when searching a GCC toolchain
      tools headers UAPI: remove stale lirc.h
      bpf: Fix possible race in inc_misses_counter
      bpf: Fix renaming task_getsecid_subj->current_getsecid_subj.
    ====================
    
    Link: https://lore.kernel.org/r/20220203155815.25689-1-daniel@iogearbox.net
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Jakub Kicinski committed Feb 3, 2022
  4. printk: Fix incorrect __user type in proc_dointvec_minmax_sysadmin()

    The move of proc_dointvec_minmax_sysadmin() from kernel/sysctl.c to
    kernel/printk/sysctl.c introduced an incorrect __user attribute to the
    buffer argument.  I spotted this change in [1] as well as the kernel
    test robot.  Revert this change to please sparse:
    
      kernel/printk/sysctl.c:20:51: warning: incorrect type in argument 3 (different address spaces)
      kernel/printk/sysctl.c:20:51:    expected void *
      kernel/printk/sysctl.c:20:51:    got void [noderef] __user *buffer
    
    Fixes: faaa357 ("printk: move printk sysctl to printk/sysctl.c")
    Link: https://lore.kernel.org/r/20220104155024.48023-2-mic@digikod.net [1]
    Reported-by: kernel test robot <lkp@intel.com>
    Cc: Andrew Morton <akpm@linux-foundation.org>
    Cc: John Ogness <john.ogness@linutronix.de>
    Cc: Luis Chamberlain <mcgrof@kernel.org>
    Cc: Petr Mladek <pmladek@suse.com>
    Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
    Cc: Steven Rostedt <rostedt@goodmis.org>
    Cc: Xiaoming Ni <nixiaoming@huawei.com>
    Signed-off-by: Mickaël Salaün <mic@linux.microsoft.com>
    Link: https://lore.kernel.org/r/20220203145029.272640-1-mic@digikod.net
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    l0kod authored and torvalds committed Feb 3, 2022
  5. Revert "module, async: async_synchronize_full() on module init iff as…

    …ync is used"
    
    This reverts commit 774a122.
    
    We need to finish all async code before the module init sequence is
    done.  In the reverted commit the PF_USED_ASYNC flag was added to mark a
    thread that called async_schedule().  Then the PF_USED_ASYNC flag was
    used to determine whether or not async_synchronize_full() needs to be
    invoked.  This works when modprobe thread is calling async_schedule(),
    but it does not work if module dispatches init code to a worker thread
    which then calls async_schedule().
    
    For example, PCI driver probing is invoked from a worker thread based on
    a node where device is attached:
    
    	if (cpu < nr_cpu_ids)
    		error = work_on_cpu(cpu, local_pci_probe, &ddi);
    	else
    		error = local_pci_probe(&ddi);
    
    We end up in a situation where a worker thread gets the PF_USED_ASYNC
    flag set instead of the modprobe thread.  As a result,
    async_synchronize_full() is not invoked and modprobe completes without
    waiting for the async code to finish.
    
    The issue was discovered while loading the pm80xx driver:
    (scsi_mod.scan=async)
    
    modprobe pm80xx                      worker
    ...
      do_init_module()
      ...
        pci_call_probe()
          work_on_cpu(local_pci_probe)
                                         local_pci_probe()
                                           pm8001_pci_probe()
                                             scsi_scan_host()
                                               async_schedule()
                                               worker->flags |= PF_USED_ASYNC;
                                         ...
          < return from worker >
      ...
      if (current->flags & PF_USED_ASYNC) <--- false
      	async_synchronize_full();
    
    Commit 21c3c5d ("block: don't request module during elevator init")
    fixed the deadlock issue which the reverted commit 774a122
    ("module, async: async_synchronize_full() on module init iff async is
    used") tried to fix.
    
    Since commit 0fdff3e ("async, kmod: warn on synchronous
    request_module() from async workers") synchronous module loading from
    async is not allowed.
    
    Given that the original deadlock issue is fixed and it is no longer
    allowed to call synchronous request_module() from async we can remove
    PF_USED_ASYNC flag to make module init consistently invoke
    async_synchronize_full() unless async module probe is requested.
    
    Signed-off-by: Igor Pylypiv <ipylypiv@google.com>
    Reviewed-by: Changyuan Lyu <changyuanl@google.com>
    Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
    Acked-by: Tejun Heo <tj@kernel.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    ipylypiv authored and torvalds committed Feb 3, 2022
  6. Merge branch 'for-5.17-fixes' of git://git.kernel.org/pub/scm/linux/k…

    …ernel/git/tj/cgroup
    
    Pull cgroup fixes from Tejun Heo:
    
     - Eric's fix for a long standing cgroup1 permission issue where it only
       checks for uid 0 instead of CAP which inadvertently allows
       unprivileged userns roots to modify release_agent userhelper
    
     - Fixes for the fallout from Waiman's recent cpuset work
    
    * 'for-5.17-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
      cgroup/cpuset: Fix "suspicious RCU usage" lockdep warning
      cgroup-v1: Require capabilities to set release_agent
      cpuset: Fix the bug that subpart_cpus updated wrongly in update_cpumask()
      cgroup/cpuset: Make child cpusets restrict parents on v1 hierarchy
    torvalds committed Feb 3, 2022
  7. Merge branch 'net-ipa-enable-register-retention'

    Alex Elder says:
    
    ====================
    net: ipa: enable register retention
    
    With runtime power management in place, we sometimes need to issue
    a command to enable retention of IPA register values before power
    collapse.  This requires a new Device Tree property, whose presence
    will also be used to signal that the command is required.
    ====================
    
    Link: https://lore.kernel.org/r/20220201150205.468403-1-elder@linaro.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Jakub Kicinski committed Feb 3, 2022
  8. net: ipa: request IPA register values be retained

    In some cases, the IPA hardware needs to request the always-on
    subsystem (AOSS) to coordinate with the IPA microcontroller to
    retain IPA register values at power collapse.  This is done by
    issuing a QMP request to the AOSS microcontroller.  A similar
    request ondoes that request.
    
    We must get and hold the "QMP" handle early, because we might get
    back EPROBE_DEFER for that.  But the actual request should be sent
    while we know the IPA clock is active, and when we know the
    microcontroller is operational.
    
    Fixes: 1aac309 ("net: ipa: use autosuspend")
    Signed-off-by: Alex Elder <elder@linaro.org>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    alexelder authored and Jakub Kicinski committed Feb 3, 2022
  9. dt-bindings: net: qcom,ipa: add optional qcom,qmp property

    For some systems, the IPA driver must make a request to ensure that
    its registers are retained across power collapse of the IPA hardware.
    On such systems, we'll use the existence of the "qcom,qmp" property
    as a signal that this request is required.
    
    Signed-off-by: Alex Elder <elder@linaro.org>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    alexelder authored and Jakub Kicinski committed Feb 3, 2022
  10. cgroup/cpuset: Fix "suspicious RCU usage" lockdep warning

    It was found that a "suspicious RCU usage" lockdep warning was issued
    with the rcu_read_lock() call in update_sibling_cpumasks().  It is
    because the update_cpumasks_hier() function may sleep. So we have
    to release the RCU lock, call update_cpumasks_hier() and reacquire
    it afterward.
    
    Also add a percpu_rwsem_assert_held() in update_sibling_cpumasks()
    instead of stating that in the comment.
    
    Fixes: 4716909 ("cpuset: Track cpusets that use parent's effective_cpus")
    Signed-off-by: Waiman Long <longman@redhat.com>
    Tested-by: Phil Auld <pauld@redhat.com>
    Reviewed-by: Phil Auld <pauld@redhat.com>
    Signed-off-by: Tejun Heo <tj@kernel.org>
    Waiman Long authored and htejun committed Feb 3, 2022
  11. tools/resolve_btfids: Do not print any commands when building silently

    When building with 'make -s', there is some output from resolve_btfids:
    
    $ make -sj"$(nproc)" oldconfig prepare
      MKDIR     .../tools/bpf/resolve_btfids/libbpf/
      MKDIR     .../tools/bpf/resolve_btfids//libsubcmd
      LINK     resolve_btfids
    
    Silent mode means that no information should be emitted about what is
    currently being done. Use the $(silent) variable from Makefile.include
    to avoid defining the msg macro so that there is no information printed.
    
    Fixes: fbbb68d ("bpf: Add resolve_btfids tool to resolve BTF IDs in ELF object")
    Signed-off-by: Nathan Chancellor <nathan@kernel.org>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Link: https://lore.kernel.org/bpf/20220201212503.731732-1-nathan@kernel.org
    nathanchance authored and borkmann committed Feb 3, 2022
  12. Revert "mm/gup: small refactoring: simplify try_grab_page()"

    This reverts commit 54d516b
    
    That commit did a refactoring that effectively combined fast and slow
    gup paths (again).  And that was again incorrect, for two reasons:
    
     a) Fast gup and slow gup get reference counts on pages in different
        ways and with different goals: see Linus' writeup in commit
        cd1adf1 ("Revert "mm/gup: remove try_get_page(), call
        try_get_compound_head() directly""), and
    
     b) try_grab_compound_head() also has a specific check for
        "FOLL_LONGTERM && !is_pinned(page)", that assumes that the caller
        can fall back to slow gup. This resulted in new failures, as
        recently report by Will McVicker [1].
    
    But (a) has problems too, even though they may not have been reported
    yet.  So just revert this.
    
    Link: https://lore.kernel.org/r/20220131203504.3458775-1-willmcvicker@google.com [1]
    Fixes: 54d516b ("mm/gup: small refactoring: simplify try_grab_page()")
    Reported-and-tested-by: Will McVicker <willmcvicker@google.com>
    Cc: Christoph Hellwig <hch@lst.de>
    Cc: Minchan Kim <minchan@google.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Christian Borntraeger <borntraeger@de.ibm.com>
    Cc: Heiko Carstens <hca@linux.ibm.com>
    Cc: Vasily Gorbik <gor@linux.ibm.com>
    Cc: stable@vger.kernel.org # 5.15
    Signed-off-by: John Hubbard <jhubbard@nvidia.com>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    johnhubbard authored and torvalds committed Feb 3, 2022
  13. Merge tag 'mips-fixes-5.17_2' of git://git.kernel.org/pub/scm/linux/k…

    …ernel/git/mips/linux
    
    Pull MIPS fixes from Thomas Bogendoerfer:
    
     - fix missed change for PTR->PTR_WD conversion
    
     - kernel-doc fixes
    
    * tag 'mips-fixes-5.17_2' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
      MIPS: KVM: fix vz.c kernel-doc notation
      MIPS: octeon: Fix missed PTR->PTR_WD conversion
    torvalds committed Feb 3, 2022
  14. bpf: Use VM_MAP instead of VM_ALLOC for ringbuf

    After commit 2fd3fb0 ("kasan, vmalloc: unpoison VM_ALLOC pages
    after mapping"), non-VM_ALLOC mappings will be marked as accessible
    in __get_vm_area_node() when KASAN is enabled. But now the flag for
    ringbuf area is VM_ALLOC, so KASAN will complain out-of-bound access
    after vmap() returns. Because the ringbuf area is created by mapping
    allocated pages, so use VM_MAP instead.
    
    After the change, info in /proc/vmallocinfo also changes from
      [start]-[end]   24576 ringbuf_map_alloc+0x171/0x290 vmalloc user
    to
      [start]-[end]   24576 ringbuf_map_alloc+0x171/0x290 vmap user
    
    Fixes: 457f443 ("bpf: Implement BPF ring buffer and verifier support for it")
    Reported-by: syzbot+5ad567a418794b9b5983@syzkaller.appspotmail.com
    Signed-off-by: Hou Tao <houtao1@huawei.com>
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Link: https://lore.kernel.org/bpf/20220202060158.6260-1-houtao1@huawei.com
    htbegin authored and anakryiko committed Feb 3, 2022
  15. net, neigh: Do not trigger immediate probes on NUD_FAILED from neigh_…

    …managed_work
    
    syzkaller was able to trigger a deadlock for NTF_MANAGED entries [0]:
    
      kworker/0:16/14617 is trying to acquire lock:
      ffffffff8d4dd370 (&tbl->lock){++-.}-{2:2}, at: ___neigh_create+0x9e1/0x2990 net/core/neighbour.c:652
      [...]
      but task is already holding lock:
      ffffffff8d4dd370 (&tbl->lock){++-.}-{2:2}, at: neigh_managed_work+0x35/0x250 net/core/neighbour.c:1572
    
    The neighbor entry turned to NUD_FAILED state, where __neigh_event_send()
    triggered an immediate probe as per commit cd28ca0 ("neigh: reduce
    arp latency") via neigh_probe() given table lock was held.
    
    One option to fix this situation is to defer the neigh_probe() back to
    the neigh_timer_handler() similarly as pre cd28ca0. For the case
    of NTF_MANAGED, this deferral is acceptable given this only happens on
    actual failure state and regular / expected state is NUD_VALID with the
    entry already present.
    
    The fix adds a parameter to __neigh_event_send() in order to communicate
    whether immediate probe is allowed or disallowed. Existing call-sites
    of neigh_event_send() default as-is to immediate probe. However, the
    neigh_managed_work() disables it via use of neigh_event_send_probe().
    
    [0] <TASK>
      __dump_stack lib/dump_stack.c:88 [inline]
      dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
      print_deadlock_bug kernel/locking/lockdep.c:2956 [inline]
      check_deadlock kernel/locking/lockdep.c:2999 [inline]
      validate_chain kernel/locking/lockdep.c:3788 [inline]
      __lock_acquire.cold+0x149/0x3ab kernel/locking/lockdep.c:5027
      lock_acquire kernel/locking/lockdep.c:5639 [inline]
      lock_acquire+0x1ab/0x510 kernel/locking/lockdep.c:5604
      __raw_write_lock_bh include/linux/rwlock_api_smp.h:202 [inline]
      _raw_write_lock_bh+0x2f/0x40 kernel/locking/spinlock.c:334
      ___neigh_create+0x9e1/0x2990 net/core/neighbour.c:652
      ip6_finish_output2+0x1070/0x14f0 net/ipv6/ip6_output.c:123
      __ip6_finish_output net/ipv6/ip6_output.c:191 [inline]
      __ip6_finish_output+0x61e/0xe90 net/ipv6/ip6_output.c:170
      ip6_finish_output+0x32/0x200 net/ipv6/ip6_output.c:201
      NF_HOOK_COND include/linux/netfilter.h:296 [inline]
      ip6_output+0x1e4/0x530 net/ipv6/ip6_output.c:224
      dst_output include/net/dst.h:451 [inline]
      NF_HOOK include/linux/netfilter.h:307 [inline]
      ndisc_send_skb+0xa99/0x17f0 net/ipv6/ndisc.c:508
      ndisc_send_ns+0x3a9/0x840 net/ipv6/ndisc.c:650
      ndisc_solicit+0x2cd/0x4f0 net/ipv6/ndisc.c:742
      neigh_probe+0xc2/0x110 net/core/neighbour.c:1040
      __neigh_event_send+0x37d/0x1570 net/core/neighbour.c:1201
      neigh_event_send include/net/neighbour.h:470 [inline]
      neigh_managed_work+0x162/0x250 net/core/neighbour.c:1574
      process_one_work+0x9ac/0x1650 kernel/workqueue.c:2307
      worker_thread+0x657/0x1110 kernel/workqueue.c:2454
      kthread+0x2e9/0x3a0 kernel/kthread.c:377
      ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
      </TASK>
    
    Fixes: 7482e38 ("net, neigh: Add NTF_MANAGED flag for managed neighbor entries")
    Reported-by: syzbot+5239d0e1778a500d477a@syzkaller.appspotmail.com
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Cc: Eric Dumazet <edumazet@google.com>
    Cc: Roopa Prabhu <roopa@nvidia.com>
    Tested-by: syzbot+5239d0e1778a500d477a@syzkaller.appspotmail.com
    Reviewed-by: David Ahern <dsahern@kernel.org>
    Link: https://lore.kernel.org/r/20220201193942.5055-1-daniel@iogearbox.net
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    borkmann authored and Jakub Kicinski committed Feb 3, 2022
  16. tcp: add missing tcp_skb_can_collapse() test in tcp_shift_skb_data()

    tcp_shift_skb_data() might collapse three packets into a larger one.
    
    P_A, P_B, P_C  -> P_ABC
    
    Historically, it used a single tcp_skb_can_collapse_to(P_A) call,
    because it was enough.
    
    In commit 8571248 ("tcp: coalesce/collapse must respect MPTCP extensions"),
    this call was replaced by a call to tcp_skb_can_collapse(P_A, P_B)
    
    But the now needed test over P_C has been missed.
    
    This probably broke MPTCP.
    
    Then later, commit 9b65b17 ("net: avoid double accounting for pure zerocopy skbs")
    added an extra condition to tcp_skb_can_collapse(), but the missing call
    from tcp_shift_skb_data() is also breaking TCP zerocopy, because P_A and P_C
    might have different skb_zcopy_pure() status.
    
    Fixes: 8571248 ("tcp: coalesce/collapse must respect MPTCP extensions")
    Fixes: 9b65b17 ("net: avoid double accounting for pure zerocopy skbs")
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Cc: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Cc: Talal Ahmad <talalahmad@google.com>
    Cc: Arjun Roy <arjunroy@google.com>
    Cc: Willem de Bruijn <willemb@google.com>
    Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
    Acked-by: Paolo Abeni <pabeni@redhat.com>
    Link: https://lore.kernel.org/r/20220201184640.756716-1-eric.dumazet@gmail.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    neebe000 authored and Jakub Kicinski committed Feb 3, 2022

Commits on Feb 2, 2022

  1. Merge tag 'nfsd-5.17-1' of git://git.kernel.org/pub/scm/linux/kernel/…

    …git/cel/linux
    
    Pull nfsd fixes from Chuck Lever:
     "Notable bug fixes:
    
       - Ensure SM_NOTIFY doesn't crash the NFS server host
    
       - Ensure NLM locks are cleaned up after client reboot
    
       - Fix a leak of internal NFSv4 lease information"
    
    * tag 'nfsd-5.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
      nfsd: nfsd4_setclientid_confirm mistakenly expires confirmed client.
      lockd: fix failure to cleanup client locks
      lockd: fix server crash on reboot of client holding lock
    torvalds committed Feb 2, 2022
  2. Merge tag 'fsnotify_for_v5.17-rc3' of git://git.kernel.org/pub/scm/li…

    …nux/kernel/git/jack/linux-fs
    
    Pull fanotify fix from Jan Kara:
     "Fix stale file descriptor in copy_event_to_user"
    
    * tag 'fsnotify_for_v5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
      fanotify: Fix stale file descriptor in copy_event_to_user()
    torvalds committed Feb 2, 2022
  3. Merge tag 'linux-kselftest-kunit-fixes-5.17-rc3' of git://git.kernel.…

    …org/pub/scm/linux/kernel/git/shuah/linux-kselftest
    
    Pull KUnit fixes from Shuah Khan:
     "A single fix to an error seen on qemu due to a missing import"
    
    * tag 'linux-kselftest-kunit-fixes-5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
      kunit: tool: Import missing importlib.abc
    torvalds committed Feb 2, 2022
  4. Merge tag 'pinctrl-v5.17-2' of git://git.kernel.org/pub/scm/linux/ker…

    …nel/git/linusw/linux-pinctrl
    
    Pull pin control fixes from Linus Walleij:
     "Most interesting and urgent is the Intel stuff affecting Chromebooks
      and laptops.
    
       - Fix up group name building on the Intel Thunderbay
    
       - Fix interrupt problems on the Intel Cherryview
    
       - Fix some pin data on the Sunxi H616
    
       - Fix up the CONFIG_PINCTRL_ST Kconfig sort order as noted during the
         merge window
    
       - Fix an unexpected interrupt problem on the Intel Sunrisepoint
    
       - Fix a glitch when updating IRQ flags on all Intel pin controllers
    
       - Revert a Zynqmp patch to unify the pin naming, let's find some
         better solution
    
       - Fix some error paths in the Broadcom BCM2835 driver
    
       - Fix a Kconfig problem pertaining to the BCM63XX drivers
    
       - Fix the regmap support in the Microchip SGPIO driver"
    
    * tag 'pinctrl-v5.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
      pinctrl: microchip-sgpio: Fix support for regmap
      pinctrl: bcm63xx: fix unmet dependency on REGMAP for GPIO_REGMAP
      pinctrl: bcm2835: Fix a few error paths
      pinctrl: zynqmp: Revert "Unify pin naming"
      pinctrl: intel: Fix a glitch when updating IRQ flags on a preconfigured line
      pinctrl: intel: fix unexpected interrupt
      pinctrl: Place correctly CONFIG_PINCTRL_ST in the Makefile
      pinctrl: sunxi: Fix H616 I2S3 pin data
      pinctrl: cherryview: Trigger hwirq0 for interrupt-lines without a mapping
      pinctrl: thunderbay: rework loops looking for groups names
      pinctrl: thunderbay: comment process of building functions a bit
    torvalds committed Feb 2, 2022
  5. net: sparx5: do not refer to skb after passing it on

    Do not try to use any SKB fields after the packet has been passed up in the
    receive stack.
    
    Reported-by: kernel test robot <lkp@intel.com>
    Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
    Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
    Link: https://lore.kernel.org/r/20220202083039.3774851-1-steen.hegelund@microchip.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    steen-hegelund-mchp authored and Jakub Kicinski committed Feb 2, 2022
  6. selinux: fix double free of cond_list on error paths

    On error path from cond_read_list() and duplicate_policydb_cond_list()
    the cond_list_destroy() gets called a second time in caller functions,
    resulting in NULL pointer deref.  Fix this by resetting the
    cond_list_len to 0 in cond_list_destroy(), making subsequent calls a
    noop.
    
    Also consistently reset the cond_list pointer to NULL after freeing.
    
    Cc: stable@vger.kernel.org
    Signed-off-by: Vratislav Bendel <vbendel@redhat.com>
    [PM: fix line lengths in the description]
    Signed-off-by: Paul Moore <paul@paul-moore.com>
    vbendel authored and pcmoore committed Feb 2, 2022
  7. Partially revert "net/smc: Add netlink net namespace support"

    The change of sizeof(struct smc_diag_linkinfo) by commit 79d39fc
    ("net/smc: Add netlink net namespace support") introduced an ABI
    regression: since struct smc_diag_lgrinfo contains an object of
    type "struct smc_diag_linkinfo", offset of all subsequent members
    of struct smc_diag_lgrinfo was changed by that change.
    
    As result, applications compiled with the old version
    of struct smc_diag_linkinfo will receive garbage in
    struct smc_diag_lgrinfo.role if the kernel implements
    this new version of struct smc_diag_linkinfo.
    
    Fix this regression by reverting the part of commit 79d39fc that
    changes struct smc_diag_linkinfo.  After all, there is SMC_GEN_NETLINK
    interface which is good enough, so there is probably no need to touch
    the smc_diag ABI in the first place.
    
    Fixes: 79d39fc ("net/smc: Add netlink net namespace support")
    Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
    Reviewed-by: Karsten Graul <kgraul@linux.ibm.com>
    Link: https://lore.kernel.org/r/20220202030904.GA9742@altlinux.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    ldv-alt authored and Jakub Kicinski committed Feb 2, 2022
  8. Merge tag 'mlx5-fixes-2022-02-01' of git://git.kernel.org/pub/scm/lin…

    …ux/kernel/git/saeed/linux
    
    Saeed Mahameed says:
    
    ====================
    mlx5 fixes 2022-02-01
    
    This series provides bug fixes to mlx5 driver.
    Please pull and let me know if there is any problem.
    
    Sorry about the long series, but I had to move the top two patches from
    net-next to net to help avoiding a build break when kspp branch is merged
    into linus-next on next merge window.
    ====================
    
    Signed-off-by: David S. Miller <davem@davemloft.net>
    davem330 committed Feb 2, 2022
  9. Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/…

    …tnguy/net-queue
    
    Tony Nguyen says:
    
    ====================
    Intel Wired LAN Driver Updates 2022-02-01
    
    This series contains updates to e1000e driver only.
    
    Sasha removes CSME handshake with TGL platform as this is not supported
    and is causing hardware unit hangs to be reported.
    
    * '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
      e1000e: Handshake with CSME starts from ADL platforms
      e1000e: Separate ADP board type from TGP
    ====================
    
    Link: https://lore.kernel.org/r/20220201173754.580305-1-anthony.l.nguyen@intel.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Jakub Kicinski committed Feb 2, 2022
Older