Skip to content
Permalink
Michal-Suchane…
Switch branches/tags

Commits on Jul 18, 2021

  1. libbpf: Remove from kernel tree.

    libbpf shipped by the kernel is outdated and has problems. Remove it.
    
    Current version of libbpf is available at
    
    https://github.com/libbpf/libbpf
    
    Link: https://lore.kernel.org/bpf/b07015ebd7bbadb06a95a5105d9f6b4ed5817b2f.camel@debian.org/
    Signed-off-by: Michal Suchanek <msuchanek@suse.de>
    hramrach authored and intel-lab-lkp committed Jul 18, 2021

Commits on Jul 16, 2021

  1. bpf, selftests: Add test cases for pointer alu from multiple paths

    Add several test cases for checking update_alu_sanitation_state() under
    multiple paths:
    
      # ./test_verifier
      [...]
      #1061/u map access: known scalar += value_ptr unknown vs const OK
      #1061/p map access: known scalar += value_ptr unknown vs const OK
      #1062/u map access: known scalar += value_ptr const vs unknown OK
      #1062/p map access: known scalar += value_ptr const vs unknown OK
      #1063/u map access: known scalar += value_ptr const vs const (ne) OK
      #1063/p map access: known scalar += value_ptr const vs const (ne) OK
      #1064/u map access: known scalar += value_ptr const vs const (eq) OK
      #1064/p map access: known scalar += value_ptr const vs const (eq) OK
      #1065/u map access: known scalar += value_ptr unknown vs unknown (eq) OK
      #1065/p map access: known scalar += value_ptr unknown vs unknown (eq) OK
      #1066/u map access: known scalar += value_ptr unknown vs unknown (lt) OK
      #1066/p map access: known scalar += value_ptr unknown vs unknown (lt) OK
      #1067/u map access: known scalar += value_ptr unknown vs unknown (gt) OK
      #1067/p map access: known scalar += value_ptr unknown vs unknown (gt) OK
      [...]
      Summary: 1762 PASSED, 0 SKIPPED, 0 FAILED
    
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Acked-by: Alexei Starovoitov <ast@kernel.org>
    borkmann committed Jul 16, 2021
  2. bpf: Fix pointer arithmetic mask tightening under state pruning

    In 7fedb63 ("bpf: Tighten speculative pointer arithmetic mask") we
    narrowed the offset mask for unprivileged pointer arithmetic in order to
    mitigate a corner case where in the speculative domain it is possible to
    advance, for example, the map value pointer by up to value_size-1 out-of-
    bounds in order to leak kernel memory via side-channel to user space.
    
    The verifier's state pruning for scalars leaves one corner case open
    where in the first verification path R_x holds an unknown scalar with an
    aux->alu_limit of e.g. 7, and in a second verification path that same
    register R_x, here denoted as R_x', holds an unknown scalar which has
    tighter bounds and would thus satisfy range_within(R_x, R_x') as well as
    tnum_in(R_x, R_x') for state pruning, yielding an aux->alu_limit of 3:
    Given the second path fits the register constraints for pruning, the final
    generated mask from aux->alu_limit will remain at 7. While technically
    not wrong for the non-speculative domain, it would however be possible
    to craft similar cases where the mask would be too wide as in 7fedb63.
    
    One way to fix it is to detect the presence of unknown scalar map pointer
    arithmetic and force a deeper search on unknown scalars to ensure that
    we do not run into a masking mismatch.
    
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Acked-by: Alexei Starovoitov <ast@kernel.org>
    borkmann committed Jul 16, 2021
  3. bpf: Remove superfluous aux sanitation on subprog rejection

    Follow-up to fe9a5ca ("bpf: Do not mark insn as seen under speculative
    path verification"). The sanitize_insn_aux_data() helper does not serve a
    particular purpose in today's code. The original intention for the helper
    was that if function-by-function verification fails, a given program would
    be cleared from temporary insn_aux_data[], and then its verification would
    be re-attempted in the context of the main program a second time.
    
    However, a failure in do_check_subprogs() will skip do_check_main() and
    propagate the error to the user instead, thus such situation can never occur.
    Given its interaction is not compatible to the Spectre v1 mitigation (due to
    comparing aux->seen with env->pass_cnt), just remove sanitize_insn_aux_data()
    to avoid future bugs in this area.
    
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Acked-by: Alexei Starovoitov <ast@kernel.org>
    borkmann committed Jul 16, 2021

Commits on Jul 15, 2021

  1. Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

    Andrii Nakryiko says:
    
    ====================
    pull-request: bpf 2021-07-15
    
    The following pull-request contains BPF updates for your *net* tree.
    
    We've added 9 non-merge commits during the last 5 day(s) which contain
    a total of 9 files changed, 37 insertions(+), 15 deletions(-).
    
    The main changes are:
    
    1) Fix NULL pointer dereference in BPF_TEST_RUN for BPF_XDP_DEVMAP and
       BPF_XDP_CPUMAP programs, from Xuan Zhuo.
    
    2) Fix use-after-free of net_device in XDP bpf_link, from Xuan Zhuo.
    
    3) Follow-up fix to subprog poke descriptor use-after-free problem, from
       Daniel Borkmann and John Fastabend.
    
    4) Fix out-of-range array access in s390 BPF JIT backend, from Colin Ian King.
    
    5) Fix memory leak in BPF sockmap, from John Fastabend.
    
    6) Fix for sockmap to prevent proc stats reporting bug, from John Fastabend
       and Jakub Sitnicki.
    
    7) Fix NULL pointer dereference in bpftool, from Tobias Klauser.
    
    8) AF_XDP documentation fixes, from Baruch Siach.
    ====================
    
    Signed-off-by: David S. Miller <davem@davemloft.net>
    davem330 committed Jul 15, 2021
  2. usb: hso: fix error handling code of hso_create_net_device

    The current error handling code of hso_create_net_device is
    hso_free_net_device, no matter which errors lead to. For example,
    WARNING in hso_free_net_device [1].
    
    Fix this by refactoring the error handling code of
    hso_create_net_device by handling different errors by different code.
    
    [1] https://syzkaller.appspot.com/bug?id=66eff8d49af1b28370ad342787413e35bbe76efe
    
    Reported-by: syzbot+44d53c7255bb1aea22d2@syzkaller.appspotmail.com
    Fixes: 5fcfb6d ("hso: fix bailout in error case of probe")
    Signed-off-by: Dongliang Mu <mudongliangabcd@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    mudongliang authored and davem330 committed Jul 15, 2021
  3. qed: fix possible unpaired spin_{un}lock_bh in _qed_mcp_cmd_and_union()

    Liajian reported a bug_on hit on a ThunderX2 arm64 server with FastLinQ
    QL41000 ethernet controller:
     BUG: scheduling while atomic: kworker/0:4/531/0x00000200
      [qed_probe:488()]hw prepare failed
      kernel BUG at mm/vmalloc.c:2355!
      Internal error: Oops - BUG: 0 [#1] SMP
      CPU: 0 PID: 531 Comm: kworker/0:4 Tainted: G W 5.4.0-77-generic torvalds#86-Ubuntu
      pstate: 00400009 (nzcv daif +PAN -UAO)
     Call trace:
      vunmap+0x4c/0x50
      iounmap+0x48/0x58
      qed_free_pci+0x60/0x80 [qed]
      qed_probe+0x35c/0x688 [qed]
      __qede_probe+0x88/0x5c8 [qede]
      qede_probe+0x60/0xe0 [qede]
      local_pci_probe+0x48/0xa0
      work_for_cpu_fn+0x24/0x38
      process_one_work+0x1d0/0x468
      worker_thread+0x238/0x4e0
      kthread+0xf0/0x118
      ret_from_fork+0x10/0x18
    
    In this case, qed_hw_prepare() returns error due to hw/fw error, but in
    theory work queue should be in process context instead of interrupt.
    
    The root cause might be the unpaired spin_{un}lock_bh() in
    _qed_mcp_cmd_and_union(), which causes botton half is disabled incorrectly.
    
    Reported-by: Lijian Zhang <Lijian.Zhang@arm.com>
    Signed-off-by: Jia He <justin.he@arm.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    justin-he authored and davem330 committed Jul 15, 2021
  4. net: fix uninit-value in caif_seqpkt_sendmsg

    When nr_segs equal to zero in iovec_from_user, the object
    msg->msg_iter.iov is uninit stack memory in caif_seqpkt_sendmsg
    which is defined in ___sys_sendmsg. So we cann't just judge
    msg->msg_iter.iov->base directlly. We can use nr_segs to judge
    msg in caif_seqpkt_sendmsg whether has data buffers.
    
    =====================================================
    BUG: KMSAN: uninit-value in caif_seqpkt_sendmsg+0x693/0xf60 net/caif/caif_socket.c:542
    Call Trace:
     __dump_stack lib/dump_stack.c:77 [inline]
     dump_stack+0x1c9/0x220 lib/dump_stack.c:118
     kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:118
     __msan_warning+0x58/0xa0 mm/kmsan/kmsan_instr.c:215
     caif_seqpkt_sendmsg+0x693/0xf60 net/caif/caif_socket.c:542
     sock_sendmsg_nosec net/socket.c:652 [inline]
     sock_sendmsg net/socket.c:672 [inline]
     ____sys_sendmsg+0x12b6/0x1350 net/socket.c:2343
     ___sys_sendmsg net/socket.c:2397 [inline]
     __sys_sendmmsg+0x808/0xc90 net/socket.c:2480
     __compat_sys_sendmmsg net/compat.c:656 [inline]
    
    Reported-by: syzbot+09a5d591c1f98cf5efcb@syzkaller.appspotmail.com
    Link: https://syzkaller.appspot.com/bug?id=1ace85e8fc9b0d5a45c08c2656c3e91762daa9b8
    Fixes: bece7b2 ("caif: Rewritten socket implementation")
    Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Ziyang Xuan authored and davem330 committed Jul 15, 2021
  5. bpftool: Check malloc return value in mount_bpffs_for_pin

    Fix and add a missing NULL check for the prior malloc() call.
    
    Fixes: 49a086c ("bpftool: implement prog load command")
    Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Reviewed-by: Quentin Monnet <quentin@isovalent.com>
    Acked-by: Roman Gushchin <guro@fb.com>
    Link: https://lore.kernel.org/bpf/20210715110609.29364-1-tklauser@distanz.ch
    tklauser authored and borkmann committed Jul 15, 2021
  6. bpf, sockmap, udp: sk_prot needs inuse_idx set for proc stats

    The proc socket stats use sk_prot->inuse_idx value to record inuse sock
    stats. We currently do not set this correctly from sockmap side. The
    result is reading sock stats '/proc/net/sockstat' gives incorrect values.
    The socket counter is incremented correctly, but because we don't set the
    counter correctly when we replace sk_prot we may omit the decrement.
    
    To get the correct inuse_idx value move the core_initcall that initializes
    the UDP proto handlers to late_initcall. This way it is initialized after
    UDP has the chance to assign the inuse_idx value from the register protocol
    handler.
    
    Fixes: edc6741 ("bpf: Add sockmap hooks for UDP sockets")
    Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Reviewed-by: Cong Wang <cong.wang@bytedance.com>
    Acked-by: John Fastabend <john.fastabend@gmail.com>
    Link: https://lore.kernel.org/bpf/20210714154750.528206-1-jakub@cloudflare.com
    jsitnicki authored and borkmann committed Jul 15, 2021
  7. bpf, sockmap, tcp: sk_prot needs inuse_idx set for proc stats

    The proc socket stats use sk_prot->inuse_idx value to record inuse sock
    stats. We currently do not set this correctly from sockmap side. The
    result is reading sock stats '/proc/net/sockstat' gives incorrect values.
    The socket counter is incremented correctly, but because we don't set the
    counter correctly when we replace sk_prot we may omit the decrement.
    
    To get the correct inuse_idx value move the core_initcall that initializes
    the TCP proto handlers to late_initcall. This way it is initialized after
    TCP has the chance to assign the inuse_idx value from the register protocol
    handler.
    
    Fixes: 604326b ("bpf, sockmap: convert to generic sk_msg interface")
    Suggested-by: Jakub Sitnicki <jakub@cloudflare.com>
    Signed-off-by: John Fastabend <john.fastabend@gmail.com>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Reviewed-by: Cong Wang <cong.wang@bytedance.com>
    Link: https://lore.kernel.org/bpf/20210712195546.423990-3-john.fastabend@gmail.com
    jrfastab authored and borkmann committed Jul 15, 2021
  8. bpf, sockmap: Fix potential memory leak on unlikely error case

    If skb_linearize is needed and fails we could leak a msg on the error
    handling. To fix ensure we kfree the msg block before returning error.
    Found during code review.
    
    Fixes: 4363023 ("bpf, sockmap: Avoid failures from skb_to_sgvec when skb has frag_list")
    Signed-off-by: John Fastabend <john.fastabend@gmail.com>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Reviewed-by: Cong Wang <cong.wang@bytedance.com>
    Link: https://lore.kernel.org/bpf/20210712195546.423990-2-john.fastabend@gmail.com
    jrfastab authored and borkmann committed Jul 15, 2021
  9. s390/bpf: Perform r1 range checking before accessing jit->seen_reg[r1]

    Currently array jit->seen_reg[r1] is being accessed before the range
    checking of index r1. The range changing on r1 should be performed
    first since it will avoid any potential out-of-range accesses on the
    array seen_reg[] and also it is more optimal to perform checks on r1
    before fetching data from the array. Fix this by swapping the order
    of the checks before the array access.
    
    Fixes: 0546231 ("s390/bpf: Add s390x eBPF JIT compiler backend")
    Signed-off-by: Colin Ian King <colin.king@canonical.com>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Tested-by: Ilya Leoshkevich <iii@linux.ibm.com>
    Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
    Link: https://lore.kernel.org/bpf/20210715125712.24690-1-colin.king@canonical.com
    Colin Ian King authored and borkmann committed Jul 15, 2021
  10. net_sched: introduce tracepoint trace_qdisc_enqueue()

    Tracepoint trace_qdisc_enqueue() is introduced to trace skb at
    the entrance of TC layer on TX side. This is similar to
    trace_qdisc_dequeue():
    
    1. For both we only trace successful cases. The failure cases
       can be traced via trace_kfree_skb().
    
    2. They are called at entrance or exit of TC layer, not for each
       ->enqueue() or ->dequeue(). This is intentional, because
       we want to make trace_qdisc_enqueue() symmetric to
       trace_qdisc_dequeue(), which is easier to use.
    
    The return value of qdisc_enqueue() is not interesting here,
    we have Qdisc's drop packets in ->dequeue(), it is impossible to
    trace them even if we have the return value, the only way to trace
    them is tracing kfree_skb().
    
    We only add information we need to trace ring buffer. If any other
    information is needed, it is easy to extend it without breaking ABI,
    see commit 3dd344e ("net: tracepoint: exposing sk_family in all
    tcp:tracepoints").
    
    Reviewed-by: Cong Wang <cong.wang@bytedance.com>
    Signed-off-by: Qitao Xu <qitao.xu@bytedance.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Qitao Xu authored and davem330 committed Jul 15, 2021
  11. net_sched: use %px to print skb address in trace_qdisc_dequeue()

    Print format of skbaddr is changed to %px from %p, because we want
    to use skb address as a quick way to identify a packet.
    
    Note, trace ring buffer is only accessible to privileged users,
    it is safe to use a real kernel address here.
    
    Reviewed-by: Cong Wang <cong.wang@bytedance.com>
    Signed-off-by: Qitao Xu <qitao.xu@bytedance.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Qitao Xu authored and davem330 committed Jul 15, 2021
  12. net: use %px to print skb address in trace_netif_receive_skb

    The print format of skb adress in tracepoint class net_dev_template
    is changed to %px from %p, because we want to use skb address
    as a quick way to identify a packet.
    
    Note, trace ring buffer is only accessible to privileged users,
    it is safe to use a real kernel address here.
    
    Reviewed-by: Cong Wang <cong.wang@bytedance.com>
    Signed-off-by: Qitao Xu <qitao.xu@bytedance.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Qitao Xu authored and davem330 committed Jul 15, 2021
  13. liquidio: Fix unintentional sign extension issue on left shift of u16

    Shifting the u16 integer oct->pcie_port by CN23XX_PKT_INPUT_CTL_MAC_NUM_POS
    (29) bits will be promoted to a 32 bit signed int and then sign-extended
    to a u64. In the cases where oct->pcie_port where bit 2 is set (e.g. 3..7)
    the shifted value will be sign extended and the top 32 bits of the result
    will be set.
    
    Fix this by casting the u16 values to a u64 before the 29 bit left shift.
    
    Addresses-Coverity: ("Unintended sign extension")
    
    Fixes: 3451b97 ("liquidio: CN23XX register setup")
    Signed-off-by: Colin Ian King <colin.king@canonical.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Colin Ian King authored and davem330 committed Jul 15, 2021
  14. net: dsa: mv88e6xxx: NET_DSA_MV88E6XXX_PTP should depend on NET_DSA_M…

    …V88E6XXX
    
    Making global2 support mandatory removed the Kconfig symbol
    NET_DSA_MV88E6XXX_GLOBAL2.  This symbol also served as an intermediate
    symbol to make NET_DSA_MV88E6XXX_PTP depend on NET_DSA_MV88E6XXX.  With
    the symbol removed, the user is always asked about PTP support for
    Marvell 88E6xxx switches, even if the latter support is not enabled.
    
    Fix this by reinstating the dependency.
    
    Fixes: 63368a7 ("net: dsa: mv88e6xxx: Make global2 support mandatory")
    Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
    Reviewed-by: Andrew Lunn <andrew@lunn.ch>
    Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    geertu authored and davem330 committed Jul 15, 2021

Commits on Jul 14, 2021

  1. Merge branch 'r8152-pm-fixxes'

    Takashi Iwai says:
    
    ====================
    r8152: Fix a couple of PM problems
    
    it seems that r8152 driver suffers from the deadlock at both runtime
    and system PM.  Formerly, it was seen more often at hibernation
    resume, but now it's triggered more frequently, as reported in SUSE
    Bugzilla:
      https://bugzilla.suse.com/show_bug.cgi?id=1186194
    
    While debugging the problem, I stumbled on a few obvious bugs and here
    is the results with two patches for addressing the resume problem.
    
    ***
    
    However, the story doesn't end here, unfortunately, and those patches
    don't seem sufficing.  The rest major problem is that the driver calls
    napi_disable() and napi_enable() in the PM suspend callbacks.  This
    makes the system stalling at (runtime-)suspend.  If we drop
    napi_disable() and napi_enable() calls in the PM suspend callbacks, it
    starts working (that was the result in Bugzilla comment 13):
      https://bugzilla.suse.com/show_bug.cgi?id=1186194#c13
    
    So, my patches aren't enough and we still need to investigate
    further.  It'd be appreciated if anyone can give a fix or a hint for
    more debugging.  The usage of napi_disable() at PM callbacks is unique
    in this driver and looks rather suspicious to me; but I'm no expert in
    this area so I might be wrong...
    ====================
    
    Signed-off-by: David S. Miller <davem@davemloft.net>
    davem330 committed Jul 14, 2021
  2. r8152: Fix a deadlock by doubly PM resume

    r8152 driver sets up the MAC address at reset-resume, while
    rtl8152_set_mac_address() has the temporary autopm get/put.  This may
    lead to a deadlock as the PM lock has been already taken for the
    execution of the runtime PM callback.
    
    This patch adds the workaround to avoid the superfluous autpm when
    called from rtl8152_reset_resume().
    
    Link: https://bugzilla.suse.com/show_bug.cgi?id=1186194
    Signed-off-by: Takashi Iwai <tiwai@suse.de>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    tiwai authored and davem330 committed Jul 14, 2021
  3. r8152: Fix potential PM refcount imbalance

    rtl8152_close() takes the refcount via usb_autopm_get_interface() but
    it doesn't release when RTL8152_UNPLUG test hits.  This may lead to
    the imbalance of PM refcount.  This patch addresses it.
    
    Link: https://bugzilla.suse.com/show_bug.cgi?id=1186194
    Signed-off-by: Takashi Iwai <tiwai@suse.de>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    tiwai authored and davem330 committed Jul 14, 2021
  4. Merge tag 'net-5.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel…

    …/git/netdev/net
    
    Pull networking fixes from Jakub Kicinski.
     "Including fixes from bpf and netfilter.
    
      Current release - regressions:
    
       - sock: fix parameter order in sock_setsockopt()
    
      Current release - new code bugs:
    
       - netfilter: nft_last:
           - fix incorrect arithmetic when restoring last used
           - honor NFTA_LAST_SET on restoration
    
      Previous releases - regressions:
    
       - udp: properly flush normal packet at GRO time
    
       - sfc: ensure correct number of XDP queues; don't allow enabling the
         feature if there isn't sufficient resources to Tx from any CPU
    
       - dsa: sja1105: fix address learning getting disabled on the CPU port
    
       - mptcp: addresses a rmem accounting issue that could keep packets in
         subflow receive buffers longer than necessary, delaying MPTCP-level
         ACKs
    
       - ip_tunnel: fix mtu calculation for ETHER tunnel devices
    
       - do not reuse skbs allocated from skbuff_fclone_cache in the napi
         skb cache, we'd try to return them to the wrong slab cache
    
       - tcp: consistently disable header prediction for mptcp
    
      Previous releases - always broken:
    
       - bpf: fix subprog poke descriptor tracking use-after-free
    
       - ipv6:
           - allocate enough headroom in ip6_finish_output2() in case
             iptables TEE is used
           - tcp: drop silly ICMPv6 packet too big messages to avoid
             expensive and pointless lookups (which may serve as a DDOS
             vector)
           - make sure fwmark is copied in SYNACK packets
           - fix 'disable_policy' for forwarded packets (align with IPv4)
    
       - netfilter: conntrack:
           - do not renew entry stuck in tcp SYN_SENT state
           - do not mark RST in the reply direction coming after SYN packet
             for an out-of-sync entry
    
       - mptcp: cleanly handle error conditions with MP_JOIN and syncookies
    
       - mptcp: fix double free when rejecting a join due to port mismatch
    
       - validate lwtstate->data before returning from skb_tunnel_info()
    
       - tcp: call sk_wmem_schedule before sk_mem_charge in zerocopy path
    
       - mt76: mt7921: continue to probe driver when fw already downloaded
    
       - bonding: fix multiple issues with offloading IPsec to (thru?) bond
    
       - stmmac: ptp: fix issues around Qbv support and setting time back
    
       - bcmgenet: always clear wake-up based on energy detection
    
      Misc:
    
       - sctp: move 198 addresses from unusable to private scope
    
       - ptp: support virtual clocks and timestamping
    
       - openvswitch: optimize operation for key comparison"
    
    * tag 'net-5.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (158 commits)
      net: dsa: properly check for the bridge_leave methods in dsa_switch_bridge_leave()
      sfc: add logs explaining XDP_TX/REDIRECT is not available
      sfc: ensure correct number of XDP queues
      sfc: fix lack of XDP TX queues - error XDP TX failed (-22)
      net: fddi: fix UAF in fza_probe
      net: dsa: sja1105: fix address learning getting disabled on the CPU port
      net: ocelot: fix switchdev objects synced for wrong netdev with LAG offload
      net: Use nlmsg_unicast() instead of netlink_unicast()
      octeontx2-pf: Fix uninitialized boolean variable pps
      ipv6: allocate enough headroom in ip6_finish_output2()
      net: hdlc: rename 'mod_init' & 'mod_exit' functions to be module-specific
      net: bridge: multicast: fix MRD advertisement router port marking race
      net: bridge: multicast: fix PIM hello router port marking race
      net: phy: marvell10g: fix differentiation of 88X3310 from 88X3340
      dsa: fix for_each_child.cocci warnings
      virtio_net: check virtqueue_add_sgs() return value
      mptcp: properly account bulk freed memory
      selftests: mptcp: fix case multiple subflows limited by server
      mptcp: avoid processing packet if a subflow reset
      mptcp: fix syncookie process if mptcp can not_accept new subflow
      ...
    torvalds committed Jul 14, 2021
  5. fs: add vfs_parse_fs_param_source() helper

    Add a simple helper that filesystems can use in their parameter parser
    to parse the "source" parameter. A few places open-coded this function
    and that already caused a bug in the cgroup v1 parser that we fixed.
    Let's make it harder to get this wrong by introducing a helper which
    performs all necessary checks.
    
    Link: https://syzkaller.appspot.com/bug?id=6312526aba5beae046fdae8f00399f87aab48b12
    Cc: Christoph Hellwig <hch@lst.de>
    Cc: Alexander Viro <viro@zeniv.linux.org.uk>
    Cc: Dmitry Vyukov <dvyukov@google.com>
    Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    brauner authored and torvalds committed Jul 14, 2021
  6. cgroup: verify that source is a string

    The following sequence can be used to trigger a UAF:
    
        int fscontext_fd = fsopen("cgroup");
        int fd_null = open("/dev/null, O_RDONLY);
        int fsconfig(fscontext_fd, FSCONFIG_SET_FD, "source", fd_null);
        close_range(3, ~0U, 0);
    
    The cgroup v1 specific fs parser expects a string for the "source"
    parameter.  However, it is perfectly legitimate to e.g.  specify a file
    descriptor for the "source" parameter.  The fs parser doesn't know what
    a filesystem allows there.  So it's a bug to assume that "source" is
    always of type fs_value_is_string when it can reasonably also be
    fs_value_is_file.
    
    This assumption in the cgroup code causes a UAF because struct
    fs_parameter uses a union for the actual value.  Access to that union is
    guarded by the param->type member.  Since the cgroup paramter parser
    didn't check param->type but unconditionally moved param->string into
    fc->source a close on the fscontext_fd would trigger a UAF during
    put_fs_context() which frees fc->source thereby freeing the file stashed
    in param->file causing a UAF during a close of the fd_null.
    
    Fix this by verifying that param->type is actually a string and report
    an error if not.
    
    In follow up patches I'll add a new generic helper that can be used here
    and by other filesystems instead of this error-prone copy-pasta fix.
    But fixing it in here first makes backporting a it to stable a lot
    easier.
    
    Fixes: 8d2451f ("cgroup1: switch to option-by-option parsing")
    Reported-by: syzbot+283ce5a46486d6acdbaf@syzkaller.appspotmail.com
    Cc: Christoph Hellwig <hch@lst.de>
    Cc: Alexander Viro <viro@zeniv.linux.org.uk>
    Cc: Dmitry Vyukov <dvyukov@google.com>
    Cc: <stable@kernel.org>
    Cc: syzkaller-bugs <syzkaller-bugs@googlegroups.com>
    Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    brauner authored and torvalds committed Jul 14, 2021

Commits on Jul 13, 2021

  1. net: dsa: properly check for the bridge_leave methods in dsa_switch_b…

    …ridge_leave()
    
    This was not caught because there is no switch driver which implements
    the .port_bridge_join but not .port_bridge_leave method, but it should
    nonetheless be fixed, as in certain conditions (driver development) it
    might lead to NULL pointer dereference.
    
    Fixes: f66a6a6 ("net: dsa: permit cross-chip bridging between all trees in the system")
    Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    vladimiroltean authored and davem330 committed Jul 13, 2021
  2. Merge tag 'vboxsf-v5.14-1' of git://git.kernel.org/pub/scm/linux/kern…

    …el/git/hansg/linux
    
    Pull vboxsf fixes from Hans de Goede:
     "This adds support for the atomic_open directory-inode op to vboxsf.
    
      Note this is not just an enhancement this also fixes an actual issue
      which users are hitting, see the commit message of the "boxsf: Add
      support for the atomic_open directory-inode" patch"
    
    * tag 'vboxsf-v5.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/hansg/linux:
      vboxsf: Add support for the atomic_open directory-inode op
      vboxsf: Add vboxsf_[create|release]_sf_handle() helpers
      vboxsf: Make vboxsf_dir_create() return the handle for the created file
      vboxsf: Honor excl flag to the dir-inode create op
    torvalds committed Jul 13, 2021
  3. Merge tag 'for-5.14-rc1-tag' of git://git.kernel.org/pub/scm/linux/ke…

    …rnel/git/kdave/linux
    
    Pull btrfs zoned mode fixes from David Sterba:
    
     - fix deadlock when allocating system chunk
    
     - fix wrong mutex unlock on an error path
    
     - fix extent map splitting for append operation
    
     - update and fix message reporting unusable chunk space
    
     - don't block when background zone reclaim runs with balance in
       parallel
    
    * tag 'for-5.14-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
      btrfs: zoned: fix wrong mutex unlock on failure to allocate log root tree
      btrfs: don't block if we can't acquire the reclaim lock
      btrfs: properly split extent_map for REQ_OP_ZONE_APPEND
      btrfs: rework chunk allocation to avoid exhaustion of the system chunk array
      btrfs: fix deadlock with concurrent chunk allocations involving system chunks
      btrfs: zoned: print unusable percentage when reclaiming block groups
      btrfs: zoned: fix types for u64 division in btrfs_reclaim_bgs_work
    torvalds committed Jul 13, 2021
  4. Merge branch 'sfc-tx-queues'

    Íñigo Huguet says:
    
    ====================
    sfc: Fix lack of XDP TX queues
    
    A change introduced in commit e26ca4b ("sfc: reduce the number of
    requested xdp ev queues") created a bug in XDP_TX and XDP_REDIRECT
    because it unintentionally reduced the number of XDP TX queues, letting
    not enough queues to have one per CPU, which leaded to errors if XDP
    TX/REDIRECT was done from a high numbered CPU.
    
    This patchs make the following changes:
    - Fix the bug mentioned above
    - Revert commit 99ba0ea ("sfc: adjust efx->xdp_tx_queue_count with
      the real number of initialized queues") which intended to fix a related
      problem, created by mentioned bug, but it's no longer necessary
    - Add a new error log message if there are not enough resources to make
      XDP_TX/REDIRECT work
    
    V1 -> V2: keep the calculation of how many tx queues can handle a single
    event queue, but apply the "max. tx queues per channel" upper limit.
    V2 -> V3: WARN_ON if the number of initialized XDP TXQs differs from the
    expected.
    ====================
    
    Signed-off-by: David S. Miller <davem@davemloft.net>
    davem330 committed Jul 13, 2021
  5. sfc: add logs explaining XDP_TX/REDIRECT is not available

    If it's not possible to allocate enough channels for XDP, XDP_TX and
    XDP_REDIRECT don't work. However, only a message saying that not enough
    channels were available was shown, but not saying what are the
    consequences in that case. The user didn't know if he/she can use XDP
    or not, if the performance is reduced, or what.
    
    Signed-off-by: Íñigo Huguet <ihuguet@redhat.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Íñigo Huguet authored and davem330 committed Jul 13, 2021
  6. sfc: ensure correct number of XDP queues

    Commit 99ba0ea ("sfc: adjust efx->xdp_tx_queue_count with the real
    number of initialized queues") intended to fix a problem caused by a
    round up when calculating the number of XDP channels and queues.
    However, this was not the real problem. The real problem was that the
    number of XDP TX queues had been reduced to half in
    commit e26ca4b ("sfc: reduce the number of requested xdp ev queues"),
    but the variable xdp_tx_queue_count had remained the same.
    
    Once the correct number of XDP TX queues is created again in the
    previous patch of this series, this also can be reverted since the error
    doesn't actually exist.
    
    Only in the case that there is a bug in the code we can have different
    values in xdp_queue_number and efx->xdp_tx_queue_count. Because of this,
    and per Edward Cree's suggestion, I add instead a WARN_ON to catch if it
    happens again in the future.
    
    Note that the number of allocated queues can be higher than the number
    of used ones due to the round up, as explained in the existing comment
    in the code. That's why we also have to stop increasing xdp_queue_number
    beyond efx->xdp_tx_queue_count.
    
    Signed-off-by: Íñigo Huguet <ihuguet@redhat.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Íñigo Huguet authored and davem330 committed Jul 13, 2021
  7. sfc: fix lack of XDP TX queues - error XDP TX failed (-22)

    Fixes: e26ca4b sfc: reduce the number of requested xdp ev queues
    
    The buggy commit intended to allocate less channels for XDP in order to
    be more unlikely to reach the limit of 32 channels of the driver.
    
    The idea was to use each IRQ/eventqeue for more XDP TX queues than
    before, calculating which is the maximum number of TX queues that one
    event queue can handle. For example, in EF10 each event queue could
    handle up to 8 queues, better than the 4 they were handling before the
    change. This way, it would have to allocate half of channels than before
    for XDP TX.
    
    The problem is that the TX queues are also contained inside the channel
    structs, and there are only 4 queues per channel. Reducing the number of
    channels means also reducing the number of queues, resulting in not
    having the desired number of 1 queue per CPU.
    
    This leads to getting errors on XDP_TX and XDP_REDIRECT if they're
    executed from a high numbered CPU, because there only exist queues for
    the low half of CPUs, actually. If XDP_TX/REDIRECT is executed in a low
    numbered CPU, the error doesn't happen. This is the error in the logs
    (repeated many times, even rate limited):
    sfc 0000:5e:00.0 ens3f0np0: XDP TX failed (-22)
    
    This errors happens in function efx_xdp_tx_buffers, where it expects to
    have a dedicated XDP TX queue per CPU.
    
    Reverting the change makes again more likely to reach the limit of 32
    channels in machines with many CPUs. If this happen, no XDP_TX/REDIRECT
    will be possible at all, and we will have this log error messages:
    
    At interface probe:
    sfc 0000:5e:00.0: Insufficient resources for 12 XDP event queues (24 other channels, max 32)
    
    At every subsequent XDP_TX/REDIRECT failure, rate limited:
    sfc 0000:5e:00.0 ens3f0np0: XDP TX failed (-22)
    
    However, without reverting the change, it makes the user to think that
    everything is OK at probe time, but later it fails in an unpredictable
    way, depending on the CPU that handles the packet.
    
    It is better to restore the predictable behaviour. If the user sees the
    error message at probe time, he/she can try to configure the best way it
    fits his/her needs. At least, he/she will have 2 options:
    - Accept that XDP_TX/REDIRECT is not available (he/she may not need it)
    - Load sfc module with modparam 'rss_cpus' with a lower number, thus
      creating less normal RX queues/channels, letting more free resources
      for XDP, with some performance penalty.
    
    Anyway, let the calculation of maximum TX queues that can be handled by
    a single event queue, and use it only if it's less than the number of TX
    queues per channel. This doesn't happen in practice, but could happen if
    some constant values are tweaked in the future, such us
    EFX_MAX_TXQ_PER_CHANNEL, EFX_MAX_EVQ_SIZE or EFX_MAX_DMAQ_SIZE.
    
    Related mailing list thread:
    https://lore.kernel.org/bpf/20201215104327.2be76156@carbon/
    
    Signed-off-by: Íñigo Huguet <ihuguet@redhat.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Íñigo Huguet authored and davem330 committed Jul 13, 2021
  8. net: fddi: fix UAF in fza_probe

    fp is netdev private data and it cannot be
    used after free_netdev() call. Using fp after free_netdev()
    can cause UAF bug. Fix it by moving free_netdev() after error message.
    
    Fixes: 61414f5 ("FDDI: defza: Add support for DEC FDDIcontroller 700
    TURBOchannel adapter")
    Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    pskrgag authored and davem330 committed Jul 13, 2021
  9. net: dsa: sja1105: fix address learning getting disabled on the CPU port

    In May 2019 when commit 640f763 ("net: dsa: sja1105: Add support
    for Spanning Tree Protocol") was introduced, the comment that "STP does
    not get called for the CPU port" was true. This changed after commit
    0394a63 ("net: dsa: enable and disable all ports") in August 2019
    and went largely unnoticed, because the sja1105_bridge_stp_state_set()
    method did nothing different compared to the static setup done by
    sja1105_init_mac_settings().
    
    With the ability to turn address learning off introduced by the blamed
    commit, there is a new priv->learn_ena port mask in the driver. When
    sja1105_bridge_stp_state_set() gets called and we are in
    BR_STATE_LEARNING or later, address learning is enabled or not depending
    on priv->learn_ena & BIT(port).
    
    So what happens is that priv->learn_ena is not being set from anywhere
    for the CPU port, and the static configuration done by
    sja1105_init_mac_settings() is being overwritten.
    
    To solve this, acknowledge that the static configuration of STP state is
    no longer necessary because the STP state is being set by the DSA core
    now, but what is necessary is to set priv->learn_ena for the CPU port.
    
    Fixes: 4d94235 ("net: dsa: sja1105: offload bridge port flags to device")
    Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    vladimiroltean authored and davem330 committed Jul 13, 2021
  10. net: ocelot: fix switchdev objects synced for wrong netdev with LAG o…

    …ffload
    
    The point with a *dev and a *brport_dev is that when we have a LAG net
    device that is a bridge port, *dev is an ocelot net device and
    *brport_dev is the bonding/team net device. The ocelot net device
    beneath the LAG does not exist from the bridge's perspective, so we need
    to sync the switchdev objects belonging to the brport_dev and not to the
    dev.
    
    Fixes: e4bd44e ("net: ocelot: replay switchdev events when joining bridge")
    Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    vladimiroltean authored and davem330 committed Jul 13, 2021
  11. net: Use nlmsg_unicast() instead of netlink_unicast()

    It has 'if (err >0 )' statement in nlmsg_unicast(), so use nlmsg_unicast()
    instead of netlink_unicast(), this looks more concise.
    
    v2: remove the change in netfilter.
    
    Signed-off-by: Yajun Deng <yajun.deng@linux.dev>
    Reviewed-by: David Ahern <dsahern@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Yajun Deng authored and davem330 committed Jul 13, 2021
Older