Skip to content
Permalink
Paolo-Abeni/sk…
Switch branches/tags

Commits on Jul 21, 2021

  1. sk_buff: access secmark via getter/setter

    So we can track the field status and move it after tail.
    
    After this commit the skb lifecycle for simple cases (no ct, no secmark,
    no vlan, no UDP tunnel) uses 3 cacheline instead of 4 cachelines required
    before this series.
    
    e.g. GRO for non vlan traffic will consistently uses 3 cacheline for
    each packet.
    
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Paolo Abeni authored and intel-lab-lkp committed Jul 21, 2021
  2. sk_buff: move vlan field after tail.

    Such field validity is already tracked by the existing
    'vlan_present' bit. Move them after tail and conditinally copy
    as needed.
    
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Paolo Abeni authored and intel-lab-lkp committed Jul 21, 2021
  3. sk_buff: move inner header fields after tail

    all the inner header fields are valid only if the 'encaspulation'
    flag is set, and the relevant fields are always initialized when
    the field is set: we don't need to initialize them at skb allocation
    time
    
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Paolo Abeni authored and intel-lab-lkp committed Jul 21, 2021
  4. veth: use skb_prepare_for_gro()

    Leveraging the previous patch we can now avoid orphaning the
    skb in the veth gro path, allowing correct backpressure.
    
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Paolo Abeni authored and intel-lab-lkp committed Jul 21, 2021
  5. skbuff: introduce has_sk state bit.

    This change leverages the infrastructure introduced by the previous
    patches to allow soft devices passing to the GRO engine owned skbs
    without impacting the fast-path.
    
    It's up to the GRO caller ensuring the bit validity before
    invoking the GRO engine with the new helper skb_prepare_for_gro().
    
    If the bit is set only skb with equal sk will be aggregated.
    Additionally, skb truesize on GRO recycle and free is correctly
    updated so that sk wmem is not changed by the GRO processing.
    
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Paolo Abeni authored and intel-lab-lkp committed Jul 21, 2021
  6. net: optimize GRO for the common case.

    After the previous patches, at GRO time, skb->_state is
    usually 0, unless the packets comes from some H/W offload
    slowpath or tunnel without rx checksum offload.
    
    We can optimize the GRO code assuming !skb->_state is likely.
    This remove multiple conditionals in the fast-path, at the
    price of an additional one when we hit the above "slow-paths".
    
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Paolo Abeni authored and intel-lab-lkp committed Jul 21, 2021
  7. sk_buff: move the active_extensions into the state bitfield

    No functional change intended
    
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Paolo Abeni authored and intel-lab-lkp committed Jul 21, 2021
  8. sk_buff: track dst status in skb->_state

    Similar to the previous patch, covering the dst field,
    but limited to tracking only the dst status.
    
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Paolo Abeni authored and intel-lab-lkp committed Jul 21, 2021
  9. sk_buff: track nfct status in newly added skb->_state

    so that we can skip initizialzing such field at skb
    allocation and move such field after 'tail'.
    
    _state uses one byte hole in the header section.
    
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Paolo Abeni authored and intel-lab-lkp committed Jul 21, 2021

Commits on Jul 17, 2021

  1. netfilter: nf_tables: fix audit memory leak in nf_tables_commit

    In nf_tables_commit, if nf_tables_commit_audit_alloc fails, it does not
    free the adp variable.
    
    Fix this by adding nf_tables_commit_audit_free which frees
    the linked list with the head node adl.
    
    backtrace:
      kmalloc include/linux/slab.h:591 [inline]
      kzalloc include/linux/slab.h:721 [inline]
      nf_tables_commit_audit_alloc net/netfilter/nf_tables_api.c:8439 [inline]
      nf_tables_commit+0x16e/0x1760 net/netfilter/nf_tables_api.c:8508
      nfnetlink_rcv_batch+0x512/0xa80 net/netfilter/nfnetlink.c:562
      nfnetlink_rcv_skb_batch net/netfilter/nfnetlink.c:634 [inline]
      nfnetlink_rcv+0x1fa/0x220 net/netfilter/nfnetlink.c:652
      netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline]
      netlink_unicast+0x2c7/0x3e0 net/netlink/af_netlink.c:1340
      netlink_sendmsg+0x36b/0x6b0 net/netlink/af_netlink.c:1929
      sock_sendmsg_nosec net/socket.c:702 [inline]
      sock_sendmsg+0x56/0x80 net/socket.c:722
    
    Reported-by: syzbot <syzkaller@googlegroups.com>
    Reported-by: kernel test robot <lkp@intel.com>
    Fixes: c520292 ("audit: log nftables configuration change events once per table")
    Signed-off-by: Dongliang Mu <mudongliangabcd@gmail.com>
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
    mudongliang authored and ummakynes committed Jul 17, 2021

Commits on Jul 16, 2021

  1. net: decnet: Fix sleeping inside in af_decnet

    The release_sock() is blocking function, it would change the state
    after sleeping. use wait_woken() instead.
    
    Fixes: 1da177e ("Linux-2.6.12-rc2")
    Signed-off-by: Yajun Deng <yajun.deng@linux.dev>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Yajun Deng authored and davem330 committed Jul 16, 2021
  2. mt7530 fix mt7530_fdb_write vid missing ivl bit

    According to reference guides mt7530 (mt7620) and mt7531:
    
    NOTE: When IVL is reset, MAC[47:0] and FID[2:0] will be used to
    read/write the address table. When IVL is set, MAC[47:0] and CVID[11:0]
    will be used to read/write the address table.
    
    Since the function only fills in CVID and no FID, we need to set the
    IVL bit. The existing code does not set it.
    
    This is a fix for the issue I dropped here earlier:
    
    http://lists.infradead.org/pipermail/linux-mediatek/2021-June/025697.html
    
    With this patch, it is now possible to delete the 'self' fdb entry
    manually. However, wifi roaming still has the same issue, the entry
    does not get deleted automatically. Wifi roaming also needs a fix
    somewhere else to function correctly in combination with vlan.
    
    Signed-off-by: Eric Woudstra <ericwouds@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    ericwoud authored and davem330 committed Jul 16, 2021
  3. skbuff: Fix a potential race while recycling page_pool packets

    As Alexander points out, when we are trying to recycle a cloned/expanded
    SKB we might trigger a race.  The recycling code relies on the
    pp_recycle bit to trigger,  which we carry over to cloned SKBs.
    If that cloned SKB gets expanded or if we get references to the frags,
    call skb_release_data() and overwrite skb->head, we are creating separate
    instances accessing the same page frags.  Since the skb_release_data()
    will first try to recycle the frags,  there's a potential race between
    the original and cloned SKB, since both will have the pp_recycle bit set.
    
    Fix this by explicitly those SKBs not recyclable.
    The atomic_sub_return effectively limits us to a single release case,
    and when we are calling skb_release_data we are also releasing the
    option to perform the recycling, or releasing the pages from the page pool.
    
    Fixes: 6a5bcd8 ("page_pool: Allow drivers to hint on SKB recycling")
    Reported-by: Alexander Duyck <alexanderduyck@fb.com>
    Suggested-by: Alexander Duyck <alexanderduyck@fb.com>
    Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
    Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
    Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    apalos authored and davem330 committed Jul 16, 2021

Commits on Jul 15, 2021

  1. Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

    Andrii Nakryiko says:
    
    ====================
    pull-request: bpf 2021-07-15
    
    The following pull-request contains BPF updates for your *net* tree.
    
    We've added 9 non-merge commits during the last 5 day(s) which contain
    a total of 9 files changed, 37 insertions(+), 15 deletions(-).
    
    The main changes are:
    
    1) Fix NULL pointer dereference in BPF_TEST_RUN for BPF_XDP_DEVMAP and
       BPF_XDP_CPUMAP programs, from Xuan Zhuo.
    
    2) Fix use-after-free of net_device in XDP bpf_link, from Xuan Zhuo.
    
    3) Follow-up fix to subprog poke descriptor use-after-free problem, from
       Daniel Borkmann and John Fastabend.
    
    4) Fix out-of-range array access in s390 BPF JIT backend, from Colin Ian King.
    
    5) Fix memory leak in BPF sockmap, from John Fastabend.
    
    6) Fix for sockmap to prevent proc stats reporting bug, from John Fastabend
       and Jakub Sitnicki.
    
    7) Fix NULL pointer dereference in bpftool, from Tobias Klauser.
    
    8) AF_XDP documentation fixes, from Baruch Siach.
    ====================
    
    Signed-off-by: David S. Miller <davem@davemloft.net>
    davem330 committed Jul 15, 2021
  2. usb: hso: fix error handling code of hso_create_net_device

    The current error handling code of hso_create_net_device is
    hso_free_net_device, no matter which errors lead to. For example,
    WARNING in hso_free_net_device [1].
    
    Fix this by refactoring the error handling code of
    hso_create_net_device by handling different errors by different code.
    
    [1] https://syzkaller.appspot.com/bug?id=66eff8d49af1b28370ad342787413e35bbe76efe
    
    Reported-by: syzbot+44d53c7255bb1aea22d2@syzkaller.appspotmail.com
    Fixes: 5fcfb6d ("hso: fix bailout in error case of probe")
    Signed-off-by: Dongliang Mu <mudongliangabcd@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    mudongliang authored and davem330 committed Jul 15, 2021
  3. qed: fix possible unpaired spin_{un}lock_bh in _qed_mcp_cmd_and_union()

    Liajian reported a bug_on hit on a ThunderX2 arm64 server with FastLinQ
    QL41000 ethernet controller:
     BUG: scheduling while atomic: kworker/0:4/531/0x00000200
      [qed_probe:488()]hw prepare failed
      kernel BUG at mm/vmalloc.c:2355!
      Internal error: Oops - BUG: 0 [#1] SMP
      CPU: 0 PID: 531 Comm: kworker/0:4 Tainted: G W 5.4.0-77-generic torvalds#86-Ubuntu
      pstate: 00400009 (nzcv daif +PAN -UAO)
     Call trace:
      vunmap+0x4c/0x50
      iounmap+0x48/0x58
      qed_free_pci+0x60/0x80 [qed]
      qed_probe+0x35c/0x688 [qed]
      __qede_probe+0x88/0x5c8 [qede]
      qede_probe+0x60/0xe0 [qede]
      local_pci_probe+0x48/0xa0
      work_for_cpu_fn+0x24/0x38
      process_one_work+0x1d0/0x468
      worker_thread+0x238/0x4e0
      kthread+0xf0/0x118
      ret_from_fork+0x10/0x18
    
    In this case, qed_hw_prepare() returns error due to hw/fw error, but in
    theory work queue should be in process context instead of interrupt.
    
    The root cause might be the unpaired spin_{un}lock_bh() in
    _qed_mcp_cmd_and_union(), which causes botton half is disabled incorrectly.
    
    Reported-by: Lijian Zhang <Lijian.Zhang@arm.com>
    Signed-off-by: Jia He <justin.he@arm.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    justin-he authored and davem330 committed Jul 15, 2021
  4. net: fix uninit-value in caif_seqpkt_sendmsg

    When nr_segs equal to zero in iovec_from_user, the object
    msg->msg_iter.iov is uninit stack memory in caif_seqpkt_sendmsg
    which is defined in ___sys_sendmsg. So we cann't just judge
    msg->msg_iter.iov->base directlly. We can use nr_segs to judge
    msg in caif_seqpkt_sendmsg whether has data buffers.
    
    =====================================================
    BUG: KMSAN: uninit-value in caif_seqpkt_sendmsg+0x693/0xf60 net/caif/caif_socket.c:542
    Call Trace:
     __dump_stack lib/dump_stack.c:77 [inline]
     dump_stack+0x1c9/0x220 lib/dump_stack.c:118
     kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:118
     __msan_warning+0x58/0xa0 mm/kmsan/kmsan_instr.c:215
     caif_seqpkt_sendmsg+0x693/0xf60 net/caif/caif_socket.c:542
     sock_sendmsg_nosec net/socket.c:652 [inline]
     sock_sendmsg net/socket.c:672 [inline]
     ____sys_sendmsg+0x12b6/0x1350 net/socket.c:2343
     ___sys_sendmsg net/socket.c:2397 [inline]
     __sys_sendmmsg+0x808/0xc90 net/socket.c:2480
     __compat_sys_sendmmsg net/compat.c:656 [inline]
    
    Reported-by: syzbot+09a5d591c1f98cf5efcb@syzkaller.appspotmail.com
    Link: https://syzkaller.appspot.com/bug?id=1ace85e8fc9b0d5a45c08c2656c3e91762daa9b8
    Fixes: bece7b2 ("caif: Rewritten socket implementation")
    Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Ziyang Xuan authored and davem330 committed Jul 15, 2021
  5. bpftool: Check malloc return value in mount_bpffs_for_pin

    Fix and add a missing NULL check for the prior malloc() call.
    
    Fixes: 49a086c ("bpftool: implement prog load command")
    Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Reviewed-by: Quentin Monnet <quentin@isovalent.com>
    Acked-by: Roman Gushchin <guro@fb.com>
    Link: https://lore.kernel.org/bpf/20210715110609.29364-1-tklauser@distanz.ch
    tklauser authored and borkmann committed Jul 15, 2021
  6. bpf, sockmap, udp: sk_prot needs inuse_idx set for proc stats

    The proc socket stats use sk_prot->inuse_idx value to record inuse sock
    stats. We currently do not set this correctly from sockmap side. The
    result is reading sock stats '/proc/net/sockstat' gives incorrect values.
    The socket counter is incremented correctly, but because we don't set the
    counter correctly when we replace sk_prot we may omit the decrement.
    
    To get the correct inuse_idx value move the core_initcall that initializes
    the UDP proto handlers to late_initcall. This way it is initialized after
    UDP has the chance to assign the inuse_idx value from the register protocol
    handler.
    
    Fixes: edc6741 ("bpf: Add sockmap hooks for UDP sockets")
    Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Reviewed-by: Cong Wang <cong.wang@bytedance.com>
    Acked-by: John Fastabend <john.fastabend@gmail.com>
    Link: https://lore.kernel.org/bpf/20210714154750.528206-1-jakub@cloudflare.com
    jsitnicki authored and borkmann committed Jul 15, 2021
  7. bpf, sockmap, tcp: sk_prot needs inuse_idx set for proc stats

    The proc socket stats use sk_prot->inuse_idx value to record inuse sock
    stats. We currently do not set this correctly from sockmap side. The
    result is reading sock stats '/proc/net/sockstat' gives incorrect values.
    The socket counter is incremented correctly, but because we don't set the
    counter correctly when we replace sk_prot we may omit the decrement.
    
    To get the correct inuse_idx value move the core_initcall that initializes
    the TCP proto handlers to late_initcall. This way it is initialized after
    TCP has the chance to assign the inuse_idx value from the register protocol
    handler.
    
    Fixes: 604326b ("bpf, sockmap: convert to generic sk_msg interface")
    Suggested-by: Jakub Sitnicki <jakub@cloudflare.com>
    Signed-off-by: John Fastabend <john.fastabend@gmail.com>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Reviewed-by: Cong Wang <cong.wang@bytedance.com>
    Link: https://lore.kernel.org/bpf/20210712195546.423990-3-john.fastabend@gmail.com
    jrfastab authored and borkmann committed Jul 15, 2021
  8. bpf, sockmap: Fix potential memory leak on unlikely error case

    If skb_linearize is needed and fails we could leak a msg on the error
    handling. To fix ensure we kfree the msg block before returning error.
    Found during code review.
    
    Fixes: 4363023 ("bpf, sockmap: Avoid failures from skb_to_sgvec when skb has frag_list")
    Signed-off-by: John Fastabend <john.fastabend@gmail.com>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Reviewed-by: Cong Wang <cong.wang@bytedance.com>
    Link: https://lore.kernel.org/bpf/20210712195546.423990-2-john.fastabend@gmail.com
    jrfastab authored and borkmann committed Jul 15, 2021
  9. s390/bpf: Perform r1 range checking before accessing jit->seen_reg[r1]

    Currently array jit->seen_reg[r1] is being accessed before the range
    checking of index r1. The range changing on r1 should be performed
    first since it will avoid any potential out-of-range accesses on the
    array seen_reg[] and also it is more optimal to perform checks on r1
    before fetching data from the array. Fix this by swapping the order
    of the checks before the array access.
    
    Fixes: 0546231 ("s390/bpf: Add s390x eBPF JIT compiler backend")
    Signed-off-by: Colin Ian King <colin.king@canonical.com>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Tested-by: Ilya Leoshkevich <iii@linux.ibm.com>
    Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
    Link: https://lore.kernel.org/bpf/20210715125712.24690-1-colin.king@canonical.com
    Colin Ian King authored and borkmann committed Jul 15, 2021
  10. net_sched: introduce tracepoint trace_qdisc_enqueue()

    Tracepoint trace_qdisc_enqueue() is introduced to trace skb at
    the entrance of TC layer on TX side. This is similar to
    trace_qdisc_dequeue():
    
    1. For both we only trace successful cases. The failure cases
       can be traced via trace_kfree_skb().
    
    2. They are called at entrance or exit of TC layer, not for each
       ->enqueue() or ->dequeue(). This is intentional, because
       we want to make trace_qdisc_enqueue() symmetric to
       trace_qdisc_dequeue(), which is easier to use.
    
    The return value of qdisc_enqueue() is not interesting here,
    we have Qdisc's drop packets in ->dequeue(), it is impossible to
    trace them even if we have the return value, the only way to trace
    them is tracing kfree_skb().
    
    We only add information we need to trace ring buffer. If any other
    information is needed, it is easy to extend it without breaking ABI,
    see commit 3dd344e ("net: tracepoint: exposing sk_family in all
    tcp:tracepoints").
    
    Reviewed-by: Cong Wang <cong.wang@bytedance.com>
    Signed-off-by: Qitao Xu <qitao.xu@bytedance.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Qitao Xu authored and davem330 committed Jul 15, 2021
  11. net_sched: use %px to print skb address in trace_qdisc_dequeue()

    Print format of skbaddr is changed to %px from %p, because we want
    to use skb address as a quick way to identify a packet.
    
    Note, trace ring buffer is only accessible to privileged users,
    it is safe to use a real kernel address here.
    
    Reviewed-by: Cong Wang <cong.wang@bytedance.com>
    Signed-off-by: Qitao Xu <qitao.xu@bytedance.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Qitao Xu authored and davem330 committed Jul 15, 2021
  12. net: use %px to print skb address in trace_netif_receive_skb

    The print format of skb adress in tracepoint class net_dev_template
    is changed to %px from %p, because we want to use skb address
    as a quick way to identify a packet.
    
    Note, trace ring buffer is only accessible to privileged users,
    it is safe to use a real kernel address here.
    
    Reviewed-by: Cong Wang <cong.wang@bytedance.com>
    Signed-off-by: Qitao Xu <qitao.xu@bytedance.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Qitao Xu authored and davem330 committed Jul 15, 2021
  13. liquidio: Fix unintentional sign extension issue on left shift of u16

    Shifting the u16 integer oct->pcie_port by CN23XX_PKT_INPUT_CTL_MAC_NUM_POS
    (29) bits will be promoted to a 32 bit signed int and then sign-extended
    to a u64. In the cases where oct->pcie_port where bit 2 is set (e.g. 3..7)
    the shifted value will be sign extended and the top 32 bits of the result
    will be set.
    
    Fix this by casting the u16 values to a u64 before the 29 bit left shift.
    
    Addresses-Coverity: ("Unintended sign extension")
    
    Fixes: 3451b97 ("liquidio: CN23XX register setup")
    Signed-off-by: Colin Ian King <colin.king@canonical.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Colin Ian King authored and davem330 committed Jul 15, 2021
  14. net: dsa: mv88e6xxx: NET_DSA_MV88E6XXX_PTP should depend on NET_DSA_M…

    …V88E6XXX
    
    Making global2 support mandatory removed the Kconfig symbol
    NET_DSA_MV88E6XXX_GLOBAL2.  This symbol also served as an intermediate
    symbol to make NET_DSA_MV88E6XXX_PTP depend on NET_DSA_MV88E6XXX.  With
    the symbol removed, the user is always asked about PTP support for
    Marvell 88E6xxx switches, even if the latter support is not enabled.
    
    Fix this by reinstating the dependency.
    
    Fixes: 63368a7 ("net: dsa: mv88e6xxx: Make global2 support mandatory")
    Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
    Reviewed-by: Andrew Lunn <andrew@lunn.ch>
    Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    geertu authored and davem330 committed Jul 15, 2021

Commits on Jul 14, 2021

  1. Merge branch 'r8152-pm-fixxes'

    Takashi Iwai says:
    
    ====================
    r8152: Fix a couple of PM problems
    
    it seems that r8152 driver suffers from the deadlock at both runtime
    and system PM.  Formerly, it was seen more often at hibernation
    resume, but now it's triggered more frequently, as reported in SUSE
    Bugzilla:
      https://bugzilla.suse.com/show_bug.cgi?id=1186194
    
    While debugging the problem, I stumbled on a few obvious bugs and here
    is the results with two patches for addressing the resume problem.
    
    ***
    
    However, the story doesn't end here, unfortunately, and those patches
    don't seem sufficing.  The rest major problem is that the driver calls
    napi_disable() and napi_enable() in the PM suspend callbacks.  This
    makes the system stalling at (runtime-)suspend.  If we drop
    napi_disable() and napi_enable() calls in the PM suspend callbacks, it
    starts working (that was the result in Bugzilla comment 13):
      https://bugzilla.suse.com/show_bug.cgi?id=1186194#c13
    
    So, my patches aren't enough and we still need to investigate
    further.  It'd be appreciated if anyone can give a fix or a hint for
    more debugging.  The usage of napi_disable() at PM callbacks is unique
    in this driver and looks rather suspicious to me; but I'm no expert in
    this area so I might be wrong...
    ====================
    
    Signed-off-by: David S. Miller <davem@davemloft.net>
    davem330 committed Jul 14, 2021
  2. r8152: Fix a deadlock by doubly PM resume

    r8152 driver sets up the MAC address at reset-resume, while
    rtl8152_set_mac_address() has the temporary autopm get/put.  This may
    lead to a deadlock as the PM lock has been already taken for the
    execution of the runtime PM callback.
    
    This patch adds the workaround to avoid the superfluous autpm when
    called from rtl8152_reset_resume().
    
    Link: https://bugzilla.suse.com/show_bug.cgi?id=1186194
    Signed-off-by: Takashi Iwai <tiwai@suse.de>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    tiwai authored and davem330 committed Jul 14, 2021
  3. r8152: Fix potential PM refcount imbalance

    rtl8152_close() takes the refcount via usb_autopm_get_interface() but
    it doesn't release when RTL8152_UNPLUG test hits.  This may lead to
    the imbalance of PM refcount.  This patch addresses it.
    
    Link: https://bugzilla.suse.com/show_bug.cgi?id=1186194
    Signed-off-by: Takashi Iwai <tiwai@suse.de>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    tiwai authored and davem330 committed Jul 14, 2021
  4. Merge tag 'net-5.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel…

    …/git/netdev/net
    
    Pull networking fixes from Jakub Kicinski.
     "Including fixes from bpf and netfilter.
    
      Current release - regressions:
    
       - sock: fix parameter order in sock_setsockopt()
    
      Current release - new code bugs:
    
       - netfilter: nft_last:
           - fix incorrect arithmetic when restoring last used
           - honor NFTA_LAST_SET on restoration
    
      Previous releases - regressions:
    
       - udp: properly flush normal packet at GRO time
    
       - sfc: ensure correct number of XDP queues; don't allow enabling the
         feature if there isn't sufficient resources to Tx from any CPU
    
       - dsa: sja1105: fix address learning getting disabled on the CPU port
    
       - mptcp: addresses a rmem accounting issue that could keep packets in
         subflow receive buffers longer than necessary, delaying MPTCP-level
         ACKs
    
       - ip_tunnel: fix mtu calculation for ETHER tunnel devices
    
       - do not reuse skbs allocated from skbuff_fclone_cache in the napi
         skb cache, we'd try to return them to the wrong slab cache
    
       - tcp: consistently disable header prediction for mptcp
    
      Previous releases - always broken:
    
       - bpf: fix subprog poke descriptor tracking use-after-free
    
       - ipv6:
           - allocate enough headroom in ip6_finish_output2() in case
             iptables TEE is used
           - tcp: drop silly ICMPv6 packet too big messages to avoid
             expensive and pointless lookups (which may serve as a DDOS
             vector)
           - make sure fwmark is copied in SYNACK packets
           - fix 'disable_policy' for forwarded packets (align with IPv4)
    
       - netfilter: conntrack:
           - do not renew entry stuck in tcp SYN_SENT state
           - do not mark RST in the reply direction coming after SYN packet
             for an out-of-sync entry
    
       - mptcp: cleanly handle error conditions with MP_JOIN and syncookies
    
       - mptcp: fix double free when rejecting a join due to port mismatch
    
       - validate lwtstate->data before returning from skb_tunnel_info()
    
       - tcp: call sk_wmem_schedule before sk_mem_charge in zerocopy path
    
       - mt76: mt7921: continue to probe driver when fw already downloaded
    
       - bonding: fix multiple issues with offloading IPsec to (thru?) bond
    
       - stmmac: ptp: fix issues around Qbv support and setting time back
    
       - bcmgenet: always clear wake-up based on energy detection
    
      Misc:
    
       - sctp: move 198 addresses from unusable to private scope
    
       - ptp: support virtual clocks and timestamping
    
       - openvswitch: optimize operation for key comparison"
    
    * tag 'net-5.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (158 commits)
      net: dsa: properly check for the bridge_leave methods in dsa_switch_bridge_leave()
      sfc: add logs explaining XDP_TX/REDIRECT is not available
      sfc: ensure correct number of XDP queues
      sfc: fix lack of XDP TX queues - error XDP TX failed (-22)
      net: fddi: fix UAF in fza_probe
      net: dsa: sja1105: fix address learning getting disabled on the CPU port
      net: ocelot: fix switchdev objects synced for wrong netdev with LAG offload
      net: Use nlmsg_unicast() instead of netlink_unicast()
      octeontx2-pf: Fix uninitialized boolean variable pps
      ipv6: allocate enough headroom in ip6_finish_output2()
      net: hdlc: rename 'mod_init' & 'mod_exit' functions to be module-specific
      net: bridge: multicast: fix MRD advertisement router port marking race
      net: bridge: multicast: fix PIM hello router port marking race
      net: phy: marvell10g: fix differentiation of 88X3310 from 88X3340
      dsa: fix for_each_child.cocci warnings
      virtio_net: check virtqueue_add_sgs() return value
      mptcp: properly account bulk freed memory
      selftests: mptcp: fix case multiple subflows limited by server
      mptcp: avoid processing packet if a subflow reset
      mptcp: fix syncookie process if mptcp can not_accept new subflow
      ...
    torvalds committed Jul 14, 2021
  5. fs: add vfs_parse_fs_param_source() helper

    Add a simple helper that filesystems can use in their parameter parser
    to parse the "source" parameter. A few places open-coded this function
    and that already caused a bug in the cgroup v1 parser that we fixed.
    Let's make it harder to get this wrong by introducing a helper which
    performs all necessary checks.
    
    Link: https://syzkaller.appspot.com/bug?id=6312526aba5beae046fdae8f00399f87aab48b12
    Cc: Christoph Hellwig <hch@lst.de>
    Cc: Alexander Viro <viro@zeniv.linux.org.uk>
    Cc: Dmitry Vyukov <dvyukov@google.com>
    Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    brauner authored and torvalds committed Jul 14, 2021
  6. cgroup: verify that source is a string

    The following sequence can be used to trigger a UAF:
    
        int fscontext_fd = fsopen("cgroup");
        int fd_null = open("/dev/null, O_RDONLY);
        int fsconfig(fscontext_fd, FSCONFIG_SET_FD, "source", fd_null);
        close_range(3, ~0U, 0);
    
    The cgroup v1 specific fs parser expects a string for the "source"
    parameter.  However, it is perfectly legitimate to e.g.  specify a file
    descriptor for the "source" parameter.  The fs parser doesn't know what
    a filesystem allows there.  So it's a bug to assume that "source" is
    always of type fs_value_is_string when it can reasonably also be
    fs_value_is_file.
    
    This assumption in the cgroup code causes a UAF because struct
    fs_parameter uses a union for the actual value.  Access to that union is
    guarded by the param->type member.  Since the cgroup paramter parser
    didn't check param->type but unconditionally moved param->string into
    fc->source a close on the fscontext_fd would trigger a UAF during
    put_fs_context() which frees fc->source thereby freeing the file stashed
    in param->file causing a UAF during a close of the fd_null.
    
    Fix this by verifying that param->type is actually a string and report
    an error if not.
    
    In follow up patches I'll add a new generic helper that can be used here
    and by other filesystems instead of this error-prone copy-pasta fix.
    But fixing it in here first makes backporting a it to stable a lot
    easier.
    
    Fixes: 8d2451f ("cgroup1: switch to option-by-option parsing")
    Reported-by: syzbot+283ce5a46486d6acdbaf@syzkaller.appspotmail.com
    Cc: Christoph Hellwig <hch@lst.de>
    Cc: Alexander Viro <viro@zeniv.linux.org.uk>
    Cc: Dmitry Vyukov <dvyukov@google.com>
    Cc: <stable@kernel.org>
    Cc: syzkaller-bugs <syzkaller-bugs@googlegroups.com>
    Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    brauner authored and torvalds committed Jul 14, 2021

Commits on Jul 13, 2021

  1. net: dsa: properly check for the bridge_leave methods in dsa_switch_b…

    …ridge_leave()
    
    This was not caught because there is no switch driver which implements
    the .port_bridge_join but not .port_bridge_leave method, but it should
    nonetheless be fixed, as in certain conditions (driver development) it
    might lead to NULL pointer dereference.
    
    Fixes: f66a6a6 ("net: dsa: permit cross-chip bridging between all trees in the system")
    Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    vladimiroltean authored and davem330 committed Jul 13, 2021
  2. Merge tag 'vboxsf-v5.14-1' of git://git.kernel.org/pub/scm/linux/kern…

    …el/git/hansg/linux
    
    Pull vboxsf fixes from Hans de Goede:
     "This adds support for the atomic_open directory-inode op to vboxsf.
    
      Note this is not just an enhancement this also fixes an actual issue
      which users are hitting, see the commit message of the "boxsf: Add
      support for the atomic_open directory-inode" patch"
    
    * tag 'vboxsf-v5.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/hansg/linux:
      vboxsf: Add support for the atomic_open directory-inode op
      vboxsf: Add vboxsf_[create|release]_sf_handle() helpers
      vboxsf: Make vboxsf_dir_create() return the handle for the created file
      vboxsf: Honor excl flag to the dir-inode create op
    torvalds committed Jul 13, 2021
Older