Skip to content
Permalink
Pablo-Neira-Ay…
Switch branches/tags

Commits on Jun 22, 2021

  1. netfilter: nf_tables: do not allow to delete table with owner by handle

    nft_table_lookup_byhandle() also needs to validate the netlink PortID
    owner when deleting a table by handle.
    
    Fixes: 6001a93 ("netfilter: nftables: introduce table ownership")
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
    ummakynes authored and intel-lab-lkp committed Jun 22, 2021
  2. netfilter: nf_tables: skip netlink portID validation if zero

    nft_table_lookup() allows us to obtain the table object by the name and
    the family. The netlink portID validation needs to be skipped for the
    dump path, since the ownership only applies to commands to update the
    given table. Skip validation if the specified netlink PortID is zero
    when calling nft_table_lookup().
    
    Fixes: 6001a93 ("netfilter: nftables: introduce table ownership")
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
    ummakynes authored and intel-lab-lkp committed Jun 22, 2021

Commits on Jun 10, 2021

  1. Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf

    Pablo Neira Ayuso says:
    
    ====================
    Netfilter fixes for net
    
    The following patchset contains Netfilter fixes for net:
    
    1) Fix a crash when stateful expression with its own gc callback
       is used in a set definition.
    
    2) Skip IPv6 packets from any link-local address in IPv6 fib expression.
       Add a selftest for this scenario, from Florian Westphal.
    ====================
    
    Signed-off-by: David S. Miller <davem@davemloft.net>
    davem330 committed Jun 10, 2021
  2. Merge branch 'tcp-options-oob-fixes'

    Maxim Mikityanskiy says:
    
    ====================
    Fix out of bounds when parsing TCP options
    
    This series fixes out-of-bounds access in various places in the kernel
    where parsing of TCP options takes place. Fortunately, many more
    occurrences don't have this bug.
    
    v2 changes:
    
    synproxy: Added an early return when length < 0 to avoid calling
    skb_header_pointer with negative length.
    
    sch_cake: Added doff validation to avoid parsing garbage.
    ====================
    
    Signed-off-by: David S. Miller <davem@davemloft.net>
    davem330 committed Jun 10, 2021
  3. sch_cake: Fix out of bounds when parsing TCP options and header

    The TCP option parser in cake qdisc (cake_get_tcpopt and
    cake_tcph_may_drop) could read one byte out of bounds. When the length
    is 1, the execution flow gets into the loop, reads one byte of the
    opcode, and if the opcode is neither TCPOPT_EOL nor TCPOPT_NOP, it reads
    one more byte, which exceeds the length of 1.
    
    This fix is inspired by commit 9609dad ("ipv4: tcp_input: fix stack
    out of bounds when parsing TCP options.").
    
    v2 changes:
    
    Added doff validation in cake_get_tcphdr to avoid parsing garbage as TCP
    header. Although it wasn't strictly an out-of-bounds access (memory was
    allocated), garbage values could be read where CAKE expected the TCP
    header if doff was smaller than 5.
    
    Cc: Young Xiao <92siuyang@gmail.com>
    Fixes: 8b71388 ("sch_cake: Add optional ACK filter")
    Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
    Acked-by: Toke Høiland-Jørgensen <toke@toke.dk>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    nvmmax authored and davem330 committed Jun 10, 2021
  4. mptcp: Fix out of bounds when parsing TCP options

    The TCP option parser in mptcp (mptcp_get_options) could read one byte
    out of bounds. When the length is 1, the execution flow gets into the
    loop, reads one byte of the opcode, and if the opcode is neither
    TCPOPT_EOL nor TCPOPT_NOP, it reads one more byte, which exceeds the
    length of 1.
    
    This fix is inspired by commit 9609dad ("ipv4: tcp_input: fix stack
    out of bounds when parsing TCP options.").
    
    Cc: Young Xiao <92siuyang@gmail.com>
    Fixes: cec37a6 ("mptcp: Handle MP_CAPABLE options for outgoing connections")
    Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
    Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    nvmmax authored and davem330 committed Jun 10, 2021
  5. netfilter: synproxy: Fix out of bounds when parsing TCP options

    The TCP option parser in synproxy (synproxy_parse_options) could read
    one byte out of bounds. When the length is 1, the execution flow gets
    into the loop, reads one byte of the opcode, and if the opcode is
    neither TCPOPT_EOL nor TCPOPT_NOP, it reads one more byte, which exceeds
    the length of 1.
    
    This fix is inspired by commit 9609dad ("ipv4: tcp_input: fix stack
    out of bounds when parsing TCP options.").
    
    v2 changes:
    
    Added an early return when length < 0 to avoid calling
    skb_header_pointer with negative length.
    
    Cc: Young Xiao <92siuyang@gmail.com>
    Fixes: 48b1de4 ("netfilter: add SYNPROXY core/target")
    Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
    Reviewed-by: Florian Westphal <fw@strlen.de>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    nvmmax authored and davem330 committed Jun 10, 2021
  6. net/packet: annotate data race in packet_sendmsg()

    There is a known race in packet_sendmsg(), addressed
    in commit 32d3182 ("net/packet: fix race in tpacket_snd()")
    
    Now we have data_race(), we can use it to avoid a future KCSAN warning,
    as syzbot loves stressing af_packet sockets :)
    
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    neebe000 authored and davem330 committed Jun 10, 2021
  7. inet: annotate date races around sk->sk_txhash

    UDP sendmsg() path can be lockless, it is possible for another
    thread to re-connect an change sk->sk_txhash under us.
    
    There is no serious impact, but we can use READ_ONCE()/WRITE_ONCE()
    pair to document the race.
    
    BUG: KCSAN: data-race in __ip4_datagram_connect / skb_set_owner_w
    
    write to 0xffff88813397920c of 4 bytes by task 30997 on cpu 1:
     sk_set_txhash include/net/sock.h:1937 [inline]
     __ip4_datagram_connect+0x69e/0x710 net/ipv4/datagram.c:75
     __ip6_datagram_connect+0x551/0x840 net/ipv6/datagram.c:189
     ip6_datagram_connect+0x2a/0x40 net/ipv6/datagram.c:272
     inet_dgram_connect+0xfd/0x180 net/ipv4/af_inet.c:580
     __sys_connect_file net/socket.c:1837 [inline]
     __sys_connect+0x245/0x280 net/socket.c:1854
     __do_sys_connect net/socket.c:1864 [inline]
     __se_sys_connect net/socket.c:1861 [inline]
     __x64_sys_connect+0x3d/0x50 net/socket.c:1861
     do_syscall_64+0x4a/0x90 arch/x86/entry/common.c:47
     entry_SYSCALL_64_after_hwframe+0x44/0xae
    
    read to 0xffff88813397920c of 4 bytes by task 31039 on cpu 0:
     skb_set_hash_from_sk include/net/sock.h:2211 [inline]
     skb_set_owner_w+0x118/0x220 net/core/sock.c:2101
     sock_alloc_send_pskb+0x452/0x4e0 net/core/sock.c:2359
     sock_alloc_send_skb+0x2d/0x40 net/core/sock.c:2373
     __ip6_append_data+0x1743/0x21a0 net/ipv6/ip6_output.c:1621
     ip6_make_skb+0x258/0x420 net/ipv6/ip6_output.c:1983
     udpv6_sendmsg+0x160a/0x16b0 net/ipv6/udp.c:1527
     inet6_sendmsg+0x5f/0x80 net/ipv6/af_inet6.c:642
     sock_sendmsg_nosec net/socket.c:654 [inline]
     sock_sendmsg net/socket.c:674 [inline]
     ____sys_sendmsg+0x360/0x4d0 net/socket.c:2350
     ___sys_sendmsg net/socket.c:2404 [inline]
     __sys_sendmmsg+0x315/0x4b0 net/socket.c:2490
     __do_sys_sendmmsg net/socket.c:2519 [inline]
     __se_sys_sendmmsg net/socket.c:2516 [inline]
     __x64_sys_sendmmsg+0x53/0x60 net/socket.c:2516
     do_syscall_64+0x4a/0x90 arch/x86/entry/common.c:47
     entry_SYSCALL_64_after_hwframe+0x44/0xae
    
    value changed: 0xbca3c43d -> 0xfdb309e0
    
    Reported by Kernel Concurrency Sanitizer on:
    CPU: 0 PID: 31039 Comm: syz-executor.2 Not tainted 5.13.0-rc3-syzkaller #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Reported-by: syzbot <syzkaller@googlegroups.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    neebe000 authored and davem330 committed Jun 10, 2021
  8. net: annotate data race in sock_error()

    sock_error() is known to be racy. The code avoids
    an atomic operation is sk_err is zero, and this field
    could be changed under us, this is fine.
    
    Sysbot reported:
    
    BUG: KCSAN: data-race in sock_alloc_send_pskb / unix_release_sock
    
    write to 0xffff888131855630 of 4 bytes by task 9365 on cpu 1:
     unix_release_sock+0x2e9/0x6e0 net/unix/af_unix.c:550
     unix_release+0x2f/0x50 net/unix/af_unix.c:859
     __sock_release net/socket.c:599 [inline]
     sock_close+0x6c/0x150 net/socket.c:1258
     __fput+0x25b/0x4e0 fs/file_table.c:280
     ____fput+0x11/0x20 fs/file_table.c:313
     task_work_run+0xae/0x130 kernel/task_work.c:164
     tracehook_notify_resume include/linux/tracehook.h:189 [inline]
     exit_to_user_mode_loop kernel/entry/common.c:174 [inline]
     exit_to_user_mode_prepare+0x156/0x190 kernel/entry/common.c:208
     __syscall_exit_to_user_mode_work kernel/entry/common.c:290 [inline]
     syscall_exit_to_user_mode+0x20/0x40 kernel/entry/common.c:301
     do_syscall_64+0x56/0x90 arch/x86/entry/common.c:57
     entry_SYSCALL_64_after_hwframe+0x44/0xae
    
    read to 0xffff888131855630 of 4 bytes by task 9385 on cpu 0:
     sock_error include/net/sock.h:2269 [inline]
     sock_alloc_send_pskb+0xe4/0x4e0 net/core/sock.c:2336
     unix_dgram_sendmsg+0x478/0x1610 net/unix/af_unix.c:1671
     unix_seqpacket_sendmsg+0xc2/0x100 net/unix/af_unix.c:2055
     sock_sendmsg_nosec net/socket.c:654 [inline]
     sock_sendmsg net/socket.c:674 [inline]
     ____sys_sendmsg+0x360/0x4d0 net/socket.c:2350
     __sys_sendmsg_sock+0x25/0x30 net/socket.c:2416
     io_sendmsg fs/io_uring.c:4367 [inline]
     io_issue_sqe+0x231a/0x6750 fs/io_uring.c:6135
     __io_queue_sqe+0xe9/0x360 fs/io_uring.c:6414
     __io_req_task_submit fs/io_uring.c:2039 [inline]
     io_async_task_func+0x312/0x590 fs/io_uring.c:5074
     __tctx_task_work fs/io_uring.c:1910 [inline]
     tctx_task_work+0x1d4/0x3d0 fs/io_uring.c:1924
     task_work_run+0xae/0x130 kernel/task_work.c:164
     tracehook_notify_signal include/linux/tracehook.h:212 [inline]
     handle_signal_work kernel/entry/common.c:145 [inline]
     exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
     exit_to_user_mode_prepare+0xf8/0x190 kernel/entry/common.c:208
     __syscall_exit_to_user_mode_work kernel/entry/common.c:290 [inline]
     syscall_exit_to_user_mode+0x20/0x40 kernel/entry/common.c:301
     do_syscall_64+0x56/0x90 arch/x86/entry/common.c:57
     entry_SYSCALL_64_after_hwframe+0x44/0xae
    
    value changed: 0x00000000 -> 0x00000068
    
    Reported by Kernel Concurrency Sanitizer on:
    CPU: 0 PID: 9385 Comm: syz-executor.3 Not tainted 5.13.0-rc4-syzkaller #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Reported-by: syzbot <syzkaller@googlegroups.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    neebe000 authored and davem330 committed Jun 10, 2021
  9. Merge branch 'bridge-egress-fixes'

    Nikolay Aleksandrov says:
    
    ====================
    net: bridge: vlan tunnel egress path fixes
    
    These two fixes take care of tunnel_dst problems in the vlan tunnel egress
    path. Patch 01 fixes a null ptr deref due to the lockless use of tunnel_dst
    pointer without checking it first, and patch 02 fixes a use-after-free
    issue due to wrong dst refcounting (dst_clone() -> dst_hold_safe()).
    
    Both fix the same commit and should be queued for stable backports:
    Fixes: 11538d0 ("bridge: vlan dst_metadata hooks in ingress and egress paths")
    
    v2: no changes, added stable list to CC
    ====================
    
    Signed-off-by: David S. Miller <davem@davemloft.net>
    davem330 committed Jun 10, 2021
  10. net: bridge: fix vlan tunnel dst refcnt when egressing

    The egress tunnel code uses dst_clone() and directly sets the result
    which is wrong because the entry might have 0 refcnt or be already deleted,
    causing number of problems. It also triggers the WARN_ON() in dst_hold()[1]
    when a refcnt couldn't be taken. Fix it by using dst_hold_safe() and
    checking if a reference was actually taken before setting the dst.
    
    [1] dmesg WARN_ON log and following refcnt errors
     WARNING: CPU: 5 PID: 38 at include/net/dst.h:230 br_handle_egress_vlan_tunnel+0x10b/0x134 [bridge]
     Modules linked in: 8021q garp mrp bridge stp llc bonding ipv6 virtio_net
     CPU: 5 PID: 38 Comm: ksoftirqd/5 Kdump: loaded Tainted: G        W         5.13.0-rc3+ torvalds#360
     Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014
     RIP: 0010:br_handle_egress_vlan_tunnel+0x10b/0x134 [bridge]
     Code: e8 85 bc 01 e1 45 84 f6 74 90 45 31 f6 85 db 48 c7 c7 a0 02 19 a0 41 0f 94 c6 31 c9 31 d2 44 89 f6 e8 64 bc 01 e1 85 db 75 02 <0f> 0b 31 c9 31 d2 44 89 f6 48 c7 c7 70 02 19 a0 e8 4b bc 01 e1 49
     RSP: 0018:ffff8881003d39e8 EFLAGS: 00010246
     RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
     RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffffffffa01902a0
     RBP: ffff8881040c6700 R08: 0000000000000000 R09: 0000000000000001
     R10: 2ce93d0054fe0d00 R11: 54fe0d00000e0000 R12: ffff888109515000
     R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000401
     FS:  0000000000000000(0000) GS:ffff88822bf40000(0000) knlGS:0000000000000000
     CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
     CR2: 00007f42ba70f030 CR3: 0000000109926000 CR4: 00000000000006e0
     Call Trace:
      br_handle_vlan+0xbc/0xca [bridge]
      __br_forward+0x23/0x164 [bridge]
      deliver_clone+0x41/0x48 [bridge]
      br_handle_frame_finish+0x36f/0x3aa [bridge]
      ? skb_dst+0x2e/0x38 [bridge]
      ? br_handle_ingress_vlan_tunnel+0x3e/0x1c8 [bridge]
      ? br_handle_frame_finish+0x3aa/0x3aa [bridge]
      br_handle_frame+0x2c3/0x377 [bridge]
      ? __skb_pull+0x33/0x51
      ? vlan_do_receive+0x4f/0x36a
      ? br_handle_frame_finish+0x3aa/0x3aa [bridge]
      __netif_receive_skb_core+0x539/0x7c6
      ? __list_del_entry_valid+0x16e/0x1c2
      __netif_receive_skb_list_core+0x6d/0xd6
      netif_receive_skb_list_internal+0x1d9/0x1fa
      gro_normal_list+0x22/0x3e
      dev_gro_receive+0x55b/0x600
      ? detach_buf_split+0x58/0x140
      napi_gro_receive+0x94/0x12e
      virtnet_poll+0x15d/0x315 [virtio_net]
      __napi_poll+0x2c/0x1c9
      net_rx_action+0xe6/0x1fb
      __do_softirq+0x115/0x2d8
      run_ksoftirqd+0x18/0x20
      smpboot_thread_fn+0x183/0x19c
      ? smpboot_unregister_percpu_thread+0x66/0x66
      kthread+0x10a/0x10f
      ? kthread_mod_delayed_work+0xb6/0xb6
      ret_from_fork+0x22/0x30
     ---[ end trace 49f61b07f775fd2b ]---
     dst_release: dst:00000000c02d677a refcnt:-1
     dst_release underflow
    
    Cc: stable@vger.kernel.org
    Fixes: 11538d0 ("bridge: vlan dst_metadata hooks in ingress and egress paths")
    Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Nikolay Aleksandrov authored and davem330 committed Jun 10, 2021
  11. net: bridge: fix vlan tunnel dst null pointer dereference

    This patch fixes a tunnel_dst null pointer dereference due to lockless
    access in the tunnel egress path. When deleting a vlan tunnel the
    tunnel_dst pointer is set to NULL without waiting a grace period (i.e.
    while it's still usable) and packets egressing are dereferencing it
    without checking. Use READ/WRITE_ONCE to annotate the lockless use of
    tunnel_id, use RCU for accessing tunnel_dst and make sure it is read
    only once and checked in the egress path. The dst is already properly RCU
    protected so we don't need to do anything fancy than to make sure
    tunnel_id and tunnel_dst are read only once and checked in the egress path.
    
    Cc: stable@vger.kernel.org
    Fixes: 11538d0 ("bridge: vlan dst_metadata hooks in ingress and egress paths")
    Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Nikolay Aleksandrov authored and davem330 committed Jun 10, 2021
  12. ping: Check return value of function 'ping_queue_rcv_skb'

    Function 'ping_queue_rcv_skb' not always return success, which will
    also return fail. If not check the wrong return value of it, lead to function
    `ping_rcv` return success.
    
    Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Zheng Yongjun authored and davem330 committed Jun 10, 2021
  13. skbuff: fix incorrect msg_zerocopy copy notifications

    msg_zerocopy signals if a send operation required copying with a flag
    in serr->ee.ee_code.
    
    This field can be incorrect as of the below commit, as a result of
    both structs uarg and serr pointing into the same skb->cb[].
    
    uarg->zerocopy must be read before skb->cb[] is reinitialized to hold
    serr. Similar to other fields len, hi and lo, use a local variable to
    temporarily hold the value.
    
    This was not a problem before, when the value was passed as a function
    argument.
    
    Fixes: 7551885 ("skbuff: Push status and refcounts into sock_zerocopy_callback")
    Reported-by: Talal Ahmad <talalahmad@google.com>
    Signed-off-by: Willem de Bruijn <willemb@google.com>
    Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
    Reviewed-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    wdebruij authored and davem330 committed Jun 10, 2021
  14. Merge tag 'mlx5-fixes-2021-06-09' of git://git.kernel.org/pub/scm/lin…

    …ux/kernel/git/saeed/linux
    
    Saeed Mahameed says:
    
    ====================
    mlx5-fixes-2021-06-09
    ====================
    
    Signed-off-by: David S. Miller <davem@davemloft.net>
    davem330 committed Jun 10, 2021
  15. net/mlx5e: Block offload of outer header csum for GRE tunnel

    The device is able to offload either the outer header csum or inner
    header csum. The driver utilizes the inner csum offload. So, prohibit
    setting of tx-gre-csum-segmentation and let it be: off[fixed].
    
    Fixes: 2729984 ("net/mlx5e: Support TSO and TX checksum offloads for GRE tunnels")
    Signed-off-by: Aya Levin <ayal@nvidia.com>
    Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
    Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
    Aya Levin authored and Saeed Mahameed committed Jun 10, 2021
  16. net/mlx5e: Block offload of outer header csum for UDP tunnels

    The device is able to offload either the outer header csum or inner
    header csum. The driver utilizes the inner csum offload. Hence, block
    setting of tx-udp_tnl-csum-segmentation and set it to off[fixed].
    
    Fixes: b49663c ("net/mlx5e: Add support for UDP tunnel segmentation with outer checksum offload")
    Signed-off-by: Aya Levin <ayal@nvidia.com>
    Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
    Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
    Aya Levin authored and Saeed Mahameed committed Jun 10, 2021
  17. Revert "net/mlx5: Arm only EQs with EQEs"

    In the scenario described below, an EQ can remain in FIRED state which
    can result in missing an interrupt generation.
    
    The scenario:
    
    device                       mlx5_core driver
    ------                       ----------------
    EQ1.eqe generated
    EQ1.MSI-X sent
    EQ1.state = FIRED
    EQ2.eqe generated
                                 mlx5_irq()
                                   polls - eq1_eqes()
                                   arm eq1
                                   polls - eq2_eqes()
                                   arm eq2
    EQ2.MSI-X sent
    EQ2.state = FIRED
                                  mlx5_irq()
                                  polls - eq2_eqes() -- no eqes found
                                  driver skips EQ arming;
    
    ->EQ2 remains fired, misses generating interrupt.
    
    Hence, always arm the EQ by reverting the cited commit in fixes tag.
    
    Fixes: d894892 ("net/mlx5: Arm only EQs with EQEs")
    Signed-off-by: Shay Drory <shayd@nvidia.com>
    Reviewed-by: Parav Pandit <parav@nvidia.com>
    Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
    shayshyi authored and Saeed Mahameed committed Jun 10, 2021
  18. net/mlx5e: Fix select queue to consider SKBTX_HW_TSTAMP

    Steering packets to PTP-SQ should be done only if the SKB has
    SKBTX_HW_TSTAMP set in the tx_flags. While here, take the function into
    a header and inline it.
    Set the whole condition to select the PTP-SQ to unlikely.
    
    Fixes: 24c22dd ("net/mlx5e: Add states to PTP channel")
    Signed-off-by: Aya Levin <ayal@nvidia.com>
    Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
    Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com>
    Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
    Aya Levin authored and Saeed Mahameed committed Jun 10, 2021
  19. net/mlx5e: Don't update netdev RQs with PTP-RQ

    Since the driver opens the PTP-RQ under channel 0, it appears to the
    stack as if the SKB was received on rxq0. So from thew stack POV there
    are still the same number of RX queues.
    
    Fixes: 960fbfe ("net/mlx5e: Allow coexistence of CQE compression and HW TS PTP")
    Signed-off-by: Aya Levin <ayal@nvidia.com>
    Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
    Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com>
    Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
    Aya Levin authored and Saeed Mahameed committed Jun 10, 2021
  20. net/mlx5e: Verify dev is present in get devlink port ndo

    When changing eswitch mode, the netdev is detached from the
    hardware resources. So verify dev is present in get devlink
    port ndo. Otherwise, we will hit the following panic:
    
    [241535.973539] RIP: 0010:__devlink_port_phys_port_name_get+0x13/0x1b0
    [241535.976471] RSP: 0018:ffff9eaf0ae1b7c8 EFLAGS: 00010292
    [241535.977471] RAX: 000000000002d370 RBX: 000000000002d370 RCX: 0000000000000000
    [241535.978479] RDX: 0000000000000010 RSI: ffff9eaf0ae1b858 RDI: 000000000002d370
    [241535.979482] RBP: ffff9eaf0ae1b7e0 R08: 000000000000002a R09: ffff8888d54d13da
    [241535.980486] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8888e6700000
    [241535.981491] R13: ffff9eaf0ae1b858 R14: 0000000000000010 R15: 0000000000000000
    [241535.982489] FS:  00007fd374ef3740(0000) GS:ffff88909ea00000(0000) knlGS:0000000000000000
    [241535.983494] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [241535.984487] CR2: 000000000002d444 CR3: 000000089fd26006 CR4: 00000000003706e0
    [241535.985502] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [241535.986499] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    [241535.987477] Call Trace:
    [241535.988426]  ? nla_put_64bit+0x71/0xa0
    [241535.989368]  devlink_compat_phys_port_name_get+0x50/0xa0
    [241535.990312]  dev_get_phys_port_name+0x4b/0x60
    [241535.991252]  rtnl_fill_ifinfo+0x57b/0xcb0
    [241535.992192]  rtnl_dump_ifinfo+0x58f/0x6d0
    [241535.993123]  ? ksize+0x14/0x20
    [241535.994033]  ? __alloc_skb+0xe8/0x250
    [241535.994935]  netlink_dump+0x17c/0x300
    [241535.995821]  netlink_recvmsg+0x1de/0x2c0
    [241535.996677]  sock_recvmsg+0x70/0x80
    [241535.997518]  ____sys_recvmsg+0x9b/0x1b0
    [241535.998360]  ? iovec_from_user+0x82/0x120
    [241535.999202]  ? __import_iovec+0x2c/0x130
    [241536.000031]  ___sys_recvmsg+0x94/0x130
    [241536.000850]  ? __handle_mm_fault+0x56d/0x6e0
    [241536.001668]  __sys_recvmsg+0x5f/0xb0
    [241536.002464]  ? syscall_enter_from_user_mode+0x2b/0x80
    [241536.003242]  __x64_sys_recvmsg+0x1f/0x30
    [241536.004008]  do_syscall_64+0x38/0x50
    [241536.004767]  entry_SYSCALL_64_after_hwframe+0x44/0xae
    [241536.005532] RIP: 0033:0x7fd375014f47
    
    Fixes: 2ff349c ("net/mlx5e: Verify dev is present in some ndos")
    Signed-off-by: Roi Dayan <roid@nvidia.com>
    Signed-off-by: Chris Mi <cmi@nvidia.com>
    Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
    Chris Mi authored and Saeed Mahameed committed Jun 10, 2021
  21. net/mlx5: DR, Don't use SW steering when RoCE is not supported

    SW steering uses RC QP to write/read to/from ICM, hence it's not
    supported when RoCE is not supported as well.
    
    Fixes: 70605ea ("net/mlx5: DR, Expose APIs for direct rule managing")
    Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
    Reviewed-by: Alex Vesker <valex@nvidia.com>
    Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
    Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
    maorgottlieb authored and Saeed Mahameed committed Jun 10, 2021
  22. net/mlx5: Consider RoCE cap before init RDMA resources

    Check if RoCE is supported by the device before enable it in
    the vport context and create all the RDMA steering objects.
    
    Fixes: 80f09df ("net/mlx5: Eswitch, enable RoCE loopback traffic")
    Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
    Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
    maorgottlieb authored and Saeed Mahameed committed Jun 10, 2021
  23. net/mlx5e: Fix page reclaim for dead peer hairpin

    When adding a hairpin flow, a firmware-side send queue is created for
    the peer net device, which claims some host memory pages for its
    internal ring buffer. If the peer net device is removed/unbound before
    the hairpin flow is deleted, then the send queue is not destroyed which
    leads to a stack trace on pci device remove:
    
    [ 748.005230] mlx5_core 0000:08:00.2: wait_func:1094:(pid 12985): MANAGE_PAGES(0x108) timeout. Will cause a leak of a command resource
    [ 748.005231] mlx5_core 0000:08:00.2: reclaim_pages:514:(pid 12985): failed reclaiming pages: err -110
    [ 748.001835] mlx5_core 0000:08:00.2: mlx5_reclaim_root_pages:653:(pid 12985): failed reclaiming pages (-110) for func id 0x0
    [ 748.002171] ------------[ cut here ]------------
    [ 748.001177] FW pages counter is 4 after reclaiming all pages
    [ 748.001186] WARNING: CPU: 1 PID: 12985 at drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c:685 mlx5_reclaim_startup_pages+0x34b/0x460 [mlx5_core]                      [  +0.002771] Modules linked in: cls_flower mlx5_ib mlx5_core ptp pps_core act_mirred sch_ingress openvswitch nsh xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi rdma_cm ib_umad ib_ipoib iw_cm ib_cm ib_uverbs ib_core overlay fuse [last unloaded: pps_core]
    [ 748.007225] CPU: 1 PID: 12985 Comm: tee Not tainted 5.12.0+ #1
    [ 748.001376] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
    [ 748.002315] RIP: 0010:mlx5_reclaim_startup_pages+0x34b/0x460 [mlx5_core]
    [ 748.001679] Code: 28 00 00 00 0f 85 22 01 00 00 48 81 c4 b0 00 00 00 31 c0 5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 c7 c7 40 cc 19 a1 e8 9f 71 0e e2 <0f> 0b e9 30 ff ff ff 48 c7 c7 a0 cc 19 a1 e8 8c 71 0e e2 0f 0b e9
    [ 748.003781] RSP: 0018:ffff88815220faf8 EFLAGS: 00010286
    [ 748.001149] RAX: 0000000000000000 RBX: ffff8881b4900280 RCX: 0000000000000000
    [ 748.001445] RDX: 0000000000000027 RSI: 0000000000000004 RDI: ffffed102a441f51
    [ 748.001614] RBP: 00000000000032b9 R08: 0000000000000001 R09: ffffed1054a15ee8
    [ 748.001446] R10: ffff8882a50af73b R11: ffffed1054a15ee7 R12: fffffbfff07c1e30
    [ 748.001447] R13: dffffc0000000000 R14: ffff8881b492cba8 R15: 0000000000000000
    [ 748.001429] FS:  00007f58bd08b580(0000) GS:ffff8882a5080000(0000) knlGS:0000000000000000
    [ 748.001695] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 748.001309] CR2: 000055a026351740 CR3: 00000001d3b48006 CR4: 0000000000370ea0
    [ 748.001506] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [ 748.001483] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    [ 748.001654] Call Trace:
    [ 748.000576]  ? mlx5_satisfy_startup_pages+0x290/0x290 [mlx5_core]
    [ 748.001416]  ? mlx5_cmd_teardown_hca+0xa2/0xd0 [mlx5_core]
    [ 748.001354]  ? mlx5_cmd_init_hca+0x280/0x280 [mlx5_core]
    [ 748.001203]  mlx5_function_teardown+0x30/0x60 [mlx5_core]
    [ 748.001275]  mlx5_uninit_one+0xa7/0xc0 [mlx5_core]
    [ 748.001200]  remove_one+0x5f/0xc0 [mlx5_core]
    [ 748.001075]  pci_device_remove+0x9f/0x1d0
    [ 748.000833]  device_release_driver_internal+0x1e0/0x490
    [ 748.001207]  unbind_store+0x19f/0x200
    [ 748.000942]  ? sysfs_file_ops+0x170/0x170
    [ 748.001000]  kernfs_fop_write_iter+0x2bc/0x450
    [ 748.000970]  new_sync_write+0x373/0x610
    [ 748.001124]  ? new_sync_read+0x600/0x600
    [ 748.001057]  ? lock_acquire+0x4d6/0x700
    [ 748.000908]  ? lockdep_hardirqs_on_prepare+0x400/0x400
    [ 748.001126]  ? fd_install+0x1c9/0x4d0
    [ 748.000951]  vfs_write+0x4d0/0x800
    [ 748.000804]  ksys_write+0xf9/0x1d0
    [ 748.000868]  ? __x64_sys_read+0xb0/0xb0
    [ 748.000811]  ? filp_open+0x50/0x50
    [ 748.000919]  ? syscall_enter_from_user_mode+0x1d/0x50
    [ 748.001223]  do_syscall_64+0x3f/0x80
    [ 748.000892]  entry_SYSCALL_64_after_hwframe+0x44/0xae
    [ 748.001026] RIP: 0033:0x7f58bcfb22f7
    [ 748.000944] Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
    [ 748.003925] RSP: 002b:00007fffd7f2aaa8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
    [ 748.001732] RAX: ffffffffffffffda RBX: 000000000000000d RCX: 00007f58bcfb22f7
    [ 748.001426] RDX: 000000000000000d RSI: 00007fffd7f2abc0 RDI: 0000000000000003
    [ 748.001746] RBP: 00007fffd7f2abc0 R08: 0000000000000000 R09: 0000000000000001
    [ 748.001631] R10: 00000000000001b6 R11: 0000000000000246 R12: 000000000000000d
    [ 748.001537] R13: 00005597ac2c24a0 R14: 000000000000000d R15: 00007f58bd084700
    [ 748.001564] irq event stamp: 0
    [ 748.000787] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
    [ 748.001399] hardirqs last disabled at (0): [<ffffffff813132cf>] copy_process+0x146f/0x5eb0
    [ 748.001854] softirqs last  enabled at (0): [<ffffffff8131330e>] copy_process+0x14ae/0x5eb0
    [ 748.013431] softirqs last disabled at (0): [<0000000000000000>] 0x0
    [ 748.001492] ---[ end trace a6fabd773d1c51ae ]---
    
    Fix by destroying the send queue of a hairpin peer net device that is
    being removed/unbound, which returns the allocated ring buffer pages to
    the host.
    
    Fixes: 4d8fcf2 ("net/mlx5e: Avoid unbounded peer devices when unpairing TC hairpin rules")
    Signed-off-by: Dima Chumak <dchumak@nvidia.com>
    Reviewed-by: Roi Dayan <roid@nvidia.com>
    Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
    chumakd authored and Saeed Mahameed committed Jun 10, 2021
  24. net/mlx5e: Remove dependency in IPsec initialization flows

    Currently, IPsec feature is disabled because mlx5e_build_nic_netdev
    is required to be called after mlx5e_ipsec_init. This requirement is
    invalid as mlx5e_build_nic_netdev and mlx5e_ipsec_init initialize
    independent resources.
    
    Remove ipsec pointer check in mlx5e_build_nic_netdev so that the
    two functions can be called at any order.
    
    Fixes: 547eede ("net/mlx5e: IPSec, Innova IPSec offload infrastructure")
    Signed-off-by: Huy Nguyen <huyn@nvidia.com>
    Reviewed-by: Raed Salem <raeds@nvidia.com>
    Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
    Huy Nguyen authored and Saeed Mahameed committed Jun 10, 2021
  25. net/mlx5e: Fix use-after-free of encap entry in neigh update handler

    Function mlx5e_rep_neigh_update() wasn't updated to accommodate rtnl lock
    removal from TC filter update path and properly handle concurrent encap
    entry insertion/deletion which can lead to following use-after-free:
    
     [23827.464923] ==================================================================
     [23827.469446] BUG: KASAN: use-after-free in mlx5e_encap_take+0x72/0x140 [mlx5_core]
     [23827.470971] Read of size 4 at addr ffff8881d132228c by task kworker/u20:6/21635
     [23827.472251]
     [23827.472615] CPU: 9 PID: 21635 Comm: kworker/u20:6 Not tainted 5.13.0-rc3+ #5
     [23827.473788] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
     [23827.475639] Workqueue: mlx5e mlx5e_rep_neigh_update [mlx5_core]
     [23827.476731] Call Trace:
     [23827.477260]  dump_stack+0xbb/0x107
     [23827.477906]  print_address_description.constprop.0+0x18/0x140
     [23827.478896]  ? mlx5e_encap_take+0x72/0x140 [mlx5_core]
     [23827.479879]  ? mlx5e_encap_take+0x72/0x140 [mlx5_core]
     [23827.480905]  kasan_report.cold+0x7c/0xd8
     [23827.481701]  ? mlx5e_encap_take+0x72/0x140 [mlx5_core]
     [23827.482744]  kasan_check_range+0x145/0x1a0
     [23827.493112]  mlx5e_encap_take+0x72/0x140 [mlx5_core]
     [23827.494054]  ? mlx5e_tc_tun_encap_info_equal_generic+0x140/0x140 [mlx5_core]
     [23827.495296]  mlx5e_rep_neigh_update+0x41e/0x5e0 [mlx5_core]
     [23827.496338]  ? mlx5e_rep_neigh_entry_release+0xb80/0xb80 [mlx5_core]
     [23827.497486]  ? read_word_at_a_time+0xe/0x20
     [23827.498250]  ? strscpy+0xa0/0x2a0
     [23827.498889]  process_one_work+0x8ac/0x14e0
     [23827.499638]  ? lockdep_hardirqs_on_prepare+0x400/0x400
     [23827.500537]  ? pwq_dec_nr_in_flight+0x2c0/0x2c0
     [23827.501359]  ? rwlock_bug.part.0+0x90/0x90
     [23827.502116]  worker_thread+0x53b/0x1220
     [23827.502831]  ? process_one_work+0x14e0/0x14e0
     [23827.503627]  kthread+0x328/0x3f0
     [23827.504254]  ? _raw_spin_unlock_irq+0x24/0x40
     [23827.505065]  ? __kthread_bind_mask+0x90/0x90
     [23827.505912]  ret_from_fork+0x1f/0x30
     [23827.506621]
     [23827.506987] Allocated by task 28248:
     [23827.507694]  kasan_save_stack+0x1b/0x40
     [23827.508476]  __kasan_kmalloc+0x7c/0x90
     [23827.509197]  mlx5e_attach_encap+0xde1/0x1d40 [mlx5_core]
     [23827.510194]  mlx5e_tc_add_fdb_flow+0x397/0xc40 [mlx5_core]
     [23827.511218]  __mlx5e_add_fdb_flow+0x519/0xb30 [mlx5_core]
     [23827.512234]  mlx5e_configure_flower+0x191c/0x4870 [mlx5_core]
     [23827.513298]  tc_setup_cb_add+0x1d5/0x420
     [23827.514023]  fl_hw_replace_filter+0x382/0x6a0 [cls_flower]
     [23827.514975]  fl_change+0x2ceb/0x4a51 [cls_flower]
     [23827.515821]  tc_new_tfilter+0x89a/0x2070
     [23827.516548]  rtnetlink_rcv_msg+0x644/0x8c0
     [23827.517300]  netlink_rcv_skb+0x11d/0x340
     [23827.518021]  netlink_unicast+0x42b/0x700
     [23827.518742]  netlink_sendmsg+0x743/0xc20
     [23827.519467]  sock_sendmsg+0xb2/0xe0
     [23827.520131]  ____sys_sendmsg+0x590/0x770
     [23827.520851]  ___sys_sendmsg+0xd8/0x160
     [23827.521552]  __sys_sendmsg+0xb7/0x140
     [23827.522238]  do_syscall_64+0x3a/0x70
     [23827.522907]  entry_SYSCALL_64_after_hwframe+0x44/0xae
     [23827.523797]
     [23827.524163] Freed by task 25948:
     [23827.524780]  kasan_save_stack+0x1b/0x40
     [23827.525488]  kasan_set_track+0x1c/0x30
     [23827.526187]  kasan_set_free_info+0x20/0x30
     [23827.526968]  __kasan_slab_free+0xed/0x130
     [23827.527709]  slab_free_freelist_hook+0xcf/0x1d0
     [23827.528528]  kmem_cache_free_bulk+0x33a/0x6e0
     [23827.529317]  kfree_rcu_work+0x55f/0xb70
     [23827.530024]  process_one_work+0x8ac/0x14e0
     [23827.530770]  worker_thread+0x53b/0x1220
     [23827.531480]  kthread+0x328/0x3f0
     [23827.532114]  ret_from_fork+0x1f/0x30
     [23827.532785]
     [23827.533147] Last potentially related work creation:
     [23827.534007]  kasan_save_stack+0x1b/0x40
     [23827.534710]  kasan_record_aux_stack+0xab/0xc0
     [23827.535492]  kvfree_call_rcu+0x31/0x7b0
     [23827.536206]  mlx5e_tc_del_fdb_flow+0x577/0xef0 [mlx5_core]
     [23827.537305]  mlx5e_flow_put+0x49/0x80 [mlx5_core]
     [23827.538290]  mlx5e_delete_flower+0x6d1/0xe60 [mlx5_core]
     [23827.539300]  tc_setup_cb_destroy+0x18e/0x2f0
     [23827.540144]  fl_hw_destroy_filter+0x1d2/0x310 [cls_flower]
     [23827.541148]  __fl_delete+0x4dc/0x660 [cls_flower]
     [23827.541985]  fl_delete+0x97/0x160 [cls_flower]
     [23827.542782]  tc_del_tfilter+0x7ab/0x13d0
     [23827.543503]  rtnetlink_rcv_msg+0x644/0x8c0
     [23827.544257]  netlink_rcv_skb+0x11d/0x340
     [23827.544981]  netlink_unicast+0x42b/0x700
     [23827.545700]  netlink_sendmsg+0x743/0xc20
     [23827.546424]  sock_sendmsg+0xb2/0xe0
     [23827.547084]  ____sys_sendmsg+0x590/0x770
     [23827.547850]  ___sys_sendmsg+0xd8/0x160
     [23827.548606]  __sys_sendmsg+0xb7/0x140
     [23827.549303]  do_syscall_64+0x3a/0x70
     [23827.549969]  entry_SYSCALL_64_after_hwframe+0x44/0xae
     [23827.550853]
     [23827.551217] The buggy address belongs to the object at ffff8881d1322200
     [23827.551217]  which belongs to the cache kmalloc-256 of size 256
     [23827.553341] The buggy address is located 140 bytes inside of
     [23827.553341]  256-byte region [ffff8881d1322200, ffff8881d1322300)
     [23827.555747] The buggy address belongs to the page:
     [23827.556847] page:00000000898762aa refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1d1320
     [23827.558651] head:00000000898762aa order:2 compound_mapcount:0 compound_pincount:0
     [23827.559961] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
     [23827.561243] raw: 002ffff800010200 dead000000000100 dead000000000122 ffff888100042b40
     [23827.562653] raw: 0000000000000000 0000000000200020 00000001ffffffff 0000000000000000
     [23827.564112] page dumped because: kasan: bad access detected
     [23827.565439]
     [23827.565932] Memory state around the buggy address:
     [23827.566917]  ffff8881d1322180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
     [23827.568485]  ffff8881d1322200: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
     [23827.569818] >ffff8881d1322280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
     [23827.571143]                       ^
     [23827.571879]  ffff8881d1322300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
     [23827.573283]  ffff8881d1322380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
     [23827.574654] ==================================================================
    
    Most of the necessary logic is already correctly implemented by
    mlx5e_get_next_valid_encap() helper that is used in neigh stats update
    handler. Make the handler generic by renaming it to
    mlx5e_get_next_matching_encap() and use callback to test whether flow is
    matching instead of hardcoded check for 'valid' flag value. Implement
    mlx5e_get_next_valid_encap() by calling mlx5e_get_next_matching_encap()
    with callback that tests encap MLX5_ENCAP_ENTRY_VALID flag. Implement new
    mlx5e_get_next_init_encap() helper by calling
    mlx5e_get_next_matching_encap() with callback that tests encap completion
    result to be non-error and use it in mlx5e_rep_neigh_update() to safely
    iterate over nhe->encap_list.
    
    Remove encap completion logic from mlx5e_rep_update_flows() since the encap
    entries passed to this function are already guaranteed to be properly
    initialized by similar code in mlx5e_get_next_init_encap().
    
    Fixes: 2a1f176 ("net/mlx5e: Refactor neigh update for concurrent execution")
    Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
    Reviewed-by: Roi Dayan <roid@nvidia.com>
    Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
    vbuslov authored and Saeed Mahameed committed Jun 10, 2021
  26. net/mlx5e: Fix an error code in mlx5e_arfs_create_tables()

    When the code execute 'if (!priv->fs.arfs->wq)', the value of err is 0.
    So, we use -ENOMEM to indicate that the function
    create_singlethread_workqueue() return NULL.
    
    Clean up smatch warning:
    drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c:373
    mlx5e_arfs_create_tables() warn: missing error code 'err'.
    
    Reported-by: Abaci Robot <abaci@linux.alibaba.com>
    Fixes: f6755b8 ("net/mlx5e: Dynamic alloc arfs table for netdev when needed")
    Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
    Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
    Yang Li authored and Saeed Mahameed committed Jun 10, 2021

Commits on Jun 9, 2021

  1. Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/gi…

    …t/tnguy/net-queue
    
    Tony Nguyen says:
    
    ====================
    Intel Wired LAN Driver Updates 2021-06-09
    
    This series contains updates to ice driver only.
    
    Maciej informs the user when XDP is not supported due to the driver
    being in the 'safe mode' state. He also adds a parameter to Tx queue
    configuration to resolve an issue in configuring XDP queues as it cannot
    rely on using the number Tx or Rx queues.
    ====================
    
    Signed-off-by: David S. Miller <davem@davemloft.net>
    davem330 committed Jun 9, 2021
  2. net/sched: act_ct: handle DNAT tuple collision

    This this the counterpart of 8aa7b52 ("openvswitch: handle DNAT
    tuple collision") for act_ct. From that commit changelog:
    
    """
    With multiple DNAT rules it's possible that after destination
    translation the resulting tuples collide.
    
    ...
    
    Netfilter handles this case by allocating a null binding for SNAT at
    egress by default.  Perform the same operation in openvswitch for DNAT
    if no explicit SNAT is requested by the user and allocate a null binding
    for SNAT for packets in the "original" direction.
    """
    
    Fixes: 95219af ("act_ct: support asymmetric conntrack")
    Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    marceloleitner authored and davem330 committed Jun 9, 2021
  3. rtnetlink: Fix regression in bridge VLAN configuration

    Cited commit started returning errors when notification info is not
    filled by the bridge driver, resulting in the following regression:
    
     # ip link add name br1 type bridge vlan_filtering 1
     # bridge vlan add dev br1 vid 555 self pvid untagged
     RTNETLINK answers: Invalid argument
    
    As long as the bridge driver does not fill notification info for the
    bridge device itself, an empty notification should not be considered as
    an error. This is explained in commit 59ccaaa ("bridge: dont send
    notification when skb->len == 0 in rtnl_bridge_notify").
    
    Fix by removing the error and add a comment to avoid future bugs.
    
    Fixes: a8db57c ("rtnetlink: Fix missing error code in rtnl_bridge_notify()")
    Signed-off-by: Ido Schimmel <idosch@nvidia.com>
    Reviewed-by: Nikolay Aleksandrov <nikolay@nvidia.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    idosch authored and davem330 committed Jun 9, 2021
  4. Merge tag 'mac80211-for-net-2021-06-09' of git://git.kernel.org/pub/s…

    …cm/linux/kernel/git/jberg/mac80211
    
    Johannes berg says:
    
    ====================
    A fair number of fixes:
     * fix more fallout from RTNL locking changes
     * fixes for some of the bugs found by syzbot
     * drop multicast fragments in mac80211 to align
       with the spec and what drivers are doing now
     * fix NULL-ptr deref in radiotap injection
    ====================
    
    Signed-off-by: David S. Miller <davem@davemloft.net>
    davem330 committed Jun 9, 2021
  5. udp: fix race between close() and udp_abort()

    Kaustubh reported and diagnosed a panic in udp_lib_lookup().
    The root cause is udp_abort() racing with close(). Both
    racing functions acquire the socket lock, but udp{v6}_destroy_sock()
    release it before performing destructive actions.
    
    We can't easily extend the socket lock scope to avoid the race,
    instead use the SOCK_DEAD flag to prevent udp_abort from doing
    any action when the critical race happens.
    
    Diagnosed-and-tested-by: Kaustubh Pandey <kapandey@codeaurora.org>
    Fixes: 5d77dca ("net: diag: support SOCK_DESTROY for UDP sockets")
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Paolo Abeni authored and davem330 committed Jun 9, 2021
  6. inet: annotate data race in inet_send_prepare() and inet_dgram_connect()

    Both functions are known to be racy when reading inet_num
    as we do not want to grab locks for the common case the socket
    has been bound already. The race is resolved in inet_autobind()
    by reading again inet_num under the socket lock.
    
    syzbot reported:
    BUG: KCSAN: data-race in inet_send_prepare / udp_lib_get_port
    
    write to 0xffff88812cba150e of 2 bytes by task 24135 on cpu 0:
     udp_lib_get_port+0x4b2/0xe20 net/ipv4/udp.c:308
     udp_v6_get_port+0x5e/0x70 net/ipv6/udp.c:89
     inet_autobind net/ipv4/af_inet.c:183 [inline]
     inet_send_prepare+0xd0/0x210 net/ipv4/af_inet.c:807
     inet6_sendmsg+0x29/0x80 net/ipv6/af_inet6.c:639
     sock_sendmsg_nosec net/socket.c:654 [inline]
     sock_sendmsg net/socket.c:674 [inline]
     ____sys_sendmsg+0x360/0x4d0 net/socket.c:2350
     ___sys_sendmsg net/socket.c:2404 [inline]
     __sys_sendmmsg+0x315/0x4b0 net/socket.c:2490
     __do_sys_sendmmsg net/socket.c:2519 [inline]
     __se_sys_sendmmsg net/socket.c:2516 [inline]
     __x64_sys_sendmmsg+0x53/0x60 net/socket.c:2516
     do_syscall_64+0x4a/0x90 arch/x86/entry/common.c:47
     entry_SYSCALL_64_after_hwframe+0x44/0xae
    
    read to 0xffff88812cba150e of 2 bytes by task 24132 on cpu 1:
     inet_send_prepare+0x21/0x210 net/ipv4/af_inet.c:806
     inet6_sendmsg+0x29/0x80 net/ipv6/af_inet6.c:639
     sock_sendmsg_nosec net/socket.c:654 [inline]
     sock_sendmsg net/socket.c:674 [inline]
     ____sys_sendmsg+0x360/0x4d0 net/socket.c:2350
     ___sys_sendmsg net/socket.c:2404 [inline]
     __sys_sendmmsg+0x315/0x4b0 net/socket.c:2490
     __do_sys_sendmmsg net/socket.c:2519 [inline]
     __se_sys_sendmmsg net/socket.c:2516 [inline]
     __x64_sys_sendmmsg+0x53/0x60 net/socket.c:2516
     do_syscall_64+0x4a/0x90 arch/x86/entry/common.c:47
     entry_SYSCALL_64_after_hwframe+0x44/0xae
    
    value changed: 0x0000 -> 0x9db4
    
    Reported by Kernel Concurrency Sanitizer on:
    CPU: 1 PID: 24132 Comm: syz-executor.2 Not tainted 5.13.0-rc4-syzkaller #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Reported-by: syzbot <syzkaller@googlegroups.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    neebe000 authored and davem330 committed Jun 9, 2021
  7. net: ethtool: clear heap allocations for ethtool function

    Several ethtool functions leave heap uncleared (potentially) by
    drivers. This will leave the unused portion of heap unchanged and
    might copy the full contents back to userspace.
    
    Signed-off-by: Austin Kim <austindh.kim@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    austindhkim authored and davem330 committed Jun 9, 2021
Older