Skip to content
Permalink
Dongliang-Mu/a…
Switch branches/tags

Commits on Jul 13, 2021

  1. audit: fix memory leak in nf_tables_commit

    In nf_tables_commit, if nf_tables_commit_audit_alloc fails, it does not
    free the adp variable.
    
    Fix this by freeing the linked list with head adl.
    
    backtrace:
      kmalloc include/linux/slab.h:591 [inline]
      kzalloc include/linux/slab.h:721 [inline]
      nf_tables_commit_audit_alloc net/netfilter/nf_tables_api.c:8439 [inline]
      nf_tables_commit+0x16e/0x1760 net/netfilter/nf_tables_api.c:8508
      nfnetlink_rcv_batch+0x512/0xa80 net/netfilter/nfnetlink.c:562
      nfnetlink_rcv_skb_batch net/netfilter/nfnetlink.c:634 [inline]
      nfnetlink_rcv+0x1fa/0x220 net/netfilter/nfnetlink.c:652
      netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline]
      netlink_unicast+0x2c7/0x3e0 net/netlink/af_netlink.c:1340
      netlink_sendmsg+0x36b/0x6b0 net/netlink/af_netlink.c:1929
      sock_sendmsg_nosec net/socket.c:702 [inline]
      sock_sendmsg+0x56/0x80 net/socket.c:722
    
    Reported-by: syzbot <syzkaller@googlegroups.com>
    Fixes: c520292 ("audit: log nftables configuration change events once per table")
    Signed-off-by: Dongliang Mu <mudongliangabcd@gmail.com>
    mudongliang authored and intel-lab-lkp committed Jul 13, 2021

Commits on Jul 7, 2021

  1. Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf

    Pablo Neira Ayuso says:
    
    ====================
    Netfilter fixes for net
    
    The following patchset contains Netfilter fixes for net:
    
    1) Do not refresh timeout in SYN_SENT for syn retransmissions.
       Add selftest for unreplied TCP connection, from Florian Westphal.
    
    2) Fix null dereference from error path with hardware offload
       in nftables.
    
    3) Remove useless nf_ct_gre_keymap_flush() from netns exit path,
       from Vasily Averin.
    
    4) Missing rcu read-lock side in ctnetlink helper info dump,
       also from Vasily.
    
    5) Do not mark RST in the reply direction coming after SYN packet
       for an out-of-sync entry, from Ali Abdallah and Florian Westphal.
    
    6) Add tcp_ignore_invalid_rst sysctl to allow to disable out of
       segment RSTs, from Ali.
    
    7) KCSAN fix for nf_conntrack_all_lock(), from Manfred Spraul.
    
    8) Honor NFTA_LAST_SET in nft_last.
    
    9) Fix incorrect arithmetics when restore last_jiffies in nft_last.
    ====================
    
    Signed-off-by: David S. Miller <davem@davemloft.net>
    davem330 committed Jul 7, 2021
  2. selftests: icmp_redirect: IPv6 PMTU info should be cleared after redi…

    …rect
    
    After redirecting, it's already a new path. So the old PMTU info should
    be cleared. The IPv6 test "mtu exception plus redirect" should only
    has redirect info without old PMTU.
    
    The IPv4 test can not be changed because of legacy.
    
    Fixes: ec81053 ("selftests: Add redirect tests")
    Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
    Reviewed-by: David Ahern <dsahern@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    liuhangbin authored and davem330 committed Jul 7, 2021
  3. selftests: icmp_redirect: remove from checking for IPv6 route get

    If the kernel doesn't enable option CONFIG_IPV6_SUBTREES, the RTA_SRC
    info will not be exported to userspace in rt6_fill_node(). And ip cmd will
    not print "from ::" to the route output. So remove this check.
    
    Fixes: ec81053 ("selftests: Add redirect tests")
    Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
    Reviewed-by: David Ahern <dsahern@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    liuhangbin authored and davem330 committed Jul 7, 2021
  4. stmmac: platform: Fix signedness bug in stmmac_probe_config_dt()

    The "plat->phy_interface" variable is an enum and in this context GCC
    will treat it as an unsigned int so the error handling is never
    triggered.
    
    Fixes: b9f0b2f ("net: stmmac: platform: fix probe for ACPI devices")
    Signed-off-by: YueHaibing <yuehaibing@huawei.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    YueHaibing authored and davem330 committed Jul 7, 2021
  5. stmmac: dwmac-loongson: Fix unsigned comparison to zero

    plat->phy_interface is unsigned integer, so the condition
    can't be less than zero and the warning will never printed.
    
    Fixes: 30bba69 ("stmmac: pci: Add dwmac support for Loongson")
    Signed-off-by: YueHaibing <yuehaibing@huawei.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    YueHaibing authored and davem330 committed Jul 7, 2021
  6. netfilter: uapi: refer to nfnetlink_conntrack.h, not nf_conntrack_net…

    …link.h
    
    nf_conntrack_netlink.h does not exist, refer to nfnetlink_conntrack.h instead.
    
    Signed-off-by: Duncan Roe <duncan_roe@optusnet.com.au>
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
    duncan-roe authored and ummakynes committed Jul 7, 2021

Commits on Jul 6, 2021

  1. ipv6: fix 'disable_policy' for fwd packets

    The goal of commit df789fe ("ipv6: Provide ipv6 version of
    "disable_policy" sysctl") was to have the disable_policy from ipv4
    available on ipv6.
    However, it's not exactly the same mechanism. On IPv4, all packets coming
    from an interface, which has disable_policy set, bypass the policy check.
    For ipv6, this is done only for local packets, ie for packets destinated to
    an address configured on the incoming interface.
    
    Let's align ipv6 with ipv4 so that the 'disable_policy' sysctl has the same
    effect for both protocols.
    
    My first approach was to create a new kind of route cache entries, to be
    able to set DST_NOPOLICY without modifying routes. This would have added a
    lot of code. Because the local delivery path is already handled, I choose
    to focus on the forwarding path to minimize code churn.
    
    Fixes: df789fe ("ipv6: Provide ipv6 version of "disable_policy" sysctl")
    Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    NicolasDichtel authored and davem330 committed Jul 6, 2021
  2. octeontx2-pf: Fix assigned error return value that is never used

    Currently when the call to otx2_mbox_alloc_msg_cgx_mac_addr_update fails
    the error return variable rc is being assigned -ENOMEM and does not
    return early. rc is then re-assigned and the error case is not handled
    correctly. Fix this by returning -ENOMEM rather than assigning rc.
    
    Addresses-Coverity: ("Unused value")
    Fixes: 79d2be3 ("octeontx2-pf: offload DMAC filters to CGX/RPM block")
    Signed-off-by: Colin Ian King <colin.king@canonical.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Colin Ian King authored and davem330 committed Jul 6, 2021
  3. Merge branch 'bonding-ipsec'

    Taehee Yoo says:
    
    ====================
    net: fix bonding ipsec offload problems
    
    This series fixes some problems related to bonding ipsec offload.
    
    The 1, 5, and 8th patches are to add a missing rcu_read_lock().
    The 2nd patch is to add null check code to bond_ipsec_add_sa.
    When bonding interface doesn't have an active real interface, the
    bond->curr_active_slave pointer is null.
    But bond_ipsec_add_sa() uses that pointer without null check.
    So that it results in null-ptr-deref.
    The 3 and 4th patches are to replace xs->xso.dev with xs->xso.real_dev.
    The 6th patch is to disallow to set ipsec offload if a real interface
    type is bonding.
    The 7th patch is to add struct bond_ipsec to manage SA.
    If bond mode is changed, or active real interface is changed, SA should
    be removed from old current active real interface then it should be added
    to new active real interface.
    But it can't, because it doesn't manage SA.
    The 9th patch is to fix incorrect return value of bond_ipsec_offload_ok().
    
    v1 -> v2:
     - Add 9th patch.
     - Do not print warning when there is no SA in bond_ipsec_add_sa_all().
     - Add comment for ipsec_lock.
    ====================
    
    Signed-off-by: David S. Miller <davem@davemloft.net>
    davem330 committed Jul 6, 2021
  4. bonding: fix incorrect return value of bond_ipsec_offload_ok()

    bond_ipsec_offload_ok() is called to check whether the interface supports
    ipsec offload or not.
    bonding interface support ipsec offload only in active-backup mode.
    So, if a bond interface is not in active-backup mode, it should return
    false but it returns true.
    
    Fixes: a3b658c ("bonding: allow xfrm offload setup post-module-load")
    Signed-off-by: Taehee Yoo <ap420073@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    TaeheeYoo authored and davem330 committed Jul 6, 2021
  5. bonding: fix suspicious RCU usage in bond_ipsec_offload_ok()

    To dereference bond->curr_active_slave, it uses rcu_dereference().
    But it and the caller doesn't acquire RCU so a warning occurs.
    So add rcu_read_lock().
    
    Splat looks like:
    WARNING: suspicious RCU usage
    5.13.0-rc6+ #1179 Not tainted
    drivers/net/bonding/bond_main.c:571 suspicious
    rcu_dereference_check() usage!
    
    other info that might help us debug this:
    
    rcu_scheduler_active = 2, debug_locks = 1
    1 lock held by ping/974:
     #0: ffff888109e7db70 (sk_lock-AF_INET){+.+.}-{0:0},
    at: raw_sendmsg+0x1303/0x2cb0
    
    stack backtrace:
    CPU: 2 PID: 974 Comm: ping Not tainted 5.13.0-rc6+ #1179
    Call Trace:
     dump_stack+0xa4/0xe5
     bond_ipsec_offload_ok+0x1f4/0x260 [bonding]
     xfrm_output+0x179/0x890
     xfrm4_output+0xfa/0x410
     ? __xfrm4_output+0x4b0/0x4b0
     ? __ip_make_skb+0xecc/0x2030
     ? xfrm4_udp_encap_rcv+0x800/0x800
     ? ip_local_out+0x21/0x3a0
     ip_send_skb+0x37/0xa0
     raw_sendmsg+0x1bfd/0x2cb0
    
    Fixes: 18cb261 ("bonding: support hardware encryption offload to slaves")
    Signed-off-by: Taehee Yoo <ap420073@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    TaeheeYoo authored and davem330 committed Jul 6, 2021
  6. bonding: Add struct bond_ipesc to manage SA

    bonding has been supporting ipsec offload.
    When SA is added, bonding just passes SA to its own active real interface.
    But it doesn't manage SA.
    So, when events(add/del real interface, active real interface change, etc)
    occur, bonding can't handle that well because It doesn't manage SA.
    So some problems(panic, UAF, refcnt leak)occur.
    
    In order to make it stable, it should manage SA.
    That's the reason why struct bond_ipsec is added.
    When a new SA is added to bonding interface, it is stored in the
    bond_ipsec list. And the SA is passed to a current active real interface.
    If events occur, it uses bond_ipsec data to handle these events.
    bond->ipsec_list is protected by bond->ipsec_lock.
    
    If a current active real interface is changed, the following logic works.
    1. delete all SAs from old active real interface
    2. Add all SAs to the new active real interface.
    3. If a new active real interface doesn't support ipsec offload or SA's
    option, it sets real_dev to NULL.
    
    Fixes: 18cb261 ("bonding: support hardware encryption offload to slaves")
    Signed-off-by: Taehee Yoo <ap420073@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    TaeheeYoo authored and davem330 committed Jul 6, 2021
  7. bonding: disallow setting nested bonding + ipsec offload

    bonding interface can be nested and it supports ipsec offload.
    So, it allows setting the nested bonding + ipsec scenario.
    But code does not support this scenario.
    So, it should be disallowed.
    
    interface graph:
    bond2
       |
    bond1
       |
    eth0
    
    The nested bonding + ipsec offload may not a real usecase.
    So, disallowing this scenario is fine.
    
    Fixes: 18cb261 ("bonding: support hardware encryption offload to slaves")
    Signed-off-by: Taehee Yoo <ap420073@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    TaeheeYoo authored and davem330 committed Jul 6, 2021
  8. bonding: fix suspicious RCU usage in bond_ipsec_del_sa()

    To dereference bond->curr_active_slave, it uses rcu_dereference().
    But it and the caller doesn't acquire RCU so a warning occurs.
    So add rcu_read_lock().
    
    Test commands:
        ip netns add A
        ip netns exec A bash
        modprobe netdevsim
        echo "1 1" > /sys/bus/netdevsim/new_device
        ip link add bond0 type bond
        ip link set eth0 master bond0
        ip link set eth0 up
        ip link set bond0 up
        ip x s add proto esp dst 14.1.1.1 src 15.1.1.1 spi 0x07 mode \
    transport reqid 0x07 replay-window 32 aead 'rfc4106(gcm(aes))' \
    0x44434241343332312423222114131211f4f3f2f1 128 sel src 14.0.0.52/24 \
    dst 14.0.0.70/24 proto tcp offload dev bond0 dir in
        ip x s f
    
    Splat looks like:
    =============================
    WARNING: suspicious RCU usage
    5.13.0-rc3+ #1168 Not tainted
    -----------------------------
    drivers/net/bonding/bond_main.c:448 suspicious rcu_dereference_check()
    usage!
    
    other info that might help us debug this:
    
    rcu_scheduler_active = 2, debug_locks = 1
    2 locks held by ip/705:
     #0: ffff888106701780 (&net->xfrm.xfrm_cfg_mutex){+.+.}-{3:3},
    at: xfrm_netlink_rcv+0x59/0x80 [xfrm_user]
     #1: ffff8880075b0098 (&x->lock){+.-.}-{2:2},
    at: xfrm_state_delete+0x16/0x30
    
    stack backtrace:
    CPU: 6 PID: 705 Comm: ip Not tainted 5.13.0-rc3+ #1168
    Call Trace:
     dump_stack+0xa4/0xe5
     bond_ipsec_del_sa+0x16a/0x1c0 [bonding]
     __xfrm_state_delete+0x51f/0x730
     xfrm_state_delete+0x1e/0x30
     xfrm_state_flush+0x22f/0x390
     xfrm_flush_sa+0xd8/0x260 [xfrm_user]
     ? xfrm_flush_policy+0x290/0x290 [xfrm_user]
     xfrm_user_rcv_msg+0x331/0x660 [xfrm_user]
     ? rcu_read_lock_sched_held+0x91/0xc0
     ? xfrm_user_state_lookup.constprop.39+0x320/0x320 [xfrm_user]
     ? find_held_lock+0x3a/0x1c0
     ? mutex_lock_io_nested+0x1210/0x1210
     ? sched_clock_cpu+0x18/0x170
     netlink_rcv_skb+0x121/0x350
    [ ... ]
    
    Fixes: 18cb261 ("bonding: support hardware encryption offload to slaves")
    Signed-off-by: Taehee Yoo <ap420073@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    TaeheeYoo authored and davem330 committed Jul 6, 2021
  9. ixgbevf: use xso.real_dev instead of xso.dev in callback functions of…

    … struct xfrmdev_ops
    
    There are two pointers in struct xfrm_state_offload, *dev, *real_dev.
    These are used in callback functions of struct xfrmdev_ops.
    The *dev points whether bonding interface or real interface.
    If bonding ipsec offload is used, it points bonding interface If not,
    it points real interface.
    And real_dev always points real interface.
    So, ixgbevf should always use real_dev instead of dev.
    Of course, real_dev always not be null.
    
    Test commands:
        ip link add bond0 type bond
        #eth0 is ixgbevf interface
        ip link set eth0 master bond0
        ip link set bond0 up
        ip x s add proto esp dst 14.1.1.1 src 15.1.1.1 spi 0x07 mode \
    transport reqid 0x07 replay-window 32 aead 'rfc4106(gcm(aes))' \
    0x44434241343332312423222114131211f4f3f2f1 128 sel src 14.0.0.52/24 \
    dst 14.0.0.70/24 proto tcp offload dev bond0 dir in
    
    Splat looks like:
    KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
    CPU: 6 PID: 688 Comm: ip Not tainted 5.13.0-rc3+ #1168
    RIP: 0010:ixgbevf_ipsec_find_empty_idx+0x28/0x1b0 [ixgbevf]
    Code: 00 00 0f 1f 44 00 00 55 53 48 89 fb 48 83 ec 08 40 84 f6 0f 84 9c
    00 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04 02
    84 c0 74 08 3c 01 0f 8e 4c 01 00 00 66 81 3b 00 04 0f
    RSP: 0018:ffff8880089af390 EFLAGS: 00010246
    RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000001
    RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000000
    RBP: ffff8880089af4f8 R08: 0000000000000003 R09: fffffbfff4287e11
    R10: 0000000000000001 R11: ffff888005de8908 R12: 0000000000000000
    R13: ffff88810936a000 R14: ffff88810936a000 R15: ffff888004d78040
    FS:  00007fdf9883a680(0000) GS:ffff88811a400000(0000)
    knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 000055bc14adbf40 CR3: 000000000b87c005 CR4: 00000000003706e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
     ixgbevf_ipsec_add_sa+0x1bf/0x9c0 [ixgbevf]
     ? rcu_read_lock_sched_held+0x91/0xc0
     ? ixgbevf_ipsec_parse_proto_keys.isra.9+0x280/0x280 [ixgbevf]
     ? lock_acquire+0x191/0x720
     ? bond_ipsec_add_sa+0x48/0x350 [bonding]
     ? lockdep_hardirqs_on_prepare+0x3e0/0x3e0
     ? rcu_read_lock_held+0x91/0xa0
     ? rcu_read_lock_sched_held+0xc0/0xc0
     bond_ipsec_add_sa+0x193/0x350 [bonding]
     xfrm_dev_state_add+0x2a9/0x770
     ? memcpy+0x38/0x60
     xfrm_add_sa+0x2278/0x3b10 [xfrm_user]
     ? xfrm_get_policy+0xaa0/0xaa0 [xfrm_user]
     ? register_lock_class+0x1750/0x1750
     xfrm_user_rcv_msg+0x331/0x660 [xfrm_user]
     ? rcu_read_lock_sched_held+0x91/0xc0
     ? xfrm_user_state_lookup.constprop.39+0x320/0x320 [xfrm_user]
     ? find_held_lock+0x3a/0x1c0
     ? mutex_lock_io_nested+0x1210/0x1210
     ? sched_clock_cpu+0x18/0x170
     netlink_rcv_skb+0x121/0x350
    [ ... ]
    
    Fixes: 272c233 ("xfrm: bail early on slave pass over skb")
    Signed-off-by: Taehee Yoo <ap420073@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    TaeheeYoo authored and davem330 committed Jul 6, 2021
  10. net: netdevsim: use xso.real_dev instead of xso.dev in callback funct…

    …ions of struct xfrmdev_ops
    
    There are two pointers in struct xfrm_state_offload, *dev, *real_dev.
    These are used in callback functions of struct xfrmdev_ops.
    The *dev points whether bonding interface or real interface.
    If bonding ipsec offload is used, it points bonding interface If not,
    it points real interface.
    And real_dev always points real interface.
    So, netdevsim should always use real_dev instead of dev.
    Of course, real_dev always not be null.
    
    Test commands:
        ip netns add A
        ip netns exec A bash
        modprobe netdevsim
        echo "1 1" > /sys/bus/netdevsim/new_device
        ip link add bond0 type bond mode active-backup
        ip link set eth0 master bond0
        ip link set eth0 up
        ip link set bond0 up
        ip x s add proto esp dst 14.1.1.1 src 15.1.1.1 spi 0x07 mode \
    transport reqid 0x07 replay-window 32 aead 'rfc4106(gcm(aes))' \
    0x44434241343332312423222114131211f4f3f2f1 128 sel src 14.0.0.52/24 \
    dst 14.0.0.70/24 proto tcp offload dev bond0 dir in
    
    Splat looks like:
    BUG: spinlock bad magic on CPU#5, kworker/5:1/53
     lock: 0xffff8881068c2cc8, .magic: 11121314, .owner: <none>/-1,
    .owner_cpu: -235736076
    CPU: 5 PID: 53 Comm: kworker/5:1 Not tainted 5.13.0-rc3+ #1168
    Workqueue: events linkwatch_event
    Call Trace:
     dump_stack+0xa4/0xe5
     do_raw_spin_lock+0x20b/0x270
     ? rwlock_bug.part.1+0x90/0x90
     _raw_spin_lock_nested+0x5f/0x70
     bond_get_stats+0xe4/0x4c0 [bonding]
     ? rcu_read_lock_sched_held+0xc0/0xc0
     ? bond_neigh_init+0x2c0/0x2c0 [bonding]
     ? dev_get_alias+0xe2/0x190
     ? dev_get_port_parent_id+0x14a/0x360
     ? rtnl_unregister+0x190/0x190
     ? dev_get_phys_port_name+0xa0/0xa0
     ? memset+0x1f/0x40
     ? memcpy+0x38/0x60
     ? rtnl_phys_switch_id_fill+0x91/0x100
     dev_get_stats+0x8c/0x270
     rtnl_fill_stats+0x44/0xbe0
     ? nla_put+0xbe/0x140
     rtnl_fill_ifinfo+0x1054/0x3ad0
    [ ... ]
    
    Fixes: 272c233 ("xfrm: bail early on slave pass over skb")
    Signed-off-by: Taehee Yoo <ap420073@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    TaeheeYoo authored and davem330 committed Jul 6, 2021
  11. bonding: fix null dereference in bond_ipsec_add_sa()

    If bond doesn't have real device, bond->curr_active_slave is null.
    But bond_ipsec_add_sa() dereferences bond->curr_active_slave without
    null checking.
    So, null-ptr-deref would occur.
    
    Test commands:
        ip link add bond0 type bond
        ip link set bond0 up
        ip x s add proto esp dst 14.1.1.1 src 15.1.1.1 spi \
    0x07 mode transport reqid 0x07 replay-window 32 aead 'rfc4106(gcm(aes))' \
    0x44434241343332312423222114131211f4f3f2f1 128 sel src 14.0.0.52/24 \
    dst 14.0.0.70/24 proto tcp offload dev bond0 dir in
    
    Splat looks like:
    KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
    CPU: 4 PID: 680 Comm: ip Not tainted 5.13.0-rc3+ #1168
    RIP: 0010:bond_ipsec_add_sa+0xc4/0x2e0 [bonding]
    Code: 85 21 02 00 00 4d 8b a6 48 0c 00 00 e8 75 58 44 ce 85 c0 0f 85 14
    01 00 00 48 b8 00 00 00 00 00 fc ff df 4c 89 e2 48 c1 ea 03 <80> 3c 02
    00 0f 85 fc 01 00 00 48 8d bb e0 02 00 00 4d 8b 2c 24 48
    RSP: 0018:ffff88810946f508 EFLAGS: 00010246
    RAX: dffffc0000000000 RBX: ffff88810b4e8040 RCX: 0000000000000001
    RDX: 0000000000000000 RSI: ffffffff8fe34280 RDI: ffff888115abe100
    RBP: ffff88810946f528 R08: 0000000000000003 R09: fffffbfff2287e11
    R10: 0000000000000001 R11: ffff888115abe0c8 R12: 0000000000000000
    R13: ffffffffc0aea9a0 R14: ffff88800d7d2000 R15: ffff88810b4e8330
    FS:  00007efc5552e680(0000) GS:ffff888119c00000(0000)
    knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 000055c2530dbf40 CR3: 0000000103056004 CR4: 00000000003706e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
     xfrm_dev_state_add+0x2a9/0x770
     ? memcpy+0x38/0x60
     xfrm_add_sa+0x2278/0x3b10 [xfrm_user]
     ? xfrm_get_policy+0xaa0/0xaa0 [xfrm_user]
     ? register_lock_class+0x1750/0x1750
     xfrm_user_rcv_msg+0x331/0x660 [xfrm_user]
     ? rcu_read_lock_sched_held+0x91/0xc0
     ? xfrm_user_state_lookup.constprop.39+0x320/0x320 [xfrm_user]
     ? find_held_lock+0x3a/0x1c0
     ? mutex_lock_io_nested+0x1210/0x1210
     ? sched_clock_cpu+0x18/0x170
     netlink_rcv_skb+0x121/0x350
     ? xfrm_user_state_lookup.constprop.39+0x320/0x320 [xfrm_user]
     ? netlink_ack+0x9d0/0x9d0
     ? netlink_deliver_tap+0x17c/0xa50
     xfrm_netlink_rcv+0x68/0x80 [xfrm_user]
     netlink_unicast+0x41c/0x610
     ? netlink_attachskb+0x710/0x710
     netlink_sendmsg+0x6b9/0xb70
    [ ...]
    
    Fixes: 18cb261 ("bonding: support hardware encryption offload to slaves")
    Signed-off-by: Taehee Yoo <ap420073@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    TaeheeYoo authored and davem330 committed Jul 6, 2021
  12. bonding: fix suspicious RCU usage in bond_ipsec_add_sa()

    To dereference bond->curr_active_slave, it uses rcu_dereference().
    But it and the caller doesn't acquire RCU so a warning occurs.
    So add rcu_read_lock().
    
    Test commands:
        ip link add dummy0 type dummy
        ip link add bond0 type bond
        ip link set dummy0 master bond0
        ip link set dummy0 up
        ip link set bond0 up
        ip x s add proto esp dst 14.1.1.1 src 15.1.1.1 spi 0x07 \
    	    mode transport \
    	    reqid 0x07 replay-window 32 aead 'rfc4106(gcm(aes))' \
    	    0x44434241343332312423222114131211f4f3f2f1 128 sel \
    	    src 14.0.0.52/24 dst 14.0.0.70/24 proto tcp offload \
    	    dev bond0 dir in
    
    Splat looks like:
    =============================
    WARNING: suspicious RCU usage
    5.13.0-rc3+ #1168 Not tainted
    -----------------------------
    drivers/net/bonding/bond_main.c:411 suspicious rcu_dereference_check() usage!
    
    other info that might help us debug this:
    
    rcu_scheduler_active = 2, debug_locks = 1
    1 lock held by ip/684:
     #0: ffffffff9a2757c0 (&net->xfrm.xfrm_cfg_mutex){+.+.}-{3:3},
    at: xfrm_netlink_rcv+0x59/0x80 [xfrm_user]
       55.191733][  T684] stack backtrace:
    CPU: 0 PID: 684 Comm: ip Not tainted 5.13.0-rc3+ #1168
    Call Trace:
     dump_stack+0xa4/0xe5
     bond_ipsec_add_sa+0x18c/0x1f0 [bonding]
     xfrm_dev_state_add+0x2a9/0x770
     ? memcpy+0x38/0x60
     xfrm_add_sa+0x2278/0x3b10 [xfrm_user]
     ? xfrm_get_policy+0xaa0/0xaa0 [xfrm_user]
     ? register_lock_class+0x1750/0x1750
     xfrm_user_rcv_msg+0x331/0x660 [xfrm_user]
     ? rcu_read_lock_sched_held+0x91/0xc0
     ? xfrm_user_state_lookup.constprop.39+0x320/0x320 [xfrm_user]
     ? find_held_lock+0x3a/0x1c0
     ? mutex_lock_io_nested+0x1210/0x1210
     ? sched_clock_cpu+0x18/0x170
     netlink_rcv_skb+0x121/0x350
     ? xfrm_user_state_lookup.constprop.39+0x320/0x320 [xfrm_user]
     ? netlink_ack+0x9d0/0x9d0
     ? netlink_deliver_tap+0x17c/0xa50
     xfrm_netlink_rcv+0x68/0x80 [xfrm_user]
     netlink_unicast+0x41c/0x610
     ? netlink_attachskb+0x710/0x710
     netlink_sendmsg+0x6b9/0xb70
    [ ... ]
    
    Fixes: 18cb261 ("bonding: support hardware encryption offload to slaves")
    Signed-off-by: Taehee Yoo <ap420073@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    TaeheeYoo authored and davem330 committed Jul 6, 2021
  13. tcp: fix tcp_init_transfer() to not reset icsk_ca_initialized

    This commit fixes a bug (found by syzkaller) that could cause spurious
    double-initializations for congestion control modules, which could cause
    memory leaks or other problems for congestion control modules (like CDG)
    that allocate memory in their init functions.
    
    The buggy scenario constructed by syzkaller was something like:
    
    (1) create a TCP socket
    (2) initiate a TFO connect via sendto()
    (3) while socket is in TCP_SYN_SENT, call setsockopt(TCP_CONGESTION),
        which calls:
           tcp_set_congestion_control() ->
             tcp_reinit_congestion_control() ->
               tcp_init_congestion_control()
    (4) receive ACK, connection is established, call tcp_init_transfer(),
        set icsk_ca_initialized=0 (without first calling cc->release()),
        call tcp_init_congestion_control() again.
    
    Note that in this sequence tcp_init_congestion_control() is called
    twice without a cc->release() call in between. Thus, for CC modules
    that allocate memory in their init() function, e.g, CDG, a memory leak
    may occur. The syzkaller tool managed to find a reproducer that
    triggered such a leak in CDG.
    
    The bug was introduced when that commit 8919a9b ("tcp: Only init
    congestion control if not initialized already")
    introduced icsk_ca_initialized and set icsk_ca_initialized to 0 in
    tcp_init_transfer(), missing the possibility for a sequence like the
    one above, where a process could call setsockopt(TCP_CONGESTION) in
    state TCP_SYN_SENT (i.e. after the connect() or TFO open sendmsg()),
    which would call tcp_init_congestion_control(). It did not intend to
    reset any initialization that the user had already explicitly made;
    it just missed the possibility of that particular sequence (which
    syzkaller managed to find).
    
    Fixes: 8919a9b ("tcp: Only init congestion control if not initialized already")
    Reported-by: syzbot+f1e24a0594d4e3a895d3@syzkaller.appspotmail.com
    Signed-off-by: Nguyen Dinh Phi <phind.uet@gmail.com>
    Acked-by: Neal Cardwell <ncardwell@google.com>
    Tested-by: Neal Cardwell <ncardwell@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    ita93 authored and davem330 committed Jul 6, 2021
  14. skbuff: Release nfct refcount on napi stolen or re-used skbs

    When multiple SKBs are merged to a new skb under napi GRO,
    or SKB is re-used by napi, if nfct was set for them in the
    driver, it will not be released while freeing their stolen
    head state or on re-use.
    
    Release nfct on napi's stolen or re-used SKBs, and
    in gro_list_prepare, check conntrack metadata diff.
    
    Fixes: 5c6b946 ("net/mlx5e: CT: Handle misses after executing CT action")
    Reviewed-by: Roi Dayan <roid@nvidia.com>
    Signed-off-by: Paul Blakey <paulb@nvidia.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Paul Blakey authored and davem330 committed Jul 6, 2021
  15. netfilter: nft_last: incorrect arithmetics when restoring last used

    Subtract the jiffies that have passed by to current jiffies to fix last
    used restoration.
    
    Fixes: 836382d ("netfilter: nf_tables: add last expression")
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
    ummakynes committed Jul 6, 2021
  16. netfilter: nft_last: honor NFTA_LAST_SET on restoration

    NFTA_LAST_SET tells us if this expression has ever seen a packet, do not
    ignore this attribute when restoring the ruleset.
    
    Fixes: 836382d ("netfilter: nf_tables: add last expression")
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
    ummakynes committed Jul 6, 2021
  17. netfilter: conntrack: Mark access for KCSAN

    KCSAN detected an data race with ipc/sem.c that is intentional.
    
    As nf_conntrack_lock() uses the same algorithm: Update
    nf_conntrack_core as well:
    
    nf_conntrack_lock() contains
      a1) spin_lock()
      a2) smp_load_acquire(nf_conntrack_locks_all).
    
    a1) actually accesses one lock from an array of locks.
    
    nf_conntrack_locks_all() contains
      b1) nf_conntrack_locks_all=true (normal write)
      b2) spin_lock()
      b3) spin_unlock()
    
    b2 and b3 are done for every lock.
    
    This guarantees that nf_conntrack_locks_all() prevents any
    concurrent nf_conntrack_lock() owners:
    If a thread past a1), then b2) will block until that thread releases
    the lock.
    If the threat is before a1, then b3)+a1) ensure the write b1) is
    visible, thus a2) is guaranteed to see the updated value.
    
    But: This is only the latest time when b1) becomes visible.
    It may also happen that b1) is visible an undefined amount of time
    before the b3). And thus KCSAN will notice a data race.
    
    In addition, the compiler might be too clever.
    
    Solution: Use WRITE_ONCE().
    
    Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
    manfred-colorfu authored and ummakynes committed Jul 6, 2021
  18. netfilter: conntrack: add new sysctl to disable RST check

    This patch adds a new sysctl tcp_ignore_invalid_rst to disable marking
    out of segments RSTs as INVALID.
    
    Signed-off-by: Ali Abdallah <aabdallah@suse.de>
    Acked-by: Florian Westphal <fw@strlen.de>
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
    Alix82 authored and ummakynes committed Jul 6, 2021
  19. netfilter: conntrack: improve RST handling when tuple is re-used

    If we receive a SYN packet in original direction on an existing
    connection tracking entry, we let this SYN through because conntrack
    might be out-of-sync.
    
    Conntrack gets back in sync when server responds with SYN/ACK and state
    gets updated accordingly.
    
    However, if server replies with RST, this packet might be marked as
    INVALID because td_maxack value reflects the *old* conntrack state
    and not the state of the originator of the RST.
    
    Avoid td_maxack-based checks if previous packet was a SYN.
    
    Unfortunately that is not be enough: an out of order ACK in original
    direction updates last_index, so we still end up marking valid RST.
    
    Thus disable the sequence check when we are not in established state and
    the received RST has a sequence of 0.
    
    Because marking RSTs as invalid usually leads to unwanted timeouts,
    also skip RST sequence checks if a conntrack entry is already closing.
    
    Such entries can already be evicted via GC in case the table is full.
    
    Co-developed-by: Florian Westphal <fw@strlen.de>
    Signed-off-by: Florian Westphal <fw@strlen.de>
    Signed-off-by: Ali Abdallah <aabdallah@suse.de>
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
    Ali Abdallah authored and ummakynes committed Jul 6, 2021

Commits on Jul 5, 2021

  1. Merge branch 'stmmac-ptp'

    Xiaoliang Yang says:
    
    ====================
    net: stmmac: re-configure tas basetime after ptp time adjust
    
    If the DWMAC Ethernet device has already set the Qbv EST configuration
    before using ptp to synchronize the time adjustment, the Qbv base time
    may change to be the past time of the new current time. This is not
    allowed by hardware.
    
    This patch calculates and re-configures the Qbv basetime after ptp time
    adjustment.
    
    v1->v2:
      Update est mutex lock to protect btr/ctr r/w to be atomic.
      Add btr_reserve to store basetime from qopt and used as origin base
    time in Qbv re-configuration.
    ====================
    
    Signed-off-by: David S. Miller <davem@davemloft.net>
    davem330 committed Jul 5, 2021
  2. net: stmmac: ptp: update tas basetime after ptp adjust

    After adjusting the ptp time, the Qbv base time may be the past time
    of the new current time. dwmac5 hardware limited the base time cannot
    be set as past time. This patch add a btr_reserve to store the base
    time get from qopt, then calculate the base time and reset the Qbv
    configuration after ptp time adjust.
    
    Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Xiaoliang Yang authored and davem330 committed Jul 5, 2021
  3. net: stmmac: add mutex lock to protect est parameters

    Add a mutex lock to protect est structure parameters so that the
    EST parameters can be updated by other threads.
    
    Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Xiaoliang Yang authored and davem330 committed Jul 5, 2021
  4. net: stmmac: separate the tas basetime calculation function

    Separate the TAS basetime calculation function so that it can be
    called by other functions.
    
    Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Xiaoliang Yang authored and davem330 committed Jul 5, 2021
  5. ptp: fix format string mismatch in ptp_sysfs.c

    Fix format string mismatch in ptp_sysfs.c. Use %u for unsigned int.
    
    Fixes: 73f3706 ("ptp: support ptp physical/virtual clocks conversion")
    Reported-by: kernel test robot <lkp@intel.com>
    Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    yangbolu1991 authored and davem330 committed Jul 5, 2021
  6. ptp: fix NULL pointer dereference in ptp_clock_register

    Fix NULL pointer dereference in ptp_clock_register. The argument
    "parent" of ptp_clock_register may be NULL pointer.
    
    Fixes: 73f3706 ("ptp: support ptp physical/virtual clocks conversion")
    Reported-by: kernel test robot <oliver.sang@intel.com>
    Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    yangbolu1991 authored and davem330 committed Jul 5, 2021

Commits on Jul 3, 2021

  1. net: marvell: always set skb_shared_info in mvneta_swbm_add_rx_fragment

    Always set skb_shared_info data structure in mvneta_swbm_add_rx_fragment
    routine even if the fragment contains only the ethernet FCS.
    
    Fixes: 039fbc4 ("net: mvneta: alloc skb_shared_info on the mvneta_rx_swbm stack")
    Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    LorenzoBianconi authored and davem330 committed Jul 3, 2021

Commits on Jul 2, 2021

  1. udp: properly flush normal packet at GRO time

    If an UDP packet enters the GRO engine but is not eligible
    for aggregation and is not targeting an UDP tunnel,
    udp_gro_receive() will not set the flush bit, and packet
    could delayed till the next napi flush.
    
    Fix the issue ensuring non GROed packets traverse
    skb_gro_flush_final().
    
    Reported-and-tested-by: Matthias Treydte <mt@waldheinz.de>
    Fixes: 18f25dc ("udp: skip L4 aggregation for UDP tunnel packets")
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Paolo Abeni authored and davem330 committed Jul 2, 2021
  2. vmxnet3: fix cksum offload issues for tunnels with non-default udp ports

    Commit dacce2b ("vmxnet3: add geneve and vxlan tunnel offload
    support") added support for encapsulation offload. However, the inner
    offload capability is to be restricted to UDP tunnels with default
    Vxlan and Geneve ports.
    
    This patch fixes the issue for tunnels with non-default ports using
    features check capability and filtering appropriate features for such
    tunnels.
    
    Fixes: dacce2b ("vmxnet3: add geneve and vxlan tunnel offload support")
    Signed-off-by: Ronak Doshi <doshir@vmware.com>
    Acked-by: Guolin Yang <gyang@vmware.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Ronak Doshi authored and davem330 committed Jul 2, 2021
Older