Skip to content
Permalink
Hariprasad-Kel…
Switch branches/tags

Commits on Feb 8, 2022

  1. octeontx2-pf: PFC config support with DCBx

    Data centric bridging designed to eliminate packet loss due to
    queue overflow by adding enhancements to ethernet network such as
    proprity flow control etc. This patch adds support for management
    of Priority flow control(PFC) on Octeontx2 and CN10K interfaces.
    
    To enable PFC for all priorities
    	dcb pfc set dev eth0 prio-pfc all:on/off
    
    To enable PFC on selected priorites
    	dcb pfc set dev eth0 prio-pfc 0:on/off 1:on/off ..7:on/off
    
    With the ntuple commands user can map Priority to receive queues.
    On queue overflow NIX will assert backpressure such that PFC pause frames
    are genarated with mapped priority.
    
    To map priority 7 to Queue 1
    ethtool -U eth0 flow-type ether dst xx:xx:xx:xx:xx:xx vlan 0xe00a
    m 0x1fff  queue 1
    
    Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
    Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
    Hariprasad Kelam authored and intel-lab-lkp committed Feb 8, 2022
  2. octeontx2-af: Flow control resource management

    CN10K MAC block (RPM) and Octeontx2 MAC block (CGX) both supports
    PFC flow control and 802.3X flow control pause frames.
    
    Each MAC block supports max 4 LMACS and AF driver assigns same
    (MAC,LMAC) to PF and its VFs. As PF and its share same (MAC,LMAC)
    pair we need resource management to address below scenarios
    
    1. Maintain PFC and 8023X pause frames mutually exclusive.
    2. Reject disable flow control request if other PF or Vfs
       enabled it.
    
    Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
    Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
    Hariprasad Kelam authored and intel-lab-lkp committed Feb 8, 2022
  3. octeontx2-af: Priority flow control configuration support

    Prirority based flow control (802.1Qbb)  mechanism is similar to
    ethernet pause frames (802.3x) instead pausing all traffic on a link,
    PFC allows user to selectively pause traffic according to its class.
    
    Oceteontx2 MAC block (CGX) and CN10K Mac block (RPM) both supports
    PFC. As upper layer mbox handler is same for both the MACs, this
    patch configures PFC by calling apporopritate callbacks.
    
    Signed-off-by: Sunil Kumar Kori <skori@marvell.com>
    Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
    Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
    SunilKumarKori authored and intel-lab-lkp committed Feb 8, 2022
  4. octeontx2-af: Don't enable Pause frames by default

    Current implementation is such that 802.3x pause frames are
    enabled by default.  As CGX and RPM blocks support PFC
    (priority flow control) also, instead of driver enabling one
    between them enable them upon request from PF or its VFs.
    Also add support to disable pause frames in driver unbind.
    
    Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
    Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
    Hariprasad Kelam authored and intel-lab-lkp committed Feb 8, 2022
  5. Merge branch 'inet-separate-dscp-from-ecn-bits-using-new-dscp_t-type'

    Guillaume Nault says:
    
    ====================
    inet: Separate DSCP from ECN bits using new dscp_t type
    
    The networking stack currently doesn't clearly distinguish between DSCP
    and ECN bits. The entire DSCP+ECN bits are stored in u8 variables (or
    structure fields), and each part of the stack handles them in their own
    way, using different macros. This has created several bugs in the past
    and some uncommon code paths are still unfixed.
    
    Such bugs generally manifest by selecting invalid routes because of ECN
    bits interfering with FIB routes and rules lookups (more details in the
    LPC 2021 talk[1] and in the RFC of this series[2]).
    
    This patch series aims at preventing the introduction of such bugs (and
    detecting existing ones), by introducing a dscp_t type, representing
    "sanitised" DSCP values (that is, with no ECN information), as opposed
    to plain u8 values that contain both DSCP and ECN information. dscp_t
    makes it clear for the reader what we're working on, and Sparse can
    flag invalid interactions between dscp_t and plain u8.
    
    This series converts only a few variables and structures:
    
      * Patch 1 converts the tclass field of struct fib6_rule. It
        effectively forbids the use of ECN bits in the tos/dsfield option
        of ip -6 rule. Rules now match packets solely based on their DSCP
        bits, so ECN doesn't influence the result any more. This contrasts
        with the previous behaviour where all 8 bits of the Traffic Class
        field were used. It is believed that this change is acceptable as
        matching ECN bits wasn't usable for IPv4, so only IPv6-only
        deployments could be depending on it. Also the previous behaviour
        made DSCP-based ip6-rules fail for packets with both a DSCP and an
        ECN mark, which is another reason why any such deploy is unlikely.
    
      * Patch 2 converts the tos field of struct fib4_rule. This one too
        effectively forbids defining ECN bits, this time in ip -4 rule.
        Before that, setting ECN bit 1 was accepted, while ECN bit 0 was
        rejected. But even when accepted, the rule would never match, as
        the packets would have their ECN bits cleared before doing the
        rule lookup.
    
      * Patch 3 converts the fc_tos field of struct fib_config. This is
        equivalent to patch 2, but for IPv4 routes. Routes using a
        tos/dsfield option with any ECN bit set is now rejected. Before
        this patch, they were accepted but, as with ip4 rules, these routes
        couldn't match any packet, since their ECN bits are cleared before
        the lookup.
    
      * Patch 4 converts the fa_tos field of struct fib_alias. This one is
        pure internal u8 to dscp_t conversion. While patches 1-3 had user
        facing consequences, this patch shouldn't have any side effect and
        is there to give an overview of what future conversion patches will
        look like. Conversions are quite mechanical, but imply some code
        churn, which is the price for the extra clarity a possibility of
        type checking.
    
    To summarise, all the behaviour changes required for the dscp_t type
    approach to work should be contained in patches 1-3. These changes are
    edge cases of ip-route and ip-rule that don't currently work properly.
    So they should be safe. Also, a kernel selftest is added for each of
    them.
    
    Finally, this work also paves the way for allowing the usage of the 3
    high order DSCP bits in IPv4 (a few call paths already handle them, but
    in general the stack clears them before IPv4 rule and route lookups).
    
    References:
      [1] LPC 2021 talk:
            - https://linuxplumbersconf.org/event/11/contributions/943/
            - Direct link to slide deck:
                https://linuxplumbersconf.org/event/11/contributions/943/attachments/901/1780/inet_tos_lpc2021.pdf
      [2] RFC version of this series:
          - https://lore.kernel.org/netdev/cover.1638814614.git.gnault@redhat.com/
    ====================
    
    Link: https://lore.kernel.org/r/cover.1643981839.git.gnault@redhat.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Jakub Kicinski committed Feb 8, 2022
  6. ipv4: Use dscp_t in struct fib_alias

    Use the new dscp_t type to replace the fa_tos field of fib_alias. This
    ensures ECN bits are ignored and makes the field compatible with the
    fc_dscp field of struct fib_config.
    
    Converting old *tos variables and fields to dscp_t allows sparse to
    flag incorrect uses of DSCP and ECN bits. This patch is entirely about
    type annotation and shouldn't change any existing behaviour.
    
    Signed-off-by: Guillaume Nault <gnault@redhat.com>
    Acked-by: David Ahern <dsahern@kernel.org>
    Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Guillaume Nault authored and Jakub Kicinski committed Feb 8, 2022
  7. ipv4: Reject routes specifying ECN bits in rtm_tos

    Use the new dscp_t type to replace the fc_tos field of fib_config, to
    ensure IPv4 routes aren't influenced by ECN bits when configured with
    non-zero rtm_tos.
    
    Before this patch, IPv4 routes specifying an rtm_tos with some of the
    ECN bits set were accepted. However they wouldn't work (never match) as
    IPv4 normally clears the ECN bits with IPTOS_RT_MASK before doing a FIB
    lookup (although a few buggy code paths don't).
    
    After this patch, IPv4 routes specifying an rtm_tos with any ECN bit
    set is rejected.
    
    Note: IPv6 routes ignore rtm_tos altogether, any rtm_tos is accepted,
    but treated as if it were 0.
    
    Signed-off-by: Guillaume Nault <gnault@redhat.com>
    Acked-by: David Ahern <dsahern@kernel.org>
    Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Guillaume Nault authored and Jakub Kicinski committed Feb 8, 2022
  8. ipv4: Stop taking ECN bits into account in fib4-rules

    Use the new dscp_t type to replace the tos field of struct fib4_rule,
    so that fib4-rules consistently ignore ECN bits.
    
    Before this patch, fib4-rules did accept rules with the high order ECN
    bit set (but not the low order one). Also, it relied on its callers
    masking the ECN bits of ->flowi4_tos to prevent those from influencing
    the result. This was brittle and a few call paths still do the lookup
    without masking the ECN bits first.
    
    After this patch fib4-rules only compare the DSCP bits. ECN can't
    influence the result anymore, even if the caller didn't mask these
    bits. Also, fib4-rules now must have both ECN bits cleared or they will
    be rejected.
    
    Signed-off-by: Guillaume Nault <gnault@redhat.com>
    Acked-by: David Ahern <dsahern@kernel.org>
    Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Guillaume Nault authored and Jakub Kicinski committed Feb 8, 2022
  9. ipv6: Define dscp_t and stop taking ECN bits into account in fib6-rules

    Define a dscp_t type and its appropriate helpers that ensure ECN bits
    are not taken into account when handling DSCP.
    
    Use this new type to replace the tclass field of struct fib6_rule, so
    that fib6-rules don't get influenced by ECN bits anymore.
    
    Before this patch, fib6-rules didn't make any distinction between the
    DSCP and ECN bits. Therefore, rules specifying a DSCP (tos or dsfield
    options in iproute2) stopped working as soon a packets had at least one
    of its ECN bits set (as a work around one could create four rules for
    each DSCP value to match, one for each possible ECN value).
    
    After this patch fib6-rules only compare the DSCP bits. ECN doesn't
    influence the result anymore. Also, fib6-rules now must have the ECN
    bits cleared or they will be rejected.
    
    Signed-off-by: Guillaume Nault <gnault@redhat.com>
    Acked-by: David Ahern <dsahern@kernel.org>
    Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Guillaume Nault authored and Jakub Kicinski committed Feb 8, 2022
  10. net: stmmac: optimize locking around PTP clock reads

    Reading the PTP clock is a simple operation requiring only 3 register
    reads. Under a PREEMPT_RT kernel, protecting those reads by a spin_lock is
    counter-productive: if the 2nd task preempting the 1st has a higher prio
    but needs to read time as well, it will require 2 context switches, which
    will pretty much always be more costly than just disabling preemption for
    the duration of the reads. Moreover, with the code logic recently added
    to get_systime(), disabling preemption is not even required anymore:
    reads and writes just need to be protected from each other, to prevent a
    clock read while the clock is being updated.
    
    Improve the above situation by replacing the PTP spinlock by a rwlock, and
    using read_lock for PTP clock reads so simultaneous reads do not block
    each other.
    
    Signed-off-by: Yannick Vignon <yannick.vignon@nxp.com>
    Link: https://lore.kernel.org/r/20220204135545.2770625-1-yannick.vignon@oss.nxp.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Yackou authored and Jakub Kicinski committed Feb 8, 2022
  11. net: typhoon: include <net/vxlan.h>

    We need this to get vxlan_features_check() definition.
    
    Fixes: d2692ee ("net: typhoon: implement ndo_features_check method")
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Link: https://lore.kernel.org/r/20220208003502.1799728-1-eric.dumazet@gmail.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    neebe000 authored and Jakub Kicinski committed Feb 8, 2022

Commits on Feb 7, 2022

  1. net: dsa: mv88e6xxx: Unlock on error in mv88e6xxx_port_bridge_join()

    Call mv88e6xxx_reg_unlock(chip) before returning on this error path.
    
    Fixes: 7af4a36 ("net: dsa: mv88e6xxx: Improve isolation of standalone ports")
    Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    error27 authored and davem330 committed Feb 7, 2022
  2. net: dsa: mv88e6xxx: Fix off by in one in mv88e6185_phylink_get_caps()

    The <= ARRAY_SIZE() needs to be < ARRAY_SIZE() to prevent an out of
    bounds error.
    
    Fixes: d4ebf12 ("net: dsa: mv88e6xxx: populate supported_interfaces and mac_capabilities")
    Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
    Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    error27 authored and davem330 committed Feb 7, 2022
  3. net: hns3: add support for TX push mode

    For the device that supports the TX push capability, the BD can
    be directly copied to the device memory. However, due to hardware
    restrictions, the push mode can be used only when there are no
    more than two BDs, otherwise, the doorbell mode based on device
    memory is used.
    
    Signed-off-by: Yufeng Mo <moyufeng@huawei.com>
    Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Yufeng Mo authored and davem330 committed Feb 7, 2022
  4. net: asix: add proper error handling of usb read errors

    Syzbot once again hit uninit value in asix driver. The problem still the
    same -- asix_read_cmd() reads less bytes, than was requested by caller.
    
    Since all read requests are performed via asix_read_cmd() let's catch
    usb related error there and add __must_check notation to be sure all
    callers actually check return value.
    
    So, this patch adds sanity check inside asix_read_cmd(), that simply
    checks if bytes read are not less, than was requested and adds missing
    error handling of asix_read_cmd() all across the driver code.
    
    Fixes: d9fe64e ("net: asix: Add in_pm parameter")
    Reported-and-tested-by: syzbot+6ca9f7867b77c2d316ac@syzkaller.appspotmail.com
    Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
    Tested-by: Oleksij Rempel <o.rempel@pengutronix.de>
    Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    pskrgag authored and davem330 committed Feb 7, 2022
  5. r8169: factor out redundant RTL8168d PHY config functionality to rtl8…

    …168d_1_common()
    
    rtl8168d_2_hw_phy_config() shares quite some functionality with
    rtl8168d_1_hw_phy_config(), so let's factor out the common part to a
    new function rtl8168d_1_common(). In addition improve the code a little.
    
    Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    hkallweit authored and davem330 committed Feb 7, 2022
  6. ip6mr: fix use-after-free in ip6mr_sk_done()

    Apparently addrconf_exit_net() is called before igmp6_net_exit()
    and ndisc_net_exit() at netns dismantle time:
    
     net_namespace: call ip6table_mangle_net_exit()
     net_namespace: call ip6_tables_net_exit()
     net_namespace: call ipv6_sysctl_net_exit()
     net_namespace: call ioam6_net_exit()
     net_namespace: call seg6_net_exit()
     net_namespace: call ping_v6_proc_exit_net()
     net_namespace: call tcpv6_net_exit()
     ip6mr_sk_done sk=ffffa354c78a74c0
     net_namespace: call ipv6_frags_exit_net()
     net_namespace: call addrconf_exit_net()
     net_namespace: call ip6addrlbl_net_exit()
     net_namespace: call ip6_flowlabel_net_exit()
     net_namespace: call ip6_route_net_exit_late()
     net_namespace: call fib6_rules_net_exit()
     net_namespace: call xfrm6_net_exit()
     net_namespace: call fib6_net_exit()
     net_namespace: call ip6_route_net_exit()
     net_namespace: call ipv6_inetpeer_exit()
     net_namespace: call if6_proc_net_exit()
     net_namespace: call ipv6_proc_exit_net()
     net_namespace: call udplite6_proc_exit_net()
     net_namespace: call raw6_exit_net()
     net_namespace: call igmp6_net_exit()
     ip6mr_sk_done sk=ffffa35472b2a180
     ip6mr_sk_done sk=ffffa354c78a7980
     net_namespace: call ndisc_net_exit()
     ip6mr_sk_done sk=ffffa35472b2ab00
     net_namespace: call ip6mr_net_exit()
     net_namespace: call inet6_net_exit()
    
    This was fine because ip6mr_sk_done() would not reach the point decreasing
    net->ipv6.devconf_all->mc_forwarding until my patch in ip6mr_sk_done().
    
    To fix this without changing struct pernet_operations ordering,
    we can clear net->ipv6.devconf_dflt and net->ipv6.devconf_all
    when they are freed from addrconf_exit_net()
    
    BUG: KASAN: use-after-free in instrument_atomic_read include/linux/instrumented.h:71 [inline]
    BUG: KASAN: use-after-free in atomic_read include/linux/atomic/atomic-instrumented.h:27 [inline]
    BUG: KASAN: use-after-free in ip6mr_sk_done+0x11b/0x410 net/ipv6/ip6mr.c:1578
    Read of size 4 at addr ffff88801ff08688 by task kworker/u4:4/963
    
    CPU: 0 PID: 963 Comm: kworker/u4:4 Not tainted 5.17.0-rc2-syzkaller-00650-g5a8fb33e5305 #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    Workqueue: netns cleanup_net
    Call Trace:
     <TASK>
     __dump_stack lib/dump_stack.c:88 [inline]
     dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
     print_address_description.constprop.0.cold+0x8d/0x336 mm/kasan/report.c:255
     __kasan_report mm/kasan/report.c:442 [inline]
     kasan_report.cold+0x83/0xdf mm/kasan/report.c:459
     check_region_inline mm/kasan/generic.c:183 [inline]
     kasan_check_range+0x13d/0x180 mm/kasan/generic.c:189
     instrument_atomic_read include/linux/instrumented.h:71 [inline]
     atomic_read include/linux/atomic/atomic-instrumented.h:27 [inline]
     ip6mr_sk_done+0x11b/0x410 net/ipv6/ip6mr.c:1578
     rawv6_close+0x58/0x80 net/ipv6/raw.c:1201
     inet_release+0x12e/0x280 net/ipv4/af_inet.c:428
     inet6_release+0x4c/0x70 net/ipv6/af_inet6.c:478
     __sock_release net/socket.c:650 [inline]
     sock_release+0x87/0x1b0 net/socket.c:678
     inet_ctl_sock_destroy include/net/inet_common.h:65 [inline]
     igmp6_net_exit+0x6b/0x170 net/ipv6/mcast.c:3173
     ops_exit_list+0xb0/0x170 net/core/net_namespace.c:168
     cleanup_net+0x4ea/0xb00 net/core/net_namespace.c:600
     process_one_work+0x9ac/0x1650 kernel/workqueue.c:2307
     worker_thread+0x657/0x1110 kernel/workqueue.c:2454
     kthread+0x2e9/0x3a0 kernel/kthread.c:377
     ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
     </TASK>
    
    Fixes: f2f2325 ("ip6mr: ip6mr_sk_done() can exit early in common cases")
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Reported-by: syzbot <syzkaller@googlegroups.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    neebe000 authored and davem330 committed Feb 7, 2022
  7. caif: cleanup double word in comment

    Replace the second 'so' with 'free'.
    
    Signed-off-by: Tom Rix <trix@redhat.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    trixirt authored and davem330 committed Feb 7, 2022
  8. Merge branch 'mlxsw-dip-sip-mangling'

    Ido Schimmel says:
    
    ====================
    mlxsw: Add SIP and DIP mangling support
    
    Danielle says:
    
    On Spectrum-2 onwards, it is possible to overwrite SIP and DIP address
    of an IPv4 or IPv6 packet in the ACL engine. That corresponds to pedit
    munges of, respectively, ip src and ip dst fields, and likewise for ip6.
    Offload these munges on the systems where they are supported.
    
    Patchset overview:
    Patch #1: introduces SIP_DIP_ACTION and its fields.
    Patch #2-#3: adds the new pedit fields, and dispatches on them on
    	     Spectrum-2 and above.
    Patch #4 adds a selftest.
    ====================
    
    Signed-off-by: David S. Miller <davem@davemloft.net>
    davem330 committed Feb 7, 2022
  9. selftests: forwarding: Add a test for pedit munge SIP and DIP

    Add a test that checks that pedit adjusts source and destination
    addresses of IPv4 and IPv6 packets.
    
    Output example:
    
    $ ./pedit_ip.sh
    TEST: ping                                                          [ OK ]
    TEST: ping6                                                         [ OK ]
    TEST: dev swp2 ingress pedit ip src set 198.51.100.1                [ OK ]
    TEST: dev swp3 egress pedit ip src set 198.51.100.1                 [ OK ]
    TEST: dev swp2 ingress pedit ip dst set 198.51.100.1                [ OK ]
    TEST: dev swp3 egress pedit ip dst set 198.51.100.1                 [ OK ]
    TEST: dev swp2 ingress pedit ip6 src set 2001:db8:2::1              [ OK ]
    TEST: dev swp3 egress pedit ip6 src set 2001:db8:2::1               [ OK ]
    TEST: dev swp2 ingress pedit ip6 dst set 2001:db8:2::1              [ OK ]
    TEST: dev swp3 egress pedit ip6 dst set 2001:db8:2::1               [ OK ]
    
    Signed-off-by: Danielle Ratson <danieller@nvidia.com>
    Reviewed-by: Petr Machata <petrm@nvidia.com>
    Signed-off-by: Ido Schimmel <idosch@nvidia.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Danielle Ratson authored and davem330 committed Feb 7, 2022
  10. mlxsw: Support FLOW_ACTION_MANGLE for SIP and DIP IPv6 addresses

    Spectrum-2 supports an ACL action SIP_DIP, which allows IPv4 and IPv6
    source and destination addresses change. Offload suitable mangles to
    the IPv6 address change action.
    
    Signed-off-by: Danielle Ratson <danieller@nvidia.com>
    Reviewed-by: Petr Machata <petrm@nvidia.com>
    Signed-off-by: Ido Schimmel <idosch@nvidia.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Danielle Ratson authored and davem330 committed Feb 7, 2022
  11. mlxsw: Support FLOW_ACTION_MANGLE for SIP and DIP IPv4 addresses

    Spectrum-2 supports an ACL action SIP_DIP, which allows IPv4 and IPv6
    source and destination addresses change. Offload suitable mangles to
    the IPv4 address change action.
    
    Signed-off-by: Danielle Ratson <danieller@nvidia.com>
    Reviewed-by: Petr Machata <petrm@nvidia.com>
    Signed-off-by: Ido Schimmel <idosch@nvidia.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Danielle Ratson authored and davem330 committed Feb 7, 2022
  12. mlxsw: core_acl_flex_actions: Add SIP_DIP_ACTION

    Add fields related to SIP_DIP_ACTION, which is used for changing of SIP
    and DIP addresses.
    
    Signed-off-by: Danielle Ratson <danieller@nvidia.com>
    Reviewed-by: Petr Machata <petrm@nvidia.com>
    Signed-off-by: Ido Schimmel <idosch@nvidia.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Danielle Ratson authored and davem330 committed Feb 7, 2022
  13. Merge branch 'ipv6-kfree_skb_reason'

    Menglong Dong says:
    
    ====================
    net: use kfree_skb_reason() for ip/udp packet receive
    
    In this series patches, kfree_skb() is replaced with kfree_skb_reason()
    during ipv4 and udp4 packet receiving path, and following drop reasons
    are introduced:
    
    SKB_DROP_REASON_SOCKET_FILTER
    SKB_DROP_REASON_NETFILTER_DROP
    SKB_DROP_REASON_OTHERHOST
    SKB_DROP_REASON_IP_CSUM
    SKB_DROP_REASON_IP_INHDR
    SKB_DROP_REASON_IP_RPFILTER
    SKB_DROP_REASON_UNICAST_IN_L2_MULTICAST
    SKB_DROP_REASON_XFRM_POLICY
    SKB_DROP_REASON_IP_NOPROTO
    SKB_DROP_REASON_SOCKET_RCVBUFF
    SKB_DROP_REASON_PROTO_MEM
    
    TCP is more complex, so I left it in the next series.
    
    I just figure out how __print_symbolic() works. It doesn't base on the
    array index, but searching for symbols by loop. So I'm a little afraid
    it's performance.
    
    Changes since v3:
    - fix some small problems in the third patch (net: ipv4: use
      kfree_skb_reason() in ip_rcv_core()), as David Ahern said
    
    Changes since v2:
    - use SKB_DROP_REASON_PKT_TOO_SMALL for a path in ip_rcv_core()
    
    Changes since v1:
    - add document for all drop reasons, as David advised
    - remove unreleated cleanup
    - remove EARLY_DEMUX and IP_ROUTE_INPUT drop reason
    - replace {UDP, TCP}_FILTER with SOCKET_FILTER
    ====================
    
    Signed-off-by: David S. Miller <davem@davemloft.net>
    davem330 committed Feb 7, 2022
  14. net: udp: use kfree_skb_reason() in __udp_queue_rcv_skb()

    Replace kfree_skb() with kfree_skb_reason() in __udp_queue_rcv_skb().
    Following new drop reasons are introduced:
    
    SKB_DROP_REASON_SOCKET_RCVBUFF
    SKB_DROP_REASON_PROTO_MEM
    
    Signed-off-by: Menglong Dong <imagedong@tencent.com>
    Reviewed-by: David Ahern <dsahern@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    xmmgithub authored and davem330 committed Feb 7, 2022
  15. net: udp: use kfree_skb_reason() in udp_queue_rcv_one_skb()

    Replace kfree_skb() with kfree_skb_reason() in udp_queue_rcv_one_skb().
    
    Signed-off-by: Menglong Dong <imagedong@tencent.com>
    Reviewed-by: David Ahern <dsahern@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    xmmgithub authored and davem330 committed Feb 7, 2022
  16. net: ipv4: use kfree_skb_reason() in ip_protocol_deliver_rcu()

    Replace kfree_skb() with kfree_skb_reason() in ip_protocol_deliver_rcu().
    Following new drop reasons are introduced:
    
    SKB_DROP_REASON_XFRM_POLICY
    SKB_DROP_REASON_IP_NOPROTO
    
    Signed-off-by: Menglong Dong <imagedong@tencent.com>
    Reviewed-by: David Ahern <dsahern@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    xmmgithub authored and davem330 committed Feb 7, 2022
  17. net: ipv4: use kfree_skb_reason() in ip_rcv_finish_core()

    Replace kfree_skb() with kfree_skb_reason() in ip_rcv_finish_core(),
    following drop reasons are introduced:
    
    SKB_DROP_REASON_IP_RPFILTER
    SKB_DROP_REASON_UNICAST_IN_L2_MULTICAST
    
    Signed-off-by: Menglong Dong <imagedong@tencent.com>
    Reviewed-by: David Ahern <dsahern@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    xmmgithub authored and davem330 committed Feb 7, 2022
  18. net: ipv4: use kfree_skb_reason() in ip_rcv_core()

    Replace kfree_skb() with kfree_skb_reason() in ip_rcv_core(). Three new
    drop reasons are introduced:
    
    SKB_DROP_REASON_OTHERHOST
    SKB_DROP_REASON_IP_CSUM
    SKB_DROP_REASON_IP_INHDR
    
    Signed-off-by: Menglong Dong <imagedong@tencent.com>
    Reviewed-by: David Ahern <dsahern@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    xmmgithub authored and davem330 committed Feb 7, 2022
  19. net: netfilter: use kfree_drop_reason() for NF_DROP

    Replace kfree_skb() with kfree_skb_reason() in nf_hook_slow() when
    skb is dropped by reason of NF_DROP. Following new drop reasons
    are introduced:
    
    SKB_DROP_REASON_NETFILTER_DROP
    
    Signed-off-by: Menglong Dong <imagedong@tencent.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    xmmgithub authored and davem330 committed Feb 7, 2022
  20. net: skb_drop_reason: add document for drop reasons

    Add document for following existing drop reasons:
    
    SKB_DROP_REASON_NOT_SPECIFIED
    SKB_DROP_REASON_NO_SOCKET
    SKB_DROP_REASON_PKT_TOO_SMALL
    SKB_DROP_REASON_TCP_CSUM
    SKB_DROP_REASON_SOCKET_FILTER
    SKB_DROP_REASON_UDP_CSUM
    
    Signed-off-by: Menglong Dong <imagedong@tencent.com>
    Reviewed-by: David Ahern <dsahern@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    xmmgithub authored and davem330 committed Feb 7, 2022

Commits on Feb 6, 2022

  1. ref_tracker: remove filter_irq_stacks() call

    After commit e940066 ("lib/stackdepot: always do filter_irq_stacks()
    in stack_depot_save()") it became unnecessary to filter the stack
    before calling stack_depot_save().
    
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Cc: Marco Elver <elver@google.com>
    Cc: Alexander Potapenko <glider@google.com>
    Cc: Dmitry Vyukov <dvyukov@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    neebe000 authored and davem330 committed Feb 6, 2022
  2. net: initialize init_net earlier

    While testing a patch that will follow later
    ("net: add netns refcount tracker to struct nsproxy")
    I found that devtmpfs_init() was called before init_net
    was initialized.
    
    This is a bug, because devtmpfs_setup() calls
    ksys_unshare(CLONE_NEWNS);
    
    This has the effect of increasing init_net refcount,
    which will be later overwritten to 1, as part of setup_net(&init_net)
    
    We had too many prior patches [1] trying to work around the root cause.
    
    Really, make sure init_net is in BSS section, and that net_ns_init()
    is called earlier at boot time.
    
    Note that another patch ("vfs: add netns refcount tracker
    to struct fs_context") also will need net_ns_init() being called
    before vfs_caches_init()
    
    As a bonus, this patch saves around 4KB in .data section.
    
    [1]
    
    f8c46cb ("netns: do not call pernet ops for not yet set up init_net namespace")
    b5082df ("net: Initialise init_net.count to 1")
    734b654 ("net: Statically initialize init_net.dev_base_head")
    
    v2: fixed a build error reported by kernel build bots (CONFIG_NET=n)
    
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    neebe000 authored and davem330 committed Feb 6, 2022
  3. net: hsr: use hlist_head instead of list_head for mac addresses

    Currently, HSR manages mac addresses of known HSR nodes by using list_head.
    It takes a lot of time when there are a lot of registered nodes due to
    finding specific mac address nodes by using linear search. We can be
    reducing the time by using hlist. Thus, this patch moves list_head to
    hlist_head for mac addresses and this allows for further improvement of
    network performance.
    
        Condition: registered 10,000 known HSR nodes
        Before:
        # iperf3 -c 192.168.10.1 -i 1 -t 10
        Connecting to host 192.168.10.1, port 5201
        [  5] local 192.168.10.2 port 59442 connected to 192.168.10.1 port 5201
        [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
        [  5]   0.00-1.49   sec  3.75 MBytes  21.1 Mbits/sec    0    158 KBytes
        [  5]   1.49-2.05   sec  1.25 MBytes  18.7 Mbits/sec    0    166 KBytes
        [  5]   2.05-3.06   sec  2.44 MBytes  20.3 Mbits/sec   56   16.9 KBytes
        [  5]   3.06-4.08   sec  1.43 MBytes  11.7 Mbits/sec   11   38.0 KBytes
        [  5]   4.08-5.00   sec   951 KBytes  8.49 Mbits/sec    0   56.3 KBytes
    
        After:
        # iperf3 -c 192.168.10.1 -i 1 -t 10
        Connecting to host 192.168.10.1, port 5201
        [  5] local 192.168.10.2 port 36460 connected to 192.168.10.1 port 5201
        [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
        [  5]   0.00-1.00   sec  7.39 MBytes  62.0 Mbits/sec    3    130 KBytes
        [  5]   1.00-2.00   sec  5.06 MBytes  42.4 Mbits/sec   16    113 KBytes
        [  5]   2.00-3.00   sec  8.58 MBytes  72.0 Mbits/sec   42   94.3 KBytes
        [  5]   3.00-4.00   sec  7.44 MBytes  62.4 Mbits/sec    2    131 KBytes
        [  5]   4.00-5.07   sec  8.13 MBytes  63.5 Mbits/sec   38   92.9 KBytes
    
    Signed-off-by: Juhee Kang <claudiajkang@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    ClaudiaJKang authored and davem330 committed Feb 6, 2022

Commits on Feb 5, 2022

  1. skmsg: convert struct sk_msg_sg::copy to a bitmap

    We have plans for increasing MAX_SKB_FRAGS, but sk_msg_sg::copy
    is currently an unsigned long, limiting MAX_SKB_FRAGS to 30 on 32bit arches.
    
    Convert it to a bitmap, as Jakub suggested.
    
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    neebe000 authored and davem330 committed Feb 5, 2022
Older