Skip to content
Permalink
Felix-Fietkau/…
Switch branches/tags

Commits on Nov 12, 2021

  1. mac80211: fix throughput LED trigger

    The codepaths for rx with decap offload and tx with itxq were not updating
    the counters for the throughput led trigger.
    
    Signed-off-by: Felix Fietkau <nbd@nbd.name>
    nbd168 authored and intel-lab-lkp committed Nov 12, 2021

Commits on Oct 27, 2021

  1. cfg80211: move offchan_cac_event to a dedicated work

    In order to make cfg80211_offchan_cac_abort() (renamed from
    cfg80211_offchan_cac_event) callable in other contexts and
    without so much locking restrictions, make it trigger a new
    work instead of operating directly.
    
    Do some other renames while at it to clarify.
    
    Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
    Link: https://lore.kernel.org/r/6145c3d0f30400a568023f67981981d24c7c6133.1635325205.git.lorenzo@kernel.org
    [rewrite commit log]
    Signed-off-by: Johannes Berg <johannes.berg@intel.com>
    LorenzoBianconi authored and jmberg-intel committed Oct 27, 2021

Commits on Oct 26, 2021

  1. mac80211_hwsim: Fix spelling mistake "Droping" -> "Dropping"

    There is a spelling mistake in a comment, fix it.
    
    Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
    Link: https://lore.kernel.org/r/20211026094000.209463-1-colin.i.king@gmail.com
    Signed-off-by: Johannes Berg <johannes.berg@intel.com>
    Colin Ian King authored and jmberg-intel committed Oct 26, 2021
  2. mac80211: introduce set_radar_offchan callback

    Similar to cfg80211, introduce set_radar_offchan callback in mac80211_ops
    in order to configure a dedicated offchannel chain available on some hw
    (e.g. mt7915) to perform offchannel CAC detection and avoid tx/rx downtime.
    
    Tested-by: Evelyn Tsai <evelyn.tsai@mediatek.com>
    Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
    Link: https://lore.kernel.org/r/201110606d4f3a7dfdf31440e351f2e2c375d4f0.1634979655.git.lorenzo@kernel.org
    Signed-off-by: Johannes Berg <johannes.berg@intel.com>
    LorenzoBianconi authored and jmberg-intel committed Oct 26, 2021
  3. cfg80211: implement APIs for dedicated radar detection HW

    If a dedicated (off-channel) radar detection hardware (chain)
    is available in the hardware/driver, allow this to be used by
    calling the NL80211_CMD_RADAR_DETECT command with a new flag
    attribute requesting off-channel radar detection is used.
    
    Offchannel CAC (channel availability check) avoids the CAC
    downtime when switching to a radar channel or when turning on
    the AP.
    
    Drivers advertise support for this using the new feature flag
    NL80211_EXT_FEATURE_RADAR_OFFCHAN.
    
    Tested-by: Evelyn Tsai <evelyn.tsai@mediatek.com>
    Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
    Link: https://lore.kernel.org/r/7468e291ef5d05d692c1738d25b8f778d8ea5c3f.1634979655.git.lorenzo@kernel.org
    Link: https://lore.kernel.org/r/1e60e60fef00e14401adae81c3d49f3e5f307537.1634979655.git.lorenzo@kernel.org
    Link: https://lore.kernel.org/r/85fa50f57fc3adb2934c8d9ca0be30394de6b7e8.1634979655.git.lorenzo@kernel.org
    Link: https://lore.kernel.org/r/4b6c08671ad59aae0ac46fc94c02f31b1610eb72.1634979655.git.lorenzo@kernel.org
    Link: https://lore.kernel.org/r/241849ccaf2c228873c6f8495bf87b19159ba458.1634979655.git.lorenzo@kernel.org
    [remove offchan_mutex, fix cfg80211_stop_offchan_radar_detection(),
     remove gfp_t argument, fix documentation, fix tracing]
    Signed-off-by: Johannes Berg <johannes.berg@intel.com>
    LorenzoBianconi authored and jmberg-intel committed Oct 26, 2021
  4. Merge branch 'phy-supported-interfaces-bitmap'

    Russell King says:
    
    ====================
    Introduce supported interfaces bitmap
    
    This series introduces a new bitmap to allow us to indicate which
    phy_interface_t modes are supported.
    
    Currently, phylink will call ->validate with PHY_INTERFACE_MODE_NA to
    request all link mode capabilities from the MAC driver before choosing
    an interface to use. This leads in some cases to some rather hairly
    code. This can be simplified if phylink is aware of the interface modes
    that  the MAC supports, and it can instead walk those modes, calling
    ->validate for each one, and combining the results.
    
    This series merely introduces the support; there is no change of
    behaviour until MAC drivers populate their supported_interfaces bitmap.
    ====================
    
    Signed-off-by: David S. Miller <davem@davemloft.net>
    davem330 committed Oct 26, 2021
  5. net: phylink: use supported_interfaces for phylink validation

    If the network device supplies a supported interface bitmap, we can use
    that during phylink's validation to simplify MAC drivers in two ways by
    using the supported_interfaces bitmap to:
    
    1. reject unsupported interfaces before calling into the MAC driver.
    2. generate the set of all supported link modes across all supported
       interfaces (used mainly for SFP, but also some 10G PHYs.)
    
    Suggested-by: Sean Anderson <sean.anderson@seco.com>
    Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Russell King (Oracle) authored and davem330 committed Oct 26, 2021
  6. net: phylink: add MAC phy_interface_t bitmap

    Add a phy_interface_t bitmap so the MAC driver can specifiy which PHY
    interface modes it supports.
    
    Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Russell King authored and davem330 committed Oct 26, 2021
  7. net: phy: add phy_interface_t bitmap support

    Add support for a bitmap for phy interface modes, which includes:
    - a macro to declare the interface bitmap
    - an inline helper to zero the interface bitmap
    - an inline helper to detect an empty interface bitmap
    - inline helpers to do a bitwise AND and OR operations on two interface
      bitmaps
    
    Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Russell King (Oracle) authored and davem330 committed Oct 26, 2021
  8. Merge branch 'dsa-isolation-prep'

    Vladimir Oltean says:
    
    ====================
    DSA preparations for FDB isolation between bridges
    
    This series makes 2 small changes to DSA's SWITCHDEV_FDB_{ADD,DEL}_TO_DEVICE
    handler, which will make it possible to offer switch drivers a stable
    association between a FDB entry and a bridge device in a future series.
    ====================
    
    Signed-off-by: David S. Miller <davem@davemloft.net>
    davem330 committed Oct 26, 2021
  9. net: dsa: stop calling dev_hold in dsa_slave_fdb_event

    Now that we guarantee that SWITCHDEV_FDB_{ADD,DEL}_TO_DEVICE events have
    finished executing by the time we leave our bridge upper interface,
    we've established a stronger boundary condition for how long the
    dsa_slave_switchdev_event_work() might run.
    
    As such, it is no longer possible for DSA slave interfaces to become
    unregistered, since they are still bridge ports.
    
    So delete the unnecessary dev_hold() and dev_put().
    
    Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    vladimiroltean authored and davem330 committed Oct 26, 2021
  10. net: dsa: flush switchdev workqueue when leaving the bridge

    DSA is preparing to offer switch drivers an API through which they can
    associate each FDB entry with a struct net_device *bridge_dev. This can
    be used to perform FDB isolation (the FDB lookup performed on the
    ingress of a standalone, or bridged port, should not find an FDB entry
    that is present in the FDB of another bridge).
    
    In preparation of that work, DSA needs to ensure that by the time we
    call the switch .port_fdb_add and .port_fdb_del methods, the
    dp->bridge_dev pointer is still valid, i.e. the port is still a bridge
    port.
    
    This is not guaranteed because the SWITCHDEV_FDB_{ADD,DEL}_TO_DEVICE API
    requires drivers that must have sleepable context to handle those events
    to schedule the deferred work themselves. DSA does this through the
    dsa_owq.
    
    It can happen that a port leaves a bridge, del_nbp() flushes the FDB on
    that port, SWITCHDEV_FDB_DEL_TO_DEVICE is notified in atomic context,
    DSA schedules its deferred work, but del_nbp() finishes unlinking the
    bridge as a master from the port before DSA's deferred work is run.
    
    Fundamentally, the port must not be unlinked from the bridge until all
    FDB deletion deferred work items have been flushed. The bridge must wait
    for the completion of these hardware accesses.
    
    An attempt has been made to address this issue centrally in switchdev by
    making SWITCHDEV_FDB_DEL_TO_DEVICE deferred (=> blocking) at the switchdev
    level, which would offer implicit synchronization with del_nbp:
    
    https://patchwork.kernel.org/project/netdevbpf/cover/20210820115746.3701811-1-vladimir.oltean@nxp.com/
    
    but it seems that any attempt to modify switchdev's behavior and make
    the events blocking there would introduce undesirable side effects in
    other switchdev consumers.
    
    The most undesirable behavior seems to be that
    switchdev_deferred_process_work() takes the rtnl_mutex itself, which
    would be worse off than having the rtnl_mutex taken individually from
    drivers which is what we have now (except DSA which has removed that
    lock since commit 0faf890 ("net: dsa: drop rtnl_lock from
    dsa_slave_switchdev_event_work")).
    
    So to offer the needed guarantee to DSA switch drivers, I have come up
    with a compromise solution that does not require switchdev rework:
    we already have a hook at the last moment in time when the bridge is
    still an upper of ours: the NETDEV_PRECHANGEUPPER handler. We can flush
    the dsa_owq manually from there, which makes all FDB deletions
    synchronous.
    
    Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    vladimiroltean authored and davem330 committed Oct 26, 2021
  11. ifb: Depend on netfilter alternatively to tc

    IFB originally depended on NET_CLS_ACT for traffic redirection.
    But since v4.5, that may be achieved with NFT_FWD_NETDEV as well.
    
    Fixes: 39e6dea ("netfilter: nf_tables: add forward expression to the netdev family")
    Signed-off-by: Lukas Wunner <lukas@wunner.de>
    Cc: <stable@vger.kernel.org> # v4.5+: bcfabee: netfilter: nft_fwd_netdev: allow to redirect to ifb via ingress
    Cc: <stable@vger.kernel.org> # v4.5+
    Signed-off-by: David S. Miller <davem@davemloft.net>
    l1k authored and davem330 committed Oct 26, 2021
  12. mctp: Implement extended addressing

    This change allows an extended address struct - struct sockaddr_mctp_ext
    - to be passed to sendmsg/recvmsg. This allows userspace to specify
    output ifindex and physical address information (for sendmsg) or receive
    the input ifindex/physaddr for incoming messages (for recvmsg). This is
    typically used by userspace for MCTP address discovery and assignment
    operations.
    
    The extended addressing facility is conditional on a new sockopt:
    MCTP_OPT_ADDR_EXT; userspace must explicitly enable addressing before
    the kernel will consume/populate the extended address data.
    
    Includes a fix for an uninitialised var:
    Reported-by: kernel test robot <lkp@intel.com>
    
    Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    jk-ozlabs authored and davem330 committed Oct 26, 2021
  13. net: ax88796c: Remove pointless check in ax88796c_open()

    Clang warns:
    
    drivers/net/ethernet/asix/ax88796c_main.c:851:24: error: address of
    array 'ax_local->phydev->advertising' will always evaluate to 'true'
    [-Werror,-Wpointer-bool-conversion]
            if (ax_local->phydev->advertising &&
                ~~~~~~~~~~~~~~~~~~^~~~~~~~~~~ ~~
    
    advertising cannot be NULL here if ax_local is not NULL, which cannot
    happen due to the check in ax88796c_probe(). Remove the check.
    
    Link: ClangBuiltLinux#1492
    Signed-off-by: Nathan Chancellor <nathan@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    nathanchance authored and davem330 committed Oct 26, 2021
  14. net: ax88796c: Fix clang -Wimplicit-fallthrough in ax88796c_set_mac()

    Clang warns:
    
    drivers/net/ethernet/asix/ax88796c_main.c:696:2: error: unannotated fall-through between switch labels [-Werror,-Wimplicit-fallthrough]
            case SPEED_10:
            ^
    drivers/net/ethernet/asix/ax88796c_main.c:696:2: note: insert 'break;' to avoid fall-through
            case SPEED_10:
            ^
            break;
    drivers/net/ethernet/asix/ax88796c_main.c:706:2: error: unannotated fall-through between switch labels [-Werror,-Wimplicit-fallthrough]
            case DUPLEX_HALF:
            ^
    drivers/net/ethernet/asix/ax88796c_main.c:706:2: note: insert 'break;' to avoid fall-through
            case DUPLEX_HALF:
            ^
            break;
    
    Clang is a little more pedantic than GCC, which permits implicit
    fallthroughs to cases that contain just break or return. Clang's version
    is more in line with the kernel's own stance in deprecated.rst, which
    states that all switch/case blocks must end in either break,
    fallthrough, continue, goto, or return. Add the missing breaks to fix
    the warning.
    
    Link: ClangBuiltLinux#1491
    Signed-off-by: Nathan Chancellor <nathan@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    nathanchance authored and davem330 committed Oct 26, 2021
  15. net: mana: Allow setting the number of queues while the NIC is down

    The existing code doesn't allow setting the number of queues while the
    NIC is down.
    
    Update the ethtool handler functions to support setting the number of
    queues while the NIC is at down state.
    
    Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    haiyangz authored and davem330 committed Oct 26, 2021
  16. net: hsr: Add support for redbox supervision frames

    added support for the redbox supervision frames
    as defined in the IEC-62439-3:2018.
    
    Signed-off-by: Andreas Oetken <andreas.oetken@siemens-energy.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    oetken authored and davem330 committed Oct 26, 2021
  17. Merge branch 'tcp_stream_alloc_skb'

    Eric Dumazet says:
    
    ====================
    tcp: tcp_stream_alloc_skb() changes
    
    sk_stream_alloc_skb() is only used by TCP.
    
    Rename it to tcp_stream_alloc_skb() and apply small
    optimizations.
    ====================
    
    Signed-off-by: David S. Miller <davem@davemloft.net>
    davem330 committed Oct 26, 2021
  18. tcp: remove unneeded code from tcp_stream_alloc_skb()

    Aligning @SiZe argument to 4 bytes is not needed.
    
    The header alignment has nothing to do with @SiZe.
    
    It really depends on skb->head alignment and MAX_TCP_HEADER.
    
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    neebe000 authored and davem330 committed Oct 26, 2021
  19. tcp: use MAX_TCP_HEADER in tcp_stream_alloc_skb

    Both IPv4 and IPv6 uses same reserve, no need risking
    cache line misses to fetch its value.
    
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    neebe000 authored and davem330 committed Oct 26, 2021
  20. tcp: rename sk_stream_alloc_skb

    sk_stream_alloc_skb() is only used by TCP.
    
    Rename it to make this clear, and move its declaration
    to include/net/tcp.h
    
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    neebe000 authored and davem330 committed Oct 26, 2021
  21. net: annotate data-race in neigh_output()

    neigh_output() reads n->nud_state and hh->hh_len locklessly.
    
    This is fine, but we need to add annotations and document this.
    
    We evaluate skip_cache first to avoid reading these fields
    if the cache has to by bypassed.
    
    syzbot report:
    
    BUG: KCSAN: data-race in __neigh_event_send / ip_finish_output2
    
    write to 0xffff88810798a885 of 1 bytes by interrupt on cpu 1:
     __neigh_event_send+0x40d/0xac0 net/core/neighbour.c:1128
     neigh_event_send include/net/neighbour.h:444 [inline]
     neigh_resolve_output+0x104/0x410 net/core/neighbour.c:1476
     neigh_output include/net/neighbour.h:510 [inline]
     ip_finish_output2+0x80a/0xaa0 net/ipv4/ip_output.c:221
     ip_finish_output+0x3b5/0x510 net/ipv4/ip_output.c:309
     NF_HOOK_COND include/linux/netfilter.h:296 [inline]
     ip_output+0xf3/0x1a0 net/ipv4/ip_output.c:423
     dst_output include/net/dst.h:450 [inline]
     ip_local_out+0x164/0x220 net/ipv4/ip_output.c:126
     __ip_queue_xmit+0x9d3/0xa20 net/ipv4/ip_output.c:525
     ip_queue_xmit+0x34/0x40 net/ipv4/ip_output.c:539
     __tcp_transmit_skb+0x142a/0x1a00 net/ipv4/tcp_output.c:1405
     tcp_transmit_skb net/ipv4/tcp_output.c:1423 [inline]
     tcp_xmit_probe_skb net/ipv4/tcp_output.c:4011 [inline]
     tcp_write_wakeup+0x4a9/0x810 net/ipv4/tcp_output.c:4064
     tcp_send_probe0+0x2c/0x2b0 net/ipv4/tcp_output.c:4079
     tcp_probe_timer net/ipv4/tcp_timer.c:398 [inline]
     tcp_write_timer_handler+0x394/0x520 net/ipv4/tcp_timer.c:626
     tcp_write_timer+0xb9/0x180 net/ipv4/tcp_timer.c:642
     call_timer_fn+0x2e/0x1d0 kernel/time/timer.c:1421
     expire_timers+0x135/0x240 kernel/time/timer.c:1466
     __run_timers+0x368/0x430 kernel/time/timer.c:1734
     run_timer_softirq+0x19/0x30 kernel/time/timer.c:1747
     __do_softirq+0x12c/0x26e kernel/softirq.c:558
     invoke_softirq kernel/softirq.c:432 [inline]
     __irq_exit_rcu kernel/softirq.c:636 [inline]
     irq_exit_rcu+0x4e/0xa0 kernel/softirq.c:648
     sysvec_apic_timer_interrupt+0x69/0x80 arch/x86/kernel/apic/apic.c:1097
     asm_sysvec_apic_timer_interrupt+0x12/0x20
     native_safe_halt arch/x86/include/asm/irqflags.h:51 [inline]
     arch_safe_halt arch/x86/include/asm/irqflags.h:89 [inline]
     acpi_safe_halt drivers/acpi/processor_idle.c:109 [inline]
     acpi_idle_do_entry drivers/acpi/processor_idle.c:553 [inline]
     acpi_idle_enter+0x258/0x2e0 drivers/acpi/processor_idle.c:688
     cpuidle_enter_state+0x2b4/0x760 drivers/cpuidle/cpuidle.c:237
     cpuidle_enter+0x3c/0x60 drivers/cpuidle/cpuidle.c:351
     call_cpuidle kernel/sched/idle.c:158 [inline]
     cpuidle_idle_call kernel/sched/idle.c:239 [inline]
     do_idle+0x1a3/0x250 kernel/sched/idle.c:306
     cpu_startup_entry+0x15/0x20 kernel/sched/idle.c:403
     secondary_startup_64_no_verify+0xb1/0xbb
    
    read to 0xffff88810798a885 of 1 bytes by interrupt on cpu 0:
     neigh_output include/net/neighbour.h:507 [inline]
     ip_finish_output2+0x79a/0xaa0 net/ipv4/ip_output.c:221
     ip_finish_output+0x3b5/0x510 net/ipv4/ip_output.c:309
     NF_HOOK_COND include/linux/netfilter.h:296 [inline]
     ip_output+0xf3/0x1a0 net/ipv4/ip_output.c:423
     dst_output include/net/dst.h:450 [inline]
     ip_local_out+0x164/0x220 net/ipv4/ip_output.c:126
     __ip_queue_xmit+0x9d3/0xa20 net/ipv4/ip_output.c:525
     ip_queue_xmit+0x34/0x40 net/ipv4/ip_output.c:539
     __tcp_transmit_skb+0x142a/0x1a00 net/ipv4/tcp_output.c:1405
     tcp_transmit_skb net/ipv4/tcp_output.c:1423 [inline]
     tcp_xmit_probe_skb net/ipv4/tcp_output.c:4011 [inline]
     tcp_write_wakeup+0x4a9/0x810 net/ipv4/tcp_output.c:4064
     tcp_send_probe0+0x2c/0x2b0 net/ipv4/tcp_output.c:4079
     tcp_probe_timer net/ipv4/tcp_timer.c:398 [inline]
     tcp_write_timer_handler+0x394/0x520 net/ipv4/tcp_timer.c:626
     tcp_write_timer+0xb9/0x180 net/ipv4/tcp_timer.c:642
     call_timer_fn+0x2e/0x1d0 kernel/time/timer.c:1421
     expire_timers+0x135/0x240 kernel/time/timer.c:1466
     __run_timers+0x368/0x430 kernel/time/timer.c:1734
     run_timer_softirq+0x19/0x30 kernel/time/timer.c:1747
     __do_softirq+0x12c/0x26e kernel/softirq.c:558
     invoke_softirq kernel/softirq.c:432 [inline]
     __irq_exit_rcu kernel/softirq.c:636 [inline]
     irq_exit_rcu+0x4e/0xa0 kernel/softirq.c:648
     sysvec_apic_timer_interrupt+0x69/0x80 arch/x86/kernel/apic/apic.c:1097
     asm_sysvec_apic_timer_interrupt+0x12/0x20
     native_safe_halt arch/x86/include/asm/irqflags.h:51 [inline]
     arch_safe_halt arch/x86/include/asm/irqflags.h:89 [inline]
     acpi_safe_halt drivers/acpi/processor_idle.c:109 [inline]
     acpi_idle_do_entry drivers/acpi/processor_idle.c:553 [inline]
     acpi_idle_enter+0x258/0x2e0 drivers/acpi/processor_idle.c:688
     cpuidle_enter_state+0x2b4/0x760 drivers/cpuidle/cpuidle.c:237
     cpuidle_enter+0x3c/0x60 drivers/cpuidle/cpuidle.c:351
     call_cpuidle kernel/sched/idle.c:158 [inline]
     cpuidle_idle_call kernel/sched/idle.c:239 [inline]
     do_idle+0x1a3/0x250 kernel/sched/idle.c:306
     cpu_startup_entry+0x15/0x20 kernel/sched/idle.c:403
     rest_init+0xee/0x100 init/main.c:734
     arch_call_rest_init+0xa/0xb
     start_kernel+0x5e4/0x669 init/main.c:1142
     secondary_startup_64_no_verify+0xb1/0xbb
    
    value changed: 0x20 -> 0x01
    
    Reported by Kernel Concurrency Sanitizer on:
    CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.15.0-rc6-syzkaller #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Reported-by: syzbot <syzkaller@googlegroups.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    neebe000 authored and davem330 committed Oct 26, 2021
  22. Merge branch 'mlxsw-rif-mac-prefixes'

    Ido Schimmel says:
    
    ====================
    mlxsw: Support multiple RIF MAC prefixes
    
    Currently, mlxsw enforces that all the netdevs used as router interfaces
    (RIFs) have the same MAC prefix (e.g., same 38 MSBs in Spectrum-1).
    Otherwise, an error is returned to user space with extack. This patchset
    relaxes the limitation through the use of RIF MAC profiles.
    
    A RIF MAC profile is a hardware entity that represents a particular MAC
    prefix which multiple RIFs can reference. Therefore, the number of
    possible MAC prefixes is no longer one, but the number of profiles
    supported by the device.
    
    The ability to change the MAC of a particular netdev is useful, for
    example, for users who use the netdev to connect to an upstream provider
    that performs MAC filtering. Currently, such users are either forced to
    negotiate with the provider or change the MAC address of all other
    netdevs so that they share the same prefix.
    
    Patchset overview:
    
    Patches #1-#3 are preparations.
    
    Patch #4 adds actual support for RIF MAC profiles.
    
    Patch #5 exposes RIF MAC profiles as a devlink resource, so that user
    space has visibility into the maximum number of profiles and current
    occupancy. Useful for debugging and testing (next 3 patches).
    
    Patches torvalds#6-torvalds#8 add both scale and functional tests.
    
    Patch torvalds#9 removes tests that validated the previous limitation. It is now
    covered by patch torvalds#6 for devices that support a single profile.
    ====================
    
    Signed-off-by: David S. Miller <davem@davemloft.net>
    davem330 committed Oct 26, 2021
  23. selftests: mlxsw: Remove deprecated test cases

    After adding the previous patches, the constraint that all the router
    interface MAC addresses have the same prefix is no longer relevant.
    
    Remove the test cases that validated that this constraint is honored.
    
    Signed-off-by: Danielle Ratson <danieller@nvidia.com>
    Signed-off-by: Ido Schimmel <idosch@nvidia.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Danielle Ratson authored and davem330 committed Oct 26, 2021
  24. selftests: Add an occupancy test for RIF MAC profiles

    When all the RIF MAC profiles are in use, test that it is possible to
    change the MAC of a netdev (i.e., a RIF) when its MAC profile is not
    shared with other RIFs. Test that replacement fails when the MAC profile
    is shared.
    
    Signed-off-by: Danielle Ratson <danieller@nvidia.com>
    Signed-off-by: Ido Schimmel <idosch@nvidia.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Danielle Ratson authored and davem330 committed Oct 26, 2021
  25. selftests: mlxsw: Add forwarding test for RIF MAC profiles

    Verify that MAC profile changes are indeed applied and that packets are
    forwarded with the correct source MAC.
    
    Output example:
    
    $ ./rif_mac_profiles.sh
    TEST: h1->h2: new mac profile                                       [ OK ]
    TEST: h2->h1: new mac profile                                       [ OK ]
    TEST: h1->h2: edit mac profile                                      [ OK ]
    TEST: h2->h1: edit mac profile                                      [ OK ]
    
    Signed-off-by: Danielle Ratson <danieller@nvidia.com>
    Signed-off-by: Ido Schimmel <idosch@nvidia.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Danielle Ratson authored and davem330 committed Oct 26, 2021
  26. selftests: mlxsw: Add a scale test for RIF MAC profiles

    Query the maximum number of supported RIF MAC profiles using
    devlink-resource and verify that all available MAC profiles can be utilized
    and that an error is generated when user space tries to exceed this number.
    
    Output example in Spectrum-2:
    
    $ TESTS='rif_mac_profile' ./resource_scale.sh
    TEST: 'rif_mac_profile' 4                                           [ OK ]
    TEST: 'rif_mac_profile' overflow 5                                  [ OK ]
    
    Signed-off-by: Danielle Ratson <danieller@nvidia.com>
    Signed-off-by: Ido Schimmel <idosch@nvidia.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Danielle Ratson authored and davem330 committed Oct 26, 2021
  27. mlxsw: spectrum_router: Expose RIF MAC profiles to devlink resource

    Expose via devlink-resource the maximum number of RIF MAC profiles and
    their current occupancy, so it can be used for debug and writing generic
    tests, like in the next patch.
    
    Example for Spectrum-2 output:
    
    $ devlink resource show pci/0000:06:00.0
    ...
      name rif_mac_profiles size 4 occ 0 unit entry dpipe_tables none
    
    Signed-off-by: Danielle Ratson <danieller@nvidia.com>
    Signed-off-by: Ido Schimmel <idosch@nvidia.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Danielle Ratson authored and davem330 committed Oct 26, 2021
  28. mlxsw: spectrum_router: Add RIF MAC profiles support

    Currently, mlxsw enforces that all the router interfaces (RIFs) have the
    same MAC prefix.
    
    Relax this limitation by using RIF MAC profiles. Each profile is
    associated with a particular MAC prefix and multiple RIFs can use the
    same profile. Therefore, the number of possible MAC prefixes is no
    longer one, but the number of profiles supported by the device.
    
    Store the profiles in an IDR and reference count them according to the
    number of RIFs using them.
    
    Associate a RIF with a profile when the RIF is created and remove the
    association when the RIF is deleted.
    
    Change the association following 'NETDEV_CHANGEADDR' events, except when
    only one RIF is using the profile. In which case, change the MAC prefix
    of the profile itself instead of associating the RIF with a new profile.
    
    Signed-off-by: Danielle Ratson <danieller@nvidia.com>
    Signed-off-by: Ido Schimmel <idosch@nvidia.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Danielle Ratson authored and davem330 committed Oct 26, 2021
  29. mlxsw: spectrum_router: Propagate extack further

    The next patch will set the MAC profile of a router interface (RIF) as
    part of its configure() callback. The operation can fail in case the
    maximum number of profiles was exceeded.
    
    Add extack to mlxsw_sp_rif_ops::configure() in order to communicate such
    failures to user space.
    
    In addition, the MAC profile of a RIF can change following a
    'NETDEV_CHANGEADDR' notification. Propagate extack to
    mlxsw_sp_router_port_change_event() so that failures could be
    communicated in this path as well.
    
    No functional changes intended.
    
    Signed-off-by: Danielle Ratson <danieller@nvidia.com>
    Signed-off-by: Ido Schimmel <idosch@nvidia.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Danielle Ratson authored and davem330 committed Oct 26, 2021
  30. mlxsw: resources: Add resource identifier for RIF MAC profiles

    Add a resource identifier for maximum RIF MAC profiles so that it could
    be later used to query the information from firmware.
    
    Signed-off-by: Danielle Ratson <danieller@nvidia.com>
    Signed-off-by: Ido Schimmel <idosch@nvidia.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Danielle Ratson authored and davem330 committed Oct 26, 2021
  31. mlxsw: reg: Add MAC profile ID field to RITR register

    Add MAC profile ID field to RITR register so that it could be used for
    associating a RIF with a MAC profile ID by a later patch.
    
    Signed-off-by: Danielle Ratson <danieller@nvidia.com>
    Signed-off-by: Ido Schimmel <idosch@nvidia.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Danielle Ratson authored and davem330 committed Oct 26, 2021
  32. Merge branch 'netfilter-vrf-rework'

    Florian Westphal says:
    
    ====================
    vrf: rework interaction with netfilter/conntrack
    
    V2:
    - fix 'plain integer as null pointer' warning
    - reword commit message in patch 2 to clarify loss of 'ct set untracked'
    
    This patch series aims to solve the to-be-reverted change 09e856d
    ("vrf: Reset skb conntrack connection on VRF rcv") in a different way.
    
    Rather than have skbs pass through conntrack and nat hooks twice, suppress
    conntrack invocation if the conntrack/nat hook is called from the vrf driver.
    
    First patch deals with 'incoming connection' case:
    1. suppress NAT transformations
    2. skip conntrack confirmation
    
    NAT and conntrack confirmation is done when ip/ipv6 stack calls
    the postrouting hook.
    
    Second patch deals with local packets:
    in vrf driver, mark the skbs as 'untracked', so conntrack output
    hook ignores them.  This skips all nat hooks as well.
    
    Afterwards, remove the untracked state again so the second
    round will pick them up.
    
    One alternative to the chosen implementation would be to add a 'caller
    id' field to 'struct nf_hook_state' and then use that, these patches
    use the more straightforward check of VRF flag on the state->out device.
    
    The two patches apply to both net and net-next, i am targeting -next
    because I think that since snat did not work correctly for so long that
    we can take the longer route.  If you disagree, apply to net at your
    discretion.
    
    The patches apply both with 09e856d reverted or still
    in-place, but only with the revert in place ingress conntrack settings
    (zone, notrack etc) start working again.
    
    I've already submitted selftests for vrf+nfqueue and conntrack+vrf.
    ====================
    
    Signed-off-by: David S. Miller <davem@davemloft.net>
    davem330 committed Oct 26, 2021
  33. vrf: run conntrack only in context of lower/physdev for locally gener…

    …ated packets
    
    The VRF driver invokes netfilter for output+postrouting hooks so that users
    can create rules that check for 'oif $vrf' rather than lower device name.
    
    This is a problem when NAT rules are configured.
    
    To avoid any conntrack involvement in round 1, tag skbs as 'untracked'
    to prevent conntrack from picking them up.
    
    This gets cleared before the packet gets handed to the ip stack so
    conntrack will be active on the second iteration.
    
    One remaining issue is that a rule like
    
      output ... oif $vrfname notrack
    
    won't propagate to the second round because we can't tell
    'notrack set via ruleset' and 'notrack set by vrf driver' apart.
    However, this isn't a regression: the 'notrack' removal happens
    instead of unconditional nf_reset_ct().
    I'd also like to avoid leaking more vrf specific conditionals into the
    netfilter infra.
    
    For ingress, conntrack has already been done before the packet makes it
    to the vrf driver, with this patch egress does connection tracking with
    lower/physical device as well.
    
    Signed-off-by: Florian Westphal <fw@strlen.de>
    Acked-by: David Ahern <dsahern@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Florian Westphal authored and davem330 committed Oct 26, 2021
Older