Skip to content
Permalink
Eric-Dumazet/t…
Switch branches/tags

Commits on Oct 14, 2021

  1. tcp: switch orphan_count to bare per-cpu counters

    Use of percpu_counter structure to track count of orphaned
    sockets is causing problems on modern hosts with 256 cpus
    or more.
    
    Stefan Bach reported a serious spinlock contention in real workloads,
    that I was able to reproduce with a netfilter rule dropping
    incoming FIN packets.
    
        53.56%  server  [kernel.kallsyms]      [k] queued_spin_lock_slowpath
                |
                ---queued_spin_lock_slowpath
                   |
                    --53.51%--_raw_spin_lock_irqsave
                              |
                               --53.51%--__percpu_counter_sum
                                         tcp_check_oom
                                         |
                                         |--39.03%--__tcp_close
                                         |          tcp_close
                                         |          inet_release
                                         |          inet6_release
                                         |          sock_close
                                         |          __fput
                                         |          ____fput
                                         |          task_work_run
                                         |          exit_to_usermode_loop
                                         |          do_syscall_64
                                         |
                                          --14.48%--tcp_out_of_resources
                                                    tcp_write_timeout
                                                    tcp_retransmit_timer
                                                    tcp_write_timer_handler
                                                    tcp_write_timer
                                                    call_timer_fn
                                                    expire_timers
                                                    __run_timers
                                                    run_timer_softirq
                                                    __softirqentry_text_start
    
    As explained in commit cf86a08 ("net/dst: use a smaller percpu_counter
    batch for dst entries accounting"), default batch size is too big
    for the default value of tcp_max_orphans (262144).
    
    But even if we reduce batch sizes, there would still be cases
    where the estimated count of orphans is beyond the limit,
    and where tcp_too_many_orphans() has to call the expensive
    percpu_counter_sum_positive().
    
    One solution is to use plain per-cpu counters, and have
    a timer to periodically refresh this cache.
    
    Updating this cache every 100ms seems about right, tcp pressure
    state is not radically changing over shorter periods.
    
    percpu_counter was nice 15 years ago while hosts had less
    than 16 cpus, not anymore by current standards.
    
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Reported-by: Stefan Bach <sfb@google.com>
    Cc: Neal Cardwell <ncardwell@google.com>
    neebe000 authored and intel-lab-lkp committed Oct 14, 2021

Commits on Oct 13, 2021

  1. net: dsa: unregister cross-chip notifier after ds->ops->teardown

    To be symmetric with the error unwind path of dsa_switch_setup(), call
    dsa_switch_unregister_notifier() after ds->ops->teardown.
    
    The implication is that ds->ops->teardown cannot emit cross-chip
    notifiers. For example, currently the dsa_tag_8021q_unregister() call
    from sja1105_teardown() does not propagate to the entire tree due to
    this reason. However I cannot find an actual issue caused by this,
    observed using code inspection.
    
    Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
    Link: https://lore.kernel.org/r/20211012123735.2545742-1-vladimir.oltean@nxp.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    vladimiroltean authored and Jakub Kicinski committed Oct 13, 2021
  2. marvell: octeontx2: build error: unknown type name 'u64'

    Building an allmodconfig kernel arm64 kernel, the following build error
    shows up:
    
    In file included from drivers/crypto/marvell/octeontx2/cn10k_cpt.c:4:
    include/linux/soc/marvell/octeontx2/asm.h:38:15: error: unknown type name 'u64'
       38 | static inline u64 otx2_atomic64_fetch_add(u64 incr, u64 *ptr)
          |               ^~~
    
    Include linux/types.h in asm.h so the compiler knows what the type
    'u64' are.
    
    Fixes: af3826d ("octeontx2-pf: Use hardware register for CQE count")
    Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
    Link: https://lore.kernel.org/r/20211013135743.3826594-1-anders.roxell@linaro.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    roxell authored and Jakub Kicinski committed Oct 13, 2021
  3. net: remove single-byte netdev->dev_addr writes

    Make the drivers which use single-byte netdev addresses
    (netdev->addr_len == 1) use the appropriate address setting
    helpers.
    
    arcnet copies from int variables and io reads a lot, so
    add a helper for arcnet drivers to use.
    
    Similar helper could be reused for phonet and appletalk
    but there isn't any good central location where we could
    put it, and netdevice.h is already very crowded.
    
    Acked-by: Sebastian Reichel <sebastian.reichel@collabora.com> # for HSI
    Link: https://lore.kernel.org/r/20211012142757.4124842-1-kuba@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Jakub Kicinski committed Oct 13, 2021
  4. Merge branch 'net-use-dev_addr_set-in-hamradio-and-ip-tunnels'

    Jakub Kicinski says:
    
    ====================
    net: use dev_addr_set() in hamradio and ip tunnels
    
    Commit 406f42f ("net-next: When a bond have a massive amount
    of VLANs...") introduced a rbtree for faster Ethernet address look
    up. To maintain netdev->dev_addr in this tree we need to make all
    the writes to it got through appropriate helpers.
    ====================
    
    Link: https://lore.kernel.org/r/20211012160634.4152690-1-kuba@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Jakub Kicinski committed Oct 13, 2021
  5. ip: use dev_addr_set() in tunnels

    Use dev_addr_set() instead of writing to netdev->dev_addr
    directly in ip tunnels drivers.
    
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Jakub Kicinski committed Oct 13, 2021
  6. hamradio: use dev_addr_set() for setting device address

    Use dev_addr_set() instead of writing to netdev->dev_addr
    directly in hamradio drivers.
    
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Jakub Kicinski committed Oct 13, 2021
  7. netdevice: demote the type of some dev_addr_set() helpers

    __dev_addr_set() and dev_addr_mod() and pretty low level,
    let the arguments be void, there's no chance for confusion
    in callers converted to use them. Keep u8 in dev_addr_set()
    because some of the callers are converted from a loop
    and we want to make sure assignments are not from an array
    of a different type.
    
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Jakub Kicinski committed Oct 13, 2021
  8. Merge branch 'net-constify-dev_addr-passing-for-protocols'

    Jakub Kicinski says:
    
    ====================
    net: constify dev_addr passing for protocols
    
    Commit 406f42f ("net-next: When a bond have a massive amount
    of VLANs...") introduced a rbtree for faster Ethernet address look
    up. To maintain netdev->dev_addr in this tree we need to make all
    the writes to it got through appropriate helpers.
    
    netdev->dev_addr will be made const to prevent direct writes.
    This set sprinkles const across variables and arguments in protocol
    code which are used to hold references to netdev->dev_addr.
    ====================
    
    Link: https://lore.kernel.org/r/20211012155840.4151590-1-kuba@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Jakub Kicinski committed Oct 13, 2021
  9. decnet: constify dev_addr passing

    In preparation for netdev->dev_addr being constant
    make all relevant arguments in decnet constant.
    
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Jakub Kicinski committed Oct 13, 2021
  10. tipc: constify dev_addr passing

    In preparation for netdev->dev_addr being constant
    make all relevant arguments in tipc constant.
    
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Jakub Kicinski committed Oct 13, 2021
  11. ipv6: constify dev_addr passing

    In preparation for netdev->dev_addr being constant
    make all relevant arguments in ndisc constant.
    
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Jakub Kicinski committed Oct 13, 2021
  12. llc/snap: constify dev_addr passing

    In preparation for netdev->dev_addr being constant
    make all relevant arguments in LLC and SNAP constant.
    
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Jakub Kicinski committed Oct 13, 2021
  13. rose: constify dev_addr passing

    In preparation for netdev->dev_addr being constant
    make all relevant arguments in rose constant.
    
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Jakub Kicinski committed Oct 13, 2021
  14. ax25: constify dev_addr passing

    In preparation for netdev->dev_addr being constant
    make all relevant arguments in AX25 constant.
    
    Modify callers as well (netrom, rose).
    
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Jakub Kicinski committed Oct 13, 2021
  15. Merge branch 'add-functional-support-for-gigabit-ethernet-driver'

    Biju Das says:
    
    ====================
    Add functional support for Gigabit Ethernet driver
    
    The DMAC and EMAC blocks of Gigabit Ethernet IP found on RZ/G2L SoC are
    similar to the R-Car Ethernet AVB IP.
    
    The Gigabit Ethernet IP consists of Ethernet controller (E-MAC), Internal
    TCP/IP Offload Engine (TOE)  and Dedicated Direct memory access controller
    (DMAC).
    
    With a few changes in the driver we can support both IPs.
    
    This patch series is aims to add functional support for Gigabit Ethernet
    driver by filling all the stubs except set_features.
    
    set_feature patch will send as separate RFC patch along with rx_checksum
    patch, as it needs further discussion related to HW checksum.
    
    With this series, we can do boot kernel with rootFS mounted on NFS on
    RZ/G2L platforms.
    ====================
    
    Link: https://lore.kernel.org/r/20211012163613.30030-1-biju.das.jz@bp.renesas.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Jakub Kicinski committed Oct 13, 2021
  16. ravb: Fix typo AVB->DMAC

    Fix the typo AVB->DMAC in comment, as the code following the comment
    is for DMAC on Gigabit Ethernet IP.
    
    Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
    Suggested-by: Sergey Shtylyov <s.shtylyov@omp.ru>
    Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Biju Das authored and Jakub Kicinski committed Oct 13, 2021
  17. ravb: Update ravb_emac_init_gbeth()

    This patch enables Receive/Transmit port of TOE and removes
    the setting of promiscuous bit from EMAC configuration mode register.
    
    This patch also update EMAC configuration mode comment from
    "PAUSE prohibition" to "EMAC Mode: PAUSE prohibition; Duplex; TX;
    RX; CRC Pass Through".
    
    Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
    Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Biju Das authored and Jakub Kicinski committed Oct 13, 2021
  18. ravb: Document PFRI register bit

    Document PFRI register bit, as it is documented on R-Car Gen3 and
    RZ/G2L hardware manuals.
    
    Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
    Suggested-by: Sergey Shtylyov <s.shtylyov@omp.ru>
    Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Biju Das authored and Jakub Kicinski committed Oct 13, 2021
  19. ravb: Rename "nc_queue" feature bit

    Rename the feature bit "nc_queue" with "nc_queues" as AVB DMAC has
    RX and TX NC queues.
    
    There is no functional change.
    
    Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
    Suggested-by: Sergey Shtylyov <s.shtylyov@omp.ru>
    Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Biju Das authored and Jakub Kicinski committed Oct 13, 2021
  20. ravb: Optimize ravb_emac_init_gbeth function

    Optimize CXR31 register initialization on ravb_emac_init_gbeth
    function.
    
    Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
    Suggested-by: Sergey Shtylyov <s.shtylyov@omp.ru>
    Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Biju Das authored and Jakub Kicinski committed Oct 13, 2021
  21. ravb: Rename "tsrq" variable

    Rename the variable "tsrq" with "tccr_mask" as we are passing
    TCCR mask to the ravb_wait() function.
    
    There is no functional change.
    
    Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
    Suggested-by: Sergey Shtylyov <s.shtylyov@omp.ru>
    Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Biju Das authored and Jakub Kicinski committed Oct 13, 2021
  22. ravb: Add support to retrieve stats for GbEthernet

    Add support for retrieving stats information for GbEthernet.
    
    Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
    Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
    Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Biju Das authored and Jakub Kicinski committed Oct 13, 2021
  23. ravb: Add carrier_counters to struct ravb_hw_info

    RZ/G2L E-MAC supports carrier counters.
    Add a carrier_counter hw feature bit to struct ravb_hw_info
    to add this feature only for RZ/G2L.
    
    Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
    Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Biju Das authored and Jakub Kicinski committed Oct 13, 2021
  24. ravb: Fillup ravb_rx_gbeth() stub

    Fillup ravb_rx_gbeth() function to support RZ/G2L.
    
    This patch also renames ravb_rcar_rx to ravb_rx_rcar to be
    consistent with the naming convention used in sh_eth driver.
    
    Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
    Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
    Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Biju Das authored and Jakub Kicinski committed Oct 13, 2021
  25. ravb: Fillup ravb_rx_ring_format_gbeth() stub

    Fillup ravb_rx_ring_format_gbeth() function to support RZ/G2L.
    
    This patch also renames ravb_rx_ring_format to ravb_rx_ring_format_rcar
    to be consistent with the naming convention used in sh_eth driver.
    
    Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
    Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
    Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Biju Das authored and Jakub Kicinski committed Oct 13, 2021
  26. ravb: Fillup ravb_rx_ring_free_gbeth() stub

    Fillup ravb_rx_ring_free_gbeth() function to support RZ/G2L.
    
    This patch also renames ravb_rx_ring_free to ravb_rx_ring_free_rcar
    to be consistent with the naming convention used in sh_eth driver.
    
    Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
    Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
    Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Biju Das authored and Jakub Kicinski committed Oct 13, 2021
  27. ravb: Fillup ravb_alloc_rx_desc_gbeth() stub

    Fillup ravb_alloc_rx_desc_gbeth() function to support RZ/G2L.
    
    This patch also renames ravb_alloc_rx_desc to ravb_alloc_rx_desc_rcar
    to be consistent with the naming convention used in sh_eth driver.
    
    Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
    Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
    Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Biju Das authored and Jakub Kicinski committed Oct 13, 2021
  28. ravb: Add rx_max_buf_size to struct ravb_hw_info

    R-Car AVB-DMAC has maximum 2K size on RX buffer, whereas on RZ/G2L
    it is 8K. We need to allow for changing the MTU within the limit
    of the maximum size of a descriptor.
    
    Add a rx_max_buf_size variable to struct ravb_hw_info to handle
    this difference.
    
    Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
    Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
    Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Biju Das authored and Jakub Kicinski committed Oct 13, 2021
  29. ravb: Use ALIGN macro for max_rx_len

    Use ALIGN macro for calculating the value for max_rx_len.
    
    Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
    Suggested-by: Sergey Shtylyov <s.shtylyov@omp.ru>
    Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Biju Das authored and Jakub Kicinski committed Oct 13, 2021
  30. net: qed_debug: fix check of false (grc_param < 0) expression

    The type of enum dbg_grc_params has the enumerator list starting from 0.
    When grc_param is declared by enum dbg_grc_params, (grc_param < 0) is
    always false.  We should remove the check of this expression.
    
    Signed-off-by: Jean Sacren <sakiwit@gmail.com>
    Acked-by: Shai Malin <smalin@marvell.com>
    Link: https://lore.kernel.org/r/20211012074645.12864-1-sakiwit@gmail.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    sacren authored and Jakub Kicinski committed Oct 13, 2021
  31. net: enetc: include ip6_checksum.h for csum_ipv6_magic

    For those architectures which do not define_HAVE_ARCH_IPV6_CSUM, we need
    to include ip6_checksum.h which provides the csum_ipv6_magic() function.
    
    Fixes: fb8629e ("net: enetc: add support for software TSO")
    Reported-by: kernel test robot <lkp@intel.com>
    Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
    Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
    Link: https://lore.kernel.org/r/20211012121358.16641-1-ioana.ciornei@nxp.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    IoanaCiornei authored and Jakub Kicinski committed Oct 13, 2021
  32. ionic: no devlink_unregister if not registered

    Don't try to unregister the devlink if it hasn't been registered
    yet.  This bit of error cleanup code got missed in the recent
    devlink registration changes.
    
    Fixes: 7911c8b ("ionic: Move devlink registration to be last devlink command")
    Signed-off-by: Shannon Nelson <snelson@pensando.io>
    Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
    Link: https://lore.kernel.org/r/20211012231520.72582-1-snelson@pensando.io
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    emusln authored and Jakub Kicinski committed Oct 13, 2021

Commits on Oct 12, 2021

  1. Merge branch 'devlink-reload-simplification'

    Leon Romanovsky says:
    
    ====================
    devlink reload simplification
    
    Simplify devlink reload APIs.
    ====================
    
    Link: https://lore.kernel.org/r/cover.1634044267.git.leonro@nvidia.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Jakub Kicinski committed Oct 12, 2021
  2. devlink: Delete reload enable/disable interface

    Commit a0c7634 ("devlink: disallow reload operation during device
    cleanup") added devlink_reload_{enable,disable}() APIs to prevent reload
    operation from racing with device probe/dismantle.
    
    After recent changes to move devlink_register() to the end of device
    probe and devlink_unregister() to the beginning of device dismantle,
    these races can no longer happen. Reload operations will be denied if
    the devlink instance is unregistered and devlink_unregister() will block
    until all in-flight operations are done.
    
    Therefore, remove these devlink_reload_{enable,disable}() APIs.
    
    Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Leon Romanovsky authored and Jakub Kicinski committed Oct 12, 2021
Older