Skip to content
Permalink
Lorenzo-Fontan…
Switch branches/tags

Commits on Jul 13, 2021

  1. tools/bpf/bpftool: xlated dump from ELF file directly

    bpftool can dump an xlated or jitted representation
    of the programs already loaded into the kernel.
    That capability is very useful for understanding what
    are the instructions the kernel will execute for that program.
    
    However, sometimes the verifier does not load the program and
    one cannot use this feature until changes are made to make the
    verifier happy again.
    
    This patch reuses the same dump function to dump the program
    from an ELF file directly instead of loading the instructions
    from a loaded file descriptor. In this way, the user
    can use all the bpftool features for "xlated" without loading.
    
    In particular, the "visual" command is very useful when combined
    to this because the dot graph makes easy to spot bad instruction
    sequences.
    
    Usage:
    
      bpftool prog dump xlated elf program.o
    
    It also works with the other commands like 'visual' to print
    an dot representation of the program.
    
      bpftool prog dump xlated elf program.o visual
    
    Signed-off-by: Lorenzo Fontana <fontanalorenz@gmail.com>
    fntlnz authored and intel-lab-lkp committed Jul 13, 2021
  2. tools/lib/bpf: bpf_program__insns allow to retrieve insns in libbpf

    This allows consumers of libbpf to iterate trough the insns
    of a program without loading it first directly after the ELF parsing.
    
    Being able to do that is useful to create tooling that can show
    the structure of a BPF program using libbpf without having to
    parse the ELF separately.
    
    Usage:
      struct bpf_insn *insn;
      insn = bpf_program__insns(prog);
    
    Signed-off-by: Lorenzo Fontana <fontanalorenz@gmail.com>
    fntlnz authored and intel-lab-lkp committed Jul 13, 2021

Commits on Jul 12, 2021

  1. libbpf: Fix reuse of pinned map on older kernel

    When loading a BPF program with a pinned map, the loader checks whether
    the pinned map can be reused, i.e. their properties match. To derive
    such of the pinned map, the loader invokes BPF_OBJ_GET_INFO_BY_FD and
    then does the comparison.
    
    Unfortunately, on < 4.12 kernels the BPF_OBJ_GET_INFO_BY_FD is not
    available, so loading the program fails with the following error:
    
    	libbpf: failed to get map info for map FD 5: Invalid argument
    	libbpf: couldn't reuse pinned map at
    		'/sys/fs/bpf/tc/globals/cilium_call_policy': parameter
    		mismatch"
    	libbpf: map 'cilium_call_policy': error reusing pinned map
    	libbpf: map 'cilium_call_policy': failed to create:
    		Invalid argument(-22)
    	libbpf: failed to load object 'bpf_overlay.o'
    
    To fix this, fallback to derivation of the map properties via
    /proc/$PID/fdinfo/$MAP_FD if BPF_OBJ_GET_INFO_BY_FD fails with EINVAL,
    which can be used as an indicator that the kernel doesn't support
    the latter.
    
    Signed-off-by: Martynas Pumputis <m@lambda.lt>
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Acked-by: John Fastabend <john.fastabend@gmail.com>
    Link: https://lore.kernel.org/bpf/20210712125552.58705-1-m@lambda.lt
    brb authored and anakryiko committed Jul 12, 2021

Commits on Jul 8, 2021

  1. samples/bpf: xdp_redirect_cpu_user: Cpumap qsize set larger default

    Experience from production shows queue size of 192 is too small, as
    this caused packet drops during cpumap-enqueue on RX-CPU.  This can be
    diagnosed with xdp_monitor sample program.
    
    This bpftrace program was used to diagnose the problem in more detail:
    
     bpftrace -e '
      tracepoint:xdp:xdp_cpumap_kthread { @deq_bulk = lhist(args->processed,0,10,1); @drop_net = lhist(args->drops,0,10,1) }
      tracepoint:xdp:xdp_cpumap_enqueue { @enq_bulk = lhist(args->processed,0,10,1); @enq_drops = lhist(args->drops,0,10,1); }'
    
    Watch out for the @enq_drops counter. The @drop_net counter can happen
    when netstack gets invalid packets, so don't despair it can be
    natural, and that counter will likely disappear in newer kernels as it
    was a source of confusion (look at netstat info for reason of the
    netstack @drop_net counters).
    
    The production system was configured with CPU power-saving C6 state.
    Learn more in this blogpost[1].
    
    And wakeup latency in usec for the states are:
    
     # grep -H . /sys/devices/system/cpu/cpu0/cpuidle/*/latency
     /sys/devices/system/cpu/cpu0/cpuidle/state0/latency:0
     /sys/devices/system/cpu/cpu0/cpuidle/state1/latency:2
     /sys/devices/system/cpu/cpu0/cpuidle/state2/latency:10
     /sys/devices/system/cpu/cpu0/cpuidle/state3/latency:133
    
    Deepest state take 133 usec to wakeup from (133/10^6). The link speed
    is 25Gbit/s ((25*10^9/8) in bytes/sec). How many bytes can arrive with
    in 133 usec at this speed: (25*10^9/8)*(133/10^6) = 415625 bytes. With
    MTU size packets this is 275 packets, and with minimum Ethernet (incl
    intergap overhead) 84 bytes it is 4948 packets. Clearly default queue
    size is too small.
    
    Setting default cpumap queue to 2048 as worst-case (small packet) at
    10Gbit/s is 1979 packets with 133 usec wakeup time, +64 packet before
    kthread wakeup call (due to xdp_do_flush) worst-case 2043 packets.
    
    Thus, if a packet burst on RX-CPU will enqueue packets to a remote
    cpumap CPU that is in deep-sleep state it can overrun the cpumap queue.
    
    The production system was also configured to avoid deep-sleep via:
     tuned-adm profile network-latency
    
    [1] https://jeremyeder.com/2013/08/30/oh-did-you-expect-the-cpu/
    
    Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Acked-by: Song Liu <songliubraving@fb.com>
    Link: https://lore.kernel.org/bpf/162523477604.786243.13372630844944530891.stgit@firesoul
    netoptimizer authored and Alexei Starovoitov committed Jul 8, 2021
  2. Merge branch 'Generic XDP improvements'

    Kumar Kartikeya says:
    
    ====================
    
    This small series makes some improvements to generic XDP mode and brings it
    closer to native XDP. Patch 1 splits out generic XDP processing into reusable
    parts, patch 2 adds pointer friendly wrappers for bitops (not have to cast back
    and forth the address of local pointer to unsigned long *), patch 3 implements
    generic cpumap support (details in commit) and patch 4 allows devmap bpf prog
    execution before generic_xdp_tx is called.
    
    Patch 5 just updates a couple of selftests to adapt to changes in behavior (in
    that specifying devmap/cpumap prog fd in generic mode is now allowed).
    
    Changelog:
    ----------
    v5 -> v6
    v5: https://lore.kernel.org/bpf/20210701002759.381983-1-memxor@gmail.com
     * Put rcpu->prog check before RCU-bh section to avoid do_softirq (Jesper)
    
    v4 -> v5
    v4: https://lore.kernel.org/bpf/20210628114746.129669-1-memxor@gmail.com
     * Add comments and examples for new bitops macros (Alexei)
    
    v3 -> v4
    v3: https://lore.kernel.org/bpf/20210622202835.1151230-1-memxor@gmail.com
     * Add detach now that attach of XDP program succeeds (Toke)
     * Clean up the test to use new ASSERT macros
    
    v2 -> v3
    v2: https://lore.kernel.org/bpf/20210622195527.1110497-1-memxor@gmail.com
     * list_for_each_entry -> list_for_each_entry_safe (due to deletion of skb)
    
    v1 -> v2
    v1: https://lore.kernel.org/bpf/20210620233200.855534-1-memxor@gmail.com
     * Move __ptr_{set,clear,test}_bit to bitops.h (Toke)
       Also changed argument order to match the bit op they wrap.
     * Remove map value size checking functions for cpumap/devmap (Toke)
     * Rework prog run for skb in cpu_map_kthread_run (Toke)
     * Set skb->dev to dst->dev after devmap prog has run
     * Don't set xdp rxq that will be overwritten in cpumap prog run
    ====================
    
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Alexei Starovoitov committed Jul 8, 2021
  3. bpf: Tidy xdp attach selftests

    Support for cpumap and devmap entry progs in previous commits means the
    test needs to be updated for the new semantics. Also take this
    opportunity to convert it from CHECK macros to the new ASSERT macros.
    
    Since xdp_cpumap_attach has no subtest, put the sole test inside the
    test_xdp_cpumap_attach function.
    
    Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
    Link: https://lore.kernel.org/bpf/20210702111825.491065-6-memxor@gmail.com
    kkdwivedi authored and Alexei Starovoitov committed Jul 8, 2021
  4. bpf: devmap: Implement devmap prog execution for generic XDP

    This lifts the restriction on running devmap BPF progs in generic
    redirect mode. To match native XDP behavior, it is invoked right before
    generic_xdp_tx is called, and only supports XDP_PASS/XDP_ABORTED/
    XDP_DROP actions.
    
    We also return 0 even if devmap program drops the packet, as
    semantically redirect has already succeeded and the devmap prog is the
    last point before TX of the packet to device where it can deliver a
    verdict on the packet.
    
    This also means it must take care of freeing the skb, as
    xdp_do_generic_redirect callers only do that in case an error is
    returned.
    
    Since devmap entry prog is supported, remove the check in
    generic_xdp_install entirely.
    
    Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
    Link: https://lore.kernel.org/bpf/20210702111825.491065-5-memxor@gmail.com
    kkdwivedi authored and Alexei Starovoitov committed Jul 8, 2021
  5. bpf: cpumap: Implement generic cpumap

    This change implements CPUMAP redirect support for generic XDP programs.
    The idea is to reuse the cpu map entry's queue that is used to push
    native xdp frames for redirecting skb to a different CPU. This will
    match native XDP behavior (in that RPS is invoked again for packet
    reinjected into networking stack).
    
    To be able to determine whether the incoming skb is from the driver or
    cpumap, we reuse skb->redirected bit that skips generic XDP processing
    when it is set. To always make use of this, CONFIG_NET_REDIRECT guard on
    it has been lifted and it is always available.
    
    >From the redirect side, we add the skb to ptr_ring with its lowest bit
    set to 1.  This should be safe as skb is not 1-byte aligned. This allows
    kthread to discern between xdp_frames and sk_buff. On consumption of the
    ptr_ring item, the lowest bit is unset.
    
    In the end, the skb is simply added to the list that kthread is anyway
    going to maintain for xdp_frames converted to skb, and then received
    again by using netif_receive_skb_list.
    
    Bulking optimization for generic cpumap is left as an exercise for a
    future patch for now.
    
    Since cpumap entry progs are now supported, also remove check in
    generic_xdp_install for the cpumap.
    
    Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
    Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
    Link: https://lore.kernel.org/bpf/20210702111825.491065-4-memxor@gmail.com
    kkdwivedi authored and Alexei Starovoitov committed Jul 8, 2021
  6. bitops: Add non-atomic bitops for pointers

    cpumap needs to set, clear, and test the lowest bit in skb pointer in
    various places. To make these checks less noisy, add pointer friendly
    bitop macros that also do some typechecking to sanitize the argument.
    
    These wrap the non-atomic bitops __set_bit, __clear_bit, and test_bit
    but for pointer arguments. Pointer's address has to be passed in and it
    is treated as an unsigned long *, since width and representation of
    pointer and unsigned long match on targets Linux supports. They are
    prefixed with double underscore to indicate lack of atomicity.
    
    Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
    Link: https://lore.kernel.org/bpf/20210702111825.491065-3-memxor@gmail.com
    kkdwivedi authored and Alexei Starovoitov committed Jul 8, 2021
  7. net: core: Split out code to run generic XDP prog

    This helper can later be utilized in code that runs cpumap and devmap
    programs in generic redirect mode and adjust skb based on changes made
    to xdp_buff.
    
    When returning XDP_REDIRECT/XDP_TX, it invokes __skb_push, so whenever a
    generic redirect path invokes devmap/cpumap prog if set, it must
    __skb_pull again as we expect mac header to be pulled.
    
    It also drops the skb_reset_mac_len call after do_xdp_generic, as the
    mac_header and network_header are advanced by the same offset, so the
    difference (mac_len) remains constant.
    
    Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
    Link: https://lore.kernel.org/bpf/20210702111825.491065-2-memxor@gmail.com
    kkdwivedi authored and Alexei Starovoitov committed Jul 8, 2021
  8. Merge branch 'bpf: support input xdp_md context in BPF_PROG_TEST_RUN'

    Zvi Effron says:
    
    ====================
    
    This patchset adds support for passing an xdp_md via ctx_in/ctx_out in
    bpf_attr for BPF_PROG_TEST_RUN of XDP programs.
    
    Patch 1 adds a function to validate XDP meta data lengths.
    
    Patch 2 adds initial support for passing XDP meta data in addition to
    packet data.
    
    Patch 3 adds support for also specifying the ingress interface and
    rx queue.
    
    Patch 4 adds selftests to ensure functionality is correct.
    
    Changelog:
    ----------
    v7->v8
    v7: https://lore.kernel.org/bpf/20210624211304.90807-1-zeffron@riotgames.com/
    
     * Fix too long comment line in patch 3
    
    v6->v7
    v6: https://lore.kernel.org/bpf/20210617232904.1899-1-zeffron@riotgames.com/
    
     * Add Yonghong Song's Acked-by to commit message in patch 1
     * Add Yonghong Song's Acked-by to commit message in patch 2
     * Extracted the post-update of the xdp_md context into a function (again)
     * Validate that the rx queue was registered with XDP info
     * Decrement the reference count on a found netdevice on failure to find
      a valid rx queue
     * Decrement the reference count on a found netdevice after the XDP
      program is run
     * Drop Yonghong Song's Acked-By for patch 3 because of patch changes
     * Improve a comment in the selftests
     * Drop Yonghong Song's Acked-By for patch 4 because of patch changes
    
    v5->v6
    v5: https://lore.kernel.org/bpf/20210616224712.3243-1-zeffron@riotgames.com/
    
     * Correct commit messages in patches 1 and 3
     * Add Acked-by to commit message in patch 4
     * Use gotos instead of returns to correctly free resources in
      bpf_prog_test_run_xdp
     * Rename xdp_metalen_valid to xdp_metalen_invalid
     * Improve the function signature for xdp_metalen_invalid
     * Merged declaration of ingress_ifindex and rx_queue_index into one line
    
    v4->v5
    v4: https://lore.kernel.org/bpf/20210604220235.6758-1-zeffron@riotgames.com/
    
     * Add new patch to introduce xdp_metalen_valid inline function to avoid
      duplicated code from net/core/filter.c
     * Correct size of bad_ctx in selftests
     * Make all declarations reverse Christmas tree
     * Move data check from xdp_convert_md_to_buff to bpf_prog_test_run_xdp
     * Merge xdp_convert_buff_to_md into bpf_prog_test_run_xdp
     * Fix line too long
     * Extracted common checks in selftests to a helper function
     * Removed redundant assignment in selftests
     * Reordered test cases in selftests
     * Check data against 0 instead of data_meta in selftests
     * Made selftests use EINVAL instead of hardcoded 22
     * Dropped "_" from XDP function name
     * Changed casts in XDP program from unsigned long to long
     * Added a comment explaining the use of the loopback interface in selftests
     * Change parameter order in xdp_convert_md_to_buff to be input first
     * Assigned xdp->ingress_ifindex and xdp->rx_queue_index to local variables in
      xdp_convert_md_to_buff
     * Made use of "meta data" versus "metadata" consistent in comments and commit
      messages
    
    v3->v4
    v3: https://lore.kernel.org/bpf/20210602190815.8096-1-zeffron@riotgames.com/
    
     * Clean up nits
     * Validate xdp_md->data_end in bpf_prog_test_run_xdp
     * Remove intermediate metalen variables
    
    v2 -> v3
    v2: https://lore.kernel.org/bpf/20210527201341.7128-1-zeffron@riotgames.com/
    
     * Check errno first in selftests
     * Use DECLARE_LIBBPF_OPTS
     * Rename tattr to opts in selftests
     * Remove extra new line
     * Rename convert_xdpmd_to_xdpb to xdp_convert_md_to_buff
     * Rename convert_xdpb_to_xdpmd to xdp_convert_buff_to_md
     * Move declaration of device and rxqueue in xdp_convert_md_to_buff to
      patch 2
     * Reorder the kfree calls in bpf_prog_test_run_xdp
    
    v1 -> v2
    v1: https://lore.kernel.org/bpf/20210524220555.251473-1-zeffron@riotgames.com
    
     * Fix null pointer dereference with no context
     * Use the BPF skeleton and replace CHECK with ASSERT macros
    ====================
    
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Alexei Starovoitov committed Jul 8, 2021
  9. selftests/bpf: Add test for xdp_md context in BPF_PROG_TEST_RUN

    Add a test for using xdp_md as a context to BPF_PROG_TEST_RUN for XDP
    programs.
    
    The test uses a BPF program that takes in a return value from XDP
    meta data, then reduces the size of the XDP meta data by 4 bytes.
    
    Test cases validate the possible failure cases for passing in invalid
    xdp_md contexts, that the return value is successfully passed
    in, and that the adjusted meta data is successfully copied out.
    
    Co-developed-by: Cody Haas <chaas@riotgames.com>
    Co-developed-by: Lisa Watanabe <lwatanabe@riotgames.com>
    Signed-off-by: Cody Haas <chaas@riotgames.com>
    Signed-off-by: Lisa Watanabe <lwatanabe@riotgames.com>
    Signed-off-by: Zvi Effron <zeffron@riotgames.com>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Acked-by: Yonghong Song <yhs@fb.com>
    Link: https://lore.kernel.org/bpf/20210707221657.3985075-5-zeffron@riotgames.com
    zeffron authored and Alexei Starovoitov committed Jul 8, 2021
  10. bpf: Support specifying ingress via xdp_md context in BPF_PROG_TEST_RUN

    Support specifying the ingress_ifindex and rx_queue_index of xdp_md
    contexts for BPF_PROG_TEST_RUN.
    
    The intended use case is to allow testing XDP programs that make decisions
    based on the ingress interface or RX queue.
    
    If ingress_ifindex is specified, look up the device by the provided index
    in the current namespace and use its xdp_rxq for the xdp_buff. If the
    rx_queue_index is out of range, or is non-zero when the ingress_ifindex is
    0, return -EINVAL.
    
    Co-developed-by: Cody Haas <chaas@riotgames.com>
    Co-developed-by: Lisa Watanabe <lwatanabe@riotgames.com>
    Signed-off-by: Cody Haas <chaas@riotgames.com>
    Signed-off-by: Lisa Watanabe <lwatanabe@riotgames.com>
    Signed-off-by: Zvi Effron <zeffron@riotgames.com>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Acked-by: Yonghong Song <yhs@fb.com>
    Link: https://lore.kernel.org/bpf/20210707221657.3985075-4-zeffron@riotgames.com
    zeffron authored and Alexei Starovoitov committed Jul 8, 2021
  11. bpf: Support input xdp_md context in BPF_PROG_TEST_RUN

    Support passing a xdp_md via ctx_in/ctx_out in bpf_attr for
    BPF_PROG_TEST_RUN.
    
    The intended use case is to pass some XDP meta data to the test runs of
    XDP programs that are used as tail calls.
    
    For programs that use bpf_prog_test_run_xdp, support xdp_md input and
    output. Unlike with an actual xdp_md during a non-test run, data_meta must
    be 0 because it must point to the start of the provided user data. From
    the initial xdp_md, use data and data_end to adjust the pointers in the
    generated xdp_buff. All other non-zero fields are prohibited (with
    EINVAL). If the user has set ctx_out/ctx_size_out, copy the (potentially
    different) xdp_md back to the userspace.
    
    We require all fields of input xdp_md except the ones we explicitly
    support to be set to zero. The expectation is that in the future we might
    add support for more fields and we want to fail explicitly if the user
    runs the program on the kernel where we don't yet support them.
    
    Co-developed-by: Cody Haas <chaas@riotgames.com>
    Co-developed-by: Lisa Watanabe <lwatanabe@riotgames.com>
    Signed-off-by: Cody Haas <chaas@riotgames.com>
    Signed-off-by: Lisa Watanabe <lwatanabe@riotgames.com>
    Signed-off-by: Zvi Effron <zeffron@riotgames.com>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Acked-by: Yonghong Song <yhs@fb.com>
    Link: https://lore.kernel.org/bpf/20210707221657.3985075-3-zeffron@riotgames.com
    zeffron authored and Alexei Starovoitov committed Jul 8, 2021
  12. bpf: Add function for XDP meta data length check

    This commit prepares to use the XDP meta data length check in multiple
    places by making it into a static inline function instead of a literal.
    
    Co-developed-by: Cody Haas <chaas@riotgames.com>
    Co-developed-by: Lisa Watanabe <lwatanabe@riotgames.com>
    Signed-off-by: Cody Haas <chaas@riotgames.com>
    Signed-off-by: Lisa Watanabe <lwatanabe@riotgames.com>
    Signed-off-by: Zvi Effron <zeffron@riotgames.com>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Acked-by: Yonghong Song <yhs@fb.com>
    Link: https://lore.kernel.org/bpf/20210707221657.3985075-2-zeffron@riotgames.com
    zeffron authored and Alexei Starovoitov committed Jul 8, 2021

Commits on Jul 1, 2021

  1. Merge branch 'dsa-mv88e6xxx-topaz-fixes'

    Marek Behún says:
    
    ====================
    dsa: mv88e6xxx: Topaz fixes
    
    here comes some fixes for the Topaz family (Marvell 88E6141 / 88E6341)
    which I found out about when I compared the Topaz' operations
    structure with that one of Peridot (6390).
    
    This is v2. In v1, I accidentally sent patches generated from wrong
    branch and the 5th patch does not contain a necessary change in
    serdes.c.
    
    Changes from v1:
    - the fifth patch, "enable SerDes RX stats for Topaz", needs another
      change in serdes.c
    - Andrew's Reviewed-by to 1,2,3,4 and 6
    ====================
    
    Signed-off-by: David S. Miller <davem@davemloft.net>
    davem330 committed Jul 1, 2021
  2. net: dsa: mv88e6xxx: enable SerDes PCS register dump via ethtool -d o…

    …n Topaz
    
    Commit bf3504c ("net: dsa: mv88e6xxx: Add 6390 family PCS
    registers to ethtool -d") added support for dumping SerDes PCS registers
    via ethtool -d for Peridot.
    
    The same implementation is also valid for Topaz, but was not
    enabled at the time.
    
    Signed-off-by: Marek Behún <kabel@kernel.org>
    Fixes: bf3504c ("net: dsa: mv88e6xxx: Add 6390 family PCS registers to ethtool -d")
    Reviewed-by: Andrew Lunn <andrew@lunn.ch>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    elkablo authored and davem330 committed Jul 1, 2021
  3. net: dsa: mv88e6xxx: enable SerDes RX stats for Topaz

    Commit 0df9528 ("mv88e6xxx: Add serdes Rx statistics") added
    support for RX statistics on SerDes ports for Peridot.
    
    This same implementation is also valid for Topaz, but was not enabled
    at the time.
    
    We need to use the generic .serdes_get_lane() method instead of the
    Peridot specific one in the stats methods so that on Topaz the proper
    one is used.
    
    Signed-off-by: Marek Behún <kabel@kernel.org>
    Fixes: 0df9528 ("mv88e6xxx: Add serdes Rx statistics")
    Signed-off-by: David S. Miller <davem@davemloft.net>
    elkablo authored and davem330 committed Jul 1, 2021
  4. net: dsa: mv88e6xxx: enable devlink ATU hash param for Topaz

    Commit 23e8b47 ("net: dsa: mv88e6xxx: Add devlink param for ATU
    hash algorithm.") introduced ATU hash algorithm access via devlink, but
    did not enable it for Topaz.
    
    Enable this feature also for Topaz.
    
    Signed-off-by: Marek Behún <kabel@kernel.org>
    Fixes: 23e8b47 ("net: dsa: mv88e6xxx: Add devlink param for ATU hash algorithm.")
    Reviewed-by: Andrew Lunn <andrew@lunn.ch>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    elkablo authored and davem330 committed Jul 1, 2021
  5. net: dsa: mv88e6xxx: enable .rmu_disable() on Topaz

    Commit 9e5baf9 ("net: dsa: mv88e6xxx: add RMU disable op")
    introduced .rmu_disable() method with implementation for several models,
    but forgot to add Topaz, which can use the Peridot implementation.
    
    Use the Peridot implementation of .rmu_disable() on Topaz.
    
    Signed-off-by: Marek Behún <kabel@kernel.org>
    Fixes: 9e5baf9 ("net: dsa: mv88e6xxx: add RMU disable op")
    Reviewed-by: Andrew Lunn <andrew@lunn.ch>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    elkablo authored and davem330 committed Jul 1, 2021
  6. net: dsa: mv88e6xxx: use correct .stats_set_histogram() on Topaz

    Commit 40cff8f ("net: dsa: mv88e6xxx: Fix stats histogram mode")
    introduced wrong .stats_set_histogram() method for Topaz family.
    
    The Peridot method should be used instead.
    
    Signed-off-by: Marek Behún <kabel@kernel.org>
    Fixes: 40cff8f ("net: dsa: mv88e6xxx: Fix stats histogram mode")
    Reviewed-by: Andrew Lunn <andrew@lunn.ch>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    elkablo authored and davem330 committed Jul 1, 2021
  7. net: dsa: mv88e6xxx: enable .port_set_policy() on Topaz

    Commit f3a2cd3 ("net: dsa: mv88e6xxx: introduce .port_set_policy")
    introduced .port_set_policy() method with implementation for several
    models, but forgot to add Topaz, which can use the 6352 implementation.
    
    Use the 6352 implementation of .port_set_policy() on Topaz.
    
    Signed-off-by: Marek Behún <kabel@kernel.org>
    Fixes: f3a2cd3 ("net: dsa: mv88e6xxx: introduce .port_set_policy")
    Reviewed-by: Andrew Lunn <andrew@lunn.ch>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    elkablo authored and davem330 committed Jul 1, 2021
  8. net: dsa: return -EOPNOTSUPP when driver does not implement .port_lag…

    …_join
    
    The DSA core has a layered structure, and even though we end up
    returning 0 (success) to user space when setting a bonding/team upper
    that can't be offloaded, some parts of the framework actually need to
    know that we couldn't offload that.
    
    For example, if dsa_switch_lag_join returns 0 as it currently does,
    dsa_port_lag_join has no way to tell a successful offload from a
    software fallback, and it will call dsa_port_bridge_join afterwards.
    Then we'll think we're offloading the bridge master of the LAG, when in
    fact we're not even offloading the LAG. In turn, this will make us set
    skb->offload_fwd_mark = true, which is incorrect and the bridge doesn't
    like it.
    
    Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
    Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    vladimiroltean authored and davem330 committed Jul 1, 2021
  9. Merge branch 'octeopntx2-LMTST-regions'

    Geetha sowjanya says:
    
    ====================
    Dynamic LMTST region setup
    
    This patch series allows RVU PF/VF to allocate memory for
    LMTST operations instead of using memory reserved by firmware
    which is mapped as device memory.
    The LMTST mapping table contains the RVU PF/VF LMTST memory base
    address entries. This table is used by hardware for LMTST operations.
    Patch1 introduces new mailbox message to update the LMTST table with
    the new allocated memory address.
    ====================
    
    Signed-off-by: David S. Miller <davem@davemloft.net>
    davem330 committed Jul 1, 2021
  10. octeontx2-pf: cn10k: Use runtime allocated LMTLINE region

    The current driver uses static LMTST region allocated by firmware.
    This memory gets populated as PF/VF BAR2. RVU PF/VF driver ioremap
    the memory as device memory for NIX/NPA operation. Since the memory
    is mapped as device memory we see performance degration. To address
    this issue this patch implements runtime memory allocation.
    RVU PF/VF allocates memory during device probe and share the base
    address with RVU AF. RVU AF then configure the LMT MAP table
    accordingly.
    
    Signed-off-by: Geetha sowjanya <gakula@marvell.com>
    Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Geetha sowjanya authored and davem330 committed Jul 1, 2021
  11. octeontx2-af: cn10k: Support configurable LMTST regions

    This patch extends the lmtst_tbl_setup_req mbox to support run time
    LMTST configuration.
    RVU PF/VF and DPDK/ODP allocates a LMT region, creates a translation
    entry for a device via VFIO IOCTLs.
    This IOVA is shared with AF through above mbox. AF then uses
    RVU_SMMU transulation Widget and gets PA for the IOVA and updates
    the LMTtable entry for that device.
    
    Signed-off-by: Geetha sowjanya <gakula@marvell.com>
    Signed-off-by: Sunil Kovvuri Goutham <sgoutham@marvell.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Geetha sowjanya authored and davem330 committed Jul 1, 2021
  12. octeontx2-af: cn10k: Setting up lmtst map table

    Introducing a new mailbox to support updating lmt entries
    and common lmt base address scheme i.e. multiple pcifuncs
    can share lmt region to reduce L1 cache pressure for application.
    Parameters passed to mailbox includes the primary pcifunc
    value whose lmt regions will be shared by other secondary
    pcifuncs. Here secondary pcifunc will be the one who is
    calling the mailbox.
    For example:
    By default each pcifunc has its own LMT base address:
            PCIFUNC1    LMT_BASE_ADDR A
            PCIFUNC2    LMT_BASE_ADDR B
            PCIFUNC3    LMT_BASE_ADDR C
            PCIFUNC4    LMT_BASE_ADDR D
    Application will choose PCIFUNC1 as base/primary pcifunc
    and as and when other pcifunc(secondary pcifuncs) gets
    probed, this mailbox will be called and LMTST table will
    be updated as:
            PCIFUNC1    LMT_BASE_ADDR A
            PCIFUNC2    LMT_BASE_ADDR A
            PCIFUNC3    LMT_BASE_ADDR A
            PCIFUNC4    LMT_BASE_ADDR A
    
    On FLR lmtst map table gets resetted to the default lmt
    base addresses for all secondary pcifuncs.
    
    Signed-off-by: Harman Kalra <hkalra@marvell.com>
    Signed-off-by: Geetha sowjanya <gakula@marvell.com>
    Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Harman Kalra authored and davem330 committed Jul 1, 2021

Commits on Jun 30, 2021

  1. Merge tag 'net-next-5.14' of git://git.kernel.org/pub/scm/linux/kerne…

    …l/git/netdev/net-next
    
    Pull networking updates from Jakub Kicinski:
     "Core:
    
       - BPF:
          - add syscall program type and libbpf support for generating
            instructions and bindings for in-kernel BPF loaders (BPF loaders
            for BPF), this is a stepping stone for signed BPF programs
          - infrastructure to migrate TCP child sockets from one listener to
            another in the same reuseport group/map to improve flexibility
            of service hand-off/restart
          - add broadcast support to XDP redirect
    
       - allow bypass of the lockless qdisc to improving performance (for
         pktgen: +23% with one thread, +44% with 2 threads)
    
       - add a simpler version of "DO_ONCE()" which does not require jump
         labels, intended for slow-path usage
    
       - virtio/vsock: introduce SOCK_SEQPACKET support
    
       - add getsocketopt to retrieve netns cookie
    
       - ip: treat lowest address of a IPv4 subnet as ordinary unicast
         address allowing reclaiming of precious IPv4 addresses
    
       - ipv6: use prandom_u32() for ID generation
    
       - ip: add support for more flexible field selection for hashing
         across multi-path routes (w/ offload to mlxsw)
    
       - icmp: add support for extended RFC 8335 PROBE (ping)
    
       - seg6: add support for SRv6 End.DT46 behavior
    
       - mptcp:
          - DSS checksum support (RFC 8684) to detect middlebox meddling
          - support Connection-time 'C' flag
          - time stamping support
    
       - sctp: packetization Layer Path MTU Discovery (RFC 8899)
    
       - xfrm: speed up state addition with seq set
    
       - WiFi:
          - hidden AP discovery on 6 GHz and other HE 6 GHz improvements
          - aggregation handling improvements for some drivers
          - minstrel improvements for no-ack frames
          - deferred rate control for TXQs to improve reaction times
          - switch from round robin to virtual time-based airtime scheduler
    
       - add trace points:
          - tcp checksum errors
          - openvswitch - action execution, upcalls
          - socket errors via sk_error_report
    
      Device APIs:
    
       - devlink: add rate API for hierarchical control of max egress rate
         of virtual devices (VFs, SFs etc.)
    
       - don't require RCU read lock to be held around BPF hooks in NAPI
         context
    
       - page_pool: generic buffer recycling
    
      New hardware/drivers:
    
       - mobile:
          - iosm: PCIe Driver for Intel M.2 Modem
          - support for Qualcomm MSM8998 (ipa)
    
       - WiFi: Qualcomm QCN9074 and WCN6855 PCI devices
    
       - sparx5: Microchip SparX-5 family of Enterprise Ethernet switches
    
       - Mellanox BlueField Gigabit Ethernet (control NIC of the DPU)
    
       - NXP SJA1110 Automotive Ethernet 10-port switch
    
       - Qualcomm QCA8327 switch support (qca8k)
    
       - Mikrotik 10/25G NIC (atl1c)
    
      Driver changes:
    
       - ACPI support for some MDIO, MAC and PHY devices from Marvell and
         NXP (our first foray into MAC/PHY description via ACPI)
    
       - HW timestamping (PTP) support: bnxt_en, ice, sja1105, hns3, tja11xx
    
       - Mellanox/Nvidia NIC (mlx5)
          - NIC VF offload of L2 bridging
          - support IRQ distribution to Sub-functions
    
       - Marvell (prestera):
          - add flower and match all
          - devlink trap
          - link aggregation
    
       - Netronome (nfp): connection tracking offload
    
       - Intel 1GE (igc): add AF_XDP support
    
       - Marvell DPU (octeontx2): ingress ratelimit offload
    
       - Google vNIC (gve): new ring/descriptor format support
    
       - Qualcomm mobile (rmnet & ipa): inline checksum offload support
    
       - MediaTek WiFi (mt76)
          - mt7915 MSI support
          - mt7915 Tx status reporting
          - mt7915 thermal sensors support
          - mt7921 decapsulation offload
          - mt7921 enable runtime pm and deep sleep
    
       - Realtek WiFi (rtw88)
          - beacon filter support
          - Tx antenna path diversity support
          - firmware crash information via devcoredump
    
       - Qualcomm WiFi (wcn36xx)
          - Wake-on-WLAN support with magic packets and GTK rekeying
    
       - Micrel PHY (ksz886x/ksz8081): add cable test support"
    
    * tag 'net-next-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2168 commits)
      tcp: change ICSK_CA_PRIV_SIZE definition
      tcp_yeah: check struct yeah size at compile time
      gve: DQO: Fix off by one in gve_rx_dqo()
      stmmac: intel: set PCI_D3hot in suspend
      stmmac: intel: Enable PHY WOL option in EHL
      net: stmmac: option to enable PHY WOL with PMT enabled
      net: say "local" instead of "static" addresses in ndo_dflt_fdb_{add,del}
      net: use netdev_info in ndo_dflt_fdb_{add,del}
      ptp: Set lookup cookie when creating a PTP PPS source.
      net: sock: add trace for socket errors
      net: sock: introduce sk_error_report
      net: dsa: replay the local bridge FDB entries pointing to the bridge dev too
      net: dsa: ensure during dsa_fdb_offload_notify that dev_hold and dev_put are on the same dev
      net: dsa: include fdb entries pointing to bridge in the host fdb list
      net: dsa: include bridge addresses which are local in the host fdb list
      net: dsa: sync static FDB entries on foreign interfaces to hardware
      net: dsa: install the host MDB and FDB entries in the master's RX filter
      net: dsa: reference count the FDB addresses at the cross-chip notifier level
      net: dsa: introduce a separate cross-chip notifier type for host FDBs
      net: dsa: reference count the MDB entries at the cross-chip notifier level
      ...
    torvalds committed Jun 30, 2021
  2. Merge tag 'sched-urgent-2021-06-30' of git://git.kernel.org/pub/scm/l…

    …inux/kernel/git/tip/tip
    
    Pull scheduler fixes from Ingo Molnar:
    
     - Fix a small inconsistency (bug) in load tracking, caught by a new
       warning that several people reported.
    
     - Flip CONFIG_SCHED_CORE to default-disabled, and update the Kconfig
       help text.
    
    * tag 'sched-urgent-2021-06-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
      sched/core: Disable CONFIG_SCHED_CORE by default
      sched/fair: Ensure _sum and _avg values stay consistent
    torvalds committed Jun 30, 2021
  3. Merge tag 'microblaze-v5.14' of git://git.monstr.eu/linux-2.6-microblaze

    Pull microblaze updates from Michal Simek:
    
     - Remove unused PAGE_UP/DOWN macros
    
     - Fix trivial spelling mistake
    
    * tag 'microblaze-v5.14' of git://git.monstr.eu/linux-2.6-microblaze:
      arch: microblaze: Fix spelling mistake "vesion" -> "version"
      microblaze: Cleanup unused functions
    torvalds committed Jun 30, 2021
  4. Merge tag 'safesetid-5.14' of git://github.com/micah-morton/linux

    Pull SafeSetID update from Micah Morton:
     "One very minor code cleanup change that marks a variable as
      __initdata"
    
    * tag 'safesetid-5.14' of git://github.com/micah-morton/linux:
      LSM: SafeSetID: Mark safesetid_initialized as __initdata
    torvalds committed Jun 30, 2021
  5. Merge tag 'Smack-for-5.14' of git://github.com/cschaufler/smack-next

    Pull smack updates from Casey Schaufler:
     "There is nothing more significant than an improvement to a byte count
      check in smackfs.
    
      All changes have been in next for weeks"
    
    * tag 'Smack-for-5.14' of git://github.com/cschaufler/smack-next:
      Smack: fix doc warning
      Revert "Smack: Handle io_uring kernel thread privileges"
      smackfs: restrict bytes count in smk_set_cipso()
      security/smack/: fix misspellings using codespell tool
    torvalds committed Jun 30, 2021
  6. Merge tag 'audit-pr-20210629' of git://git.kernel.org/pub/scm/linux/k…

    …ernel/git/pcmoore/audit
    
    Pull audit updates from Paul Moore:
     "Another merge window, another small audit pull request.
    
      Four patches in total: one is cosmetic, one removes an unnecessary
      initialization, one renames some enum values to prevent name
      collisions, and one converts list_del()/list_add() to list_move().
    
      None of these are earth shattering and all pass the audit-testsuite
      tests while merging cleanly on top of your tree from earlier today"
    
    * tag 'audit-pr-20210629' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
      audit: remove unnecessary 'ret' initialization
      audit: remove trailing spaces and tabs
      audit: Use list_move instead of list_del/list_add
      audit: Rename enum audit_state constants to avoid AUDIT_DISABLED redefinition
      audit: add blank line after variable declarations
    torvalds committed Jun 30, 2021
  7. Merge tag 'selinux-pr-20210629' of git://git.kernel.org/pub/scm/linux…

    …/kernel/git/pcmoore/selinux
    
    Pull SELinux updates from Paul Moore:
    
     - The slow_avc_audit() function is now non-blocking so we can remove
       the AVC_NONBLOCKING tricks; this also includes the 'flags' variant of
       avc_has_perm().
    
     - Use kmemdup() instead of kcalloc()+copy when copying parts of the
       SELinux policydb.
    
     - The InfiniBand device name is now passed by reference when possible
       in the SELinux code, removing a strncpy().
    
     - Minor cleanups including: constification of avtab function args,
       removal of useless LSM/XFRM function args, SELinux kdoc fixes, and
       removal of redundant assignments.
    
    * tag 'selinux-pr-20210629' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
      selinux: kill 'flags' argument in avc_has_perm_flags() and avc_audit()
      selinux: slow_avc_audit has become non-blocking
      selinux: Fix kernel-doc
      selinux: use __GFP_NOWARN with GFP_NOWAIT in the AVC
      lsm_audit,selinux: pass IB device name by reference
      selinux: Remove redundant assignment to rc
      selinux: Corrected comment to match kernel-doc comment
      selinux: delete selinux_xfrm_policy_lookup() useless argument
      selinux: constify some avtab function arguments
      selinux: simplify duplicate_policydb_cond_list() by using kmemdup()
    torvalds committed Jun 30, 2021
  8. Merge tag 'clang-features-v5.14-rc1' of git://git.kernel.org/pub/scm/…

    …linux/kernel/git/kees/linux
    
    Pull clang feature updates from Kees Cook:
    
     - Add CC_HAS_NO_PROFILE_FN_ATTR in preparation for PGO support in the
       face of the noinstr attribute, paving the way for PGO and fixing
       GCOV. (Nick Desaulniers)
    
     - x86_64 LTO coverage is expanded to 32-bit x86. (Nathan Chancellor)
    
     - Small fixes to CFI. (Mark Rutland, Nathan Chancellor)
    
    * tag 'clang-features-v5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
      qemu_fw_cfg: Make fw_cfg_rev_attr a proper kobj_attribute
      Kconfig: Introduce ARCH_WANTS_NO_INSTR and CC_HAS_NO_PROFILE_FN_ATTR
      compiler_attributes.h: cleanups for GCC 4.9+
      compiler_attributes.h: define __no_profile, add to noinstr
      x86, lto: Enable Clang LTO for 32-bit as well
      CFI: Move function_nocfi() into compiler.h
      MAINTAINERS: Add Clang CFI section
    torvalds committed Jun 30, 2021
Older