Skip to content
Permalink
Apeksha-Gupta/…
Switch branches/tags

Commits on Nov 10, 2021

  1. MAINTAINERS: added new files

    fec_phy.h and fec_phy.c files are added in net/fec branch.
    
    Signed-off-by: Sachin Saxena <sachin.saxena@nxp.com>
    Signed-off-by: Apeksha Gupta <apeksha.gupta@nxp.com>
    Apeksha Gupta authored and intel-lab-lkp committed Nov 10, 2021
  2. fec_main: removed PHY functions

    Moved the phy related API in separate 'fec_phy.c' file.
    Splitting the PHY functionality from main FEC driver.
    
    Signed-off-by: Sachin Saxena <sachin.saxena@nxp.com>
    Signed-off-by: Apeksha Gupta <apeksha.gupta@nxp.com>
    Apeksha Gupta authored and intel-lab-lkp committed Nov 10, 2021
  3. fec_phy: add new PHY file

    Added common file for both fec and fec_uio driver.
    fec_phy.h and fec_phy.c have phy related API's.
    Now the PHY functions can be used in both FEC and
    FEC_UIO driver independently.
    
    Signed-off-by: Sachin Saxena <sachin.saxena@nxp.com>
    Signed-off-by: Apeksha Gupta <apeksha.gupta@nxp.com>
    Apeksha Gupta authored and intel-lab-lkp committed Nov 10, 2021

Commits on Nov 2, 2021

  1. Revert "net: avoid double accounting for pure zerocopy skbs"

    This reverts commit f1a456f.
    
      WARNING: CPU: 1 PID: 6819 at net/core/skbuff.c:5429 skb_try_coalesce+0x78b/0x7e0
      CPU: 1 PID: 6819 Comm: xxxxxxx Kdump: loaded Tainted: G S                5.15.0-04194-gd852503f7711 torvalds#16
      RIP: 0010:skb_try_coalesce+0x78b/0x7e0
      Code: e8 2a bf 41 ff 44 8b b3 bc 00 00 00 48 8b 7c 24 30 e8 19 c0 41 ff 44 89 f0 48 03 83 c0 00 00 00 48 89 44 24 40 e9 47 fb ff ff <0f> 0b e9 ca fc ff ff 4c 8d 70 ff 48 83 c0 07 48 89 44 24 38 e9 61
      RSP: 0018:ffff88881f449688 EFLAGS: 00010282
      RAX: 00000000fffffe96 RBX: ffff8881566e4460 RCX: ffffffff82079f7e
      RDX: 0000000000000003 RSI: dffffc0000000000 RDI: ffff8881566e47b0
      RBP: ffff8881566e46e0 R08: ffffed102619235d R09: ffffed102619235d
      R10: ffff888130c91ae3 R11: ffffed102619235c R12: ffff88881f4498a0
      R13: 0000000000000056 R14: 0000000000000009 R15: ffff888130c91ac0
      FS:  00007fec2cbb9700(0000) GS:ffff88881f440000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007fec1b060d80 CR3: 00000003acf94005 CR4: 00000000003706e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       <IRQ>
       tcp_try_coalesce+0xeb/0x290
       ? tcp_parse_options+0x610/0x610
       ? mark_held_locks+0x79/0xa0
       tcp_queue_rcv+0x69/0x2f0
       tcp_rcv_established+0xa49/0xd40
       ? tcp_data_queue+0x18a0/0x18a0
       tcp_v6_do_rcv+0x1c9/0x880
       ? rt6_mtu_change_route+0x100/0x100
       tcp_v6_rcv+0x1624/0x1830
    
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Jakub Kicinski committed Nov 2, 2021
  2. Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

    Merge in the fixes we had queued in case there was another -rc.
    
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Jakub Kicinski committed Nov 2, 2021
  3. Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

    Alexei Starovoitov says:
    
    ====================
    pull-request: bpf-next 2021-11-01
    
    We've added 181 non-merge commits during the last 28 day(s) which contain
    a total of 280 files changed, 11791 insertions(+), 5879 deletions(-).
    
    The main changes are:
    
    1) Fix bpf verifier propagation of 64-bit bounds, from Alexei.
    
    2) Parallelize bpf test_progs, from Yucong and Andrii.
    
    3) Deprecate various libbpf apis including af_xdp, from Andrii, Hengqi, Magnus.
    
    4) Improve bpf selftests on s390, from Ilya.
    
    5) bloomfilter bpf map type, from Joanne.
    
    6) Big improvements to JIT tests especially on Mips, from Johan.
    
    7) Support kernel module function calls from bpf, from Kumar.
    
    8) Support typeless and weak ksym in light skeleton, from Kumar.
    
    9) Disallow unprivileged bpf by default, from Pawan.
    
    10) BTF_KIND_DECL_TAG support, from Yonghong.
    
    11) Various bpftool cleanups, from Quentin.
    
    * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (181 commits)
      libbpf: Deprecate AF_XDP support
      kbuild: Unify options for BTF generation for vmlinux and modules
      selftests/bpf: Add a testcase for 64-bit bounds propagation issue.
      bpf: Fix propagation of signed bounds from 64-bit min/max into 32-bit.
      bpf: Fix propagation of bounds from 64-bit min/max into 32-bit and var_off.
      selftests/bpf: Fix also no-alu32 strobemeta selftest
      bpf: Add missing map_delete_elem method to bloom filter map
      selftests/bpf: Add bloom map success test for userspace calls
      bpf: Add alignment padding for "map_extra" + consolidate holes
      bpf: Bloom filter map naming fixups
      selftests/bpf: Add test cases for struct_ops prog
      bpf: Add dummy BPF STRUCT_OPS for test purpose
      bpf: Factor out helpers for ctx access checking
      bpf: Factor out a helper to prepare trampoline for struct_ops prog
      selftests, bpf: Fix broken riscv build
      riscv, libbpf: Add RISC-V (RV64) support to bpf_tracing.h
      tools, build: Add RISC-V to HOSTARCH parsing
      riscv, bpf: Increase the maximum number of iterations
      selftests, bpf: Add one test for sockmap with strparser
      selftests, bpf: Fix test_txmsg_ingress_parser error
      ...
    ====================
    
    Link: https://lore.kernel.org/r/20211102013123.9005-1-alexei.starovoitov@gmail.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Jakub Kicinski committed Nov 2, 2021
  4. Merge branch 'make-neighbor-eviction-controllable-by-userspace'

    James Prestwood says:
    
    ====================
    Make neighbor eviction controllable by userspace
    ====================
    
    Link: https://lore.kernel.org/r/20211101173630.300969-1-prestwoj@gmail.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Jakub Kicinski committed Nov 2, 2021
  5. selftests: net: add arp_ndisc_evict_nocarrier

    This tests the sysctl options for ARP/ND:
    
    /net/ipv4/conf/<iface>/arp_evict_nocarrier
    /net/ipv4/conf/all/arp_evict_nocarrier
    /net/ipv6/conf/<iface>/ndisc_evict_nocarrier
    /net/ipv6/conf/all/ndisc_evict_nocarrier
    
    Signed-off-by: James Prestwood <prestwoj@gmail.com>
    Reviewed-by: David Ahern <dsahern@kernel.org>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    jprestwo authored and Jakub Kicinski committed Nov 2, 2021
  6. net: ndisc: introduce ndisc_evict_nocarrier sysctl parameter

    In most situations the neighbor discovery cache should be cleared on a
    NOCARRIER event which is currently done unconditionally. But for wireless
    roams the neighbor discovery cache can and should remain intact since
    the underlying network has not changed.
    
    This patch introduces a sysctl option ndisc_evict_nocarrier which can
    be disabled by a wireless supplicant during a roam. This allows packets
    to be sent after a roam immediately without having to wait for
    neighbor discovery.
    
    A user reported roughly a 1 second delay after a roam before packets
    could be sent out (note, on IPv4). This delay was due to the ARP
    cache being cleared. During testing of this same scenario using IPv6
    no delay was noticed, but regardless there is no reason to clear
    the ndisc cache for wireless roams.
    
    Signed-off-by: James Prestwood <prestwoj@gmail.com>
    Reviewed-by: David Ahern <dsahern@kernel.org>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    jprestwo authored and Jakub Kicinski committed Nov 2, 2021
  7. net: arp: introduce arp_evict_nocarrier sysctl parameter

    This change introduces a new sysctl parameter, arp_evict_nocarrier.
    When set (default) the ARP cache will be cleared on a NOCARRIER event.
    This new option has been defaulted to '1' which maintains existing
    behavior.
    
    Clearing the ARP cache on NOCARRIER is relatively new, introduced by:
    
    commit 859bd2e
    Author: David Ahern <dsahern@gmail.com>
    Date:   Thu Oct 11 20:33:49 2018 -0700
    
        net: Evict neighbor entries on carrier down
    
    The reason for this changes is to prevent the ARP cache from being
    cleared when a wireless device roams. Specifically for wireless roams
    the ARP cache should not be cleared because the underlying network has not
    changed. Clearing the ARP cache in this case can introduce significant
    delays sending out packets after a roam.
    
    A user reported such a situation here:
    
    https://lore.kernel.org/linux-wireless/CACsRnHWa47zpx3D1oDq9JYnZWniS8yBwW1h0WAVZ6vrbwL_S0w@mail.gmail.com/
    
    After some investigation it was found that the kernel was holding onto
    packets until ARP finished which resulted in this 1 second delay. It
    was also found that the first ARP who-has was never responded to,
    which is actually what caues the delay. This change is more or less
    working around this behavior, but again, there is no reason to clear
    the cache on a roam anyways.
    
    As for the unanswered who-has, we know the packet made it OTA since
    it was seen while monitoring. Why it never received a response is
    unknown. In any case, since this is a problem on the AP side of things
    all that can be done is to work around it until it is solved.
    
    Some background on testing/reproducing the packet delay:
    
    Hardware:
     - 2 access points configured for Fast BSS Transition (Though I don't
       see why regular reassociation wouldn't have the same behavior)
     - Wireless station running IWD as supplicant
     - A device on network able to respond to pings (I used one of the APs)
    
    Procedure:
     - Connect to first AP
     - Ping once to establish an ARP entry
     - Start a tcpdump
     - Roam to second AP
     - Wait for operstate UP event, and note the timestamp
     - Start pinging
    
    Results:
    
    Below is the tcpdump after UP. It was recorded the interface went UP at
    10:42:01.432875.
    
    10:42:01.461871 ARP, Request who-has 192.168.254.1 tell 192.168.254.71, length 28
    10:42:02.497976 ARP, Request who-has 192.168.254.1 tell 192.168.254.71, length 28
    10:42:02.507162 ARP, Reply 192.168.254.1 is-at ac:86:74:55:b0:20, length 46
    10:42:02.507185 IP 192.168.254.71 > 192.168.254.1: ICMP echo request, id 52792, seq 1, length 64
    10:42:02.507205 IP 192.168.254.71 > 192.168.254.1: ICMP echo request, id 52792, seq 2, length 64
    10:42:02.507212 IP 192.168.254.71 > 192.168.254.1: ICMP echo request, id 52792, seq 3, length 64
    10:42:02.507219 IP 192.168.254.71 > 192.168.254.1: ICMP echo request, id 52792, seq 4, length 64
    10:42:02.507225 IP 192.168.254.71 > 192.168.254.1: ICMP echo request, id 52792, seq 5, length 64
    10:42:02.507232 IP 192.168.254.71 > 192.168.254.1: ICMP echo request, id 52792, seq 6, length 64
    10:42:02.515373 IP 192.168.254.1 > 192.168.254.71: ICMP echo reply, id 52792, seq 1, length 64
    10:42:02.521399 IP 192.168.254.1 > 192.168.254.71: ICMP echo reply, id 52792, seq 2, length 64
    10:42:02.521612 IP 192.168.254.1 > 192.168.254.71: ICMP echo reply, id 52792, seq 3, length 64
    10:42:02.521941 IP 192.168.254.1 > 192.168.254.71: ICMP echo reply, id 52792, seq 4, length 64
    10:42:02.522419 IP 192.168.254.1 > 192.168.254.71: ICMP echo reply, id 52792, seq 5, length 64
    10:42:02.523085 IP 192.168.254.1 > 192.168.254.71: ICMP echo reply, id 52792, seq 6, length 64
    
    You can see the first ARP who-has went out very quickly after UP, but
    was never responded to. Nearly a second later the kernel retries and
    gets a response. Only then do the ping packets go out. If an ARP entry
    is manually added prior to UP (after the cache is cleared) it is seen
    that the first ping is never responded to, so its not only an issue with
    ARP but with data packets in general.
    
    As mentioned prior, the wireless interface was also monitored to verify
    the ping/ARP packet made it OTA which was observed to be true.
    
    Signed-off-by: James Prestwood <prestwoj@gmail.com>
    Reviewed-by: David Ahern <dsahern@kernel.org>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    jprestwo authored and Jakub Kicinski committed Nov 2, 2021
  8. libbpf: Deprecate AF_XDP support

    Deprecate AF_XDP support in libbpf ([0]). This has been moved to
    libxdp as it is a better fit for that library. The AF_XDP support only
    uses the public libbpf functions and can therefore just use libbpf as
    a library from libxdp. The libxdp APIs are exactly the same so it
    should just be linking with libxdp instead of libbpf for the AF_XDP
    functionality. If not, please submit a bug report. Linking with both
    libraries is supported but make sure you link in the correct order so
    that the new functions in libxdp are used instead of the deprecated
    ones in libbpf.
    
    Libxdp can be found at https://github.com/xdp-project/xdp-tools.
    
      [0] Closes: libbpf/libbpf#270
    
    Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
    Link: https://lore.kernel.org/bpf/20211029090111.4733-1-magnus.karlsson@gmail.com
    magnus-karlsson authored and anakryiko committed Nov 2, 2021
  9. kbuild: Unify options for BTF generation for vmlinux and modules

    Using new PAHOLE_FLAGS variable to pass extra arguments to
    pahole for both vmlinux and modules BTF data generation.
    
    Adding new scripts/pahole-flags.sh script that detect and
    prints pahole options.
    
    [ fixed issues found by kernel test robot ]
    
    Signed-off-by: Jiri Olsa <jolsa@kernel.org>
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Acked-by: Andrii Nakryiko <andrii@kernel.org>
    Link: https://lore.kernel.org/bpf/20211029125729.70002-1-jolsa@kernel.org
    Jiri Olsa authored and anakryiko committed Nov 2, 2021
  10. selftests/bpf: Add a testcase for 64-bit bounds propagation issue.

    ./test_progs-no_alu32 -vv -t twfw
    
    Before the 64-bit_into_32-bit fix:
    19: (25) if r1 > 0x3f goto pc+6
     R1_w=inv(id=0,umax_value=63,var_off=(0x0; 0xff),s32_max_value=255,u32_max_value=255)
    
    and eventually:
    
    invalid access to map value, value_size=8 off=7 size=8
    R6 max value is outside of the allowed memory range
    libbpf: failed to load object 'no_alu32/twfw.o'
    
    After the fix:
    19: (25) if r1 > 0x3f goto pc+6
     R1_w=inv(id=0,umax_value=63,var_off=(0x0; 0x3f))
    
    verif_twfw:OK
    
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Acked-by: Yonghong Song <yhs@fb.com>
    Link: https://lore.kernel.org/bpf/20211101222153.78759-3-alexei.starovoitov@gmail.com
    Alexei Starovoitov authored and anakryiko committed Nov 2, 2021
  11. bpf: Fix propagation of signed bounds from 64-bit min/max into 32-bit.

    Similar to unsigned bounds propagation fix signed bounds.
    The 'Fixes' tag is a hint. There is no security bug here.
    The verifier was too conservative.
    
    Fixes: 3f50f13 ("bpf: Verifier, do explicit ALU32 bounds tracking")
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Acked-by: Yonghong Song <yhs@fb.com>
    Link: https://lore.kernel.org/bpf/20211101222153.78759-2-alexei.starovoitov@gmail.com
    Alexei Starovoitov authored and anakryiko committed Nov 2, 2021
  12. bpf: Fix propagation of bounds from 64-bit min/max into 32-bit and va…

    …r_off.
    
    Before this fix:
    166: (b5) if r2 <= 0x1 goto pc+22
    from 166 to 189: R2=invP(id=1,umax_value=1,var_off=(0x0; 0xffffffff))
    
    After this fix:
    166: (b5) if r2 <= 0x1 goto pc+22
    from 166 to 189: R2=invP(id=1,umax_value=1,var_off=(0x0; 0x1))
    
    While processing BPF_JLE the reg_set_min_max() would set true_reg->umax_value = 1
    and call __reg_combine_64_into_32(true_reg).
    
    Without the fix it would not pass the condition:
    if (__reg64_bound_u32(reg->umin_value) && __reg64_bound_u32(reg->umax_value))
    
    since umin_value == 0 at this point.
    Before commit 10bf4e8 the umin was incorrectly ingored.
    The commit 10bf4e8 fixed the correctness issue, but pessimized
    propagation of 64-bit min max into 32-bit min max and corresponding var_off.
    
    Fixes: 10bf4e8 ("bpf: Fix propagation of 32 bit unsigned bounds from 64 bit bounds")
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Acked-by: Yonghong Song <yhs@fb.com>
    Link: https://lore.kernel.org/bpf/20211101222153.78759-1-alexei.starovoitov@gmail.com
    Alexei Starovoitov authored and anakryiko committed Nov 2, 2021

Commits on Nov 1, 2021

  1. net: vmxnet3: remove multiple false checks in vmxnet3_ethtool.c

    In one if branch, (ec->rx_coalesce_usecs != 0) is checked.  When it is
    checked again in two more places, it is always false and has no effect
    on the whole check expression.  We should remove it in both places.
    
    In another if branch, (ec->use_adaptive_rx_coalesce != 0) is checked.
    When it is checked again, it is always false.  We should remove the
    entire branch with it.
    
    In addition we might as well let C precedence dictate by getting rid of
    two pairs of parentheses in the neighboring lines in order to keep
    expressions on both sides of '||' in balance with checkpatch warning
    silenced.
    
    Signed-off-by: Jean Sacren <sakiwit@gmail.com>
    Link: https://lore.kernel.org/r/20211031012728.8325-1-sakiwit@gmail.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    sacren authored and Jakub Kicinski committed Nov 1, 2021
  2. Merge branch 'accurate-memory-charging-for-msg_zerocopy'

    Talal Ahmad says:
    
    ====================
    Accurate Memory Charging For MSG_ZEROCOPY
    
    This series improves the accuracy of msg_zerocopy memory accounting.
    At present, when msg_zerocopy is used memory is charged twice for the
    data - once when user space allocates it, and then again within
    __zerocopy_sg_from_iter. The memory charging in the kernel is excessive
    because data is held in user pages and is never actually copied to skb
    fragments. This leads to incorrectly inflated memory statistics for
    programs passing MSG_ZEROCOPY.
    
    We reduce this inaccuracy by introducing the notion of "pure" zerocopy
    SKBs - where all the frags in the SKB are backed by pinned userspace
    pages, and none are backed by copied pages. For such SKBs, tracked via
    the new SKBFL_PURE_ZEROCOPY flag, we elide sk_mem_charge/uncharge
    calls, leading to more accurate accounting.
    
    However, SKBs can also be coalesced by the stack at present,
    potentially leading to "impure" SKBs. We restrict this coalescing so
    it can only happen within the sendmsg() system call itself, for the
    most recently allocated SKB. While this can lead to a small degree of
    double-charging of memory, this case does not arise often in practice
    for workloads that set MSG_ZEROCOPY.
    
    Testing verified that memory usage in the kernel is lowered.
    Instrumentation with counters also showed that accounting at time
    charging and uncharging is balanced.
    ====================
    
    Link: https://lore.kernel.org/r/20211030020542.3870542-1-mailtalalahmad@gmail.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Jakub Kicinski committed Nov 1, 2021
  3. net: avoid double accounting for pure zerocopy skbs

    Track skbs with only zerocopy data and avoid charging them to kernel
    memory to correctly account the memory utilization for msg_zerocopy.
    All of the data in such skbs is held in user pages which are already
    accounted to user. Before this change, they are charged again in
    kernel in __zerocopy_sg_from_iter. The charging in kernel is
    excessive because data is not being copied into skb frags. This
    excessive charging can lead to kernel going into memory pressure
    state which impacts all sockets in the system adversely. Mark pure
    zerocopy skbs with a SKBFL_PURE_ZEROCOPY flag and remove
    charge/uncharge for data in such skbs.
    
    Initially, an skb is marked pure zerocopy when it is empty and in
    zerocopy path. skb can then change from a pure zerocopy skb to mixed
    data skb (zerocopy and copy data) if it is at tail of write queue and
    there is room available in it and non-zerocopy data is being sent in
    the next sendmsg call. At this time sk_mem_charge is done for the pure
    zerocopied data and the pure zerocopy flag is unmarked. We found that
    this happens very rarely on workloads that pass MSG_ZEROCOPY.
    
    A pure zerocopy skb can later be coalesced into normal skb if they are
    next to each other in queue but this patch prevents coalescing from
    happening. This avoids complexity of charging when skb downgrades from
    pure zerocopy to mixed. This is also rare.
    
    In sk_wmem_free_skb, if it is a pure zerocopy skb, an sk_mem_uncharge
    for SKB_TRUESIZE(MAX_TCP_HEADER) is done for sk_mem_charge in
    tcp_skb_entail for an skb without data.
    
    Testing with the msg_zerocopy.c benchmark between two hosts(100G nics)
    with zerocopy showed that before this patch the 'sock' variable in
    memory.stat for cgroup2 that tracks sum of sk_forward_alloc,
    sk_rmem_alloc and sk_wmem_queued is around 1822720 and with this
    change it is 0. This is due to no charge to sk_forward_alloc for
    zerocopy data and shows memory utilization for kernel is lowered.
    
    Signed-off-by: Talal Ahmad <talalahmad@google.com>
    Acked-by: Arjun Roy <arjunroy@google.com>
    Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
    Signed-off-by: Willem de Bruijn <willemb@google.com>
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Talal Ahmad authored and Jakub Kicinski committed Nov 1, 2021
  4. tcp: rename sk_wmem_free_skb

    sk_wmem_free_skb() is only used by TCP.
    
    Rename it to make this clear, and move its declaration to
    include/net/tcp.h
    
    Signed-off-by: Talal Ahmad <talalahmad@google.com>
    Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
    Acked-by: Arjun Roy <arjunroy@google.com>
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Talal Ahmad authored and Jakub Kicinski committed Nov 1, 2021
  5. netdevsim: fix uninit value in nsim_drv_configure_vfs()

    Build bot points out that I missed initializing ret
    after refactoring.
    
    Reported-by: kernel test robot <lkp@intel.com>
    Fixes: 1c40107 ("netdevsim: move details of vf config to dev")
    Link: https://lore.kernel.org/r/20211101221845.3188490-1-kuba@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Jakub Kicinski committed Nov 1, 2021
  6. selftests/bpf: Fix also no-alu32 strobemeta selftest

    Previous fix aded bpf_clamp_umax() helper use to re-validate boundaries.
    While that works correctly, it introduces more branches, which blows up
    past 1 million instructions in no-alu32 variant of strobemeta selftests.
    
    Switching len variable from u32 to u64 also fixes the issue and reduces
    the number of validated instructions, so use that instead. Fix this
    patch and bpf_clamp_umax() removed, both alu32 and no-alu32 selftests
    pass.
    
    Fixes: 0133c20 ("selftests/bpf: Fix strobemeta selftest regression")
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Link: https://lore.kernel.org/bpf/20211101230118.1273019-1-andrii@kernel.org
    anakryiko authored and Alexei Starovoitov committed Nov 1, 2021
  7. bpf: Add missing map_delete_elem method to bloom filter map

    Without it, kernel crashes in map_delete_elem(), as reported
    by syzbot.
    
    BUG: kernel NULL pointer dereference, address: 0000000000000000
    PGD 72c97067 P4D 72c97067 PUD 1e20c067 PMD 0
    Oops: 0010 [#1] PREEMPT SMP KASAN
    CPU: 0 PID: 6518 Comm: syz-executor196 Not tainted 5.15.0-rc3-syzkaller #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    RIP: 0010:0x0
    Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
    RSP: 0018:ffffc90002bafcb8 EFLAGS: 00010246
    RAX: dffffc0000000000 RBX: 1ffff92000575f9f RCX: 0000000000000000
    RDX: 1ffffffff1327aba RSI: 0000000000000000 RDI: ffff888025a30c00
    RBP: ffffc90002baff08 R08: 0000000000000000 R09: 0000000000000001
    R10: ffffffff818525d8 R11: 0000000000000000 R12: ffffffff8993d560
    R13: ffff888025a30c00 R14: ffff888024bc0000 R15: 0000000000000000
    FS:  0000555557491300(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: ffffffffffffffd6 CR3: 0000000070189000 CR4: 00000000003506f0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
     map_delete_elem kernel/bpf/syscall.c:1220 [inline]
     __sys_bpf+0x34f1/0x5ee0 kernel/bpf/syscall.c:4606
     __do_sys_bpf kernel/bpf/syscall.c:4719 [inline]
     __se_sys_bpf kernel/bpf/syscall.c:4717 [inline]
     __x64_sys_bpf+0x75/0xb0 kernel/bpf/syscall.c:4717
     do_syscall_x64 arch/x86/entry/common.c:50 [inline]
    
    Fixes: 9330986 ("bpf: Add bloom filter map implementation")
    Reported-by: syzbot <syzkaller@googlegroups.com>
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Acked-by: Yonghong Song <yhs@fb.com>
    Link: https://lore.kernel.org/bpf/20211031171353.4092388-1-eric.dumazet@gmail.com
    neebe000 authored and Alexei Starovoitov committed Nov 1, 2021
  8. Merge branch '"map_extra" and bloom filter fixups'

    Joanne Koong says:
    
    ====================
    
    There are 3 patches in this patchset:
    
    1/3 - Bloom filter naming fixups (kernel/bpf/bloom_filter.c)
    
    2/3 - Add alignment padding for map_extra, rearrange fields in
    bpf_map struct to consolidate holes
    
    3/3 - Bloom filter tests (prog_tests/bloom_filter_map):
    Add test for successful userspace calls, some refactoring to
    use bpf_create_map instead of bpf_create_map_xattr
    
    v1 -> v2:
        * In prog_tests/bloom_filter_map: remove unneeded line break,
    	also change the inner_map_test to use bpf_create_map instead
    	of bpf_create_map_xattr.
        * Add acked-bys to commit messages
    ====================
    
    Acked-by: Martin KaFai Lau <kafai@fb.com>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Alexei Starovoitov committed Nov 1, 2021
  9. selftests/bpf: Add bloom map success test for userspace calls

    This patch has two changes:
    1) Adds a new function "test_success_cases" to test
    successfully creating + adding + looking up a value
    in a bloom filter map from the userspace side.
    
    2) Use bpf_create_map instead of bpf_create_map_xattr in
    the "test_fail_cases" and test_inner_map to make the
    code look cleaner.
    
    Signed-off-by: Joanne Koong <joannekoong@fb.com>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Acked-by: Yonghong Song <yhs@fb.com>
    Link: https://lore.kernel.org/bpf/20211029224909.1721024-4-joannekoong@fb.com
    jkoong-fb authored and Alexei Starovoitov committed Nov 1, 2021
  10. bpf: Add alignment padding for "map_extra" + consolidate holes

    This patch makes 2 changes regarding alignment padding
    for the "map_extra" field.
    
    1) In the kernel header, "map_extra" and "btf_value_type_id"
    are rearranged to consolidate the hole.
    
    Before:
    struct bpf_map {
    	...
            u32		max_entries;	/*    36     4	*/
            u32		map_flags;	/*    40     4	*/
    
            /* XXX 4 bytes hole, try to pack */
    
            u64		map_extra;	/*    48     8	*/
            int		spin_lock_off;	/*    56     4	*/
            int		timer_off;	/*    60     4	*/
            /* --- cacheline 1 boundary (64 bytes) --- */
            u32		id;		/*    64     4	*/
            int		numa_node;	/*    68     4	*/
    	...
            bool		frozen;		/*   117     1	*/
    
            /* XXX 10 bytes hole, try to pack */
    
            /* --- cacheline 2 boundary (128 bytes) --- */
    	...
            struct work_struct	work;	/*   144    72	*/
    
            /* --- cacheline 3 boundary (192 bytes) was 24 bytes ago --- */
    	struct mutex	freeze_mutex;	/*   216   144 	*/
    
            /* --- cacheline 5 boundary (320 bytes) was 40 bytes ago --- */
            u64		writecnt; 	/*   360     8	*/
    
        /* size: 384, cachelines: 6, members: 26 */
        /* sum members: 354, holes: 2, sum holes: 14 */
        /* padding: 16 */
        /* forced alignments: 2, forced holes: 1, sum forced holes: 10 */
    
    } __attribute__((__aligned__(64)));
    
    After:
    struct bpf_map {
    	...
            u32		max_entries;	/*    36     4	*/
            u64		map_extra;	/*    40     8 	*/
            u32		map_flags;	/*    48     4	*/
            int		spin_lock_off;	/*    52     4	*/
            int		timer_off;	/*    56     4	*/
            u32		id;		/*    60     4	*/
    
            /* --- cacheline 1 boundary (64 bytes) --- */
            int		numa_node;	/*    64     4	*/
    	...
    	bool		frozen		/*   113     1  */
    
            /* XXX 14 bytes hole, try to pack */
    
            /* --- cacheline 2 boundary (128 bytes) --- */
    	...
            struct work_struct	work;	/*   144    72	*/
    
            /* --- cacheline 3 boundary (192 bytes) was 24 bytes ago --- */
            struct mutex	freeze_mutex;	/*   216   144	*/
    
            /* --- cacheline 5 boundary (320 bytes) was 40 bytes ago --- */
            u64		writecnt;       /*   360     8	*/
    
        /* size: 384, cachelines: 6, members: 26 */
        /* sum members: 354, holes: 1, sum holes: 14 */
        /* padding: 16 */
        /* forced alignments: 2, forced holes: 1, sum forced holes: 14 */
    
    } __attribute__((__aligned__(64)));
    
    2) Add alignment padding to the bpf_map_info struct
    More details can be found in commit 36f9814 ("bpf: fix uapi hole
    for 32 bit compat applications")
    
    Signed-off-by: Joanne Koong <joannekoong@fb.com>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Acked-by: Yonghong Song <yhs@fb.com>
    Link: https://lore.kernel.org/bpf/20211029224909.1721024-3-joannekoong@fb.com
    jkoong-fb authored and Alexei Starovoitov committed Nov 1, 2021
  11. bpf: Bloom filter map naming fixups

    This patch has two changes in the kernel bloom filter map
    implementation:
    
    1) Change the names of map-ops functions to include the
    "bloom_map" prefix.
    
    As Martin pointed out on a previous patchset, having generic
    map-ops names may be confusing in tracing and in perf-report.
    
    2) Drop the "& 0xF" when getting nr_hash_funcs, since we
    already ascertain that no other bits in map_extra beyond the
    first 4 bits can be set.
    
    Signed-off-by: Joanne Koong <joannekoong@fb.com>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Acked-by: Yonghong Song <yhs@fb.com>
    Link: https://lore.kernel.org/bpf/20211029224909.1721024-2-joannekoong@fb.com
    jkoong-fb authored and Alexei Starovoitov committed Nov 1, 2021
  12. Merge branch 'introduce dummy BPF STRUCT_OPS'

    Hou Tao says:
    
    ====================
    
    Hi,
    
    Currently the test of BPF STRUCT_OPS depends on the specific bpf
    implementation (e.g, tcp_congestion_ops), but it can not cover all
    basic functionalities (e.g, return value handling), so introduce
    a dummy BPF STRUCT_OPS for test purpose.
    
    Instead of loading a userspace-implemeted bpf_dummy_ops map into
    kernel and calling the specific function by writing to sysfs provided
    by bpf_testmode.ko, only loading bpf_dummy_ops related prog into
    kernel and calling these prog by bpf_prog_test_run(). The latter
    is more flexible and has no dependency on extra kernel module.
    
    Now the return value handling is supported by test_1(...) ops,
    and passing multiple arguments is supported by test_2(...) ops.
    If more is needed, test_x(...) ops can be added afterwards.
    
    Comments are always welcome.
    Regards,
    Hou
    
    Change Log:
    v4:
     * add Acked-by tags in patch 1~4
     * patch 2: remove unncessary comments and update commit message
                accordingly
     * patch 4: remove unnecessary nr checking in dummy_ops_init_args()
    
    v3: https://www.spinics.net/lists/bpf/msg48303.html
     * rebase on bpf-next
     * address comments for Martin, mainly include: merge patch 3 &
       patch 4 in v2, fix names of btf ctx access check helpers,
       handle CONFIG_NET, fix leak in dummy_ops_init_args(), and
       simplify bpf_dummy_init()
     * patch 4: use a loop to check args in test_dummy_multiple_args()
    
    v2: https://www.spinics.net/lists/bpf/msg47948.html
     * rebase on bpf-next
     * add test_2(...) ops to test the passing of multiple arguments
     * a new patch (patch #2) is added to factor out ctx access helpers
     * address comments from Martin & Andrii
    
    v1: https://www.spinics.net/lists/bpf/msg46787.html
    
    RFC: https://www.spinics.net/lists/bpf/msg46117.html
    ====================
    
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Alexei Starovoitov committed Nov 1, 2021
  13. selftests/bpf: Add test cases for struct_ops prog

    Running a BPF_PROG_TYPE_STRUCT_OPS prog for dummy_st_ops::test_N()
    through bpf_prog_test_run(). Four test cases are added:
    (1) attach dummy_st_ops should fail
    (2) function return value of bpf_dummy_ops::test_1() is expected
    (3) pointer argument of bpf_dummy_ops::test_1() works as expected
    (4) multiple arguments passed to bpf_dummy_ops::test_2() are correct
    
    Signed-off-by: Hou Tao <houtao1@huawei.com>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Acked-by: Martin KaFai Lau <kafai@fb.com>
    Link: https://lore.kernel.org/bpf/20211025064025.2567443-5-houtao1@huawei.com
    Hou Tao authored and Alexei Starovoitov committed Nov 1, 2021
  14. bpf: Add dummy BPF STRUCT_OPS for test purpose

    Currently the test of BPF STRUCT_OPS depends on the specific bpf
    implementation of tcp_congestion_ops, but it can not cover all
    basic functionalities (e.g, return value handling), so introduce
    a dummy BPF STRUCT_OPS for test purpose.
    
    Loading a bpf_dummy_ops implementation from userspace is prohibited,
    and its only purpose is to run BPF_PROG_TYPE_STRUCT_OPS program
    through bpf(BPF_PROG_TEST_RUN). Now programs for test_1() & test_2()
    are supported. The following three cases are exercised in
    bpf_dummy_struct_ops_test_run():
    
    (1) test and check the value returned from state arg in test_1(state)
    The content of state is copied from userspace pointer and copied back
    after calling test_1(state). The user pointer is saved in an u64 array
    and the array address is passed through ctx_in.
    
    (2) test and check the return value of test_1(NULL)
    Just simulate the case in which an invalid input argument is passed in.
    
    (3) test multiple arguments passing in test_2(state, ...)
    5 arguments are passed through ctx_in in form of u64 array. The first
    element of array is userspace pointer of state and others 4 arguments
    follow.
    
    Signed-off-by: Hou Tao <houtao1@huawei.com>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Acked-by: Martin KaFai Lau <kafai@fb.com>
    Link: https://lore.kernel.org/bpf/20211025064025.2567443-4-houtao1@huawei.com
    Hou Tao authored and Alexei Starovoitov committed Nov 1, 2021
  15. bpf: Factor out helpers for ctx access checking

    Factor out two helpers to check the read access of ctx for raw tp
    and BTF function. bpf_tracing_ctx_access() is used to check
    the read access to argument is valid, and bpf_tracing_btf_ctx_access()
    checks whether the btf type of argument is valid besides the checking
    of argument read. bpf_tracing_btf_ctx_access() will be used by the
    following patch.
    
    Signed-off-by: Hou Tao <houtao1@huawei.com>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Acked-by: Martin KaFai Lau <kafai@fb.com>
    Link: https://lore.kernel.org/bpf/20211025064025.2567443-3-houtao1@huawei.com
    Hou Tao authored and Alexei Starovoitov committed Nov 1, 2021
  16. bpf: Factor out a helper to prepare trampoline for struct_ops prog

    Factor out a helper bpf_struct_ops_prepare_trampoline() to prepare
    trampoline for BPF_PROG_TYPE_STRUCT_OPS prog. It will be used by
    .test_run callback in following patch.
    
    Signed-off-by: Hou Tao <houtao1@huawei.com>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Acked-by: Martin KaFai Lau <kafai@fb.com>
    Link: https://lore.kernel.org/bpf/20211025064025.2567443-2-houtao1@huawei.com
    Hou Tao authored and Alexei Starovoitov committed Nov 1, 2021
  17. selftests, bpf: Fix broken riscv build

    This patch is closely related to commit 6016df8 ("selftests/bpf:
    Fix broken riscv build"). When clang includes the system include
    directories, but targeting BPF program, __BITS_PER_LONG defaults to
    32, unless explicitly set. Work around this problem, by explicitly
    setting __BITS_PER_LONG to __riscv_xlen.
    
    Signed-off-by: Björn Töpel <bjorn@kernel.org>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Link: https://lore.kernel.org/bpf/20211028161057.520552-5-bjorn@kernel.org
    Björn Töpel authored and borkmann committed Nov 1, 2021
  18. riscv, libbpf: Add RISC-V (RV64) support to bpf_tracing.h

    Add macros for 64-bit RISC-V PT_REGS to bpf_tracing.h.
    
    Signed-off-by: Björn Töpel <bjorn@kernel.org>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Link: https://lore.kernel.org/bpf/20211028161057.520552-4-bjorn@kernel.org
    Björn Töpel authored and borkmann committed Nov 1, 2021
  19. tools, build: Add RISC-V to HOSTARCH parsing

    Add RISC-V to the HOSTARCH parsing, so that ARCH is "riscv", and not
    "riscv32" or "riscv64".
    
    This affects the perf and libbpf builds, so that arch specific
    includes are correctly picked up for RISC-V.
    
    Signed-off-by: Björn Töpel <bjorn@kernel.org>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Link: https://lore.kernel.org/bpf/20211028161057.520552-3-bjorn@kernel.org
    Björn Töpel authored and borkmann committed Nov 1, 2021
  20. riscv, bpf: Increase the maximum number of iterations

    Now that BPF programs can be up to 1M instructions, it is not uncommon
    that a program requires more than the current 16 iterations to
    converge.
    
    Bump it to 32, which is enough for selftests/bpf, and test_bpf.ko.
    
    Signed-off-by: Björn Töpel <bjorn@kernel.org>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Link: https://lore.kernel.org/bpf/20211028161057.520552-2-bjorn@kernel.org
    Björn Töpel authored and borkmann committed Nov 1, 2021
Older