Skip to content
Permalink
Kumar-Kartikey…
Switch branches/tags

Commits on Sep 14, 2021

  1. bpf, selftests: Add basic test for module kfunc call

    This also tests support for invalid kfunc calls we added in prior
    changes, such that verifier handles invalid call as long as it is
    removed by code elimination pass (before fixup_kfunc_call). A separate
    test for libbpf is added, which tests failure in loading.
    
    Also adjust verifier selftests which assume 512 byte stack to now assume
    768 byte stack.
    
    Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
    kkdwivedi authored and intel-lab-lkp committed Sep 14, 2021
  2. libbpf: Update gen_loader to emit BTF_KIND_FUNC relocations

    This change updates the BPF syscall loader to relocate BTF_KIND_FUNC
    relocations, with support for weak kfunc relocations.
    
    One of the disadvantages of gen_loader is that due to stack size
    limitation, BTF fd array size is clamped to a smaller limit than what
    the kernel allows. Also, finding an existing BTF fd's slot is not
    trivial, because that would require to open all module BTFs and match on
    the open fds (like we do for libbpf), so we do the next best thing:
    deduplicate slots for the same symbol.
    
    Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
    kkdwivedi authored and intel-lab-lkp committed Sep 14, 2021
  3. libbpf: Resolve invalid weak kfunc calls with imm = 0, off = 0

    Preserve these calls as it allows verifier to succeed in loading the
    program if they are determined to be unreachable after dead code
    elimination during program load. If not, the verifier will fail at
    runtime. This is done for ext->is_weak symbols similar to the case for
    variable ksyms.
    
    Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
    kkdwivedi authored and intel-lab-lkp committed Sep 14, 2021
  4. libbpf: Support kernel module function calls

    This patch adds libbpf support for kernel module function call support.
    The fd_array parameter is used during BPF program load is used to pass
    module BTFs referenced by the program. insn->off is set to index into
    this array + 1, because insn->off as 0 is reserved for btf_vmlinux. The
    kernel subtracts 1 from insn->off when indexing into env->fd_array.
    
    We try to use existing insn->off for a module, since the kernel limits
    the maximum distinct module BTFs for kfuncs to 256, and also because
    index must never exceed the maximum allowed value that can fit in
    insn->off (INT16_MAX). In the future, if kernel interprets signed offset
    as unsigned for kfunc calls, this limit can be increased to UINT16_MAX.
    
    Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
    kkdwivedi authored and intel-lab-lkp committed Sep 14, 2021
  5. bpf: Bump MAX_BPF_STACK size to 768 bytes

    Increase the maximum stack size accessible to BPF program to 768 bytes.
    This is done so that gen_loader can use 94 additional fds for kfunc BTFs
    that it passes in to fd_array from the remaining space available for the
    loader_stack struct to expand.
    
    Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
    kkdwivedi authored and intel-lab-lkp committed Sep 14, 2021
  6. bpf: Enable TCP congestion control kfunc from modules

    This commit moves BTF ID lookup into the newly added registration
    helper, in a way that the bbr, cubic, and dctcp implementation set up
    their sets in the bpf_tcp_ca kfunc_btf_set list, while the ones not
    dependent on modules are looked up from the wrapper function.
    
    This lifts the restriction for them to be compiled as built in objects,
    and can be loaded as modules if required. Also modify Makefile.modfinal
    to resolve_btfids in TCP congestion control modules if the config option
    is set, using the base BTF support added in the previous commit.
    
    See following commits for background on use of:
    
     CONFIG_X86 ifdef:
     569c484 (bpf: Limit static tcp-cc functions in the .BTF_ids list to x86)
    
     CONFIG_DYNAMIC_FTRACE ifdef:
     7aae231 (bpf: tcp: Limit calling some tcp cc functions to CONFIG_DYNAMIC_FTRACE)
    
    [ resolve_btfids uses --no-fail because some crypto kernel modules
      under arch/x86/crypto generated from ASM do not have the .BTF sections ]
    
    Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
    kkdwivedi authored and intel-lab-lkp committed Sep 14, 2021
  7. tools: Allow specifying base BTF file in resolve_btfids

    This commits allows specifying the base BTF for resolving btf id
    lists/sets during link time in the resolve_btfids tool. The base BTF is
    set to NULL if no path is passed. This allows resolving BTF ids for
    module kernel objects.
    
    Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
    kkdwivedi authored and intel-lab-lkp committed Sep 14, 2021
  8. bpf: btf: Introduce helpers for dynamic BTF set registration

    This adds helpers for registering btf_id_set from modules and the
    check_kfunc_call callback that can be used to look them up.
    
    With in kernel sets, the way this is supposed to work is, in kernel
    callback looks up within the in-kernel kfunc whitelist, and then defers
    to the dynamic BTF set lookup if it doesn't find the BTF id. If there is
    no in-kernel BTF id set, this callback can be used directly.
    
    Also fix includes for btf.h and bpfptr.h so that they can included in
    isolation. This is in preparation for their usage in tcp_bbr, tcp_cubic
    and tcp_dctcp modules in the next patch.
    
    Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
    kkdwivedi authored and intel-lab-lkp committed Sep 14, 2021
  9. bpf: Be conservative while processing invalid kfunc calls

    This patch also modifies the BPF verifier to only return error for
    invalid kfunc calls specially marked by userspace (with insn->imm == 0,
    insn->off == 0) after the verifier has eliminated dead instructions.
    This can be handled in the fixup stage, and skip processing during add
    and check stages.
    
    If such an invalid call is dropped, the fixup stage will not encounter
    insn->imm as 0, otherwise it bails out and returns an error.
    
    This will be exposed as weak ksym support in libbpf in subsequent patch.
    
    Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
    kkdwivedi authored and intel-lab-lkp committed Sep 14, 2021
  10. bpf: Introduce BPF support for kernel module function calls

    This change adds support on the kernel side to allow for BPF programs to
    call kernel module functions. Userspace will prepare an array of module
    BTF fds that is passed in during BPF_PROG_LOAD using fd_array parameter.
    In the kernel, the module BTFs are placed in the auxilliary struct for
    bpf_prog, and loaded as needed.
    
    The verifier then uses insn->off to index into the fd_array. insn->off
    is used by subtracting one from it, as userspace has to set the index of
    array in insn->off incremented by 1. This lets us denote vmlinux btf by
    insn->off == 0, and the module kfunc using insn->off > 0.  They are
    sorted based on offset in an array, and each offset corresponds to one
    descriptor, with a max limit up to 256 such module BTFs.
    
    Another change is to check_kfunc_call callback, which now include a
    struct module * pointer, this is to be used in later patch such that the
    kfunc_id and module pointer are matched for dynamically registered BTF
    sets from loadable modules, so that same kfunc_id in two modules doesn't
    lead to check_kfunc_call succeeding. For the duration of the
    check_kfunc_call, the reference to struct module exists, as it returns
    the pointer stored in kfunc_btf_tab.
    
    Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
    kkdwivedi authored and intel-lab-lkp committed Sep 14, 2021
  11. libbpf: Introduce legacy kprobe events support

    Allow kprobe tracepoint events creation through legacy interface, as the
    kprobe dynamic PMUs support, used by default, was only created in v4.17.
    
    Store legacy kprobe name in struct bpf_perf_link, instead of creating
    a new "subclass" off of bpf_perf_link. This is ok as it's just two new
    fields, which are also going to be reused for legacy uprobe support in
    follow up patches.
    
    Signed-off-by: Rafael David Tinoco <rafaeldtinoco@gmail.com>
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Link: https://lore.kernel.org/bpf/20210912064844.3181742-1-rafaeldtinoco@gmail.com
    rafaeldtinoco authored and anakryiko committed Sep 14, 2021

Commits on Sep 13, 2021

  1. libbpf: Make libbpf_version.h non-auto-generated

    Turn previously auto-generated libbpf_version.h header into a normal
    header file. This prevents various tricky Makefile integration issues,
    simplifies the overall build process, but also allows to further extend
    it with some more versioning-related APIs in the future.
    
    To prevent accidental out-of-sync versions as defined by libbpf.map and
    libbpf_version.h, Makefile checks their consistency at build time.
    
    Simultaneously with this change bump libbpf.map to v0.6.
    
    Also undo adding libbpf's output directory into include path for
    kernel/bpf/preload, bpftool, and resolve_btfids, which is not necessary
    because libbpf_version.h is just a normal header like any other.
    
    Fixes: 0b46b75 ("libbpf: Add LIBBPF_DEPRECATED_SINCE macro for scheduling API deprecations")
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Link: https://lore.kernel.org/bpf/20210913222309.3220849-1-andrii@kernel.org
    anakryiko authored and Alexei Starovoitov committed Sep 13, 2021
  2. bpf, selftests: Replicate tailcall limit test for indirect call case

    The tailcall_3 test program uses bpf_tail_call_static() where the JIT
    would patch a direct jump. Add a new tailcall_6 test program replicating
    exactly the same test just ensuring that bpf_tail_call() uses a map
    index where the verifier cannot make assumptions this time.
    
    In other words, this will now cover both on x86-64 JIT, meaning, JIT
    images with emit_bpf_tail_call_direct() emission as well as JIT images
    with emit_bpf_tail_call_indirect() emission.
    
      # echo 1 > /proc/sys/net/core/bpf_jit_enable
      # ./test_progs -t tailcalls
      torvalds#136/1 tailcalls/tailcall_1:OK
      torvalds#136/2 tailcalls/tailcall_2:OK
      torvalds#136/3 tailcalls/tailcall_3:OK
      torvalds#136/4 tailcalls/tailcall_4:OK
      torvalds#136/5 tailcalls/tailcall_5:OK
      torvalds#136/6 tailcalls/tailcall_6:OK
      torvalds#136/7 tailcalls/tailcall_bpf2bpf_1:OK
      torvalds#136/8 tailcalls/tailcall_bpf2bpf_2:OK
      torvalds#136/9 tailcalls/tailcall_bpf2bpf_3:OK
      torvalds#136/10 tailcalls/tailcall_bpf2bpf_4:OK
      torvalds#136/11 tailcalls/tailcall_bpf2bpf_5:OK
      torvalds#136 tailcalls:OK
      Summary: 1/11 PASSED, 0 SKIPPED, 0 FAILED
    
      # echo 0 > /proc/sys/net/core/bpf_jit_enable
      # ./test_progs -t tailcalls
      torvalds#136/1 tailcalls/tailcall_1:OK
      torvalds#136/2 tailcalls/tailcall_2:OK
      torvalds#136/3 tailcalls/tailcall_3:OK
      torvalds#136/4 tailcalls/tailcall_4:OK
      torvalds#136/5 tailcalls/tailcall_5:OK
      torvalds#136/6 tailcalls/tailcall_6:OK
      [...]
    
    For interpreter, the tailcall_1-6 tests are passing as well. The later
    tailcall_bpf2bpf_* are failing due lack of bpf2bpf + tailcall support
    in interpreter, so this is expected.
    
    Also, manual inspection shows that both loaded programs from tailcall_3
    and tailcall_6 test case emit the expected opcodes:
    
    * tailcall_3 disasm, emit_bpf_tail_call_direct():
    
      [...]
       b:   push   %rax
       c:   push   %rbx
       d:   push   %r13
       f:   mov    %rdi,%rbx
      12:   movabs $0xffff8d3f5afb0200,%r13
      1c:   mov    %rbx,%rdi
      1f:   mov    %r13,%rsi
      22:   xor    %edx,%edx                 _
      24:   mov    -0x4(%rbp),%eax          |  limit check
      2a:   cmp    $0x20,%eax               |
      2d:   ja     0x0000000000000046       |
      2f:   add    $0x1,%eax                |
      32:   mov    %eax,-0x4(%rbp)          |_
      38:   nopl   0x0(%rax,%rax,1)
      3d:   pop    %r13
      3f:   pop    %rbx
      40:   pop    %rax
      41:   jmpq   0xffffffffffffe377
      [...]
    
    * tailcall_6 disasm, emit_bpf_tail_call_indirect():
    
      [...]
      47:   movabs $0xffff8d3f59143a00,%rsi
      51:   mov    %edx,%edx
      53:   cmp    %edx,0x24(%rsi)
      56:   jbe    0x0000000000000093        _
      58:   mov    -0x4(%rbp),%eax          |  limit check
      5e:   cmp    $0x20,%eax               |
      61:   ja     0x0000000000000093       |
      63:   add    $0x1,%eax                |
      66:   mov    %eax,-0x4(%rbp)          |_
      6c:   mov    0x110(%rsi,%rdx,8),%rcx
      74:   test   %rcx,%rcx
      77:   je     0x0000000000000093
      79:   pop    %rax
      7a:   mov    0x30(%rcx),%rcx
      7e:   add    $0xb,%rcx
      82:   callq  0x000000000000008e
      87:   pause
      89:   lfence
      8c:   jmp    0x0000000000000087
      8e:   mov    %rcx,(%rsp)
      92:   retq
      [...]
    
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Tested-by: Tiezhu Yang <yangtiezhu@loongson.cn>
    Acked-by: Yonghong Song <yhs@fb.com>
    Acked-by: Johan Almbladh <johan.almbladh@anyfinetworks.com>
    Acked-by: Paul Chaignon <paul@cilium.io>
    Link: https://lore.kernel.org/bpf/CAM1=_QRyRVCODcXo_Y6qOm1iT163HoiSj8U2pZ8Rj3hzMTT=HQ@mail.gmail.com
    Link: https://lore.kernel.org/bpf/20210910091900.16119-1-daniel@iogearbox.net
    borkmann authored and Alexei Starovoitov committed Sep 13, 2021
  3. Merge branch 'bpf: introduce bpf_get_branch_snapshot'

    Song Liu says:
    
    ====================
    
    Changes v6 => v7:
    1. Improve/fix intel_pmu_snapshot_branch_stack() logic. (Peter).
    
    Changes v5 => v6:
    1. Add local_irq_save/restore to intel_pmu_snapshot_branch_stack. (Peter)
    2. Remove buf and size check in bpf_get_branch_snapshot, move flags check
       to later fo the function. (Peter, Andrii)
    3. Revise comments for bpf_get_branch_snapshot in bpf.h (Andrii)
    
    Changes v4 => v5:
    1. Modify perf_snapshot_branch_stack_t to save some memcpy. (Andrii)
    2. Minor fixes in selftests. (Andrii)
    
    Changes v3 => v4:
    1. Do not reshuffle intel_pmu_disable_all(). Use some inline to save LBR
       entries. (Peter)
    2. Move static_call(perf_snapshot_branch_stack) to the helper. (Alexei)
    3. Add argument flags to bpf_get_branch_snapshot. (Andrii)
    4. Make MAX_BRANCH_SNAPSHOT an enum (Andrii). And rename it as
       PERF_MAX_BRANCH_SNAPSHOT
    5. Make bpf_get_branch_snapshot similar to bpf_read_branch_records.
       (Andrii)
    6. Move the test target function to bpf_testmod. Updated kallsyms_find_next
       to work properly with modules. (Andrii)
    
    Changes v2 => v3:
    1. Fix the use of static_call. (Peter)
    2. Limit the use to perfmon version >= 2. (Peter)
    3. Modify intel_pmu_snapshot_branch_stack() to use intel_pmu_disable_all
       and intel_pmu_enable_all().
    
    Changes v1 => v2:
    1. Rename the helper as bpf_get_branch_snapshot;
    2. Fix/simplify the use of static_call;
    3. Instead of percpu variables, let intel_pmu_snapshot_branch_stack output
       branch records to an output argument of type perf_branch_snapshot.
    
    Branch stack can be very useful in understanding software events. For
    example, when a long function, e.g. sys_perf_event_open, returns an errno,
    it is not obvious why the function failed. Branch stack could provide very
    helpful information in this type of scenarios.
    
    This set adds support to read branch stack with a new BPF helper
    bpf_get_branch_trace(). Currently, this is only supported in Intel systems.
    It is also possible to support the same feaure for PowerPC.
    
    The hardware that records the branch stace is not stopped automatically on
    software events. Therefore, it is necessary to stop it in software soon.
    Otherwise, the hardware buffers/registers will be flushed. One of the key
    design consideration in this set is to minimize the number of branch record
    entries between the event triggers and the hardware recorder is stopped.
    Based on this goal, current design is different from the discussions in
    original RFC [1]:
     1) Static call is used when supported, to save function pointer
        dereference;
     2) intel_pmu_lbr_disable_all is used instead of perf_pmu_disable(),
        because the latter uses about 10 entries before stopping LBR.
    
    With current code, on Intel CPU, LBR is stopped after 7 branch entries
    after fexit triggers:
    
    ID: 0 from bpf_get_branch_snapshot+18 to intel_pmu_snapshot_branch_stack+0
    ID: 1 from __brk_limit+477143934 to bpf_get_branch_snapshot+0
    ID: 2 from __brk_limit+477192263 to __brk_limit+477143880  # trampoline
    ID: 3 from __bpf_prog_enter+34 to __brk_limit+477192251
    ID: 4 from migrate_disable+60 to __bpf_prog_enter+9
    ID: 5 from __bpf_prog_enter+4 to migrate_disable+0
    ID: 6 from bpf_testmod_loop_test+20 to __bpf_prog_enter+0
    ID: 7 from bpf_testmod_loop_test+20 to bpf_testmod_loop_test+13
    ID: 8 from bpf_testmod_loop_test+20 to bpf_testmod_loop_test+13
    ...
    
    [1] https://lore.kernel.org/bpf/20210818012937.2522409-1-songliubraving@fb.com/
    ====================
    
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Alexei Starovoitov committed Sep 13, 2021
  4. selftests/bpf: Add test for bpf_get_branch_snapshot

    This test uses bpf_get_branch_snapshot from a fexit program. The test uses
    a target function (bpf_testmod_loop_test) and compares the record against
    kallsyms. If there isn't enough record matching kallsyms, the test fails.
    
    Signed-off-by: Song Liu <songliubraving@fb.com>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Acked-by: Andrii Nakryiko <andrii@kernel.org>
    Acked-by: John Fastabend <john.fastabend@gmail.com>
    Link: https://lore.kernel.org/bpf/20210910183352.3151445-4-songliubraving@fb.com
    liu-song-6 authored and Alexei Starovoitov committed Sep 13, 2021
  5. bpf: Introduce helper bpf_get_branch_snapshot

    Introduce bpf_get_branch_snapshot(), which allows tracing pogram to get
    branch trace from hardware (e.g. Intel LBR). To use the feature, the
    user need to create perf_event with proper branch_record filtering
    on each cpu, and then calls bpf_get_branch_snapshot in the bpf function.
    On Intel CPUs, VLBR event (raw event 0x1b00) can be use for this.
    
    Signed-off-by: Song Liu <songliubraving@fb.com>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Acked-by: John Fastabend <john.fastabend@gmail.com>
    Acked-by: Andrii Nakryiko <andrii@kernel.org>
    Link: https://lore.kernel.org/bpf/20210910183352.3151445-3-songliubraving@fb.com
    liu-song-6 authored and Alexei Starovoitov committed Sep 13, 2021
  6. perf: Enable branch record for software events

    The typical way to access branch record (e.g. Intel LBR) is via hardware
    perf_event. For CPUs with FREEZE_LBRS_ON_PMI support, PMI could capture
    reliable LBR. On the other hand, LBR could also be useful in non-PMI
    scenario. For example, in kretprobe or bpf fexit program, LBR could
    provide a lot of information on what happened with the function. Add API
    to use branch record for software use.
    
    Note that, when the software event triggers, it is necessary to stop the
    branch record hardware asap. Therefore, static_call is used to remove some
    branch instructions in this process.
    
    Suggested-by: Peter Zijlstra <peterz@infradead.org>
    Signed-off-by: Song Liu <songliubraving@fb.com>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Acked-by: John Fastabend <john.fastabend@gmail.com>
    Acked-by: Andrii Nakryiko <andrii@kernel.org>
    Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Link: https://lore.kernel.org/bpf/20210910183352.3151445-2-songliubraving@fb.com
    liu-song-6 authored and Alexei Starovoitov committed Sep 13, 2021

Commits on Sep 10, 2021

  1. selftests/bpf: Test new __sk_buff field hwtstamp

    Analogous to the gso_segs selftests introduced in commit d9ff286
    ("bpf: allow BPF programs access skb_shared_info->gso_segs field").
    
    Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Acked-by: Martin KaFai Lau <kafai@fb.com>
    Link: https://lore.kernel.org/bpf/20210909220409.8804-3-vfedorenko@novek.ru
    vvfedorenko authored and borkmann committed Sep 10, 2021
  2. bpf: Add hardware timestamp field to __sk_buff

    BPF programs may want to know hardware timestamps if NIC supports
    such timestamping.
    
    Expose this data as hwtstamp field of __sk_buff the same way as
    gso_segs/gso_size. This field could be accessed from the same
    programs as tstamp field, but it's read-only field. Explicit test
    to deny access to padding data is added to bpf_skb_is_valid_access.
    
    Also update BPF_PROG_TEST_RUN tests of the feature.
    
    Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Acked-by: Martin KaFai Lau <kafai@fb.com>
    Link: https://lore.kernel.org/bpf/20210909220409.8804-2-vfedorenko@novek.ru
    vvfedorenko authored and borkmann committed Sep 10, 2021
  3. Merge branch 'bpf-xsk-selftests'

    Magnus Karlsson says:
    
    ====================
    This patch set facilitates adding new tests as well as describing
    existing ones in the xsk selftests suite and adds 3 new test suites at
    the end. The idea is to isolate the run-time that executes the test
    from the actual implementation of the test. Today, implementing a test
    amounts to adding test specific if-statements all around the run-time,
    which is not scalable or amenable for reuse. This patch set instead
    introduces a test specification that is the only thing that a test
    fills in. The run-time then gets this specification and acts upon it
    completely unaware of what test it is executing. This way, we can get
    rid of all test specific if-statements from the run-time and the
    implementation of the test can be contained in a single function. This
    hopefully makes it easier to add tests and for users to understand
    what the test accomplishes.
    
    As a recap of what the run-time does: each test is based on the
    run-time launching two threads and connecting a veth link between the
    two threads. Each thread opens an AF_XDP socket on that veth interface
    and one of them sends traffic that the other one receives and
    validates. Each thread has its own umem. Note that this behavior is
    not changed by this patch set.
    
    A test specification consists of several items. Most importantly:
    
    * Two packet streams. One for Tx thread that specifies what traffic to
      send and one for the Rx thread that specifies what that thread
      should receive. If it receives exactly what is specified, the test
      passes, otherwise it fails. A packet stream can also specify what
      buffers in the umem that should be used by the Rx and Tx threads.
    
    * What kind of AF_XDP sockets it should create and bind to what
      interfaces
    
    * How many times it should repeat the socket creation and destruction
    
    * The name of the test
    
    The interface for the test spec is the following:
    
    void test_spec_init(struct test_spec *test, struct ifobject *ifobj_tx,
                        struct ifobject *ifobj_rx, enum test_mode mode);
    
    /* Reset everything but the interface specifications and the mode */
    void test_spec_reset(struct test_spec *test);
    
    void test_spec_set_name(struct test_spec *test, const char *name);
    
    Packet streams have the following interfaces:
    
    struct pkt *pkt_stream_get_pkt(struct pkt_stream *pkt_stream, u32 pkt_nb)
    
    struct pkt *pkt_stream_get_next_rx_pkt(struct pkt_stream *pkt_stream)
    
    struct pkt_stream *pkt_stream_generate(struct xsk_umem_info *umem,
                                           u32 nb_pkts, u32 pkt_len);
    
    void pkt_stream_delete(struct pkt_stream *pkt_stream);
    
    struct pkt_stream *pkt_stream_clone(struct xsk_umem_info *umem,
                                        struct pkt_stream *pkt_stream);
    
    /* Replaces all packets in the stream*/
    void pkt_stream_replace(struct test_spec *test, u32 nb_pkts, u32 pkt_len);
    
    /* Replaces every other packet in the stream */
    void pkt_stream_replace_half(struct test_spec *test, u32 pkt_len, u32 offset);
    
    /* For creating custom made packet streams */
    void pkt_stream_generate_custom(struct test_spec *test, struct pkt *pkts,
                                    u32 nb_pkts);
    
    /* Restores the default packet stream */
    void pkt_stream_restore_default(struct test_spec *test);
    
    A test can then then in the most basic case described like this
    (provided the test specification has been created before calling the
    function):
    
    static bool testapp_aligned(struct test_spec *test)
    {
            test_spec_set_name(test, "RUN_TO_COMPLETION");
            testapp_validate_traffic(test);
    }
    
    Running the same test in unaligned mode would then look like this:
    
    static bool testapp_unaligned(struct test_spec *test)
    {
            if (!hugepages_present(test->ifobj_tx)) {
                    ksft_test_result_skip("No 2M huge pages present.\n");
                    return false;
            }
    
            test_spec_set_name(test, "UNALIGNED_MODE");
            test->ifobj_tx->umem->unaligned_mode = true;
            test->ifobj_rx->umem->unaligned_mode = true;
            /* Let half of the packets straddle a buffer boundrary */
            pkt_stream_replace_half(test, PKT_SIZE,
                                    XSK_UMEM__DEFAULT_FRAME_SIZE - 32);
    	/* Populate fill ring with addresses in the packet stream */
            test->ifobj_rx->pkt_stream->use_addr_for_fill = true;
            testapp_validate_traffic(test);
    
            pkt_stream_restore_default(test);
    	return true;
    }
    
    3 of the last 4 patches in the set add 3 new test suites, one for
    unaligned mode, one for testing the rejection of tricky invalid
    descriptors plus the acceptance of some valid ones in the Tx ring, and
    one for testing 2K frame sizes (the default is 4K).
    
    What is left to do for follow-up patches:
    
    * Convert the statistics tests to the new framework.
    
    * Implement a way of registering new tests without having the enum
      test_type. Once this has been done (together with the previous
      bullet), all the test types can be dropped from the header
      file. This means that we should be able to add tests by just writing
      a single function with a new test specification, which is one of the
      goals.
    
    * Introduce functions for manipulating parts of the test or interface
      spec instead of direct manipulations such as
      test->ifobj_rx->pkt_stream->use_addr_for_fill = true; which is kind
      of awkward.
    
    * Move the run-time and its interface to its own .c and .h files. Then
      we can have all the tests in a separate file.
    
    * Better error reporting if a test fails. Today it does not state what
      test fails and might not continue execute the rest of the tests due
      to this failure. Failures are not propagated upwards through the
      functions so a failed test will also be a passed test, which messes
      up the stats counting. This needs to be changed.
    
    * Add option to run specific test instead of all of them
    
    * Introduce pacing of sent packets so that they are never dropped
      by the receiver even if it is stalled for some reason. If you run
      the current tests on a heavily loaded system, they might fail in SKB
      mode due to packets being dropped by the driver on Tx. Though I have
      never seen it, it might happen.
    
    v1 -> v2:
    
    * Fixed a number of spelling errors [Maciej]
    * Fixed use after free bug in pkt_stream_replace() [Maciej]
    * pkt_stream_set -> pkt_stream_generate_custom [Maciej]
    * Fixed formatting problem in testapp_invalid_desc() [Maciej]
    ====================
    
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    borkmann committed Sep 10, 2021
  4. selftests: xsk: Add tests for 2K frame size

    Add tests for 2K frame size. Both a standard send and receive test and
    one testing for invalid descriptors when the frame size is 2K.
    
    Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
    Link: https://lore.kernel.org/bpf/20210907071928.9750-21-magnus.karlsson@gmail.com
    magnus-karlsson authored and borkmann committed Sep 10, 2021
  5. selftests: xsk: Add tests for invalid xsk descriptors

    Add tests for invalid xsk descriptors in the Tx ring. A number of
    handcrafted nasty invalid descriptors are created and submitted to the
    tx ring to check that they are validated correctly. Corner case valid
    ones are also sent. The tests are run for both aligned and unaligned
    mode.
    
    pkt_stream_set() is introduced to be able to create a hand-crafted
    packet stream where every single packet is specified in detail.
    
    Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
    Link: https://lore.kernel.org/bpf/20210907071928.9750-20-magnus.karlsson@gmail.com
    magnus-karlsson authored and borkmann committed Sep 10, 2021
  6. selftests: xsk: Eliminate test specific if-statement in test runner

    Eliminate a test specific if-statement for the RX_FILL_EMTPY stats
    test that is present in the test runner. We can do this as we now have
    the use_addr_for_fill option. Just create and empty Rx packet stream
    and indicated that the test runner should use the addresses in that to
    populate the fill ring. As there are no packets in the stream, the
    fill ring will be empty and we will get the error stats that we want
    to test.
    
    Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
    Link: https://lore.kernel.org/bpf/20210907071928.9750-19-magnus.karlsson@gmail.com
    magnus-karlsson authored and borkmann committed Sep 10, 2021
  7. selftests: xsk: Add test for unaligned mode

    Add a test for unaligned mode in which packet buffers can be placed
    anywhere within the umem. Some packets are made to straddle page
    boundaries in order to check for correctness. On the Tx side, buffers
    are now allocated according to the addresses found in the packet
    stream. Thus, the placement of buffers can be controlled with the
    boolean use_addr_for_fill in the packet stream.
    
    One new pkt_stream interface is introduced: pkt_stream_replace_half()
    that replaces every other packet in the default packet stream with the
    specified new packet. The constant DEFAULT_OFFSET is also
    introduced. It specifies at what offset from the start of a chunk a Tx
    packet is placed by the sending thread. This is just to be able to
    test that it is possible to send packets at an offset not equal to
    zero.
    
    Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
    Link: https://lore.kernel.org/bpf/20210907071928.9750-18-magnus.karlsson@gmail.com
    magnus-karlsson authored and borkmann committed Sep 10, 2021
  8. selftests: xsk: Introduce replacing the default packet stream

    Introduce the concept of a default packet stream that is the set of
    packets sent by most tests. Then add the ability to replace it for a
    test that would like to send or receive something else through the use
    of the function pkt_stream_replace() and then restored with
    pkt_stream_restore_default(). These are then used to convert the
    STAT_TEST_TX_INVALID to use these new APIs.
    
    Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
    Link: https://lore.kernel.org/bpf/20210907071928.9750-17-magnus.karlsson@gmail.com
    magnus-karlsson authored and borkmann committed Sep 10, 2021
  9. selftests: xsk: Allow for invalid packets

    Allow for invalid packets to be sent. These are verified by the Rx
    thread not to be received. Or put in another way, if they are
    received, the test will fail. This feature will be used to eliminate
    an if statement for a stats test and will also be used by other tests
    in later patches. The previous code could only deal with valid
    packets.
    
    Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
    Link: https://lore.kernel.org/bpf/20210907071928.9750-16-magnus.karlsson@gmail.com
    magnus-karlsson authored and borkmann committed Sep 10, 2021
  10. selftests: xsk: Eliminate MAX_SOCKS define

    Remove the MAX_SOCKS define as it always will be one for the forseable
    future and the code does not work for any other case anyway.
    
    Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
    Link: https://lore.kernel.org/bpf/20210907071928.9750-15-magnus.karlsson@gmail.com
    magnus-karlsson authored and borkmann committed Sep 10, 2021
  11. selftests: xsx: Make pthreads local scope

    Make the pthread_t variables local scope instead of global. No reason
    for them to be global.
    
    Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
    Link: https://lore.kernel.org/bpf/20210907071928.9750-14-magnus.karlsson@gmail.com
    magnus-karlsson authored and borkmann committed Sep 10, 2021
  12. selftests: xsk: Make xdp_flags and bind_flags local

    Make xdp_flags and bind_flags local instead of global by moving them
    into the interface object. These flags decide if the socket should be
    created in SKB mode or in DRV mode and therefore they are sticky and
    will survive a test_spec_reset. Since every test is first run in SKB
    mode then in DRV mode, this change only happens once. With this
    change, the configured_mode global variable can also be
    erradicated. The first test_spec_init() also becomes superfluous and
    can be eliminated.
    
    Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
    Link: https://lore.kernel.org/bpf/20210907071928.9750-13-magnus.karlsson@gmail.com
    magnus-karlsson authored and borkmann committed Sep 10, 2021
  13. selftests: xsk: Specify number of sockets to create

    Add the ability in the test specification to specify numbers of
    sockets to create. The default is one socket. This is then used to
    remove test specific if-statements around the bpf_res tests.
    
    Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
    Link: https://lore.kernel.org/bpf/20210907071928.9750-12-magnus.karlsson@gmail.com
    magnus-karlsson authored and borkmann committed Sep 10, 2021
  14. selftests: xsk: Replace second_step global variable

    Replace the second_step global variable with a test specification
    variable called total_steps that a test can be set to indicate how
    many times the packet stream should be sent without reinitializing any
    sockets. This eliminates test specific code in the test runner around
    the bidirectional test.
    
    The total_steps variable is 1 by default as most tests only need a
    single round of packets.
    
    Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
    Link: https://lore.kernel.org/bpf/20210907071928.9750-11-magnus.karlsson@gmail.com
    magnus-karlsson authored and borkmann committed Sep 10, 2021
  15. selftests: xsk: Introduce rx_on and tx_on in ifobject

    Introduce rx_on and tx_on in the ifobject so that we can describe if
    the thread should create a socket with only tx, rx, or both. This
    eliminates some test specific if statements from the code. We can also
    eliminate the flow vector structure now as this is fully specified
    by the tx_on and rx_on variables.
    
    Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
    Link: https://lore.kernel.org/bpf/20210907071928.9750-10-magnus.karlsson@gmail.com
    magnus-karlsson authored and borkmann committed Sep 10, 2021
  16. selftests: xsk: Add use_poll to ifobject

    Add a use_poll option to the ifobject so that we do not need to use a
    test specific if-statement in the test runner.
    
    Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
    Link: https://lore.kernel.org/bpf/20210907071928.9750-9-magnus.karlsson@gmail.com
    magnus-karlsson authored and borkmann committed Sep 10, 2021
  17. selftests: xsx: Introduce test name in test spec

    Introduce the test name in the test specification. This so we can set
    the name locally in the test function and simplify the logic for
    printing out test results.
    
    Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
    Link: https://lore.kernel.org/bpf/20210907071928.9750-8-magnus.karlsson@gmail.com
    magnus-karlsson authored and borkmann committed Sep 10, 2021
  18. selftests: xsk: Make frame_size configurable

    Make the frame size configurable instead of it being hard coded to a
    default. This is a property of the umem and will make it possible to
    implement tests for different umem frame sizes in a later patch.
    
    Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
    Link: https://lore.kernel.org/bpf/20210907071928.9750-7-magnus.karlsson@gmail.com
    magnus-karlsson authored and borkmann committed Sep 10, 2021
Older