Skip to content
Permalink
Song-Liu/bpf-l…
Switch branches/tags

Commits on Aug 18, 2021

  1. bpf: lbr: enable reading LBR from tracing bpf programs

    The typical way to access LBR is via hardware perf_event. For CPUs with
    FREEZE_LBRS_ON_PMI support, PMI could capture reliable LBR. On the other
    hand, LBR could also be useful in non-PMI scenario. For example, in
    kretprobe or bpf fexit program, LBR could provide a lot of information
    on what happened with the function.
    
    In this RFC, we try to enable LBR for BPF program. This works like:
      1. Create a hardware perf_event with PERF_SAMPLE_BRANCH_* on each CPU;
      2. Call a new bpf helper (bpf_get_branch_trace) from the BPF program;
      3. Before calling this bpf program, the kernel stops LBR on local CPU,
         make a copy of LBR, and resumes LBR;
      4. In the bpf program, the helper access the copy from #3.
    
    Please see tools/testing/selftests/bpf/[progs|prog_tests]/get_call_trace.c
    for a detailed example. Not that, this process is far from ideal, but it
    allows quick prototype of this feature.
    
    AFAICT, the biggest challenge here is that we are now sharing LBR in PMI
    and out of PMI, which could trigger some interesting race conditions.
    However, if we allow some level of missed/corrupted samples, this should
    still be very useful.
    
    Please share your thoughts and comments on this. Thanks in advance!
    
    Cc: Kan Liang <kan.liang@linux.intel.com>
    Cc: Like Xu <like.xu@linux.intel.com>
    Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
    Signed-off-by: Song Liu <songliubraving@fb.com>
    liu-song-6 authored and intel-lab-lkp committed Aug 18, 2021

Commits on Aug 17, 2021

  1. bpf: Remove redundant initialization of variable allow

    The variable allow is being initialized with a value that is never read, it
    is being updated later on. The assignment is redundant and can be removed.
    
    Addresses-Coverity: ("Unused value")
    
    Signed-off-by: Colin Ian King <colin.king@canonical.com>
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Link: https://lore.kernel.org/bpf/20210817170842.495440-1-colin.king@canonical.com
    Colin Ian King authored and anakryiko committed Aug 17, 2021
  2. Merge branch 'selftests/bpf: fix flaky send_signal test'

    Yonghong Song says:
    
    ====================
    
    The bpf selftest send_signal() is flaky for its subtests trying to
    send signals in softirq/nmi context. To reduce flakiness, the
    signal-targetted process priority is boosted, which should minimize
    preemption of that process and improve the possibility that
    the underlying task in softirq/nmi context is the bpf_send_signal()
    wanted task.
    
    Patch #1 did a refactoring to use ASSERT_* instead of old CHECK macros.
    Patch #2 did actual change of boosting priority.
    
    Changelog:
      v1 -> v2:
        remove skip logic where the underlying task in interrupt context
        is not the intended one.
    ====================
    
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    anakryiko committed Aug 17, 2021
  3. selftests/bpf: Fix flaky send_signal test

    libbpf CI has reported send_signal test is flaky although
    I am not able to reproduce it in my local environment.
    But I am able to reproduce with on-demand libbpf CI ([1]).
    
    Through code analysis, the following is possible reason.
    The failed subtest runs bpf program in softirq environment.
    Since bpf_send_signal() only sends to a fork of "test_progs"
    process. If the underlying current task is
    not "test_progs", bpf_send_signal() will not be triggered
    and the subtest will fail.
    
    To reduce the chances where the underlying process is not
    the intended one, this patch boosted scheduling priority to
    -20 (highest allowed by setpriority() call). And I did
    10 runs with on-demand libbpf CI with this patch and I
    didn't observe any failures.
    
     [1] https://github.com/libbpf/libbpf/actions/workflows/ondemand.yml
    
    Signed-off-by: Yonghong Song <yhs@fb.com>
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Link: https://lore.kernel.org/bpf/20210817190923.3186725-1-yhs@fb.com
    yonghong-song authored and anakryiko committed Aug 17, 2021
  4. selftests/bpf: Replace CHECK with ASSERT_* macros in send_signal.c

    Replace CHECK in send_signal.c with ASSERT_* macros as
    ASSERT_* macros are generally preferred. There is no
    funcitonality change.
    
    Signed-off-by: Yonghong Song <yhs@fb.com>
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Link: https://lore.kernel.org/bpf/20210817190918.3186400-1-yhs@fb.com
    yonghong-song authored and anakryiko committed Aug 17, 2021
  5. Merge branch 'selftests/bpf: Improve the usability of test_progs'

    Yucong Sun says:
    
    ====================
    
    This short series adds two new switches to test_progs, "-a" and "-d",
    adding support for both exact string matching, as well as '*' wildcards.
    It also cleans up the output to make it possible to generate
    allowlist/denylist using common cli tools.
    ====================
    
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    anakryiko committed Aug 17, 2021
  6. selftests/bpf: Support glob matching for test selector.

    This patch adds '-a' and '-d' arguments supporting both exact string match as
    well as using '*' wildcard in test/subtests selection. '-a' and '-t' can
    co-exists, same as '-d' and '-b', in which case they just add to the list of
    allowed or denied test selectors.
    
    Caveat: Same as the current substring matching mechanism, test and subtest
    selector applies independently, 'a*/b*' will execute all tests matching "a*",
    and with subtest name matching "b*", but tests matching "a*" that has no
    subtests will also be executed.
    
    Signed-off-by: Yucong Sun <fallentree@fb.com>
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Link: https://lore.kernel.org/bpf/20210817044732.3263066-5-fallentree@fb.com
    thefallentree authored and anakryiko committed Aug 17, 2021
  7. selftests/bpf: Also print test name in subtest status message

    This patch add test name in subtest status message line, making it possible to
    grep ':OK' in the output to generate a list of passed test+subtest names, which
    can be processed to generate argument list to be used with "-a", "-d" exact
    string matching.
    
    Example:
    
     #1/1 align/mov:OK
     ..
     #1/12 align/pointer variable subtraction:OK
     #1 align:OK
    
    Signed-off-by: Yucong Sun <fallentree@fb.com>
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Link: https://lore.kernel.org/bpf/20210817044732.3263066-4-fallentree@fb.com
    thefallentree authored and anakryiko committed Aug 17, 2021
  8. selftests/bpf: Correctly display subtest skip status

    In skip_account(), test->skip_cnt is set to 0 at the end, this makes next print
    statement never display SKIP status for the subtest. This patch moves the
    accounting logic after the print statement, fixing the issue.
    
    This patch also added SKIP status display for normal tests.
    
    Signed-off-by: Yucong Sun <fallentree@fb.com>
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Link: https://lore.kernel.org/bpf/20210817044732.3263066-3-fallentree@fb.com
    thefallentree authored and anakryiko committed Aug 17, 2021
  9. selftests/bpf: Skip loading bpf_testmod when using -l to list tests.

    When using "-l", test_progs often is executed as non-root user,
    load_bpf_testmod() will fail and output errors. This patch skips loading bpf
    testmod when "-l" is specified, making output cleaner.
    
    Signed-off-by: Yucong Sun <fallentree@fb.com>
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Link: https://lore.kernel.org/bpf/20210817044732.3263066-2-fallentree@fb.com
    thefallentree authored and anakryiko committed Aug 17, 2021
  10. selftests/bpf: Add exponential backoff to map_delete_retriable in tes…

    …t_maps
    
    Using a fixed delay of 1 microsecond has proven flaky in slow CPU environment,
    e.g. Github Actions CI system. This patch adds exponential backoff with a cap
    of 50ms to reduce the flakiness of the test. Initial delay is chosen at random
    in the range [0ms, 5ms).
    
    Signed-off-by: Yucong Sun <fallentree@fb.com>
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Link: https://lore.kernel.org/bpf/20210817045713.3307985-1-fallentree@fb.com
    thefallentree authored and anakryiko committed Aug 17, 2021
  11. selftests/bpf: Add exponential backoff to map_update_retriable in tes…

    …t_maps
    
    Using a fixed delay of 1 microsecond has proven flaky in slow CPU environment,
    e.g. Github Actions CI system. This patch adds exponential backoff with a cap
    of 50ms to reduce the flakiness of the test. Initial delay is chosen at random
    in the range [0ms, 5ms).
    
    Signed-off-by: Yucong Sun <fallentree@fb.com>
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Acked-by: Song Liu <songliubraving@fb.com>
    Link: https://lore.kernel.org/bpf/20210816175250.296110-1-fallentree@fb.com
    thefallentree authored and anakryiko committed Aug 17, 2021
  12. Merge branch 'sockmap: add sockmap support for unix stream socket'

    Jiang Wang says:
    
    ====================
    
    This patch series add support for unix stream type
    for sockmap. Sockmap already supports TCP, UDP,
    unix dgram types. The unix stream support is similar
    to unix dgram.
    
    Also add selftests for unix stream type in sockmap tests.
    ====================
    
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    anakryiko committed Aug 17, 2021
  13. selftest/bpf: Add new tests in sockmap for unix stream to tcp.

    Add two new test cases in sockmap tests, where unix stream is
    redirected to tcp and vice versa.
    
    Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Reviewed-by: Cong Wang <cong.wang@bytedance.com>
    Acked-by: John Fastabend <john.fastabend@gmail.com>
    Acked-by: Jakub Sitnicki <jakub@cloudflare.com>
    Link: https://lore.kernel.org/bpf/20210816190327.2739291-6-jiang.wang@bytedance.com
    Jiang Wang authored and anakryiko committed Aug 17, 2021
  14. selftest/bpf: Change udp to inet in some function names

    This is to prepare for adding new unix stream tests.
    Mostly renames, also pass the socket types as an argument.
    
    Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Reviewed-by: Cong Wang <cong.wang@bytedance.com>
    Acked-by: John Fastabend <john.fastabend@gmail.com>
    Acked-by: Jakub Sitnicki <jakub@cloudflare.com>
    Link: https://lore.kernel.org/bpf/20210816190327.2739291-5-jiang.wang@bytedance.com
    Jiang Wang authored and anakryiko committed Aug 17, 2021
  15. selftest/bpf: Add tests for sockmap with unix stream type.

    Add two tests for unix stream to unix stream redirection
    in sockmap tests.
    
    Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Reviewed-by: Cong Wang <cong.wang@bytedance.com>
    Acked-by: Jakub Sitnicki <jakub@cloudflare.com>
    Acked-by: John Fastabend <john.fastabend@gmail.com>
    Link: https://lore.kernel.org/bpf/20210816190327.2739291-4-jiang.wang@bytedance.com
    Jiang Wang authored and anakryiko committed Aug 17, 2021
  16. af_unix: Add unix_stream_proto for sockmap

    Previously, sockmap for AF_UNIX protocol only supports
    dgram type. This patch add unix stream type support, which
    is similar to unix_dgram_proto. To support sockmap, dgram
    and stream cannot share the same unix_proto anymore, because
    they have different implementations, such as unhash for stream
    type (which will remove closed or disconnected sockets from the map),
    so rename unix_proto to unix_dgram_proto and add a new
    unix_stream_proto.
    
    Also implement stream related sockmap functions.
    And add dgram key words to those dgram specific functions.
    
    Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Reviewed-by: Cong Wang <cong.wang@bytedance.com>
    Acked-by: Jakub Sitnicki <jakub@cloudflare.com>
    Acked-by: John Fastabend <john.fastabend@gmail.com>
    Link: https://lore.kernel.org/bpf/20210816190327.2739291-3-jiang.wang@bytedance.com
    Jiang Wang authored and anakryiko committed Aug 17, 2021
  17. af_unix: Add read_sock for stream socket types

    To support sockmap for af_unix stream type, implement
    read_sock, which is similar to the read_sock for unix
    dgram sockets.
    
    Signed-off-by: Jiang Wang <jiang.wang@bytedance.com>
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Reviewed-by: Cong Wang <cong.wang@bytedance.com>
    Acked-by: Jakub Sitnicki <jakub@cloudflare.com>
    Acked-by: John Fastabend <john.fastabend@gmail.com>
    Link: https://lore.kernel.org/bpf/20210816190327.2739291-2-jiang.wang@bytedance.com
    Jiang Wang authored and anakryiko committed Aug 17, 2021
  18. selftests/bpf: Test btf__load_vmlinux_btf/btf__load_module_btf APIs

    Add test for btf__load_vmlinux_btf/btf__load_module_btf APIs. The test
    loads bpf_testmod module BTF and check existence of a symbol which is
    known to exist.
    
    Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Link: https://lore.kernel.org/bpf/20210815081035.205879-1-hengqi.chen@gmail.com
    chenhengqi authored and anakryiko committed Aug 17, 2021
  19. bpf: Reconfigure libbpf docs to remove unversioned API

    This removes the libbpf_api.rst file from the kernel documentation.
    The intention for this file was to pull documentation from comments
    above API functions in libbpf. However, due to limitations of the
    kernel documentation system, this API documentation could not be
    versioned, which is counterintuative to how users expect to use it.
    There is also currently no doc comments, making this a blank page.
    
    Once the kernel comment documentation is actually contributed, it
    will still exist in the kernel repository, just in the code itself.
    
    A seperate site is being spun up to generate documentaiton from those
    comments in a way in which it can be versioned properly.
    
    This also reconfigures the bpf documentation index page to make it
    easier to sync to the previously mentioned documentaiton site.
    
    Signed-off-by: Grant Seltzer <grantseltzer@gmail.com>
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Link: https://lore.kernel.org/bpf/20210810020508.280639-1-grantseltzer@gmail.com
    grantseltzer authored and anakryiko committed Aug 17, 2021

Commits on Aug 16, 2021

  1. Merge branch 'bpf-perf-link'

    Andrii Nakryiko says:
    
    ====================
    This patch set implements an ability for users to specify custom black box u64
    value for each BPF program attachment, bpf_cookie, which is available to BPF
    program at runtime. This is a feature that's critically missing for cases when
    some sort of generic processing needs to be done by the common BPF program
    logic (or even exactly the same BPF program) across multiple BPF hooks (e.g.,
    many uniformly handled kprobes) and it's important to be able to distinguish
    between each BPF hook at runtime (e.g., for additional configuration lookup).
    
    The choice of restricting this to a fixed-size 8-byte u64 value is an explicit
    design decision. Making this configurable by users adds unnecessary complexity
    (extra memory allocations, extra complications on the verifier side to validate
    accesses to variable-sized data area) while not really opening up new
    possibilities. If user's use case requires storing more data per attachment,
    it's possible to use either global array, or ARRAY/HASHMAP BPF maps, where
    bpf_cookie would be used as an index into respective storage, populated by
    user-space code before creating BPF link. This gives user all the flexibility
    and control while keeping BPF verifier and BPF helper API simple.
    
    Currently, similar functionality can only be achieved through:
    
      - code-generation and BPF program cloning, which is very complicated and
        unmaintainable;
      - on-the-fly C code generation and further runtime compilation, which is
        what BCC uses and allows to do pretty simply. The big downside is a very
        heavy-weight Clang/LLVM dependency and inefficient memory usage (due to
        many BPF program clones and the compilation process itself);
      - in some cases (kprobes and sometimes uprobes) it's possible to do function
        IP lookup to get function-specific configuration. This doesn't work for
        all the cases (e.g., when attaching uprobes to shared libraries) and has
        higher runtime overhead and additional programming complexity due to
        BPF_MAP_TYPE_HASHMAP lookups. Up until recently, before bpf_get_func_ip()
        BPF helper was added, it was also very complicated and unstable (API-wise)
        to get traced function's IP from fentry/fexit and kretprobe.
    
    With libbpf and BPF CO-RE, runtime compilation is not an option, so to be able
    to build generic tracing tooling simply and efficiently, ability to provide
    additional bpf_cookie value for each *attachment* (as opposed to each BPF
    program) is extremely important. Two immediate users of this functionality are
    going to be libbpf-based USDT library (currently in development) and retsnoop
    ([0]), but I'm sure more applications will come once users get this feature in
    their kernels.
    
    To achieve above described, all perf_event-based BPF hooks are made available
    through a new BPF_LINK_TYPE_PERF_EVENT BPF link, which allows to use common
    LINK_CREATE command for program attachments and generally brings
    perf_event-based attachments into a common BPF link infrastructure.
    
    With that, LINK_CREATE gets ability to pass throught bpf_cookie value during
    link creation (BPF program attachment) time. bpf_get_attach_cookie() BPF
    helper is added to allow fetching this value at runtime from BPF program side.
    BPF cookie is stored either on struct perf_event itself and fetched from the
    BPF program context, or is passed through ambient BPF run context, added in
    c7603cf ("bpf: Add ambient BPF runtime context stored in current").
    
    On the libbpf side of things, BPF perf link is utilized whenever is supported
    by the kernel instead of using PERF_EVENT_IOC_SET_BPF ioctl on perf_event FD.
    All the tracing attach APIs are extended with OPTS and bpf_cookie is passed
    through corresponding opts structs.
    
    Last part of the patch set adds few self-tests utilizing new APIs.
    
    There are also a few refactorings along the way to make things cleaner and
    easier to work with, both in kernel (BPF_PROG_RUN and BPF_PROG_RUN_ARRAY), and
    throughout libbpf and selftests.
    
    Follow-up patches will extend bpf_cookie to fentry/fexit programs.
    
    While adding uprobe_opts, also extend it with ref_ctr_offset for specifying
    USDT semaphore (reference counter) offset. Update attach_probe selftests to
    validate its functionality. This is another feature (along with bpf_cookie)
    required for implementing libbpf-based USDT solution.
    
      [0] https://github.com/anakryiko/retsnoop
    
    v4->v5:
      - rebase on latest bpf-next to resolve merge conflict;
      - add ref_ctr_offset to uprobe_opts and corresponding selftest;
    v3->v4:
      - get rid of BPF_PROG_RUN macro in favor of bpf_prog_run() (Daniel);
      - move #ifdef CONFIG_BPF_SYSCALL check into bpf_set_run_ctx (Daniel);
    v2->v3:
      - user_ctx -> bpf_cookie, bpf_get_user_ctx -> bpf_get_attach_cookie (Peter);
      - fix BPF_LINK_TYPE_PERF_EVENT value fix (Jiri);
      - use bpf_prog_run() from bpf_prog_run_pin_on_cpu() (Yonghong);
    v1->v2:
      - fix build failures on non-x86 arches by gating on CONFIG_PERF_EVENTS.
    ====================
    
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    borkmann committed Aug 16, 2021
  2. selftests/bpf: Add ref_ctr_offset selftests

    Extend attach_probe selftests to specify ref_ctr_offset for uprobe/uretprobe
    and validate that its value is incremented from zero.
    
    Turns out that once uprobe is attached with ref_ctr_offset, uretprobe for the
    same location/function *has* to use ref_ctr_offset as well, otherwise
    perf_event_open() fails with -EINVAL. So this test uses ref_ctr_offset for
    both uprobe and uretprobe, even though for the purpose of test uprobe would be
    enough.
    
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Link: https://lore.kernel.org/bpf/20210815070609.987780-17-andrii@kernel.org
    anakryiko authored and borkmann committed Aug 16, 2021
  3. libbpf: Add uprobe ref counter offset support for USDT semaphores

    When attaching to uprobes through perf subsystem, it's possible to specify
    offset of a so-called USDT semaphore, which is just a reference counted u16,
    used by kernel to keep track of how many tracers are attached to a given
    location. Support for this feature was added in [0], so just wire this through
    uprobe_opts. This is important to enable implementing USDT attachment and
    tracing through libbpf's bpf_program__attach_uprobe_opts() API.
    
      [0] a6ca88b ("trace_uprobe: support reference counter in fd-based uprobe")
    
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Link: https://lore.kernel.org/bpf/20210815070609.987780-16-andrii@kernel.org
    anakryiko authored and borkmann committed Aug 16, 2021
  4. selftests/bpf: Add bpf_cookie selftests for high-level APIs

    Add selftest with few subtests testing proper bpf_cookie usage.
    
    Kprobe and uprobe subtests are pretty straightforward and just validate that
    the same BPF program attached with different bpf_cookie will be triggered with
    those different bpf_cookie values.
    
    Tracepoint subtest is a bit more interesting, as it is the only
    perf_event-based BPF hook that shares bpf_prog_array between multiple
    perf_events internally. This means that the same BPF program can't be attached
    to the same tracepoint multiple times. So we have 3 identical copies. This
    arrangement allows to test bpf_prog_array_copy()'s handling of bpf_prog_array
    list manipulation logic when programs are attached and detached.  The test
    validates that bpf_cookie isn't mixed up and isn't lost during such list
    manipulations.
    
    Perf_event subtest validates that two BPF links can be created against the
    same perf_event (but not at the same time, only one BPF program can be
    attached to perf_event itself), and that for each we can specify different
    bpf_cookie value.
    
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Link: https://lore.kernel.org/bpf/20210815070609.987780-15-andrii@kernel.org
    anakryiko authored and borkmann committed Aug 16, 2021
  5. selftests/bpf: Extract uprobe-related helpers into trace_helpers.{c,h}

    Extract two helpers used for working with uprobes into trace_helpers.{c,h} to
    be re-used between multiple uprobe-using selftests. Also rename get_offset()
    into more appropriate get_uprobe_offset().
    
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Link: https://lore.kernel.org/bpf/20210815070609.987780-14-andrii@kernel.org
    anakryiko authored and borkmann committed Aug 16, 2021
  6. selftests/bpf: Test low-level perf BPF link API

    Add tests utilizing low-level bpf_link_create() API to create perf BPF link.
    
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Link: https://lore.kernel.org/bpf/20210815070609.987780-13-andrii@kernel.org
    anakryiko authored and borkmann committed Aug 16, 2021
  7. libbpf: Add bpf_cookie to perf_event, kprobe, uprobe, and tp attach APIs

    Wire through bpf_cookie for all attach APIs that use perf_event_open under the
    hood:
      - for kprobes, extend existing bpf_kprobe_opts with bpf_cookie field;
      - for perf_event, uprobe, and tracepoint APIs, add their _opts variants and
        pass bpf_cookie through opts.
    
    For kernel that don't support BPF_LINK_CREATE for perf_events, and thus
    bpf_cookie is not supported either, return error and log warning for user.
    
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Link: https://lore.kernel.org/bpf/20210815070609.987780-12-andrii@kernel.org
    anakryiko authored and borkmann committed Aug 16, 2021
  8. libbpf: Add bpf_cookie support to bpf_link_create() API

    Add ability to specify bpf_cookie value when creating BPF perf link with
    bpf_link_create() low-level API.
    
    Given BPF_LINK_CREATE command is growing and keeps getting new fields that are
    specific to the type of BPF_LINK, extend libbpf side of bpf_link_create() API
    and corresponding OPTS struct to accomodate such changes. Add extra checks to
    prevent using incompatible/unexpected combinations of fields.
    
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Link: https://lore.kernel.org/bpf/20210815070609.987780-11-andrii@kernel.org
    anakryiko authored and borkmann committed Aug 16, 2021
  9. libbpf: Use BPF perf link when supported by kernel

    Detect kernel support for BPF perf link and prefer it when attaching to
    perf_event, tracepoint, kprobe/uprobe. Underlying perf_event FD will be kept
    open until BPF link is destroyed, at which point both perf_event FD and BPF
    link FD will be closed.
    
    This preserves current behavior in which perf_event FD is open for the
    duration of bpf_link's lifetime and user is able to "disconnect" bpf_link from
    underlying FD (with bpf_link__disconnect()), so that bpf_link__destroy()
    doesn't close underlying perf_event FD.When BPF perf link is used, disconnect
    will keep both perf_event and bpf_link FDs open, so it will be up to
    (advanced) user to close them. This approach is demonstrated in bpf_cookie.c
    selftests, added in this patch set.
    
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Link: https://lore.kernel.org/bpf/20210815070609.987780-10-andrii@kernel.org
    anakryiko authored and borkmann committed Aug 16, 2021
  10. libbpf: Remove unused bpf_link's destroy operation, but add dealloc

    bpf_link->destroy() isn't used by any code, so remove it. Instead, add ability
    to override deallocation procedure, with default doing plain free(link). This
    is necessary for cases when we want to "subclass" struct bpf_link to keep
    extra information, as is the case in the next patch adding struct
    bpf_link_perf.
    
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Link: https://lore.kernel.org/bpf/20210815070609.987780-9-andrii@kernel.org
    anakryiko authored and borkmann committed Aug 16, 2021
  11. libbpf: Re-build libbpf.so when libbpf.map changes

    Ensure libbpf.so is re-built whenever libbpf.map is modified.  Without this,
    changes to libbpf.map are not detected and versioned symbols mismatch error
    will be reported until `make clean && make` is used, which is a suboptimal
    developer experience.
    
    Fixes: 306b267 ("libbpf: Verify versioned symbols")
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Acked-by: Yonghong Song <yhs@fb.com>
    Link: https://lore.kernel.org/bpf/20210815070609.987780-8-andrii@kernel.org
    anakryiko authored and borkmann committed Aug 16, 2021
  12. bpf: Add bpf_get_attach_cookie() BPF helper to access bpf_cookie value

    Add new BPF helper, bpf_get_attach_cookie(), which can be used by BPF programs
    to get access to a user-provided bpf_cookie value, specified during BPF
    program attachment (BPF link creation) time.
    
    Naming is hard, though. With the concept being named "BPF cookie", I've
    considered calling the helper:
      - bpf_get_cookie() -- seems too unspecific and easily mistaken with socket
        cookie;
      - bpf_get_bpf_cookie() -- too much tautology;
      - bpf_get_link_cookie() -- would be ok, but while we create a BPF link to
        attach BPF program to BPF hook, it's still an "attachment" and the
        bpf_cookie is associated with BPF program attachment to a hook, not a BPF
        link itself. Technically, we could support bpf_cookie with old-style
        cgroup programs.So I ultimately rejected it in favor of
        bpf_get_attach_cookie().
    
    Currently all perf_event-backed BPF program types support
    bpf_get_attach_cookie() helper. Follow-up patches will add support for
    fentry/fexit programs as well.
    
    While at it, mark bpf_tracing_func_proto() as static to make it obvious that
    it's only used from within the kernel/trace/bpf_trace.c.
    
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Acked-by: Yonghong Song <yhs@fb.com>
    Link: https://lore.kernel.org/bpf/20210815070609.987780-7-andrii@kernel.org
    anakryiko authored and borkmann committed Aug 16, 2021
  13. bpf: Allow to specify user-provided bpf_cookie for BPF perf links

    Add ability for users to specify custom u64 value (bpf_cookie) when creating
    BPF link for perf_event-backed BPF programs (kprobe/uprobe, perf_event,
    tracepoints).
    
    This is useful for cases when the same BPF program is used for attaching and
    processing invocation of different tracepoints/kprobes/uprobes in a generic
    fashion, but such that each invocation is distinguished from each other (e.g.,
    BPF program can look up additional information associated with a specific
    kernel function without having to rely on function IP lookups). This enables
    new use cases to be implemented simply and efficiently that previously were
    possible only through code generation (and thus multiple instances of almost
    identical BPF program) or compilation at runtime (BCC-style) on target hosts
    (even more expensive resource-wise). For uprobes it is not even possible in
    some cases to know function IP before hand (e.g., when attaching to shared
    library without PID filtering, in which case base load address is not known
    for a library).
    
    This is done by storing u64 bpf_cookie in struct bpf_prog_array_item,
    corresponding to each attached and run BPF program. Given cgroup BPF programs
    already use two 8-byte pointers for their needs and cgroup BPF programs don't
    have (yet?) support for bpf_cookie, reuse that space through union of
    cgroup_storage and new bpf_cookie field.
    
    Make it available to kprobe/tracepoint BPF programs through bpf_trace_run_ctx.
    This is set by BPF_PROG_RUN_ARRAY, used by kprobe/uprobe/tracepoint BPF
    program execution code, which luckily is now also split from
    BPF_PROG_RUN_ARRAY_CG. This run context will be utilized by a new BPF helper
    giving access to this user-provided cookie value from inside a BPF program.
    Generic perf_event BPF programs will access this value from perf_event itself
    through passed in BPF program context.
    
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Acked-by: Yonghong Song <yhs@fb.com>
    Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Link: https://lore.kernel.org/bpf/20210815070609.987780-6-andrii@kernel.org
    anakryiko authored and borkmann committed Aug 16, 2021
  14. bpf: Implement minimal BPF perf link

    Introduce a new type of BPF link - BPF perf link. This brings perf_event-based
    BPF program attachments (perf_event, tracepoints, kprobes, and uprobes) into
    the common BPF link infrastructure, allowing to list all active perf_event
    based attachments, auto-detaching BPF program from perf_event when link's FD
    is closed, get generic BPF link fdinfo/get_info functionality.
    
    BPF_LINK_CREATE command expects perf_event's FD as target_fd. No extra flags
    are currently supported.
    
    Force-detaching and atomic BPF program updates are not yet implemented, but
    with perf_event-based BPF links we now have common framework for this without
    the need to extend ioctl()-based perf_event interface.
    
    One interesting consideration is a new value for bpf_attach_type, which
    BPF_LINK_CREATE command expects. Generally, it's either 1-to-1 mapping from
    bpf_attach_type to bpf_prog_type, or many-to-1 mapping from a subset of
    bpf_attach_types to one bpf_prog_type (e.g., see BPF_PROG_TYPE_SK_SKB or
    BPF_PROG_TYPE_CGROUP_SOCK). In this case, though, we have three different
    program types (KPROBE, TRACEPOINT, PERF_EVENT) using the same perf_event-based
    mechanism, so it's many bpf_prog_types to one bpf_attach_type. I chose to
    define a single BPF_PERF_EVENT attach type for all of them and adjust
    link_create()'s logic for checking correspondence between attach type and
    program type.
    
    The alternative would be to define three new attach types (e.g., BPF_KPROBE,
    BPF_TRACEPOINT, and BPF_PERF_EVENT), but that seemed like unnecessary overkill
    and BPF_KPROBE will cause naming conflicts with BPF_KPROBE() macro, defined by
    libbpf. I chose to not do this to avoid unnecessary proliferation of
    bpf_attach_type enum values and not have to deal with naming conflicts.
    
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Acked-by: Yonghong Song <yhs@fb.com>
    Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Link: https://lore.kernel.org/bpf/20210815070609.987780-5-andrii@kernel.org
    anakryiko authored and borkmann committed Aug 16, 2021
  15. bpf: Refactor perf_event_set_bpf_prog() to use struct bpf_prog input

    Make internal perf_event_set_bpf_prog() use struct bpf_prog pointer as an
    input argument, which makes it easier to re-use for other internal uses
    (coming up for BPF link in the next patch). BPF program FD is not as
    convenient and in some cases it's not available. So switch to struct bpf_prog,
    move out refcounting outside and let caller do bpf_prog_put() in case of an
    error. This follows the approach of most of the other BPF internal functions.
    
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Acked-by: Yonghong Song <yhs@fb.com>
    Link: https://lore.kernel.org/bpf/20210815070609.987780-4-andrii@kernel.org
    anakryiko authored and borkmann committed Aug 16, 2021
Older