Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DLPX-91640 bpftrace: Automatic merge failed #35

Merged

Conversation

manoj-joseph
Copy link
Contributor

@manoj-joseph manoj-joseph commented Jul 9, 2024

Background

DLPX-91538 a.k.a PR #30 was a bad merge. I reverted it with #34.

Problem

Git still thinks that the upstream changes pulled in by PR 30 are still in. So, I had to redo the merge locally and pulled the changes back in with this commit.

commit 6ac072a2769baeb3cef3d53e0992c5f749cd057a
Author: Manoj Joseph <manoj.joseph@delphix.com>
Date:   Mon Jul 15 12:49:46 2024 -0700

    Redo DLPX-91538

Once that was done, the remaining upstream changes merged without conflict.

mjoseph@Manoj-Josephs-MacBook-Pro bpftrace % git checkout merge-3   
Switched to branch 'merge-3'
Your branch is ahead of 'origin/develop' by 1 commit.
  (use "git push" to publish your local commits)
mjoseph@Manoj-Josephs-MacBook-Pro bpftrace % git merge --no-edit --no-stat upstreams-develop
Auto-merging .github/workflows/ci.yml
Auto-merging CMakeLists.txt
Auto-merging flake.nix
Auto-merging src/CMakeLists.txt
Auto-merging src/ast/CMakeLists.txt
Auto-merging src/btf.h
Auto-merging src/output.cpp
Merge made by the 'ort' strategy.
mjoseph@Manoj-Josephs-MacBook-Pro bpftrace % 

Solution

mjoseph@Manoj-Josephs-MacBook-Pro bpftrace % git lg -30    
*   3dc3760a - (HEAD -> merge-3, origin/dlpx/pr/manoj-joseph/68dda4ce-4238-4683-951a-0c0d7cdf9023) DLPX-91640 bpftrace: Automatic merge failed (8 minutes ago) <Manoj Joseph>
|\  
| * 36c4b72c - (origin/upstreams/develop, upstreams-develop) Add global vars which can be set from userspace (2 days ago) <Jordan Rome>
| * 6cc27984 - Codegen Tests: Remove unnecessary uses of count() function (3 days ago) <Alastair Robertson>
| * 0c72e1be - Tests: disable verbose logging after log tests (3 days ago) <Viktor Malik>
| * 32ca92eb - For loops: Allow sharing variables with main program (6 days ago) <Alastair Robertson>
| * fe9c5b46 - Cleanups: Use "const" and reduce string copies (8 days ago) <Alastair Robertson>
| * af32e6bf - Fix unaligned map key access (8 days ago) <Alastair Robertson>
| * c8b4cd27 - Enable for-loops in multiple probes (10 days ago) <Viktor Malik>
| * f5792f73 - Codegen: use string constants for BPF fmt strings (10 days ago) <Viktor Malik>
| * 66eac00c - Load BPF programs and maps via libbpf bpf_object (10 days ago) <Viktor Malik>
| * ca1b216b - codegen: Fix segfault of CodegenLLVM::generateProbe() (13 days ago) <Rong Tao>
| * 7b9b4e93 - Standardize async id creation and setting (2 weeks ago) <Jordan Rome>
| * 8b0a62a7 - CHANGELOG: Move `buf` entry to the correct version (i.e. unreleased) (2 weeks ago) <Alastair Robertson>
| * 95e322a2 - docs: remove mention of unsupported character literals (2 weeks ago) <Marco Iorio>
| * 87616e09 - Fix racy per-cpu runtime test (2 weeks ago) <Jordan Rome>
| * b3997d15 - Add per-cpu aggregations for avg (2 weeks ago) <Jordan Rome>
| * fb1ecb3c - Add --dry-run CLI option (3 weeks ago) <Viktor Malik>
| * b92cb694 - Refactor verbose and debug outputs (3 weeks ago) <Viktor Malik>
| * e1c1f426 - CI: use new version of tcpdrop (3 weeks ago) <Viktor Malik>
* | 6ac072a2 - (merge-2) Redo DLPX-91538 (20 minutes ago) <Manoj Joseph>
* |   c4edc4ac - (origin/release, origin/develop, origin/HEAD, develop) Merge pull request #34 from delphix/dlpx/pr/manoj-joseph/a54da1fd-c183-4f6f-a7fb-8a236a94dd7d (6 days ago) <Manoj Joseph>
|\ \  
| * | 00815a21 - (origin/dlpx/pr/manoj-joseph/a54da1fd-c183-4f6f-a7fb-8a236a94dd7d) DLPX-91603 Revert "DLPX-91538 Upstream merge conflict in bpftrace" (6 days ago) <Manoj Joseph>
|/ /  
* |   b615f00e - Merge pull request #30 from delphix/dlpx/pr/manoj-joseph/b2d64e98-0d65-474d-8532-84c91330818a (2 weeks ago) <Manoj Joseph>
|\ \  
| * | 52c9c328 - (origin/dlpx/pr/manoj-joseph/b2d64e98-0d65-474d-8532-84c91330818a) DLPX-91538 Upstream merge conflict in bpftrace (2 weeks ago) <Manoj Joseph>
|/| | 
| |/  
| * 4e4f7ac2 - (upstreams-develop-old) codegen: iter: Allocate an array rather than a buffer (3 weeks ago) <Daniel Xu>
| * e4b4ebb7 - types: buf: Fix buffer size calculation (3 weeks ago) <Daniel Xu>
| * e357b493 - Load all BPF programs at once (3 weeks ago) <Viktor Malik>
| * 0c3a9492 - Move BPF program loading to BpfProgram (3 weeks ago) <Viktor Malik>
| * 19bae030 - Create BpfPrograms from struct bpf_program (3 weeks ago) <Viktor Malik>
| * 946ddeb4 - Cleanup: reduce warnings from partially-initialised LIBBPF_OPTS (3 weeks ago) <Holger Hoffstätte>
mjoseph@Manoj-Josephs-MacBook-Pro bpftrace % 

Testing Done

http://ops.jenkins.delphix.com/job/linux-pkg/job/develop/job/build-package/job/bpftrace/job/pre-push/204/console SUCCESS

viktormalik and others added 15 commits June 28, 2024 07:04
It seems that the CI machines were updated to a kernel which has the
tcp_drop function inlined. Therefore, we need to use the new version of
tcpdrop.bt.
Consolidate and cleanup information that is being printed in the
verbose (-v) and debugging output (-d). The main idea is:
- the verbose output is intended for users to get more information on
  what bpftrace is doing and where it is (possibly) failing,
- the debug output is intended for developers to help them debugging
  bpftrace by providing very detailed outputs of individual stages of
  bon pftrace runtime.

The main (user-facing) change this brings is that it introduces a
mandatory argument to the -d option which allows to pick the stage whose
output should be printed.

The currently supported stages are:
- 'ast'         - prints the AST after each pass
- 'codegen'     - prints the LLVM IR code as emit by CodegenLLVM
- 'codegen-opt' - prints the LLVM IR code after it is optimized by LLVM
                  (i.e. what is actually compiled to BPF bytecode)
- 'libbpf'      - captures and prints libbpf log for all libbpf
                  operations that we use
- 'verifier'    - captures and prints the BPF verifier log

On top of that, -d can be used multiple times with different arguments
and the argument 'all' activates all of the above.

Another change is that -d no longer executes a dry run. The reason is
that we're printing information from later stages so we need to let
bpftrace run all the time. A new option for dry run will be added in the
following patch.

In addition, there are more user-facing changes:
- remove the -dd option,
- allow simultaneous use of -v and -d,
- add --debug as a long version of -d.

Also, some minor refactorings were done:
- make sure the verbosity output always goes to stderr while the
  debugging output always goes to stdout,
- improve some formatting of verbose outputs.
The option terminates bpftrace right after all the probes are attached.
This can be useful to test that the script can be parsed, loaded, and
attached, without actually executing it.

We use this in the tools parsing test since the -d option no longer does
a dry run.
Similar to what was done for count and sum
allow avg to be used in expressions.

This is the last map type that will be
supported for this functionality.

Issue: bpftrace/bpftrace#3126
Issue: bpftrace/bpftrace#3216
"print_per_cpu_map_vals" is a racy test due to the use of
a very active probe and per-cpu maps. Let's just use 'BEGIN'
to test printing even though this doesn't really exercise
the cpu aggregation path as 'BEGIN' only runs on one CPU.
Character literals appear to not be supported at this time. Hence,
let's clarify this aspect in the documentation.

Fixes: fd0461d ("docs: entries for new int syntax")
Fixes: #3278
- Remove entry for #3281, as it's a fix for a bug that hasn't made it to
  a release
- Quote a bunch of things with backticks to make the markdown render
  more nicely
This will make sure we don't forget to reset async ids
in 'create_reset_ids' and also makes sure we're incrementing
the id whenever we get it (instead of doing it at some
later, random instruction).

This is follow up to a conversation here:
bpftrace/bpftrace#3249 (comment)
If it is dummy, then the visit() operation should not be executed. For
example, on aarch64, there is no syscalls:sys_{enter,exit}_open
tracepoint, however, opensnoop.bt tool will try to attch them and visit
args's field. We expect to get a warning, but instead we get a segmentation
fault. As follows:

    $ sudo gdb bpftrace
    ...
    (gdb) set args opensnoop.bt
    (gdb) r
    opensnoop.bt:22-24: WARNING: tracepoint not found: syscalls:sys_enter_open
    opensnoop.bt:28-30: WARNING: tracepoint not found: syscalls:sys_exit_open

    Thread 1 "bpftrace" received signal SIGSEGV, Segmentation fault.
    0x000000000052bbd8 in __gnu_cxx::__normal_iterator<bpftrace::Field const*, std::vector<bpftrace::Field, std::allocator<bpftrace::Field> > >::__normal_iterator (this=0xffffffffb0b8, __i=<error reading variable: Cannot access memory at address 0x10>)
        at /usr/include/c++/14/bits/stl_iterator.h:1068
    1068              : _M_current(__i) { }
    (gdb) bt
    #0  0x000000000052bbd8 in __gnu_cxx::__normal_iterator<bpftrace::Field const*, std::vector<bpftrace::Field, std::allocator<bpftrace::Field> > >::__normal_iterator (this=0xffffffffb0b8,
        __i=<error reading variable: Cannot access memory at address 0x10>) at /usr/include/c++/14/bits/stl_iterator.h:1068
    #1  0x0000000000525bdc in std::vector<bpftrace::Field, std::allocator<bpftrace::Field> >::begin (this=0x10)
        at /usr/include/c++/14/bits/stl_vector.h:884
    #2  0x000000000059ce8c in bpftrace::Struct::GetField (this=0x0, name="filename")
        at /home/rongtao/Git/bpftrace/bpftrace/src/struct.cpp:131
    #3  0x00000000005a70a0 in bpftrace::SizedType::GetField (this=0xe9b998, name="filename")
        at /home/rongtao/Git/bpftrace/bpftrace/src/types.cpp:538
    #4  0x00000000007c7178 in bpftrace::ast::CodegenLLVM::visit (this=0xffffffffc650, acc=...)
        at /home/rongtao/Git/bpftrace/bpftrace/src/ast/passes/codegen_llvm.cpp:1973
    #5  0x0000000000847db0 in bpftrace::ast::FieldAccess::accept (this=0xe9be30, v=...)
        at /home/rongtao/Git/bpftrace/bpftrace/src/ast/ast.cpp:31
    #6  0x00000000007d2404 in bpftrace::ast::CodegenLLVM::accept (this=0xffffffffc650, node=0xe9be30)
        at /home/rongtao/Git/bpftrace/bpftrace/src/ast/passes/codegen_llvm.cpp:3778
    #7  0x00000000007c8bec in bpftrace::ast::CodegenLLVM::visit (this=0xffffffffc650, assignment=...)
        at /home/rongtao/Git/bpftrace/bpftrace/src/ast/passes/codegen_llvm.cpp:2218
    #8  0x0000000000847eb4 in bpftrace::ast::AssignMapStatement::accept (this=0xffffe4429250, v=...)
        at /home/rongtao/Git/bpftrace/bpftrace/src/ast/ast.cpp:36
    #9  0x00000000007d2404 in bpftrace::ast::CodegenLLVM::accept (this=0xffffffffc650, node=0xffffe4429250)
        at /home/rongtao/Git/bpftrace/bpftrace/src/ast/passes/codegen_llvm.cpp:3778
    #10 0x00000000007ca4c4 in bpftrace::ast::CodegenLLVM::generateProbe (this=0xffffffffc650, probe=..., full_func_id="dummy",
        name="dummy", func_type=0xff2b50, usdt_location_index=std::optional [no contained value], dummy=true)
        at /home/rongtao/Git/bpftrace/bpftrace/src/ast/passes/codegen_llvm.cpp:2539
    #11 0x00000000007cb4e4 in bpftrace::ast::CodegenLLVM::visit (this=0xffffffffc650, probe=...)
        at /home/rongtao/Git/bpftrace/bpftrace/src/ast/passes/codegen_llvm.cpp:2734
    #12 0x00000000008480f0 in bpftrace::ast::Probe::accept (this=0xffffe4427650, v=...)
        at /home/rongtao/Git/bpftrace/bpftrace/src/ast/ast.cpp:47
    #13 0x00000000007d2404 in bpftrace::ast::CodegenLLVM::accept (this=0xffffffffc650, node=0xffffe4427650)
        at /home/rongtao/Git/bpftrace/bpftrace/src/ast/passes/codegen_llvm.cpp:3778
    #14 0x00000000007cb7a4 in bpftrace::ast::CodegenLLVM::visit (this=0xffffffffc650, program=...)
        at /home/rongtao/Git/bpftrace/bpftrace/src/ast/passes/codegen_llvm.cpp:2752
    #15 0x000000000084818c in bpftrace::ast::Program::accept (this=0xe90f50, v=...)
        at /home/rongtao/Git/bpftrace/bpftrace/src/ast/ast.cpp:50
    #16 0x00000000007d2404 in bpftrace::ast::CodegenLLVM::accept (this=0xffffffffc650, node=0xe90f50)
        at /home/rongtao/Git/bpftrace/bpftrace/src/ast/passes/codegen_llvm.cpp:3778
    #17 0x00000000007d03fc in bpftrace::ast::CodegenLLVM::generate_ir (this=0xffffffffc650)
        at /home/rongtao/Git/bpftrace/bpftrace/src/ast/passes/codegen_llvm.cpp:3450
    #18 0x0000000000478138 in main (argc=2, argv=0xfffffffff4c8) at /home/rongtao/Git/bpftrace/bpftrace/src/main.cpp:898

We can simplify opensnoop.bt to:

    tracepoint:syscalls:sys_enter_open_not_exist,
    tracepoint:syscalls:sys_enter_openat
    {
        @ = args.filename;
    }

This will produce the following error:

    stdin:1:1-45: WARNING: tracepoint not found: syscalls:sys_enter_open_not_exist
    tracepoint:syscalls:sys_enter_open_not_exist,tracepoint:syscalls:sys_enter_openat {@ = args.filename;}
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    Segmentation fault

We need to generate the dummy probe only for cases when none of the probe
attach points exists.

Link: bpftrace/bpftrace#3274
Signed-off-by: Viktor Malik <viktor.malik@gmail.com>
Signed-off-by: Rong Tao <rongtao@cestc.cn>
Instead of using bpf_prog_load to load BPF programs and bpf_map_create
to create BPF maps, use bpf_object__load to load everything at once,
just as libbpf does. This leverages the fact that codegen now produces
an ELF compliant with libbpf.

Required properties of BPF programs (namely program type, expected
attach type, and attach target) still need to be fixed manually (as
we're not using libbpf-compatible section names), this is now done in
BpfBytecode.

Verifier log is now kept per-program as the verifier overrides it for
every program load. This also means that when the user sets
config.log_size to "size", the actual amount of memory allocated is
(size * number_of_probes).

Thanks to this change, a lot of code could have been dropped:
- the entire ELF parsing and BpfBytecode::sections_ (ELF is now directly
  passed to libbpf and all data is accessed via operations over
  bpf_object and bpf_program),
- custom relocations of BPF programs (now done by libbpf),
- custom fixups of BTF data (now done by libbpf).

The change also brought some problems:
- kfunc probes require libbpf commit dd589c3b31c1 ("libbpf: support
  "module: Function" syntax for tracing programs") [1]. Since this is
  not in a released version of libbpf, yet, let's pin Nix to the
  specific commit and leave a TODO in CMakeLists.txt to require the
  libbpf version which contains this support (probably 1.5.0).
- It broke two builtin calls: debugf and printf for iter probes. For
  now, disable the corresponding tests and fix them in the following
  commit.
- Since we now call one libbpf function to do a lot of things (load all
  programs, create all maps, load BTF info, etc.), we have less options
  to figure out what went wrong in case of a failure. Unfortunately,
  libbpf doesn't set program/map fds to error codes so it is hard to
  locate the program which actually failed to load. What we do here is
  we assume that a program failed to load if the verifier log is
  non-empty. If all verifier logs are empty, we advice user to check
  libbpf debugging output to find out more.

[1] libbpf/libbpf@dd589c3
BPF helpers that use fmt strings (bpf_trace_printk, bpf_seq_printf)
expect the string passed in a data map. Currently, we create that map
manually, however libbpf is able to create it internally if an internal
global constant string is used in its place.

Creating such a constant string allows us to get rid of a good amount of
code for maintaining the data map and we now only store the format
strings themselves in RequiredResources.

Re-enable tests for builtins using such helpers (debugf, printf in iter)
which were disabled in the previous commit.
For-loops in multiple probes are now supported without problems b/c
libbpf takes care of program relocations.

This reverts commit cdf9e9d and
re-enables a disabled runtime test for the feature.
If we have a map with key type [string[9], int64], then the int64 should
be treated as an unaligned access.

Previously the alignment calculations worked off the length of the
string value, not the maximum size allocated for the string key.

Example:
  @mapa["ccccdddd", 0] = 1;
  @mapa["aaaabbb", 0] = 1;

The string key's size is taken from the maximum string length, which is
9 bytes. Fields after this key should be considered unaligned.

The second line contains a string of 8 bytes. Padding was not taken
into account and the following key(s) were incorrectly treated as
aligned.
1. Determine which variables need to be shared with the loop callback
2. Pack pointers to them into a context struct
3. Pass pointer to the context struct to the callback function
4. In the callback, override the shared variables so that they read and
   write through the context pointers instead of directly from their
   original addresses

See the comment in semantic_analyser.cpp for pseudo code of this
transformation.
@manoj-joseph manoj-joseph force-pushed the dlpx/pr/manoj-joseph/68dda4ce-4238-4683-951a-0c0d7cdf9023 branch from 5425bc3 to 408d238 Compare July 9, 2024 22:45
viktormalik and others added 3 commits July 12, 2024 15:12
LogStream.basic test runs ENABLE_LOG(V1) to test the verbose logging
output but forgets to run DISABLE_LOG(V1) afterwards. This causes all
subsequent tests (e.g. when running all unit tests) to print verbose
error messages which pollute the output.

Add the missing line.
Simplify the test cases such that each is only testing the minimal
functionality that it needs to. Minimises noise from changing expected
results in unrelated patches.
This adds the ability to create const volatile global
variables in bpf and allow them to be updated in userspace
after program opening but before program loading.

This is useful for things like AOT, where we don't know
know the number of CPUs on the machine where the bpftrace
program will run at the time of codegen.
@manoj-joseph manoj-joseph force-pushed the dlpx/pr/manoj-joseph/68dda4ce-4238-4683-951a-0c0d7cdf9023 branch 2 times, most recently from 93e9887 to fe27d92 Compare July 15, 2024 19:40
@manoj-joseph manoj-joseph force-pushed the dlpx/pr/manoj-joseph/68dda4ce-4238-4683-951a-0c0d7cdf9023 branch 2 times, most recently from 98d0d9f to 889aa9d Compare July 15, 2024 19:55
@manoj-joseph manoj-joseph changed the title Merge branch 'upstreams-develop' into merge DLPX-91640 bpftrace: Automatic merge failed Jul 15, 2024
@manoj-joseph manoj-joseph force-pushed the dlpx/pr/manoj-joseph/68dda4ce-4238-4683-951a-0c0d7cdf9023 branch from 889aa9d to 3dc3760 Compare July 15, 2024 20:03
@manoj-joseph manoj-joseph marked this pull request as ready for review July 15, 2024 20:31
@manoj-joseph manoj-joseph merged commit 430423d into develop Jul 16, 2024
29 of 37 checks passed
@manoj-joseph manoj-joseph deleted the dlpx/pr/manoj-joseph/68dda4ce-4238-4683-951a-0c0d7cdf9023 branch July 16, 2024 04:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

8 participants