Allow working with all probe params (kfunc, uprobe) #2477

viktormalik · 2023-01-19T09:34:55Z

Allows to use *args in multiple constructions such as:

printing (resolves get all arguments values of a function as a string in uprobe and kprobe. something like $$parms in systemtap do. #2436)

# bpftrace -e 'kfunc:vfs_read { print(*args); exit(); }'
Attaching 1 probe...
{ .file = 0xffff9fbe928a7e00, .buf = 0x7ffddb859ee0, .count = 16, .pos = 0xffffb203068afee0 }

storing in map

# bpftrace -e 'kfunc:vfs_read { @ = *args; exit(); }'
Attaching 1 probe...

@: { .file = 0xffff9fbe928a7e00, .buf = 0x7ffddb859ee0, .count = 16, .pos = 0xffffb203068afe88 }

map keys

# bpftrace -e 'kfunc:vfs_read { @[*args] = count(); } i:ms:100 { exit(); }'
Attaching 2 probes...

[...]
@[{.file=0xffff9fbe8164df00,.buf=0x3434008bd400,.count=1024,.pos=0xffffb2030f24ff00}]: 1
@[{.file=0xffff9fbe4496e400,.buf=0x7ffe84009f10,.count=64,.pos=0xffffb20300757e60}]: 2
@[{.file=0xffff9fbf0c666100,.buf=0x7fff1d7d8577,.count=1,.pos=0x0}]: 2

The main change is that probe args are now represented using a special record with struct <probename>_args type and hence can be treated as any other record type. Construction of the type is different for kfuncs and uprobes:

kfunc: the ctx pointer points to the struct of args which has exactly the same layout as our representation, so the implementation is straightforward.
uprobes: args are stored in registers, so when *args occurs in the code, the args record is constructed on BPF stack and filled from registers. Note: this is done every time, even when only args->x is used (and it would be sufficient to read one register). The reason for this is that creating entire *args elegantly fits the current implementation and since there are at most 6 args supported, this should be quite a cheap operation.

This required some minor changes, especially to codegen tests, see commit messages for more details.

Checklist

Language changes are updated in man/adoc/bpftrace.adoc and if needed in docs/reference_guide.md
User-visible and non-trivial changes updated in CHANGELOG.md
The new behaviour is covered by tests

viktormalik · 2023-01-19T09:38:00Z

ab7681e will go away when #2474 is resolved (do not merge before that).

fbs · 2023-01-22T11:37:40Z

looks good!

viktormalik · 2023-01-25T09:12:07Z

Rebased on master, ready for merge.

viktormalik · 2023-03-27T09:32:29Z

@fbs you didn't put an explicit approval - can we merge this?

@danobi, @ajor want to have a look before that?

Thanks!

ajor

First of all I just want to say that I really like the idea behind this change - the examples look great!

Some of the commits in this PR are simple and unobjectionable and could be merged independently - that would make the core change here a lot less intimidating to go through :)

Language Design

I know that in the current implementation, args is treated as a pointer, but does it need to be?

I don't remember, but I'm assuming I made it like that just because it was easiest - just pass the context pointer directly around and call it "args".

Ideally the bpftrace language won't become full of cruft because of lazy decisions made in the past. It seems to me that a pointer doesn't actually make sense, since we don't allow using it without first dereferencing.

What do you think about allowing use of args without dereferencing? e.g.:

kfunc:vfs_read { @a = args }
kfunc:vfs_read { @b = args.file }

The old syntax of "args->file" could be kept for backwards compatibility (at least for a while).

Bugs

I came across a couple of bugs when playing around with this new builtin:

Error dereferencing args in tracepoint

# ./build/src/bpftrace -e 'tracepoint:syscalls:sys_enter_read { @ = *args }'
Attaching 1 probe...                                                                                      
ERROR: failed to create map: '@': Invalid argument                                                        
Creation of the required BPF maps has failed.                                                             
Make sure you have all the required permissions and are not confined (e.g. like                           
snapcraft does). `dmesg` will likely have useful output for further troubleshooting

Segfault dereferencing args when function is not defined

void foo(int a, int b);
int main() { }

# ./build/src/bpftrace -e 'uprobe:/tmp/a.out:foo { @ = *args }'
Segmentation fault

ajor · 2023-03-27T16:54:58Z

src/output.cpp

+    std::ostringstream res;
+    res << "0x" << std::hex << read_data<uint64_t>(value.data());
+    return res.str();


I see this is a common pattern in output.cpp, but for future reference I think we'll need to change this at some point (not now), as output is one place where we actually need to care about the performance of bpftrace. Taking longer here can mean we don't keep up with events from the kernel.

fbs · 2023-03-28T09:50:42Z

Language Design

I know that in the current implementation, args is treated as a pointer, but does it need to be?

I don't remember, but I'm assuming I made it like that just because it was easiest - just pass the context pointer directly around and call it "args".

Ideally the bpftrace language won't become full of cruft because of lazy decisions made in the past. It seems to me that a pointer doesn't actually make sense, since we don't allow using it without first dereferencing.

What do you think about allowing use of args without dereferencing? e.g.:
kfunc:vfs_read { @a = args }
kfunc:vfs_read { @b = args.file }
The old syntax of "args->file" could be kept for backwards compatibility (at least for a while).

Maybe we should just drop the -> syntax completely? Languages like rust and go are fine without it. I think the type system is good enough to know when we have pointers by now. $x.a can just be transformed into (*$x).a without the user having to remember?

ajor · 2023-03-29T09:48:36Z

Interesting idea. I'll need to think about it some more but I like simplifying the language.

My main concern is unintentionally breaking some part of the C language, like happened when we didn't require the "struct" prefix when casting: #682
A minor concern is confusing C programmers, but I'm sure they'll get used to it.

viktormalik · 2023-03-29T10:18:53Z

My main concern is unintentionally breaking some part of the C language, like happened when we didn't require the "struct" prefix when casting: #682 A minor concern is confusing C programmers, but I'm sure they'll get used to it.

Well, we'd have to support -> for backwards compatibility so that shouldn't be an issue. Basically -> and . would have exactly the same effect.

What would change is that users wouldn't be able to rely on the type system to warn them if they use . to access pointer variables and vice-versa. But I doubt that anyone relies on that in bpftrace.

viktormalik · 2023-04-06T14:24:41Z

Per agreement with @ajor, I'm splitting this into smaller PRs.

The first three commits are now separated into #2556 and #2557.

ajor · 2023-04-07T09:59:01Z

I'm having second thoughts about the move from "->" to "." now. What would we do about dereferencing pointers when not accessing fields?

C++ and Rust are two languages that mainly use the "." syntax to access struct fields, but that's because they mainly use references instead of pointers. Reading a reference gives the value being pointed to rather than the address it holds (it's implicitly dereferenced). We wouldn't want this behaviour for pointers in bpftrace though, since users will often want and expect an address if they're reading a pointer.

Maybe we should keep pointers and references as separate ideas - we can introduce reference types for higher level language support later, but it feels like this change would be creating something half way between a reference and a pointer. Opinions?

I'm also not sure that it will help with this PR for printing args - if args remains as a pointer type, then printing it will still require *args, right? But that dereference was the part I was hoping we could get rid of.

danobi · 2023-04-08T13:12:02Z

I didn't take a very close look at the PR, but the runtime test examples look great.

I'm having second thoughts about the move from "->" to "." now. What would we do about dereferencing pointers when not accessing fields?

I think simplicity is a good north star for the bpftrace language. While I like the idea of a unifying . operator to dereference pointers, I think it is too late to deprecate ->. We have countless examples and scripts out in the wild. While we could retain support but "soft deprecate it", I suspect that'll just increase confusion. At least for me, I get distrustful of docs/examples that use deprecated features.

C++ and Rust are two languages that mainly use the "." syntax to access struct fields, but that's because they mainly use references instead of pointers. Reading a reference gives the value being pointed to rather than the address it holds (it's implicitly dereferenced). We wouldn't want this behaviour for pointers in bpftrace though, since users will often want and expect an address if they're reading a pointer.

Maybe we should keep pointers and references as separate ideas - we can introduce reference types for higher level language support later, but it feels like this change would be creating something half way between a reference and a pointer. Opinions?

I think reference types would require some more discussion, perhaps at an office hours. I have lots of thoughts on that :).

I'm also not sure that it will help with this PR for printing args - if args remains as a pointer type, then printing it will still require *args, right? But that dereference was the part I was hoping we could get rid of.

I don't think we can get rid of it easily. For example rust even supports *ptr deref syntax.

viktormalik · 2023-04-11T06:31:41Z

I'm having second thoughts about the move from "->" to "." now. What would we do about dereferencing pointers when not accessing fields?

I think simplicity is a good north star for the bpftrace language. While I like the idea of a unifying . operator to dereference pointers, I think it is too late to deprecate ->. We have countless examples and scripts out in the wild. While we could retain support but "soft deprecate it", I suspect that'll just increase confusion. At least for me, I get distrustful of docs/examples that use deprecated features.

I agree, deprecating -> doesn't seem like a good idea. I'd still consider allowing . where -> can be used, though. It may be a bit confusing but we'd basically allow the . operator to dereference if applied to a pointer, for the sake of language simplicity.

I think reference types would require some more discussion, perhaps at an office hours. I have lots of thoughts on that :).

Yes, let's discuss on the next office hours :-)

I'm also not sure that it will help with this PR for printing args - if args remains as a pointer type, then printing it will still require *args, right? But that dereference was the part I was hoping we could get rid of.

I don't think we can get rid of it easily. For example rust even supports *ptr deref syntax.

But do we need to keep args to be a pointer? It's a language builtin so I don't see a problem making it a struct (and perhaps allowing -> on it, for backwards compatibility).

Anyways, allowing . to dereference when necessary would resolve the problem even if args remains a pointer.

viktormalik · 2023-04-24T14:43:27Z

After giving it more thought, I realized that we're talking about two independent problems here:

allowing . in place of -> for generic struct access,
making args be a struct instead of a pointer (inherently allowing to use . on it).

While I'm not sure if the former is a good idea (as it may introduce confusion), I like the latter as args is a bpftrace language builtin and the value of the raw args pointer should never be accessed. The implementation is in #2578.

viktormalik · 2023-06-06T13:15:16Z

Rebased onto master which now includes #2578 which started treating args as a struct. This PR now adds capabilities to use args in variables and maps (for kfunc and uprobe probes).

danobi

nice!

ajor

I have a couple of questions to help me understand this area better:
Q1: Will we able to do something similar to this for tracepoint args in the future or is there some fundamental limitation for them?

Q2: Why are uprobe args handled differently from kfunc args? (put on the stack)

I also found a few more problems which we'll need to sort out before merging:

uprobe arg collisions

#include <stdio.h>

void __attribute__((noinline)) bar(int a, int b, int c, int d, int e, int f) {                                                                                        
  printf("hi\n");
}
void __attribute__((noinline)) baz(char g, char h, char i) {                                                                                                          
  printf("hi\n");
}

int main() {                                                                                                                                                          
  bar(11,12,13,14,15,16);
  baz(21, 22, 23);
}

gcc test.c -g -O0

Works fine with just one uprobe:

sudo ./src/bpftrace -e 'uprobe:/tmp/a.out:baz { print(args); }'                                                
Attaching 1 probe...                                                                                                                                                  
{ .g = 21, .h = 22, .i = 23 }                                                                                                                                         
^C

Wrong results with multiple uprobes:

sudo ./src/bpftrace -e 'uprobe:/tmp/a.out:bar { print(args); } uprobe:/tmp/a.out:baz { print(args); }'         
Attaching 2 probes...                                                                                                                                                 
{ .a = 11, .b = 12, .c = 13, .d = 14, .e = 15, .f = 16 }                                                                                                              
{ .g = 21, .h = 0, .i = 0 }                                                                                                                                           
^C

Crash when uprobe order is swapped:

sudo ./src/bpftrace -e 'uprobe:/tmp/a.out:baz { print(args); } uprobe:/tmp/a.out:bar { print(args); }' 
bpftrace: /usr/include/llvm/IR/Instructions.h:922: llvm::Type* llvm::checkGEPType(llvm::Type*): Assertion `Ty && "Invalid GetElementPtrInst indices for type!"' failed.
Aborted

Error message

The error message for uprobes with lots of arguments isn't very nice:

sudo ./src/bpftrace -e 'uprobe:/tmp/a.out:foo { print(args); }'
ERROR: Failed to compile: array::at: __n (which is 6) >= _Nm (which is 6)

Repro:

void __attribute__((noinline)) foo(int a, int b, int c, int d, int e, int f, int g) {                                                                                 
}

int main() {                                                                                                                                                          
  foo(1,2,3,4,5,6,7);                                                                                                                                            
}

gcc test.c -g -O0

src/ast/passes/semantic_analyser.cpp

src/ast/irbuilderbpf.cpp

viktormalik · 2023-07-17T07:13:33Z

Thanks for the review! I added fixup commits to simplify another reviewing and will squash them before merge.

I have a couple of questions to help me understand this area better:

Q1: Will we able to do something similar to this for tracepoint args in the future or is there some fundamental
limitation for them?

We should be. Tracepoint args are passed in a similar way to kfuncs, but they have some specialties, like extra fields at the beginning of the struct which we will want to omit or __data_loc fields. Once this is merged, I'll open an issue to track support for tracepoints.

Q2: Why are uprobe args handled differently from kfunc args? (put on the stack)

That's b/c for kfuncs, the BPF program gets passed a pointer to a struct containing all the arguments (pretty much exactly what we want). OTOH, uprobe args are passed via registers (the passed context is pointer to pt_regs) so we need to build that "struct of args" on stack to have the same codegen representation of args for both kfuncs and uprobes (and eventually tracepoints).

ajor

Nice!

ajor · 2023-07-17T08:53:55Z

src/ast/irbuilderbpf.cpp

+
+llvm::Type *IRBuilderBPF::UprobeArgsType(const SizedType &args_type)
+{
+  auto type_name = args_type.GetName().substr(strlen("struct "));


Suggested change

auto type_name = args_type.GetName().substr(strlen("struct "));

auto type_name = args_type.GetName();

type_name.erase(0, strlen("struct "));

Removes the prefix in-place to avoid creating a temporary std::string with substr(), eliminating an unnecessary malloc/memcpy/free

Good idea, fixed.

When trying to access the 'args' builtin for a probe in which the arguments cannot be parsed (e.g. are not present in DWARF/BTF), throw an error in semantic analyser (instead of segfaulting in codegen).

This was disabled by commit 722c5c7 ("Error if trying to assign args to a variable in kfunc") since it was not supported at the time. Now, FieldAnalyser supports this so it can be enabled.

Since args is now representing a record containing all function args, it may be used as a map key/value since the it will be proberead and copied into the map. For now, this only works for k(ret)func probes.

Allows to pull all arguments of a uprobe from DWARF into a single structure (record) using "args". The record can be used as any other record - printed, stored inside maps/variables, used as a map key, etc. This enables constructions like: print(args); @[args] = count(); @[tid] = args; The implementation builds on the fact that probe args are now stored inside struct manager under a special "struct <probename>_args" type. When accessing the "args" builtin in a uprobe, this builds a new record (having the above type) on stack and fills it with probe args (from registers). One effect of this is that all uprobe args are always read onto stack, even when a single arg is necessary (e.g. "args.a" is used). This is a bit inefficient but much better fits the current implementation. Since there are at most 6 args, the slowdown should be small. Also a difference from other records in bpftrace is that we don't use byte arrays to represent the new record with an LLVM type. The reason is that it is easier to use GEPs with indices to access individual args, instead of using byte offsets inside the new record. Adds runtime tests for the above constructions.

Add entry to the reference guide and manpage about the `args` builtin.

viktormalik · 2023-07-17T09:16:26Z

Squashed the fixup commits, ready for merge.

viktormalik requested review from ajor, danobi, fbs and BurntBrunch as code owners January 19, 2023 09:34

viktormalik force-pushed the print-args branch from 17065de to 324fe35 Compare January 25, 2023 09:10

fbs removed the request for review from BurntBrunch March 27, 2023 11:42

ajor self-assigned this Mar 27, 2023

ajor requested changes Mar 27, 2023

View reviewed changes

ajor added the waiting-on-author label Apr 6, 2023

viktormalik marked this pull request as draft April 6, 2023 18:34

viktormalik mentioned this pull request Apr 26, 2023

Make args a structure (instead of a pointer) #2578

Merged

3 tasks

viktormalik force-pushed the print-args branch 2 times, most recently from b507cc3 to be7487f Compare June 6, 2023 12:53

viktormalik marked this pull request as ready for review June 6, 2023 13:15

viktormalik requested a review from ajor June 6, 2023 13:43

viktormalik removed the waiting-on-author label Jun 6, 2023

viktormalik added the waiting-for-feedback label Jun 14, 2023

danobi approved these changes Jun 16, 2023

View reviewed changes

viktormalik force-pushed the print-args branch from be7487f to 49f634f Compare July 11, 2023 06:46

ajor requested changes Jul 14, 2023

View reviewed changes

src/ast/passes/semantic_analyser.cpp Show resolved Hide resolved

src/ast/irbuilderbpf.cpp Outdated Show resolved Hide resolved

viktormalik force-pushed the print-args branch from 45e736d to b396c34 Compare July 17, 2023 07:30

ajor approved these changes Jul 17, 2023

View reviewed changes

viktormalik force-pushed the print-args branch from b396c34 to edb1aa5 Compare July 17, 2023 09:03

viktormalik added 6 commits July 17, 2023 11:15

Semantic analyser: fail on accessing empty args

2410173

When trying to access the 'args' builtin for a probe in which the arguments cannot be parsed (e.g. are not present in DWARF/BTF), throw an error in semantic analyser (instead of segfaulting in codegen).

Allow assigning args to a local var in kfunc

5f23edc

This was disabled by commit 722c5c7 ("Error if trying to assign args to a variable in kfunc") since it was not supported at the time. Now, FieldAnalyser supports this so it can be enabled.

Allow args as map key/value in k(ret)func

2375d26

Since args is now representing a record containing all function args, it may be used as a map key/value since the it will be proberead and copied into the map. For now, this only works for k(ret)func probes.

Document the args builtin

77ea10a

Add entry to the reference guide and manpage about the `args` builtin.

Update CHANGELOG (improved work with args)

44ca3aa

viktormalik force-pushed the print-args branch from edb1aa5 to 44ca3aa Compare July 17, 2023 09:15

viktormalik merged commit 3c417c8 into bpftrace:master Jul 18, 2023
21 checks passed

viktormalik mentioned this pull request Jul 20, 2023

Allow working with entire args for tracepoints #2688

Open

viktormalik deleted the print-args branch July 21, 2023 08:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow working with all probe params (kfunc, uprobe) #2477

Allow working with all probe params (kfunc, uprobe) #2477

viktormalik commented Jan 19, 2023 •

edited

Loading

viktormalik commented Jan 19, 2023

fbs commented Jan 22, 2023

viktormalik commented Jan 25, 2023

viktormalik commented Mar 27, 2023

ajor left a comment

ajor Mar 27, 2023

fbs commented Mar 28, 2023

Language Design

ajor commented Mar 29, 2023

viktormalik commented Mar 29, 2023

viktormalik commented Apr 6, 2023

ajor commented Apr 7, 2023

danobi commented Apr 8, 2023 •

edited

Loading

viktormalik commented Apr 11, 2023

viktormalik commented Apr 24, 2023

viktormalik commented Jun 6, 2023

danobi left a comment

ajor left a comment

viktormalik commented Jul 17, 2023

ajor left a comment

ajor Jul 17, 2023

viktormalik Jul 17, 2023

viktormalik commented Jul 17, 2023

	auto type_name = args_type.GetName().substr(strlen("struct "));
	auto type_name = args_type.GetName();
	type_name.erase(0, strlen("struct "));

Allow working with all probe params (kfunc, uprobe) #2477

Allow working with all probe params (kfunc, uprobe) #2477

Conversation

viktormalik commented Jan 19, 2023 • edited Loading

Checklist

viktormalik commented Jan 19, 2023

fbs commented Jan 22, 2023

viktormalik commented Jan 25, 2023

viktormalik commented Mar 27, 2023

ajor left a comment

Choose a reason for hiding this comment

Language Design

Bugs

Error dereferencing args in tracepoint

Segfault dereferencing args when function is not defined

ajor Mar 27, 2023

Choose a reason for hiding this comment

fbs commented Mar 28, 2023

Language Design

ajor commented Mar 29, 2023

viktormalik commented Mar 29, 2023

viktormalik commented Apr 6, 2023

ajor commented Apr 7, 2023

danobi commented Apr 8, 2023 • edited Loading

viktormalik commented Apr 11, 2023

viktormalik commented Apr 24, 2023

viktormalik commented Jun 6, 2023

danobi left a comment

Choose a reason for hiding this comment

ajor left a comment

Choose a reason for hiding this comment

uprobe arg collisions

Error message

viktormalik commented Jul 17, 2023

ajor left a comment

Choose a reason for hiding this comment

ajor Jul 17, 2023

Choose a reason for hiding this comment

viktormalik Jul 17, 2023

Choose a reason for hiding this comment

viktormalik commented Jul 17, 2023

viktormalik commented Jan 19, 2023 •

edited

Loading

danobi commented Apr 8, 2023 •

edited

Loading