RFC: Kernel Runtime BPF support checks #1087

dalehamel · 2020-01-14T01:30:47Z

Currently the features provided by the bpftrace binary are a product of the system that bpftrace is built on. If the headers available at build-time have a particular feature, then bpftrace uses them.

This hurts the portability of bpftrace, as it means that it has to be built for each system that it runs on. Now that #1041 has merged to help adress #342, this is more relevant of an issue - systems that can potentially run a bpftrace that tries to use features their kernel doesn't support. This causes issues like #1014 and for various headers to have to be bundled in. This issue was first raised by @danobi in #342 (comment).

The alternative paths that a solution might take, that I've thought of so far are:

We build bpftrace for each kernel version which adds to or modifies the BPF API, and users can distribute bpftrace per kernel
We teach bpftrace to detect this at runtime, and either take fallback paths for compatibility
Throw an exception / catch whatever exception arises, and warn the user that the feature is not supported, or perhaps not fully supported.

I think that 1) is probably going to be a lot of work and a bit of a pain. The checks we have at build-time are still good for ensuring that bpftrace can build the superset of functionality, but I'd prefer if it were able to still work robustly if it is running against an older kernel. For this reason, i favor 2) or 3).

I'm not actually sure what issues we'll run into, I'd guess that there will probably be some bugs reported against the 0.9.4 semi-static binaries that could give some insight. I know that we can at least see cases where we do #ifdef in the code now, so I'd guess that for instance someone trying to use the cgroup builtin on a system that doesn't support it could run into this problem.

So, if we go the path of checking kernel compatibility, there will probably be more code complexity as we have to handle fallbacks / stubbing functionality / warning on older systems.

So, the error handling path might be the best one, as it is already a runtime system for handling fallbacks. I'd guess that we can just design a new exception KernelVersionException or BpfApiException or something to that effect, and either warn or throw a fatal error depending on what is appropriate.

For a lot of users, a fatal exception for a script that tries to use unhandled features isn't degrading their experience, because they couldn't run the script anyways - at least now they know why. In other cases, if functionality is missing but not critical (ie, there is a reasonable fallback), then a warning message can be printed, or the fallback to something less efficient could happen transparently.

@fbs @mmarchini @danobi any thoughts on how we should tackle this?

The text was updated successfully, but these errors were encountered:

fbs · 2020-01-14T08:27:40Z

might be tricky to implement as you'd have to inspect the error verifier gives (e.g. for loops it's 'invalid back edge' but others might not be unique) and keep the code in sync with that. If you have bugs in that you will throw errors that make users think the kernel is missing features while it is just us reading the verifier log incorrectly / bad codegen.

I think 2) is the way to go. The overhead of loading a few small test programs into the kernel shouldn't be too bad. The bcc helpers already try loading your programs a few times in case of failures.

@olsajiri already implemented a feature detector in #871 which I've stolen for #1066 (b92a5c4). I wanted to clean that up and put that up as a separate PR to get some feedback on it. The idea is that you write a few lines of "raw bpf" that will be loaded in the kernel and give you a yes/no which you can then use in the semantic analyser to give the user a nice error.

I'm not sure how we should handle the BPF_FUNC_ enum (which provides the kernel helper function id). If the ID is considered "part of the stable api" and won't change between kernel versions we could consider copying it from the kernel. That way we will always have to correct ID there when building bpftrace and avoids ugliness like: #966 (comment). For every new helper we add we can also update the ID. (If the kernel ever changes these we'd have to update the codegen tests anyway). Detection should be fairly easy:

bpf: Failed to load program: Invalid argument
0: (85) call unknown#109
invalid func unknown#109

If we land something like ^^ it should also be easy to add a feature test commmand that will show you what the build is capable off. e.g.:

$ bpftrace --info
bpftrace version 0.9.3
builddate 1-1-2020 13:37:00
Build options:
bcc version 0.11
HAVE_BCC_ELF_FOREACH_SYM

Kernel support:
cgroupid: yes
signal: yes
loops: no

mmarchini · 2020-01-15T01:55:38Z

+1 for options 2

I'm not sure how we should handle the BPF_FUNC_ enum

We already have a C parser using clang, we can probably use that to get the enum during runtime.

fbs · 2020-08-19T16:57:43Z

think we fixed this,

fbs mentioned this issue Jan 17, 2020

Runtime feature checking #1088

Merged

fbs closed this as completed Aug 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Kernel Runtime BPF support checks #1087

RFC: Kernel Runtime BPF support checks #1087

dalehamel commented Jan 14, 2020

fbs commented Jan 14, 2020

mmarchini commented Jan 15, 2020

fbs commented Aug 19, 2020

RFC: Kernel Runtime BPF support checks #1087

RFC: Kernel Runtime BPF support checks #1087

Comments

dalehamel commented Jan 14, 2020

fbs commented Jan 14, 2020

mmarchini commented Jan 15, 2020

fbs commented Aug 19, 2020