Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Kernel Runtime BPF support checks #1087

Closed
dalehamel opened this issue Jan 14, 2020 · 3 comments
Closed

RFC: Kernel Runtime BPF support checks #1087

dalehamel opened this issue Jan 14, 2020 · 3 comments

Comments

@dalehamel
Copy link
Contributor

Currently the features provided by the bpftrace binary are a product of the system that bpftrace is built on. If the headers available at build-time have a particular feature, then bpftrace uses them.

This hurts the portability of bpftrace, as it means that it has to be built for each system that it runs on. Now that #1041 has merged to help adress #342, this is more relevant of an issue - systems that can potentially run a bpftrace that tries to use features their kernel doesn't support. This causes issues like #1014 and for various headers to have to be bundled in. This issue was first raised by @danobi in #342 (comment).

The alternative paths that a solution might take, that I've thought of so far are:

  1. We build bpftrace for each kernel version which adds to or modifies the BPF API, and users can distribute bpftrace per kernel
  2. We teach bpftrace to detect this at runtime, and either take fallback paths for compatibility
  3. Throw an exception / catch whatever exception arises, and warn the user that the feature is not supported, or perhaps not fully supported.

I think that 1) is probably going to be a lot of work and a bit of a pain. The checks we have at build-time are still good for ensuring that bpftrace can build the superset of functionality, but I'd prefer if it were able to still work robustly if it is running against an older kernel. For this reason, i favor 2) or 3).

I'm not actually sure what issues we'll run into, I'd guess that there will probably be some bugs reported against the 0.9.4 semi-static binaries that could give some insight. I know that we can at least see cases where we do #ifdef in the code now, so I'd guess that for instance someone trying to use the cgroup builtin on a system that doesn't support it could run into this problem.

So, if we go the path of checking kernel compatibility, there will probably be more code complexity as we have to handle fallbacks / stubbing functionality / warning on older systems.

So, the error handling path might be the best one, as it is already a runtime system for handling fallbacks. I'd guess that we can just design a new exception KernelVersionException or BpfApiException or something to that effect, and either warn or throw a fatal error depending on what is appropriate.

For a lot of users, a fatal exception for a script that tries to use unhandled features isn't degrading their experience, because they couldn't run the script anyways - at least now they know why. In other cases, if functionality is missing but not critical (ie, there is a reasonable fallback), then a warning message can be printed, or the fallback to something less efficient could happen transparently.

@fbs @mmarchini @danobi any thoughts on how we should tackle this?

@fbs
Copy link
Contributor

fbs commented Jan 14, 2020

  1. might be tricky to implement as you'd have to inspect the error verifier gives (e.g. for loops it's 'invalid back edge' but others might not be unique) and keep the code in sync with that. If you have bugs in that you will throw errors that make users think the kernel is missing features while it is just us reading the verifier log incorrectly / bad codegen.

I think 2) is the way to go. The overhead of loading a few small test programs into the kernel shouldn't be too bad. The bcc helpers already try loading your programs a few times in case of failures.

@olsajiri already implemented a feature detector in #871 which I've stolen for #1066 (b92a5c4). I wanted to clean that up and put that up as a separate PR to get some feedback on it. The idea is that you write a few lines of "raw bpf" that will be loaded in the kernel and give you a yes/no which you can then use in the semantic analyser to give the user a nice error.

I'm not sure how we should handle the BPF_FUNC_ enum (which provides the kernel helper function id). If the ID is considered "part of the stable api" and won't change between kernel versions we could consider copying it from the kernel. That way we will always have to correct ID there when building bpftrace and avoids ugliness like: #966 (comment). For every new helper we add we can also update the ID. (If the kernel ever changes these we'd have to update the codegen tests anyway). Detection should be fairly easy:

bpf: Failed to load program: Invalid argument
0: (85) call unknown#109
invalid func unknown#109

If we land something like ^^ it should also be easy to add a feature test commmand that will show you what the build is capable off. e.g.:

$ bpftrace --info
bpftrace version 0.9.3
builddate 1-1-2020 13:37:00
Build options:
bcc version 0.11
HAVE_BCC_ELF_FOREACH_SYM

Kernel support:
cgroupid: yes
signal: yes
loops: no

@mmarchini
Copy link
Contributor

+1 for options 2

I'm not sure how we should handle the BPF_FUNC_ enum

We already have a C parser using clang, we can probably use that to get the enum during runtime.

@fbs
Copy link
Contributor

fbs commented Aug 19, 2020

think we fixed this,

@fbs fbs closed this as completed Aug 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants