Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BTF for kernel modules #2315

Merged
merged 14 commits into from
Nov 16, 2022
Merged

BTF for kernel modules #2315

merged 14 commits into from
Nov 16, 2022

Conversation

viktormalik
Copy link
Contributor

@viktormalik viktormalik commented Aug 3, 2022

This introduces support for BTF information for kernel modules.

BTF for kernel modules is nowadays shipped by many distros (it is exposed via /sys/kernel/btf/<module-name>). Using it enables BTF-depending features for modules, in particular:

  • k(ret)func probes for module functions including access to function arguments by name
  • automatic type resolution of types defined in modules
  • listing of types defined in modules

This is quite tricky to test, though, as it either requires to insmod a custom module before running the tests or to rely on some common module that should be available on most systems. In the end, I went with the second way - the runtime tests hook into the nft_trans_alloc_gfp function from the nf_tables module which is pretty common and is invoked by the nft utility. The module is now also loaded in CI to make sure that the tests are executed there.

This PR also features one important change - FieldAnalyser now fully resolves structure types from BTF/DWARF (including struct fields) and Clang parser is only called if there are some user-defined types/includes or if FieldAnalyser was not able to resolve something (for instance, it doesn't support bitfields at the moment). This should bring some execution speedup and is also crucial for type resolution from kernel modules since BTF dumping doesn't work properly for them.

In addition, there are several minor changes done:

  • BTF parsing is refactored to use as many functions from libbpf as possible
  • runtime tests now allow multiple REQUIRES fields (all must be satisfied for the test to run)
  • unit tests for FieldAnalyser were moved into a separate test suite (they were previously scattered between ClangParser and SemanticAnalyser tests)
Checklist
  • Language changes are updated in man/adoc/bpftrace.adoc and if needed in docs/reference_guide.md
  • User-visible and non-trivial changes updated in CHANGELOG.md
  • The new behaviour is covered by tests

@lgtm-com
Copy link

lgtm-com bot commented Aug 3, 2022

This pull request fixes 1 alert when merging e130b1b into 05cfa8e - view on LGTM.com

fixed alerts:

  • 1 for Time-of-check time-of-use filesystem race condition

Copy link

@BurntBrunch BurntBrunch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functionality-wise, this is great! I do worry about how it scales with modules/complexity of BTF though, a lot of the work is eager/iterates over everything. There may be a future "make all this lazy" issue, especially if we start dumping module types.

src/btf.cpp Show resolved Hide resolved
tests/runtime/engine/runner.py Outdated Show resolved Hide resolved
src/attached_probe.cpp Outdated Show resolved Hide resolved
src/btf.cpp Outdated Show resolved Hide resolved
@viktormalik
Copy link
Contributor Author

I do worry about how it scales with modules/complexity of BTF though, a lot of the work is eager/iterates over everything. There may be a future "make all this lazy" issue, especially if we start dumping module types.

Indeed, I have the same impression. I gave it some more thought and came with some observations/ideas:

  • We have to iterate everything at least once to find out which module (or let's say "BTF source") contains the function that we're attaching to. After that, we could store that BTF source within the probe and use it for the rest of the BTF queries (e.g. type resolution).
  • This may be a problem if one probe attaches to multiple modules (e.g. via wildcards). Not sure if we want to allow something like that, but we could store a list of BTF sources, which could be fine as it'll usually be of size 1.
  • In general, two modules may define functions/types with the same name. We should probably handle that, so I'd suggest to:
    • print a warning when user tries to access a function/type defined in multiple modules
    • allow the user to explicitly specify which module he wants to access (e.g. kfunc:modname:function)

Does this make sense? Any ideas how we could do even better?

@BurntBrunch
Copy link

Okay, how about this:

  1. Parse kallsyms and keep the module names that symbols come from
  2. When resolving wildcards, resolve against kallsyms, remembering the module, then filter to only attachable.
  3. When we're ready to attach, we now know the module names we need, so we don't have to parse everything.

This would save us from walking all BTF types, though we'd still have to iterate over all modules because of the ID, FD tuple.
The downside is that kallsyms gets pretty big but honestly, I think it's useful enough to be worth it.

allow the user to explicitly specify which module he wants to access (e.g. kfunc:modname:function)

I think there's value in this syntax extension regardless! However, I think it should be function:modname so you don't have to edit the middle of the probe spec when clarifying the module name.

@viktormalik
Copy link
Contributor Author

Okay, how about this:

  1. Parse kallsyms and keep the module names that symbols come from
  2. When resolving wildcards, resolve against kallsyms, remembering the module, then filter to only attachable.

And what about parsing sys/kernel/debug/tracing/available_filter_functions? It also contains the module name and we're parsing it anyway to get the list of traceable functions. And it's also considerably smaller than kallsyms.

I think there's value in this syntax extension regardless! However, I think it should be function:modname so you don't have to edit the middle of the probe spec when clarifying the module name.

While I see your point, I think we should stay consistent with other probe types. E.g. tracepoints have the form tracepoint:subsystem:event, which corresponds more to kfunc:module:function.

@BurntBrunch
Copy link

And what about parsing sys/kernel/debug/tracing/available_filter_functions?

That's what I meant by "filter to attachable" above. I didn't realize it contains the module name, so, sure, we don't need kallsyms for this filtering.

Most reasons to look at kallsyms are relevant for kprobe anyway, which doesn't matter here.

While I see your point, I think we should stay consistent with other probe types.

Sure, my opinion is very weakly held. 😃 Besides, this module specifier would be mostly useful for wildcard resolution (duplicate symbols are not something that people would normally encounter), in which case function/pattern at the end makes sense.

@viktormalik
Copy link
Contributor Author

Version 2 with several major improvements:

  • before parsing BTF, collect names of all modules that we're attaching to from available_filter_functions, then parse BTF only for those modules (vmlinux BTF is always parsed), which reduces the amount of BTF that must be iterated
  • enable kfunc:module:function syntax which allows to attach to functions from specific module(s) only, supports also wildcards
  • minimal required libbpf is now 0.6.0
  • addressed all the PR comments and added several more fixes

More implementation details can be found in commit messages. The PR is now quite large, so I'm happy to break it if it simplifies reviewing.

Note: there is a BPF verifier bug which causes incorrect attachment when dealing with multiple functions from different modules (or vmlinux) having the same name. In practice, it will always attach to the first function of that name found in kallsyms. I'm working on a fix but I think that we can proceed with this independently as the situation should occur rarely. If someone finds it useful, we can show a warning or mention it in the reference guide.

@lgtm-com
Copy link

lgtm-com bot commented Nov 2, 2022

This pull request fixes 1 alert when merging 834f846 into d470220 - view on LGTM.com

fixed alerts:

  • 1 for Time-of-check time-of-use filesystem race condition

Copy link

@BurntBrunch BurntBrunch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the as-lazy-as-possible approach! The new interactions between FieldAnalyser and ClangParser were a little hard to figure out but I think that's just me not being familiar with the older code.

More implementation details can be found in commit messages.

I wish there was a nice architectural diagram somewhere that we could change as part of this rework. Oh well.

I'm very much of the opinion that we should merge this and bake it pre-release instead of trying to iron out every possible edge case in the PR.

CHANGELOG.md Outdated Show resolved Hide resolved
.github/workflows/ci.yml Show resolved Hide resolved
src/attached_probe.cpp Show resolved Hide resolved
src/btf.cpp Outdated Show resolved Hide resolved
src/btf.cpp Outdated Show resolved Hide resolved
src/btf.cpp Show resolved Hide resolved
src/btf.h Outdated Show resolved Hide resolved
src/probe_matcher.cpp Outdated Show resolved Hide resolved
src/struct.h Outdated Show resolved Hide resolved
src/utils.cpp Outdated Show resolved Hide resolved
When listing kfunc params, the params are indented with two spaces. The
test checks that there exists a line starting with whitespaces, but that
also matches the newline character from the previous line, leading to
false negative results.

Fix this by checking for spaces only (instead of any whitespace).
Also, remove an incorrect grep call which would filter out the lines
with parameters.
There are 2 places where we basically reimplement the same thing that
libbpf does:
- finding and parsing vmlinux BTF data - this is implemented by
  btf__load_vmlinux_btf
- finding BTF id of a function type in BTF::resolve_args - this can be
  done using btf__find_by_name_kind

Using the mentioned libbpf's functions allows to remove large portions
of code.
This is a preparation step for supporting BTF for kernel modules.

Several notable refactorings are done:
- Operations based on iterating through all the types in vmlinux BTF are
  refactored in a way that the BTF to iterate may be supplied as a
  parameter. The original methods are called with the vmlinux BTF, i.e.,
  there should be no functional change for now.  At the same time,
  BTF::btf is renamed to BTF::vmlinux_btf.
- A function for looking up a BTF entry is created. It'll be later
  expanded by searching the kernel modules BTF, too.
- The above changes often require to pass a BTF id along with the source
  BTF containing the id definition. To facilitate this, a new type
  BTF::BTFId is created, which encapsulates these information.
When parsing /sys/kernel/debug/tracing/available_filter_functions, store
also the module name for each symbol. Note that multiple modules may
define a function with the same name.

BPFtrace::traceable_funcs_ becomes a map function-->set of modules.

This will be useful for handling BTF for modules as we'll know which
modules are used and won't have to iterate BTF for all of them.
Almost all the features that require BTF are now supported for kernel
modules. These include namely:
- k(ret)func are available for module functions, including access to
  function arguments by name
- listing of types defined in modules

The only feature that is not supported is automatic resolution of types
defined in modules. The reason is that this requires BTF type dumping,
which is complicated for modules as we would have to deal with type
redefinition conflicts. Also, for a lot of modules, dumping may take too
much time, so it is better to resolve this differently anyway. The
solution will come in the upcoming commits.

There is a major change in the way BTF is parsed. Instead of parsing it
directly from the constructor of BPFtrace, it is rather parsed on-demand
(BTF is now stored inside a unique_ptr in the BPFtrace class). The
reason for this is that we first parse the attachpoints, gather the list
of kernel modules that are used, and then only parse BTF for those
modules. This keeps the approach quite fast as we usually need to parse
only several modules. BTF for vmlinux is parsed every time.

The list of modules for the attached functions is obtained from the list
of traceable functions (available_filter_functions from tracefs). When
wildcards are used, this approach causes a chicken-and-egg problem in
the probe matcher as it doesn't have BTF when parsing the attachpoints
for the first time (to get the list of modules). So, we let it read the
list of symbols for k(ret)func from available_filter_functions if BTF is
not available. Note that this is not entirely precise (not all traceable
functions have a BTF entry) but it is sufficient to get the list of
modules.
Even though bpftrace represents enums using int type, the size of an
enum must be parsed in a different way from BTF.
Resolve struct fields directly when parsing BTF instead of offloading
the job to ClangParser. This allows to work with types whose definitions
cannot be resolved using the Clang parser, e.g. types defined in kernel
modules (as struct def dumping from module BTFs is not supported now).

Bitfields and data_loc fields are not supported for now.

Also, since querying BTF may now take longer with kernel modules,
FieldAnalyser always tries to resolve types from DWARF first.
If ClangParser has nothing to resolve (i.e. there are no user includes,
no user type definitions, and no types to resolve from BTF), it doesn't
have to run, which will save some execution time.

This requires some changes to FieldAnalyser, the most important one is
that it only adds those types to BPFtrace::btf_set_ which it cannot
resolve on its own.

In practice, this means that ClangParser will run in a much smaller
amount of cases than before.

Note: this brings some speedup in FieldAnalyser thanks to the fact that
it now tracks types using SizedType (instead of type name) and doesn't
have to call BTF::type_of that often.
The tests check basic BTF-based features, namely attachement to kfunc,
access to args by name, and resolution of struct types from BTF.

The tests attach to the nft_trans_alloc_gfp function from the nf_tables
module which is called every time a new nftables table is created.
Each test creates a temporary table called "bpftrace" which is
afterwards deleted.

Requires nf_tables module to be loaded and 'nft' command to be installed
in /usr/sbin/nft, otherwise the tests are skipped. The module is also
manually loaded in CI to make sure that the tests are executed.
Since FieldAnalyser is getting more and more functionality, it makes
sense to make a separate unit test suite for it and move existing tests
there.

There are 2 kinds of tests corresponding to the main responsibilities of
FieldAnalyser:
- Tests for resolution of function args from BTF/DWARF (originally
  located in SemanticAnalyser tests).
- Tests for resolution of struct type fields from BTF/DWARF (originally
  located in ClangParser tests). These are also expanded to check that
  FieldAnalyser is now able to correctly parse struct types (without
  running ClangParser).
When BTF for a kfunc cannot be loaded, fail with appropriate error
message instead of generic "loading program failed".

In addition, when trying to attach to multiple attachpoints, do not fail
hard if one of them does not exist (this behaviour is same to kprobes).
With BTF for kernel modules supported, it may be useful to allow users
to specify the module along with the function name for kfuncs.
This brings several new possibilities such as:
- attaching to all functions in a given module
- specifying a module in case there are multiple functions/types with
  the same name

"vmlinux" can be used as module name to pull functions from vmlinux.

Specifying the module is optional - if no module is specified, all
loaded modules are searched. This should guarantee backwards
compatibility of all existing scripts.

The only visible change from the current behaviour is in function
listing since the format "kfunc:module:function" is yielded now:

    # bpftrace -l 'kfunc:vfs_*'
    kfunc:vmlinux:vfs_cancel_lock
    kfunc:vmlinux:vfs_clean_context
    kfunc:vmlinux:vfs_cleanup_quota_inode
    ...

This is also reflected in the reference guide and the manpage.

There is one known issue - when trying to attach to multiple functions
of the same name (in different modules), BPF verifier only allows to
attach the first one and fails on attaching the others. We try to detect
such a situation and ask the user to specify a module explicitly, to at
least partially mitigate the issue.
The BTF class sets the libbpf print function which is not the right
place to do it for 2 reasons: (1) when using BTF as a library, there is
no reason to change libbpf's behaviour and (2) other parts of bpftrace
use libbpf, too.

Therefore, move setting of the print function to main.cpp.
@viktormalik
Copy link
Contributor Author

I wish there was a nice architectural diagram somewhere that we could change as part of this rework. Oh well.

There is a nice picture in internals_development.md, although it will be partially outdated by now. Still, I can have a look into it while updating the development guide.

@lgtm-com
Copy link

lgtm-com bot commented Nov 7, 2022

This pull request fixes 1 alert when merging f0a9daf into 216d2cb - view on LGTM.com

fixed alerts:

  • 1 for Time-of-check time-of-use filesystem race condition

@danobi
Copy link
Member

danobi commented Nov 15, 2022

I read through most of this PR and it looks quite reasonable.

@viktormalik
Copy link
Contributor Author

Thanks for the reviews, let's not prolong this and merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants