Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Report helper function errors #1276

Merged
merged 2 commits into from
May 18, 2020
Merged

Conversation

mmisono
Copy link
Collaborator

@mmisono mmisono commented Apr 17, 2020

Some helper functions return error codes when failing, but currently,
bpftrace does not check and discards it. Several times I had to use bcc
to check error codes for debugging.

This patch adds an error checking mechanism, and helper function errors
are reported in runtime using '-k' option. For example:

% sudo BPFTRACE_MAP_KEYS_MAX=2 ./src/bpftrace -e \
  'BEGIN { @[0] = 0; @[1] = 1; @[2] = 2;} -k'
Attaching 1 probe...
stdin:1:35-36: WARNING: Failed to map_update_elem: Argument list too long (-7)
BEGIN { @[0] = 0; @[1] = 1; @[2] = 2;}
                                 ~

^C

@[0]: 0
@[1]: 1

This mechanism is implemented as an async action. By default, errors of
bpf_probe_read()-related functions and bpf_map_lookup_elem() are not
checked. This is because when these functions fail, bpftrace returns
zero as a result, and some scripts already check it, for example, by
using predicate / @[tid] /. -kk option forces to check all errors
including these functions.

Note that the error check increases code size. All codegen tests are
performed with the error check disabled.

BasicBlock *helper_merge_block = BasicBlock::Create(module_.getContext(),
"helper_merge",
parent);
auto *ret = CreateIntCast(return_value, getInt32Ty(), true);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found that a return type of most helper functions was Int64Ty but they actually returned int and we needed to use Int32Ty value to check if a return value was negative. I cast the value here to keep code changes of this PR as small as possible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you leave that as a comment in the code? Will help next person who wonders why there's a cast.

@mmisono
Copy link
Collaborator Author

mmisono commented Apr 17, 2020

I think this solves #464.

I forgot to mention but this uses perf_event_output() for error reporting, so an error of perf_event_output() is not reported. An error message is printed using bpftrace.error() and is not included in json if an output format is json.

@mmisono
Copy link
Collaborator Author

mmisono commented Apr 17, 2020

This is the comparison of processed insn sizes (not the actual code size. Since the verifier explorers all possible code path, processed insn size becomes larger than the code size). Multiple values mean that the scripts have multiple functions.

So, basically error checking increases the number of processed insn by tens or hundreds, but some scripts (especially which have many map updates) increase code size by thousands (e.g., capable.bt). I also measured the compilation times but I didn't see noticeable differences with and without error checking. I didn't measure runtime speed.

I'm still wondering what is the best default behavior, and I'd appreciate any feedback.

tools default --check-all-helper --disable-helper-check
tcpsynbl.bt 14,14,185,185 14,14,478,478 14,14,139,139
bashreadline.bt 161,14 161,14 161,14
tcpaccept.bt 394,391 394,1166 394,331
tcpconnect.bt 362,14 362,14 362,14
runqlen.bt 14,92 14,251 14,70
writeback.bt 2094,34,516,14 2094,34,546,14 828,14,453,14
setuids.bt 270,154,154,272,200,201,314 270,154,154,272,298,299,513 270,45,45,71,123,124,197
mdflush.bt 149,14 149,14 149,14
loads.bt 14,66 14,215 14,66
biosnoop.bt 159,14,253,198 159,14,352,198 159,14,182,42
cpuwalk.bt 14,62 14,77 14,41
opensnoop.bt 211,36,36,266,266,14 211,36,36,318,318,14 211,15,15,193,193,14
capable.bt 10463,14,365 10515,14,379 3382,14,335
naptime.bt 256,182 256,633 256,154
dcsnoop.bt 193,175,14,42 193,223,14,86 193,121,14,21
biostacks.bt 14,14,278,278,132 14,14,341,341,132 14,14,185,185,27
xfsdist.bt 14,250,250,250,250,14,94,94,94,94 14,305,305,305,305,14,94,94,94,94 14,150,150,150,150,14,30,30,30,30
pidpersec.bt 14,42,41,14 14,57,41,14 14,21,41,14
biolatency.bt 14,14,191,34 14,14,236,34 14,14,134,14
runqlat.bt 14,36,36,363,14 14,36,36,388,14 14,16,16,135,14
tcpretrans.bt 3262,14,875 3262,14,1769 1229,14,875
threadsnoop.bt 117,14 117,14 117,14
statsnoop.bt 181,36,36,36,36,161,161,161,161,14 181,36,36,36,36,195,195,195,195,14 181,15,15,15,15,108,108,108,108,14
tcpdrop.bt 3278,14,975 3279,14,1915 1246,14,894
vfsstat.bt 14,41,14,42,42,42,42,42,42,42,42 14,41,14,57,57,57,57,57,57,57,57 14,41,14,21,21,21,21,21,21,21,21
oomkill.bt 14,14 14,14 14,14
syncsnoop.bt 177,70,72,70,72,72,72 177,70,72,70,72,72,72 177,43,44,43,44,44,44
syscount.bt 14,162,14 14,203,14 14,53,14
swapin.bt 41,14 41,14 41,14
execsnoop.bt 101,218 101,707 101,218
killsnoop.bt 227,93,171 227,93,270 227,29,115
bitesize.bt 14,198,14 14,221,14 14,177,14
gethostlatency.bt 193,14,216,216,216,94,94 193,14,275,275,275,94,94 193,14,163,163,163,30,30
tcplife.bt 490,14,1015 490,14,2110 490,14,635

@@ -141,6 +141,8 @@ OPTIONS:
-p PID enable USDT probes on PID
-c 'CMD' run CMD and enable USDT probes on resulting process
-v verbose messages
--check-all-helper check return codes of all helper functions
Copy link
Contributor

@fbs fbs Apr 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
--check-all-helper check return codes of all helper functions
--check-helper-errors emit a warning when a bpf helper returns an error

I think this is a bit more descriptive.

Also alignment :)

src/ast/irbuilderbpf.cpp Outdated Show resolved Hide resolved
src/ast/irbuilderbpf.cpp Outdated Show resolved Hide resolved
src/ast/irbuilderbpf.cpp Outdated Show resolved Hide resolved
src/main.cpp Outdated
@@ -57,14 +57,16 @@ void usage()
std::cerr << " --unsafe allow unsafe builtin functions" << std::endl;
std::cerr << " -v verbose messages" << std::endl;
std::cerr << " --info Print information about kernel BPF support" << std::endl;
std::cerr << " --check-all-helper check return codes of all helper functions" << std::endl;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment and alignment thing here

@fbs
Copy link
Contributor

fbs commented Apr 17, 2020

Nice, this is useful :)

Seems like the processed instruction count is actually reduced in the --disable-helper-check case, what is going on there?

@mmisono
Copy link
Collaborator Author

mmisono commented Apr 17, 2020

Thanks for the review, will fix.

"default" in the above table means the default behavior when this patch applied (so it checks errors other than bpf_probe_read_xxx / bpf_map_lookup_elem). --disable-helper-check does not check error at all, and this is the same behavior as the current.

@mmisono
Copy link
Collaborator Author

mmisono commented Apr 22, 2020

I examined the generated code. Basically, this error check creates one branch and adds 10 instructions per helper function call. A problem is a branch creates another state and the verifier needs to explore paths both that take a branch and not. This is the reason the processed instructions sharply increase in scripts that call many helper functions. The verifier prunes a search if two states are equivalent, but unfortunately not.

I'm wondering if we can implement this functionality using bpf-to-bpf call. I guess it reduces states. I'll give it a try later. (there is little information about this..)

@mmisono mmisono force-pushed the report_helper_errors branch 2 times, most recently from a0c1e8d to 027c544 Compare April 27, 2020 12:32
@mmisono
Copy link
Collaborator Author

mmisono commented Apr 27, 2020

I fixed the reviewed points and changed the option since the initial version was a bit confusing:

  • -k : emit a warning when a bpf helper returns an error, except read functions (bpf_probe_read_xxx(), bpf_map_lookup_elem()).
    • The reason excludes these functions is described in the commit message.
  • -kk: check all helper function errors.

Therefore, if these options are not given, program behavior is the same as the current. What do you think about this? @fbs

(As I wrote, I'm seeing if bpf-to-bpf call is usable, but I don't plan to include it in this PR even if it works.)

@danobi
Copy link
Member

danobi commented May 13, 2020

Nice, this looks really useful. Code looks good to me. Can you add some codegen tests with the error checks and also a runtime test or two? Just so we don't accidentally break this later.

Sorry about the delay, didn't see this PR. In the future feel free to ping me if PR get delayed too much.

src/bpftrace.h Outdated Show resolved Hide resolved
BasicBlock *helper_merge_block = BasicBlock::Create(module_.getContext(),
"helper_merge",
parent);
auto *ret = CreateIntCast(return_value, getInt32Ty(), true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you leave that as a comment in the code? Will help next person who wonders why there's a cast.

@mmisono
Copy link
Collaborator Author

mmisono commented May 17, 2020

Updated:

  • Rebased onto master
  • Use unordered map instead of map
  • Add comments about IntCast
  • Add codegen and runtime tests

Please take a look when you have time @danobi

Some helper functions return error codes when failing, but currently,
bpftrace does not check and discards it. Several times I had to use bcc
to check error codes for debugging.

This patch adds an error checking mechanism, and helper function errors
are reported in runtime using '-k' option. For example:

```
% sudo BPFTRACE_MAP_KEYS_MAX=2 ./src/bpftrace -e \
  'BEGIN { @[0] = 0; @[1] = 1; @[2] = 2;} -k'
Attaching 1 probe...
stdin:1:35-36: WARNING: Failed to map_update_elem: Argument list too long (-7)
BEGIN { @[0] = 0; @[1] = 1; @[2] = 2;}
                                 ~

^C

@[0]: 0
@[1]: 1
```

This mechanism is implemented as an async action. By default, errors of
`bpf_probe_read()`-related functions and `bpf_map_lookup_elem()` are not
checked. This is because when these functions fail, bpftrace returns
zero as a result, and some scripts already check it, for example, by
using predicate `/ @[tid] /`. `-kk` option forces to check all errors
including these functions.

Note that the error check increases code size. All codegen tests are
performed with the error check disabled.
Helper error checking increases code size and some scripts requires more
bigger log size.
tests/runtime/other Outdated Show resolved Hide resolved
@mmisono mmisono force-pushed the report_helper_errors branch 2 times, most recently from 811059e to 181c0cb Compare May 18, 2020 21:45
@danobi danobi merged commit 5ec9148 into bpftrace:master May 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants