Skip to content

Releases: anakryiko/retsnoop

retsnoop v0.8.2

08 Jul 18:15
Compare
Choose a tag to compare

Few more usability improvements:

  • allow tracing functions from kernel modules;
  • default LBR mode to any_return, instead of mode detailed any;
  • improve error and return value output logic to take into account pointer vs integer vs error, if possible;
  • slight clean up of LBR output.

retsnoop v0.8.1

04 Jul 03:53
Compare
Choose a tag to compare

Few usability improvements:

  • improved glob support, '?' is now supported, as well as there could be multiple '*' anywhere in the pattern;
  • fixed multi-kprobe kernel support detection (take into account CONFIG_FPROBE);
  • filter out __ftrace_invalid_address___xxx fake kprobe entries;
  • don't filter out valid bpf_prog_xxx kernel functions from stack traces.

retsnoop v0.8

28 May 16:27
Compare
Choose a tag to compare

Few pretty big usability improvements.

  1. More flexible and compact stack trace formatting. Retsnoop is now trying to determine the minimal correct size of each output column so as to keep the stack trace alignment but also using minimal amount of horizontal space.
  2. This same logic is reused for formatting LBR stacks, which also allows to have from and to branches emitted "horizontally", instead of one after the other as before. This improves the comprehension significantly.
  3. Retsnoop now recognizes symbol LBR flags aliases, allowing much easier tuning of what kind of LBR data to capture. E.g., --lbr=any_return will capture only returns from functions, allowing to see further into unknown sequence of kernel function calls. This is very useful when trying to discover what's going on without knowing particular area of the kernel you are trying to debug. By default retsnoop is effectively using --lbr=any.
  4. --lbr-max-count N was added to limit number of last useful LBR records. It's not always necessary to see all 32 of them, last 5 or some might be more than enough.

With all the above changes, here's an example of one captured error with LBR stack traces included. Retsnoop is run as:

$ sudo ./retsnoop -e '*sys_bpf' -a ':kernel/bpf/syscall.c' -n simfail --lbr=any_return --lbr-max-count=5

Failure is simulated with simfail:

$ sudo ./simfail bpf-bad-map-lookup-value

And here's the result:

09:24:54.846 PID 336615 (simfail):
                    entry_SYSCALL_64_after_hwframe+0x44  (arch/x86/entry/entry_64.S:112:0)
                    do_syscall_64+0x2d                   (arch/x86/entry/common.c:46:12)
    34us [-ENOENT]  __x64_sys_bpf+0x1c                   (kernel/bpf/syscall.c:4749:1)
    27us [-ENOENT]  __sys_bpf+0x1a42                     (kernel/bpf/syscall.c:4632:9)
                    . map_lookup_elem                    (kernel/bpf/syscall.c:1113:5)
!    7us [-ENOENT]  bpf_map_copy_value

[#07] migrate_disable+0x3c       (kernel/sched/core.c:1755:1)      ->  bpf_map_copy_value+0x31       (kernel/bpf/syscall.c:241:2)
[#07]                                                                  . bpf_disable_instrumentation (include/linux/bpf.h:1453:2)

[#06] array_map_lookup_elem+0x24 (kernel/bpf/arraymap.c:168:1)     ->  bpf_map_copy_value+0x1ed      (kernel/bpf/syscall.c:269:10)

[#05] rcu_read_unlock_strict+0x5 (kernel/rcu/tree_plugin.h:797:1)  ->  bpf_map_copy_value+0x18c      (include/linux/rcupdate.h:724:2)

[#04] migrate_enable+0x59        (kernel/sched/core.c:1783:1)      ->  bpf_map_copy_value+0x9e       (kernel/bpf/syscall.c:288:2)
[#04]                                                                  . maybe_wait_bpf_programs     (kernel/bpf/syscall.c:170:49)

[#03] bpf_map_copy_value+0xba    (kernel/bpf/syscall.c:291:1)      ->  __kretprobe_trampoline+0x0

retsnoop v0.7

13 Apr 03:47
Compare
Choose a tag to compare

Two major features:

  1. Extremely fast multi-kprobe is used if kernel supports it (automatically, need 5.18+ kernel). This speeds up attachment and especially detachment time immensely. There is no way to understate this. It's seconds and potentially minutes (if attaching to a lot of functions) against a couple milliseconds with multi-kprobe.
  2. Error filter support. Use -x ENOMEM to report stacks that return -ENOMEM. Use -X ENOMEM to skip stacks that report -ENOMEM. NULL is an error, so -x NULL and -X NULL is also supported. You can combine multiple -x and -X options together. -X takes precedence (i.e., if some error is disabled, enabling it with -x won't help).

retsnoop v0.6

29 Nov 04:23
Compare
Choose a tag to compare

Lots of quality of life improvements:

  • Ability to specify functions by their source code locations.Use the following syntax in -e, -a and -d: :fs/btrfs/*.c'.
  • Default to safer kprobe mode by default. Can be overriden with -F argument.
  • Symbolization with line info and inline functions is now on by default, no more need to specify -ss. If vmlinux image can't be located, fall backs to -s none (-sn), meaning no extra symbolization beyond using /proc/kallsyms.
  • Dry run mode added (--dry-run) which will do everything but load and attach BPF programs. Very useful to figure out what retsnoop will try to trace without risking affecting the system.
  • -V (--version) now prints retsnoop version.

retsnoop v0.5.1

01 Nov 22:17
Compare
Choose a tag to compare

Fixes potential issues with LBR perf event by using hardware event. No other changes compared to v0.5.

retsnoop v0.5

29 Sep 06:09
Compare
Choose a tag to compare

A huge milestone for retsnoop: LBR capturing!

When kernel supports capturing LBR entries from BPF kprobe/fexit function,
it will capture such LBR records and emit relevant them after the captured stack trace.
This allows to trace back inside the last failed/traced function, including logic inside
the inlined functions. This allows to see where exactly inside potentially large function
the error happened. Use --lbr flag to enable this feature. If kernel doesn't support
this feature, retsnoop will report this with a warning, visible in verbose mode (-v).

Relevant kernel feature was added by Song Liu in
Linux kernel commit 856c02dbce4f ("bpf: Introduce helper bpf_get_branch_snapshot").

retsnoop v0.4.1-alpha

10 Aug 00:09
Compare
Choose a tag to compare
retsnoop v0.4.1-alpha Pre-release
Pre-release

Force line-oriented output in stdout.

retsnoop v0.4-alpha

17 Jul 07:40
Compare
Choose a tag to compare
retsnoop v0.4-alpha Pre-release
Pre-release
retsnoop: add ability to log stacks with duration longer than specified

Allow to request capturing only those stacks for which entry function was
executing for at least specified about of milliseconds. Use '-L 123' to
request logging stacks that took at least 123ms. Most probably will be used
with --success-stacks (-S), but should work with error-only stacks as well.

Along the way fix annoying issue of emitting one extra unnecessary and
duplicate intermediate stack.

Plus success stacks should have better stack traces now.

Also cleaned up usage string.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>

retsnoop v0.3-alpha

16 May 06:21
Compare
Choose a tag to compare
retsnoop v0.3-alpha Pre-release
Pre-release
retsnoop: report real clock timestamp

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>