Skip to content

Conversation

pzakha
Copy link
Contributor

@pzakha pzakha commented Oct 27, 2020

This updates bcc to the latest changes from upstream.

Testing

ab-pre-push: http://selfservice.jenkins.delphix.com/job/devops-gate/job/master/job/appliance-build-orchestrator-pre-push/4217/

gfx and others added 30 commits March 8, 2020 22:27
… value.

In the environment of massive software interrupts.
<idle>-0     [003] ..s1   106.421020: softirq_entry: vec=6 [action=TASKLET]
<idle>-0     [000] ..s1   106.421063: softirq_entry: vec=3 [action=NET_RX]
<idle>-0     [003] ..s1   106.421083: softirq_exit: vec=6 [action=TASKLET]

Follow the above ftrace logs, we know the correct vec-6 start timestamp is replaced with incorrect vec-3.
Because PID is idle-0. It will produce the wrong result after calculating delta.
Documentation (man page and example text) updated.
Some processes can do a lot of security capability checks, generating a
lot of ouput. In this case, the --unique option is useful to only print
once the same set of capability, pid (or cgroup if --cgroupmap is used)
and kernel/user stacks (if -K or -U are used).

  # ./capable.py -K -U --unique

Documentation (man page and example text) updated.
Before this patch, Dockerfile.ubuntu was generating an oversized image.

                       Uncompressed size      Compressed size
Before this patch:     1.3GB                  390MB
After this patch:      250MB                  110MB
* libbpf-tools: add CO-RE opensnoop

* libbpf-tools/opensnoop: feedback
Update vmlinux.h to a version generated from same v5.5 tag and default config
with cherry-picked 1aae4bdd7879 ("bpf: Switch BPF UAPI #define constants used
from BPF program side to enums") on top of it. This adds lots of BPF helper
flags often useful from BPF program side.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
softirqs: Combined CPU as part of the key is necessary to avoid amiss…
Sync libbpf to latest revision. It brings latest BPF headers, among other
things.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Remove BPF_F_CURRENT_CPU definitions, which are now provided by vmlinux.h
after 1aae4bdd7879 ("bpf: Switch BPF UAPI #define constants used from BPF
program side to enums") commit in kernel.

Fix potential uninitialized read warning in opensnoop.

Also add opensnoop to .gitignore.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
BPF code is compiled with -target bpf, but for PT_REGS_PARM macro (and by
induction for BPF_KPROBE/BPF_KRETPROBE macros as well), it's important to know
what's the target host original architecture was, to use correct definition of
struct pt_regs. Determine that based on output of `uname -m` (taking into
account that both x86_64 and x86 are defined as x86 internally for kernel).

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
The tracepoint inet_sock_set_state only exists in kernels 4.15.
Backported the bpf tracepoint to use kprobes on older kernels.
Default v5.5 kernel config doesn't have most of BPF-related functionality
enabled, which leads to vmlinux.h not containing a lot of useful constants.
This patch contains re-generated vmlinux.h from kernel built with default
config plus minimal changes to enable most (all?) BPF-relevant parts of
kernel. Here's a list of added options:

CONFIG_BPF_EVENTS=y
CONFIG_BPFILTER_UMH=m
CONFIG_BPFILTER=y
CONFIG_BPF_JIT=y
CONFIG_BPF_KPROBE_OVERRIDE=y
CONFIG_BPF_STREAM_PARSER=y
CONFIG_BPF_SYSCALL=y
CONFIG_CC_HAS_ASM_INLINE=y
CONFIG_CC_HAS_KASAN_GENERIC=y
CONFIG_CC_HAS_SANCOV_TRACE_PC=y
CONFIG_CGROUP_BPF=y
CONFIG_GCC_VERSION=70300
CONFIG_IPV6_MULTIPLE_TABLES=y
CONFIG_IPV6_SEG6_BPF=y
CONFIG_IPV6_SEG6_LWTUNNEL=y
CONFIG_LIBCRC32C=y
CONFIG_LWTUNNEL_BPF=y
CONFIG_LWTUNNEL=y
CONFIG_NET_ACT_BPF=y
CONFIG_NET_CLS_BPF=y
CONFIG_NETFILTER_ADVANCED=y
CONFIG_NETFILTER_XT_MATCH_BPF=y
CONFIG_NET_SOCK_MSG=y
CONFIG_NF_CT_PROTO_DCCP=y
CONFIG_NF_CT_PROTO_SCTP=y
CONFIG_NF_CT_PROTO_UDPLITE=y
CONFIG_SOCK_CGROUP_DATA=y
CONFIG_STREAM_PARSER=y
CONFIG_XDP_SOCKETS_DIAG=y
CONFIG_XDP_SOCKETS=y

To make this vmlinux.h generation process easier for future adjustments (e.g.,
if some of the tools would need types that default config compiles out), check
in used Kconfig along the vmlinux.h itself.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
And moving it to common.cc in order to be able to make
automated tests for it. Following patches are adding
automated test for this function and it seems too much
to link in all the clang/llvm stuff to the test binary
just for single function test.

Adding ebpf::parse_tracepoint that takes istream of the
tracepoint format data and returns tracepoint struct
as std::string.

No functional change is intended or expected.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
The test prepares tracepoint format file and run
ebpf::parse_tracepoint on it. Then it compares
expected struct output with actual function result.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
There's issue in current RHEL real time kernel with tracepoint format,
which makes bcc-tools to return wrong data.

Two new 'common_' fields were added and it causes 2 issues for tracepoint
format parser.

First issue
  - is the gap between common fields and other fields, which is not
    picked up by the parser, so the resulted struct is not aligned with
    the data.

Second issue
  - is the fact that current parser covers common fields with:
      u64 __do_not_use__
    so the new common fields are not accounted for.

    This issue is solved in the following patch. I kept both
    issues and fixes separated to make the change readable.

There's a 'not described gap' in the sched_wakeup's format file and
probably in other formats as well:

Having:
  # cat /sys/kernel/debug/tracing/events/sched/sched_wakeup/format
  name: sched_wakeup
  ID: 310
  format:
          field:unsigned short common_type;       offset:0;       size:2; signed:0;
          field:unsigned char common_flags;       offset:2;       size:1; signed:0;
          field:unsigned char common_preempt_count;       offset:3;       size:1; signed:0;
          field:int common_pid;   offset:4;       size:4; signed:1;
          field:unsigned char common_migrate_disable;     offset:8;       size:1; signed:0;
          field:unsigned char common_preempt_lazy_count;  offset:9;       size:1; signed:0;

          field:char comm[16];    offset:12;      size:16;        signed:1;
          field:pid_t pid;        offset:28;      size:4; signed:1;
          field:int prio; offset:32;      size:4; signed:1;
          field:int success;      offset:36;      size:4; signed:1;
          field:int target_cpu;   offset:40;      size:4; signed:1;

There's "common_preempt_lazy_count" field on offset 9 with size 1:
        common_preempt_lazy_count;  offset:9;       size:1;

and it's followed by "comm" field on offset 12:
        field:char comm[16];    offset:12;      size:16;        signed:1;

which makes 2 bytes gap in between, that might confuse some applications
like bpftrace or bcc-tools library.

The tracepoint parser makes struct out of the field descriptions,
but does not account for such gaps.

I posted patch to fix this [1] in RT kernel, but that might take a while,
and we could easily fix our tracepoint parser to workaround this issue.

Adding code to detect this gaps and add 1 byte __pad_X fields, where X is
the offset number.

[1] https://lore.kernel.org/linux-rt-users/20200221153541.681468-1-jolsa@kernel.org/
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Current parser covers common fields with:
  u64 __do_not_use__
so the new common fields are not accounted for.

Keeping the 'u64 __do_not_use__' field for backward compatibility
(who knows who's actualy using it) and adding new fields, like:
  char __do_not_use__X

for each byte of extra common fields, where X is the offset of the
field.

With this fix the bcc-tools correctly parses tracepoints on RT kernel
and it's usable again.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Pull in latest libbpf changes. In particular, containing BPF_KRETPROBE fix.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Aggregates incoming network traffic
outputs source ip, destination ip, the number of their network traffic, and current time

Co-authored-by: gist-banana <gist.banana@gist.ac.kr>
* make -DCMAKE_INSTALL_PREFIX=/usr as the default
* remove -DCMAKE_INSTALL_PEREFIX=/usr from INSTALL.md  as it gets the default
Signed-off-by: Wenbo Zhang <ethercflow@gmail.com>
Commit c347fe6 ("Support kfunc in opensnoop.py") introduces an
alternative probe on do_sys_open() with kfuncs instead of kprobes. This
new implementation is used if the kernel supports it. But it removed the
--cgroupmap filter added in commit b2aa29f ("tools: cgroup
filtering in execsnoop/opensnoop").

This patch adds the --cgroupmap filter in the kfunc implementation.
Fixes the following error on aarch64:

bpf: Failed to load program: Permission denied
; struct sock *sk = ctx->regs[0]; int copied = ctx->regs[1];
0: (79) r8 = *(u64 *)(r1 +8)
...
; struct ipv6_key_t ipv6_key = {.pid = pid};
79: (63) *(u32 *)(r10 -48) = r7
; struct ipv6_key_t ipv6_key = {.pid = pid};
80: (7b) *(u64 *)(r10 +8) = r9
invalid stack off=8 size=8
processed 96 insns (limit 1000000) max_states_per_insn 0 total_states 7 peak_states 7 mark_read 4
currently with ascending sort the useful information
is commonly beyond the bottom of the terminal window
and it is necessary to reverse the sort manually every execution.

Signed-off-by: Mark Kogan <mkogan@redhat.com>
sam-lunt and others added 15 commits October 14, 2020 00:02
The bcc packages were recently moved from the AUR to the standard Arch
repos, so the installation instructions for Arch should be updated to
reflect that.

Additionally, the Arch "linux-lts" package was upgraded to 4.4.7 in
April 2016, so it is safe to assume that any Arch installations that are
still in use have been upgraded by this point. Therefore, the note about
upgrading to kernel 4.3.1 is removed.
I test test_usdt2.py on aarch64 with kernel-5.9-rc5,
gcc-9.2.1 and glibc2.32.
with the current range of 3, always get failure:
======================================================================
FAIL: test_attach1 (__main__.TestUDST)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_usdt2.py", line 170, in test_attach1
    self.assertTrue(self.evt_st_6 == 1)
AssertionError: False is not true

----------------------------------------------------------------------
Ran 1 test in 1.068s

FAILED (failures=1)

Signed-off-by: Chunmei Xu <xuchunmei@linux.alibaba.com>
Signed-off-by: Juraj Vijtiuk <juraj.vijtiuk@sartura.hr>
Signed-off-by: Wenbo Zhang <ethercflow@gmail.com>
* Catch TypeError raised by Python3 glob and fall back to Python2 interface
Otherwise, we have:
  ...
    File "bcc/tools/netqtop.py", line 54
      print(hd.center(COL_WIDTH)),
                                 ^
  TabError: inconsistent use of tabs and spaces in indentation
  ...

    File "bcc/tools/tcprtt.py", line 117, in <module>
    bpf_text = bpf_text.replace(b'LPORTFILTER', b'')
  TypeError: replace() argument 1 must be str, not bytes
  ...

Signed-off-by: Yonghong Song <yhs@fb.com>
* fix netqtop python3 compatible
* delete import types
This adds support to push docker images to quay.io, like other projects in
the iovisor org.

It separates docker image builds into a separate github workflow, and
refactors the package building process slightly, to be generic, in order to
create builds for both ubuntu 16.04 and ubuntu 18.04.

This provides a means to distribute intermediate apt packages between releases,
and also enables uploading these as CI artifacts.

As recent releases have not annotated their tags, it drops the requirement for
tags to be annotated in selecting the version to use.
Importing Abstract Base Classes (ABCs) from the collections module
is deprecated since Python 3.3, it emits a warning since Python 3.8,
and it will stop working in Python 3.10.

Try importing MutableMapping from collections.abc (the preferred way since
Python 3.3) and, in case of error (Python < 3.3) fall back to importing it
from collections.
Copy link

@sebroy sebroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On testing: Can you check that our bcc-based estat tools, as well as the analytics back-end are all healthy post this change? Blackbox has an analytics scope that you can run manually, it's not run by default as part of pre-push.

@pzakha
Copy link
Contributor Author

pzakha commented Oct 27, 2020

@sebroy Do you happen to know what's the name of the test suite that includes analytics?

@sebroy
Copy link

sebroy commented Oct 27, 2020

Yeah, @pzakha it's analytics_positive, which is part of os_tests.

@pzakha
Copy link
Contributor Author

pzakha commented Oct 27, 2020

Copy link
Contributor

@prakashsurya prakashsurya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't look through all changes, but scanned the files modified, and this LGTM.. My only concern is pulling in the github workflow file, which doesn't look relevant to us.

@pzakha
Copy link
Contributor Author

pzakha commented Oct 28, 2020

@prakashsurya I've disabled the publish.yml workflow by moving it to a different directory.
@sebroy Could you take another look at this?

@delphix-devops-bot delphix-devops-bot merged commit d37492a into delphix:master Oct 29, 2020
prakashsurya pushed a commit that referenced this pull request Nov 8, 2022
…for -v option

Add additional information and change format of backtrace
- add symbol base offset, dso name, dso base offset
- symbol and dso info is included if it's available in target binary
- changed format:
INDEX ADDR [SYMBOL+OFFSET] (MODULE+OFFSET)

Print backtrace of ip if it failed to get syms.

Before:
  # offcputime -v
    psiginfo
    vscanf
    __snprintf_chk
    [unknown]
    [unknown]
    [unknown]
    [unknown]
    [unknown]
    sd_event_exit
    sd_event_dispatch
    sd_event_run
    [unknown]
    __libc_start_main
    [unknown]
    -                systemd-journal (204)
        1

    xas_load
    xas_find
    filemap_map_pages
    __handle_mm_fault
    handle_mm_fault
    do_page_fault
    do_translation_fault
    do_mem_abort
    do_el0_ia_bp_hardening
    el0_ia
    xas_load
    --
failed to get syms
      -                PmLogCtl (138757)
        1

After:
  # offcputime -v
    #0  0xffffffc01018b7e8 __arm64_sys_clock_nanosleep+0x0
    #1  0xffffffc01009a93c el0_svc_handler+0x34
    #2  0xffffffc010084a08 el0_svc+0x8
    #3  0xffffffc01018b7e8 __arm64_sys_clock_nanosleep+0x0
    --
    #4  0x0000007fa0bffd14 clock_nanosleep+0x94 (/usr/lib/libc-2.31.so+0x9ed14)
    #5  0x0000007fa0c0530c nanosleep+0x1c (/usr/lib/libc-2.31.so+0xa430c)
    #6  0x0000007fa0c051e4 sleep+0x34 (/usr/lib/libc-2.31.so+0xa41e4)
    #7  0x000000558a5a9608 flb_loop+0x28 (/usr/bin/fluent-bit+0x52608)
    #8  0x000000558a59f1c4 flb_main+0xa84 (/usr/bin/fluent-bit+0x481c4)
    #9  0x0000007fa0b85124 __libc_start_main+0xe4 (/usr/lib/libc-2.31.so+0x24124)
    #10 0x000000558a59d828 _start+0x34 (/usr/bin/fluent-bit+0x46828)
    -                fluent-bit (1238)
        1

    #0  0xffffffc01027daa4 generic_copy_file_checks+0x334
    #1  0xffffffc0102ba634 __handle_mm_fault+0x8dc
    #2  0xffffffc0102baa20 handle_mm_fault+0x168
    #3  0xffffffc010ad23c0 do_page_fault+0x148
    #4  0xffffffc010ad27c0 do_translation_fault+0xb0
    #5  0xffffffc0100816b0 do_mem_abort+0x50
    #6  0xffffffc0100843b0 el0_da+0x1c
    #7  0xffffffc01027daa4 generic_copy_file_checks+0x334
    --
    #8  0x0000007f8dc12648 [unknown]
    #9  0x0000007f8dc0aef8 [unknown]
    #10 0x0000007f8dc1c990 [unknown]
    #11 0x0000007f8dc08b0c [unknown]
    #12 0x0000007f8dc08e48 [unknown]
    #13 0x0000007f8dc081c8 [unknown]
    -                PmLogCtl (2412)
        1

Fixed: #3884
Signed-off-by: Eunseon Lee <es.lee@lge.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.