Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

silent hash map truncation to 4096 #2809

Open
mjguzik opened this issue Oct 21, 2023 · 5 comments
Open

silent hash map truncation to 4096 #2809

mjguzik opened this issue Oct 21, 2023 · 5 comments
Labels

Comments

@mjguzik
Copy link

mjguzik commented Oct 21, 2023

I tried to find out how many chains in a hash table are unused in a certain workload and added a probe to record all computed hash values, then used it like so:

kprobe:hashval { @hash[arg0] = count(); }

Then I found the list is truncated to 4096 elements, without any warning. I only spotted the problem because the number is round.

Since one only gets a partial result without being told this is the case, I consider this to be a bug. It is easy to trip over it and not realize unless you count entries.

So, bare minimum this should warn entries got dropped. Best case is autosizing as RAM permits and only warning if it can't be done. I'm guessing this would require patching eBPF, but one has to start with a report somewhere.

bpftrace --info output

System
OS: Linux 6.6.0-rc6+ #367 SMP PREEMPT_DYNAMIC Sat Oct 21 06:52:28 EDT 2023
Arch: x86_64

Build
version: v0.17.0
LLVM: 14.0.6
unsafe uprobe: no
bfd: no
libdw (DWARF support): yes

Kernel helpers
probe_read: yes
probe_read_str: yes
probe_read_user: yes
probe_read_user_str: yes
probe_read_kernel: yes
probe_read_kernel_str: yes
get_current_cgroup_id: yes
send_signal: yes
override_return: no
get_boot_ns: yes
dpath: yes
skboutput: no

Kernel features
Instruction limit: 1000000
Loop support: yes
btf: yes
map batch: yes
uprobe refcount (depends on Build:bcc bpf_attach_uprobe refcount): yes

Map types
hash: yes
percpu hash: yes
array: yes
percpu array: yes
stack_trace: yes
perf_event_array: yes

Probe types
kprobe: yes
tracepoint: yes
perf_event: yes
kfunc: yes
iter:task: yes
iter:task_file: yes
kprobe_multi: no
raw_tp_special: yes

@mjguzik mjguzik added the bug Something isn't working label Oct 21, 2023
@fbs
Copy link
Contributor

fbs commented Oct 21, 2023

Have you tried setting BPFTRACE_MAP_KEYS_MAX

Do agree that it needs better warnings, silently 'corrupting' data is bad

@danobi
Copy link
Member

danobi commented Oct 21, 2023

-k will report this:

dxu@fedora~ $ sudo BPFTRACE_MAP_KEYS_MAX=1 bpftrace -e 'BEGIN { @[1] = 1; @[2] = 2; exit() }'
Attaching 1 probe...


@[1]: 1
dxu@fedora~ $ sudo BPFTRACE_MAP_KEYS_MAX=1 bpftrace -e 'BEGIN { @[1] = 1; @[2] = 2; exit() }' -k
Attaching 1 probe...
stdin:1:24-25: WARNING: Failed to map_update_elem: Argument list too long (-7)
BEGIN { @[1] = 1; @[2] = 2; exit() }
                       ~


@[1]: 1

Perhaps this should be on by default?

@mjguzik
Copy link
Author

mjguzik commented Oct 21, 2023

I did not know about BPFTRACE_MAP_KEYS_MAX, googling did not show it. Thanks for the tip.

-k will report this:
stdin:1:24-25: WARNING: Failed to map_update_elem: Argument list too long (-7)
Perhaps this should be on by default?

It does not have to outright fail, a warning would be sufficient.

Perhaps the message can be trivially adjusted to tell the user what to tweak?

@jordalgo
Copy link
Contributor

jordalgo commented Jan 7, 2024

Perhaps the message can be trivially adjusted to tell the user what to tweak?

The error message "Failed to map_update_elem" doesn't come from bpftrace (it comes from libbpf I believe) so updating it is not trivial. Maybe we could detect the return code (E2BIG) and log an additional message there.

jordalgo pushed a commit to jordalgo/bpftrace that referenced this issue Jan 7, 2024
This is for additional helpful messaging for specific errors e.g.
for when the number of bpf map entries is exceeded.

Follow up to: bpftrace#2809
jordalgo pushed a commit to jordalgo/bpftrace that referenced this issue Jan 7, 2024
This is for additional helpful messaging for specific errors e.g.
for when the number of bpf map entries is exceeded.

Follow up to: bpftrace#2809
@danobi
Copy link
Member

danobi commented Jan 10, 2024

Additional error msg seems reasonable

jordalgo pushed a commit to jordalgo/bpftrace that referenced this issue Jan 10, 2024
This is for additional helpful messaging for specific errors e.g.
for when the number of bpf map entries is exceeded.

Follow up to: bpftrace#2809
jordalgo pushed a commit to jordalgo/bpftrace that referenced this issue Jan 10, 2024
This is for additional helpful messaging for specific errors e.g.
for when the number of bpf map entries is exceeded.

Follow up to: bpftrace#2809
ajor pushed a commit that referenced this issue Jan 15, 2024
This is for additional helpful messaging for specific errors e.g.
for when the number of bpf map entries is exceeded.

Follow up to: #2809
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants