Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: failed on debian/bullseye64 #158

Closed
1 task done
b1tg opened this issue Feb 13, 2023 · 2 comments · Fixed by #162
Closed
1 task done

[Bug]: failed on debian/bullseye64 #158

b1tg opened this issue Feb 13, 2023 · 2 comments · Fixed by #162
Assignees
Labels
bug Something isn't working

Comments

@b1tg
Copy link

b1tg commented Feb 13, 2023

Contact Details

No response

What happened?

Test pulsar on debian/bullseye64, it failed.

Relevant log output

vagrant@bullseye:~$ curl --proto '=https' --tlsv1.2 -sSf https://raw.githubusercontent.com/Exein-io/pulsar/main/pulsar-install.sh | sh
info: downloading files
info: installing files
info: generating configuration
info: basic rules
info: cleaning
info: installation complete
vagrant@bullseye:~$ uname -a
Linux bullseye 5.10.0-21-amd64 #1 SMP Debian 5.10.162-1 (2023-01-21) x86_64 GNU/Linux
vagrant@bullseye:~$ sudo pulsard
[2023-02-13T11:17:38Z INFO  pulsar::pulsard::daemon] Starting module process-monitor
[2023-02-13T11:17:38Z INFO  pulsar::pulsard::daemon] Starting module file-system-monitor
[2023-02-13T11:17:38Z WARN  bpf_common::feature_autodetect::lsm] LSM not supported: eBPF LSM programs disabled
[2023-02-13T11:17:38Z ERROR pulsar::pulsard::module_manager] Module error in file-system-monitor: failed program load kprobe security_path_mknod
    
    Caused by:
        0: the BPF_PROG_LOAD syscall failed. Verifier output: number of funcs in func_info doesn't match number of subprogs
           verification time 25 usec
           stack depth 0+0
           processed 0 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
           
        1: Invalid argument (os error 22)
[2023-02-13T11:17:38Z INFO  pulsar::pulsard::daemon] Starting module network-monitor
[2023-02-13T11:17:38Z WARN  bpf_common::feature_autodetect::lsm] LSM not supported: eBPF LSM programs disabled
[2023-02-13T11:17:38Z INFO  pulsar::pulsard::daemon] Starting module logger
[2023-02-13T11:17:38Z INFO  pulsar::pulsard::daemon] Starting module rules-engine
[2023-02-13T11:17:39Z INFO  pulsar::pulsard::daemon] Starting module desktop-notifier

Code of Conduct

  • I agree to follow this project's Code of Conduct
@b1tg b1tg added the bug Something isn't working label Feb 13, 2023
@JuxhinDB JuxhinDB assigned MatteoNardi and banditopazzo and unassigned JuxhinDB Feb 13, 2023
@MatteoNardi
Copy link
Contributor

Thanks for the bug report.

The bug is reproducible on kernel 5.10 and using the test-suite, the other modules work.
After some analysis it seems to be a regression from the introduction of bpf_loop in #146
Some notes (mostly for myself and @banditopazzo):

  • The removal of __always_inline from get_path_str causes
    the BPF_PROG_LOAD syscall failed. Verifier output: number of funcs in func_info doesn't match number of subprogs
  • Adding __always_inline back causes a new relocation errors (even on my 5.15 machine). These are caused by the various BPF_CORE_READ(c.mnt_p, mnt_parent); Taking the fields out of the struct get_path_ctx before the BPF_CORE_READ will fix the issue. Eg. struct mount *mnt_p = c.mnt_p; BPF_CORE_READ(mnt_p, mnt_parent) will work. I wonder why the code without inline was working, it probably has to do with some compiler optimization.
  • Fixing the second issue causes a new the BPF_PROG_LOAD syscall failed. Verifier output: number of funcs in func_info doesn't match number of subprogs on 5.10. This seems to come out of the callback in BPF_LOOP. Manually editing LOOP to always use the unrolling loop and reducing MAX_PATH_COMPONENTS to 10 finally solves the problem.

The first 2 issues are easy to solve. Issue number 3 is harder. I'll do some research on the best solution.

@MatteoNardi
Copy link
Contributor

The regression broke pulsar on kernels < 5.13.

Taking the address of a function before torvalds/linux@69c087b results in the reported verifier error.
Unfortunately, not calling bpf_loop by putting it inside an if (LINUX_KERNEL_VERSION >= KERNEL_VERSION(5,17,0)) {} is not enough. The verifier does this check before dead code elimination.

As a solution, we'll embed two different eBPF programs. One for kernel <5.13 and one for kernel >=5.13.

MatteoNardi added a commit that referenced this issue Feb 17, 2023
Compile and embed two eBPF programs for each *.bpf.c source:
- On kernel < 5.13 NOLOOP is defined and we won't take the address
  of functions.
- On kernel >= 5.13, the regular LOOP macro can be used.

Fix #158
@MatteoNardi MatteoNardi linked a pull request Feb 17, 2023 that will close this issue
4 tasks
MatteoNardi added a commit that referenced this issue Feb 21, 2023
Compile and embed two eBPF programs for each *.bpf.c source:
- On kernel < 5.13 NOLOOP is defined and we won't take the address
  of functions.
- On kernel >= 5.13, the regular LOOP macro can be used.

Fix #158
MatteoNardi added a commit that referenced this issue Feb 22, 2023
Compile and embed two eBPF programs for each *.bpf.c source:
- On kernel < 5.13 NOLOOP is defined and we won't take the address
  of functions.
- On kernel >= 5.13, the regular LOOP macro can be used.

Fix #158
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants