Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: improve performance of magic_write event #3899

Merged
merged 1 commit into from
Feb 25, 2024

Conversation

yanivagman
Copy link
Collaborator

@yanivagman yanivagman commented Feb 25, 2024

Currently, when magic_write event is selected on its own it introduces a very high overhead on the system. This is due to the reason that the logic used for magic_write event is mixed with the logic used for vfs_write(v) events and capture write, thus a lot of extra unnecessary calculations are being performed.

Fix that issue by seperating the logic used for magic_write from any other logic.

To compare the results, following is the first few lines of output of
the command:
sudo bpftool prog show | grep run_time | awk '{print $1" "$2" "$3" "$4" "$8" "$9" "$10" "$11}' | sort -g -r -k 6

Without the change:

23: lsm name restrict_filesystems run_time_ns 1116047113 run_cnt 2425037
4197: kprobe name trace_vfs_write run_time_ns 352683397 run_cnt 174415
4143: raw_tracepoint name tracepoint__raw_syscalls__sys_exit run_time_ns 210434425 run_cnt 909820
4198: kprobe name trace_ret_vfs_write run_time_ns 178971146 run_cnt 174553
4139: raw_tracepoint name tracepoint__raw_syscalls__sys_enter run_time_ns 145962090 run_cnt 907805
4201: kprobe name trace_ret_vfs_writev run_time_ns 138241100 run_cnt 7525
4240: kprobe name trace_security_file_permission run_time_ns 75948425 run_cnt 297712
4200: kprobe name trace_vfs_writev run_time_ns 71354527 run_cnt 7525
4179: kprobe name trace_security_file_open run_time_ns 28983767 run_cnt 3522

With the change:

23: lsm name restrict_filesystems run_time_ns 1107658754 run_cnt 2394467
3998: raw_tracepoint name tracepoint__raw_syscalls__sys_exit run_time_ns 213293883 run_cnt 1042331
3994: raw_tracepoint name tracepoint__raw_syscalls__sys_enter run_time_ns 141677521 run_cnt 1039926
4101: kprobe name trace_security_file_permission run_time_ns 77304179 run_cnt 352252
4053: kprobe name vfs_write_magic_enter run_time_ns 59810106 run_cnt 215034
4055: kprobe name vfs_write_magic_return run_time_ns 39428950 run_cnt 215608
4034: kprobe name trace_security_file_open run_time_ns 34191232 run_cnt 4281
4072: kprobe name vfs_writev_magic_enter run_time_ns 22570188 run_cnt 4071
4058: kprobe name vfs_writev_magic_return run_time_ns 4235941 run_cnt 4069

As can be seen, vfs_write_magic programs are not the "top" programs
anymore. Also, while all other events have more or less the same runtime
during this period, the magic_write event total time has changed:
Total runtime of magic_write programs before the change is 741250170 ns.
Total runtime of magic_write programs after the change is 126045185 ns.
So assuming the load on the system was the same during this period,
there is more than x5 improvement.

1. Explain what the PR does

"Replace me with make check-pr output"

2. Explain how to test it

3. Other comments

close #3445

Copy link
Collaborator

@NDStrahilevitz NDStrahilevitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logic looks correct overall. Would be good to include the performance improvement and the benchmark method in the commit or PR description before merging.

Currently, when magic_write event is selected on its own it introduces a
very high overhead on the system. This is due to the reason that the
logic used for magic_write event is mixed with the logic used for
vfs_write(v) events and capture write, thus a lot of extra unnecessary
calculations are being performed.

Fix that issue by seperating the logic used for magic_write from any
other logic.

To compare the results, following is the first few lines of output of
the command:
"sudo bpftool prog show | grep run_time | awk '{print $1" "$2" "$3" "$4" "$8" "$9" "$10" "$11}' | sort -g -r -k 6"

Without the change:
23: lsm name restrict_filesystems run_time_ns 1116047113 run_cnt 2425037
4197: kprobe name trace_vfs_write run_time_ns 352683397 run_cnt 174415
4143: raw_tracepoint name tracepoint__raw_syscalls__sys_exit run_time_ns 210434425 run_cnt 909820
4198: kprobe name trace_ret_vfs_write run_time_ns 178971146 run_cnt 174553
4139: raw_tracepoint name tracepoint__raw_syscalls__sys_enter run_time_ns 145962090 run_cnt 907805
4201: kprobe name trace_ret_vfs_writev run_time_ns 138241100 run_cnt 7525
4240: kprobe name trace_security_file_permission run_time_ns 75948425 run_cnt 297712
4200: kprobe name trace_vfs_writev run_time_ns 71354527 run_cnt 7525
4179: kprobe name trace_security_file_open run_time_ns 28983767 run_cnt 3522

With the change:
23: lsm name restrict_filesystems run_time_ns 1107658754 run_cnt 2394467
3998: raw_tracepoint name tracepoint__raw_syscalls__sys_exit run_time_ns 213293883 run_cnt 1042331
3994: raw_tracepoint name tracepoint__raw_syscalls__sys_enter run_time_ns 141677521 run_cnt 1039926
4101: kprobe name trace_security_file_permission run_time_ns 77304179 run_cnt 352252
4053: kprobe name vfs_write_magic_enter run_time_ns 59810106 run_cnt 215034
4055: kprobe name vfs_write_magic_return run_time_ns 39428950 run_cnt 215608
4034: kprobe name trace_security_file_open run_time_ns 34191232 run_cnt 4281
4072: kprobe name vfs_writev_magic_enter run_time_ns 22570188 run_cnt 4071
4058: kprobe name vfs_writev_magic_return run_time_ns 4235941 run_cnt 4069

As can be seen, vfs_write_magic programs are not the "top" programs
anymore. Also, while all other events have more or less the same runtime
during this period, the magic_write event total time has changed:
Total runtime of magic_write programs before the change is 741250170 ns.
Total runtime of magic_write programs after the change is 126045185 ns.
So assuming the load on the system was the same during this period,
there is more than x5 improvement.
@yanivagman yanivagman force-pushed the fix_magic_write_overhead branch from f132733 to 23152c1 Compare February 25, 2024 11:02
@yanivagman yanivagman merged commit 95a841b into aquasecurity:main Feb 25, 2024
32 checks passed
@yanivagman yanivagman deleted the fix_magic_write_overhead branch February 25, 2024 13:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Running./tracee has a huge impact on the performance of the system call 'write'
2 participants