Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tetragon: Hook exit sensor on acct_process #1509

Merged
merged 2 commits into from Sep 28, 2023

Conversation

olsajiri
Copy link
Contributor

@olsajiri olsajiri commented Sep 26, 2023

Djalal and Anastasios found another way we could race in exit
event hook, so we could receive multiple exit events with same
pid value.

Anastasios suggested to hook acct_process instead, which is
called only for the last task in the thread group.

The acct_process depends on CONFIG_BSD_PROCESS_ACCT config
option but it seems to be present on all supported kernels.

@olsajiri olsajiri added the release-note/minor This PR introduces a minor user-visible change label Sep 26, 2023
@jrfastab
Copy link
Contributor

would also be nice to get a test for this failing case if possible.

@olsajiri olsajiri force-pushed the another_exit_fix branch 4 times, most recently from e215ca1 to 19ee35a Compare September 27, 2023 10:30
@tixxdz
Copy link
Member

tixxdz commented Sep 27, 2023

Hi @olsajiri being testing this on various kernels seems to work, here is a simple test that works at bpf side we don't mix with userspace , maybe add it too 05b2a94 up to you

@olsajiri olsajiri marked this pull request as ready for review September 27, 2023 11:44
@olsajiri olsajiri requested a review from a team as a code owner September 27, 2023 11:44
@olsajiri olsajiri changed the title exit fix tetragon: Hook exit sensor on acct_process Sep 27, 2023
Copy link
Member

@tixxdz tixxdz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm Jiri thanks, can you please add later as a follow same test but that runs on the perf ring buffer.

In case we change some ref count logic that may not export the grpc exit event anymore, then we are still covered at bpf side.

Djalal and Anastasios found another way we could race in exit
event hook, so we could receive multiple exit events with same
pid value.

Anastasios suggested to hook acct_process instead, which is
called only for the last task in the thread group.

The acct_process depends on CONFIG_BSD_PROCESS_ACCT config
option but it seems to be present on all supported kernels.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
The previous commit fixes the exit event race that might cause
tetragon to receive multiple exit events with same pid values.

The contrib/tester-progs/threads-exit program tries to exploit
this by creating multi threads and synchronize all their exit
calls so it's likely to hit the race window.

The TestEventExitThreads test itself spawn several executions of
threads-exit program (to push the luck a bit and hit the race
window at least once) and records their pid values and then check
we receive single exit event for any given pid value.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
@tixxdz
Copy link
Member

tixxdz commented Sep 28, 2023

@olsajiri just for the record:

If we don't want to depend on that config, then an alternative could be:

+++ b/bpf/process/bpf_exit.c
@@ -46,6 +46,8 @@ char _license[] __attribute__((section("license"), used)) = "GPL";
  * won't race with another clone, because there's no other thread to call
  * it (current thread is in do_exit).
  */
+#define SIGNAL_GROUP_EXIT      0x00000004
+
 __attribute__((section("kprobe/do_task_dead"), used)) int
 event_exit(struct pt_regs *ctx)
 {
@@ -53,11 +55,30 @@ event_exit(struct pt_regs *ctx)
        __u64 pid_tgid = get_current_pid_tgid();
        struct signal_struct *signal;
        atomic_t live;
+       __u32 flags, tgid;
 
        probe_read(&signal, sizeof(signal), _(&task->signal));
        probe_read(&live, sizeof(live), _(&signal->live));
+       probe_read(&flags, sizeof(flags), _(&signal->flags));
+
+       tgid = pid_tgid >> 32;
+       /* If it is a group thread exit then check SIGNAL_GROUP_EXIT
+        * as it is set early to sync all threads where signal is locked.
+        */
+       if (flags & SIGNAL_GROUP_EXIT) {
+               if (tgid == (__u32)pid_tgid) {
+                       DEBUG("sending bpf exit:  pid=%d   tid=%d", pid_tgid >> 32, (u32)pid_tgid);
+                       event_exit_send(ctx, pid_tgid >> 32);
+               }
+               return 0;
+       }
 
-       if (live.counter == 0)
-               event_exit_send(ctx, pid_tgid >> 32);
+       /* If single thread exit then send exit on counter zero, still separated threads
+        * can race and have the live.counter zero.
+        */
+       if (live.counter == 0) {
+               DEBUG("sending bpf exit:  pid=%d   tid=%d", pid_tgid >> 32, (u32)pid_tgid);
+               event_exit_send(ctx, tgid);
+       }
        return 0;
 }

This will close the current group exit issue, but we still have a smaller window where separate threads exit on their own and could race.

For that case the sensor handler or grpc one where the event parsing happens, plus the gc need to be bullet proof to ensure we collect process entries when we receive one single exit event (as live.counter == 0) is good indication anyway, and ignore following bpf exit event if ever they happen.

However I identified some other issues on the gc logic: #1517 where we underflow the refcount and won't collect entries... I will try to fix that separately anyway as this is another bug.

Maybe then if we feel we are fine we could have that code and switch off from CONFIG_BSD_PROCESS_ACCT.

For the time being I think current solution is ok

@kkourt kkourt merged commit 9a8f892 into cilium:main Sep 28, 2023
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-backport/0.9 needs-backport/0.10 release-note/minor This PR introduces a minor user-visible change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants