tetragon: Hook exit sensor on acct_process #1509

olsajiri · 2023-09-26T13:41:14Z

Djalal and Anastasios found another way we could race in exit
event hook, so we could receive multiple exit events with same
pid value.

Anastasios suggested to hook acct_process instead, which is
called only for the last task in the thread group.

The acct_process depends on CONFIG_BSD_PROCESS_ACCT config
option but it seems to be present on all supported kernels.

jrfastab · 2023-09-27T04:28:41Z

would also be nice to get a test for this failing case if possible.

tixxdz · 2023-09-27T10:35:08Z

Hi @olsajiri being testing this on various kernels seems to work, here is a simple test that works at bpf side we don't mix with userspace , maybe add it too 05b2a94 up to you

tixxdz

Lgtm Jiri thanks, can you please add later as a follow same test but that runs on the perf ring buffer.

In case we change some ref count logic that may not export the grpc exit event anymore, then we are still covered at bpf side.

Djalal and Anastasios found another way we could race in exit event hook, so we could receive multiple exit events with same pid value. Anastasios suggested to hook acct_process instead, which is called only for the last task in the thread group. The acct_process depends on CONFIG_BSD_PROCESS_ACCT config option but it seems to be present on all supported kernels. Signed-off-by: Jiri Olsa <jolsa@kernel.org>

The previous commit fixes the exit event race that might cause tetragon to receive multiple exit events with same pid values. The contrib/tester-progs/threads-exit program tries to exploit this by creating multi threads and synchronize all their exit calls so it's likely to hit the race window. The TestEventExitThreads test itself spawn several executions of threads-exit program (to push the luck a bit and hit the race window at least once) and records their pid values and then check we receive single exit event for any given pid value. Signed-off-by: Jiri Olsa <jolsa@kernel.org>

tixxdz · 2023-09-28T09:26:36Z

@olsajiri just for the record:

If we don't want to depend on that config, then an alternative could be:

+++ b/bpf/process/bpf_exit.c
@@ -46,6 +46,8 @@ char _license[] __attribute__((section("license"), used)) = "GPL";
  * won't race with another clone, because there's no other thread to call
  * it (current thread is in do_exit).
  */
+#define SIGNAL_GROUP_EXIT      0x00000004
+
 __attribute__((section("kprobe/do_task_dead"), used)) int
 event_exit(struct pt_regs *ctx)
 {
@@ -53,11 +55,30 @@ event_exit(struct pt_regs *ctx)
        __u64 pid_tgid = get_current_pid_tgid();
        struct signal_struct *signal;
        atomic_t live;
+       __u32 flags, tgid;
 
        probe_read(&signal, sizeof(signal), _(&task->signal));
        probe_read(&live, sizeof(live), _(&signal->live));
+       probe_read(&flags, sizeof(flags), _(&signal->flags));
+
+       tgid = pid_tgid >> 32;
+       /* If it is a group thread exit then check SIGNAL_GROUP_EXIT
+        * as it is set early to sync all threads where signal is locked.
+        */
+       if (flags & SIGNAL_GROUP_EXIT) {
+               if (tgid == (__u32)pid_tgid) {
+                       DEBUG("sending bpf exit:  pid=%d   tid=%d", pid_tgid >> 32, (u32)pid_tgid);
+                       event_exit_send(ctx, pid_tgid >> 32);
+               }
+               return 0;
+       }
 
-       if (live.counter == 0)
-               event_exit_send(ctx, pid_tgid >> 32);
+       /* If single thread exit then send exit on counter zero, still separated threads
+        * can race and have the live.counter zero.
+        */
+       if (live.counter == 0) {
+               DEBUG("sending bpf exit:  pid=%d   tid=%d", pid_tgid >> 32, (u32)pid_tgid);
+               event_exit_send(ctx, tgid);
+       }
        return 0;
 }

This will close the current group exit issue, but we still have a smaller window where separate threads exit on their own and could race.

For that case the sensor handler or grpc one where the event parsing happens, plus the gc need to be bullet proof to ensure we collect process entries when we receive one single exit event (as live.counter == 0) is good indication anyway, and ignore following bpf exit event if ever they happen.

However I identified some other issues on the gc logic: #1517 where we underflow the refcount and won't collect entries... I will try to fix that separately anyway as this is another bug.

Maybe then if we feel we are fine we could have that code and switch off from CONFIG_BSD_PROCESS_ACCT.

For the time being I think current solution is ok

olsajiri added the release-note/minor This PR introduces a minor user-visible change label Sep 26, 2023

olsajiri force-pushed the another_exit_fix branch from bcb643a to 76fbd3d Compare September 26, 2023 21:10

olsajiri force-pushed the another_exit_fix branch 4 times, most recently from e215ca1 to 19ee35a Compare September 27, 2023 10:30

olsajiri force-pushed the another_exit_fix branch from 19ee35a to b723367 Compare September 27, 2023 10:35

olsajiri marked this pull request as ready for review September 27, 2023 11:44

olsajiri requested a review from a team as a code owner September 27, 2023 11:44

olsajiri requested review from tpapagian, tixxdz and kkourt September 27, 2023 11:44

olsajiri changed the title ~~exit fix~~ tetragon: Hook exit sensor on acct_process Sep 27, 2023

tpapagian approved these changes Sep 27, 2023

View reviewed changes

tixxdz approved these changes Sep 27, 2023

View reviewed changes

olsajiri mentioned this pull request Sep 27, 2023

Persistent memory escalation within Tetragon leads to Out-of-Memory errors #1485

Closed

olsajiri added 2 commits September 27, 2023 23:23

olsajiri force-pushed the another_exit_fix branch from b723367 to 233e626 Compare September 27, 2023 23:28

kkourt approved these changes Sep 28, 2023

View reviewed changes

kkourt merged commit 9a8f892 into cilium:main Sep 28, 2023
28 checks passed

p-linnane mentioned this pull request Nov 1, 2023

tetra 1.0.0 Homebrew/homebrew-core#153073

Merged

kkourt added needs-backport/0.9 backport-done/0.10 needs-backport/0.10 and removed backport-done/0.10 labels Nov 27, 2023

olsajiri mentioned this pull request Nov 27, 2023

tetragon: another exit fix #1800

Closed

This was referenced Nov 27, 2023

tetragon: another exit fix #1801

Closed

tetragon: another exit fix #1802

Merged

tetragon: another exit fix #1803

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tetragon: Hook exit sensor on acct_process #1509

tetragon: Hook exit sensor on acct_process #1509

olsajiri commented Sep 26, 2023 •

edited

jrfastab commented Sep 27, 2023

tixxdz commented Sep 27, 2023

tixxdz left a comment

tixxdz commented Sep 28, 2023

tetragon: Hook exit sensor on acct_process #1509

tetragon: Hook exit sensor on acct_process #1509

Conversation

olsajiri commented Sep 26, 2023 • edited

jrfastab commented Sep 27, 2023

tixxdz commented Sep 27, 2023

tixxdz left a comment

Choose a reason for hiding this comment

tixxdz commented Sep 28, 2023

olsajiri commented Sep 26, 2023 •

edited