fix (cgroups): already dead edge case #3325

NDStrahilevitz · 2023-07-18T15:03:55Z

commit b3d043c558b50526fef1c7e65c8d89db8f984d28
Author: Nadav Strahilevitz <nadav.strahilevitz@aquasec.com>
Date:   Tue Jul 18 14:59:12 2023 +0000

    fix (cgroups): already dead edge case
    
    Certain kubernetes version make use of short lived cgroups for various
    tasks (for example log rotation). These cgroups will generate events and
    a very quick cgroup_rmdir event.
    As such, many related events to the cgroup will attempt to query its
    directory through the recursive path and not find it.
    
    Commit adds a "Dead" field to the cgroup info to indicate a cgroup which
    has already been removed. Various logical sections can refer to it if
    its relevant to them, and more importantly, additional queries will not
    be attempted.
    
    Bonus: optimize away additional Stat syscall in containers by returning
    the directory ctime in cgroup.GetCgroupPath.

Fix #3324

@rafaeldtinoco I want to backport this to v0.16.0 branch if that's ok with you.

Certain kubernetes version make use of short lived cgroups for various tasks (for example log rotation). These cgroups will generate events and a very quick cgroup_rmdir event. As such, many related events to the cgroup will attempt to query its directory through the recursive path and not find it. Commit adds a "Dead" field to the cgroup info to indicate a cgroup which has already been removed. Various logical sections can refer to it if its relevant to them, and more importantly, additional queries will not be attempted. Bonus: optimize away additional Stat syscall in containers by returning the directory ctime in cgroup.GetCgroupPath.

yanivagman · 2023-07-19T11:24:38Z

pkg/ebpf/controlplane/controller.go

@@ -137,7 +137,7 @@ func (p *Controller) processCgroupMkdir(args []trace.Argument) error {
 	if err != nil {
 		return errfmt.WrapError(err)
 	}
-	if info.Container.ContainerId == "" {
+	if info.Container.ContainerId == "" && !info.Dead {


Can this ever happen?
It means that we get mkdir event after rmdir event, doesn't it?

It does, and it was the original indicator for the error (although it was on a tracee v0.11.1 installation).
This can happen because it's a race between userspace and ebpf. The entry can be deleted either here OR in the cgroup_rmdir program.
I guess this would also require a misorder in events received, so this doesn't fully resolve the case, but the result here is only an annoying (otherwise harmless) error log.

NDStrahilevitz added the milestone/v0.17.0 label Jul 18, 2023

NDStrahilevitz requested review from rafaeldtinoco and yanivagman July 18, 2023 15:03

NDStrahilevitz self-assigned this Jul 18, 2023

NDStrahilevitz marked this pull request as ready for review July 18, 2023 15:04

github-actions bot added the area/ebpf label Jul 18, 2023

NDStrahilevitz force-pushed the dead_cgroups_fix branch from 9435e5d to b3d043c Compare July 18, 2023 15:05

yanivagman reviewed Jul 19, 2023

View reviewed changes

yanivagman approved these changes Jul 19, 2023

View reviewed changes

NDStrahilevitz merged commit 0c5719a into aquasecurity:main Jul 19, 2023
25 checks passed

NDStrahilevitz mentioned this pull request Jul 19, 2023

[v0.16.0] backport: dead cgroups fix #3326

Merged

josedonizetti added backport/v0.17.0 and removed milestone/v0.17.0 labels Jul 19, 2023

rafaeldtinoco mentioned this pull request Oct 26, 2023

kubernetes enrichment not working on kind/minikube #3593

Closed

josedonizetti mentioned this pull request Oct 26, 2023

fix: enrichment for kind and minikube #3598

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix (cgroups): already dead edge case #3325

fix (cgroups): already dead edge case #3325

NDStrahilevitz commented Jul 18, 2023 •

edited

yanivagman Jul 19, 2023

NDStrahilevitz Jul 19, 2023 •

edited

fix (cgroups): already dead edge case #3325

fix (cgroups): already dead edge case #3325

Conversation

NDStrahilevitz commented Jul 18, 2023 • edited

yanivagman Jul 19, 2023

Choose a reason for hiding this comment

NDStrahilevitz Jul 19, 2023 • edited

Choose a reason for hiding this comment

NDStrahilevitz commented Jul 18, 2023 •

edited

NDStrahilevitz Jul 19, 2023 •

edited