Skip to content

Commit

Permalink
metric: Avoid memory leak/increase
Browse files Browse the repository at this point in the history
This commit is to make sure that the processed item in pod deletion
queue is removed by explicitly call Done() function as per suggestion
in godoc[^1].

The impact of not having this change will be increasing of memory in
cilium agent when the hubble metrics are enabled. This might take days
(if not weeks) to observe in a normal Cilium deployment due to low number
of Pod deletion events (i.e. in high churn environment, the memory will
be increasing in a faster pace).

Testing is done before and after the changes as per below.

Sample workload to simulate high number of pod deletion events

```yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: pod-churn-job
spec:
  completions: 50000000
  parallelism: 100
  template:
    metadata:
      labels:
        app: pod-churn-job
    spec:
      containers:
      - name: churn-app
        image: sandeshkv92/highpodchurn:linux_amd64
      restartPolicy: Never
```

Before this change, the cilium agent memory keeps increasing from 150MB
to ~500MB in less than 3 hours, while with the same workload configured
and this change, the memory is quite stable for a longer period (e.g. 5
hours).

[^1]: https://pkg.go.dev/k8s.io/client-go@v0.29.3/util/workqueue#Type.Get

Fixes: 782f934
Signed-off-by: Tam Mach <tam.mach@cilium.io>
  • Loading branch information
sayboras committed Apr 2, 2024
1 parent f77e831 commit 4d05e9f
Showing 1 changed file with 1 addition and 0 deletions.
1 change: 1 addition & 0 deletions pkg/hubble/metrics/metrics.go
Original file line number Diff line number Diff line change
Expand Up @@ -117,6 +117,7 @@ func initPodDeletionHandler() {
return
}
enabledMetrics.ProcessPodDeletion(pod.(*slim_corev1.Pod))
podDeletionHandler.queue.Done(pod)
}
}()
}
Expand Down

0 comments on commit 4d05e9f

Please sign in to comment.