New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pipeline memory efficiency using pool #3297
Conversation
Since it's difficult to call Results:
|
19adaff
to
a17cc3f
Compare
Accumulating results of 500 * N ( go func() {
for i := 0; i < b.N; i++ {
for j := 0; j < 500; j++ {
ctxChan <- &ctx
}
}
}()
And multi-core:
|
Nice work! I toyed with this optimization before but was not able to benchmark a difference, nice to see it has an effect. If and when memory arenas (added in go 1.20) are moved to standard from experimental, we may be able to further optimize event allocations. Edit: Would it be possible to add before and after pprofs? With the pyroscope makefile they should be relatively easy to generate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that the pool isn't passed to some of the stages (like sorting, enrichment, etc.), I assume because events aren't dropped there right? Have you tested with these additional pipeline stages (at least enrichment since we enable it by default in kubernetes).
Exactly.
So far, I've only simulated (roughly) the pipeline with the events_pipeline_bench_test.go. As you asked, I'll generate pyroscope outputs to take a look in the wild. |
This comment was marked as outdated.
This comment was marked as outdated.
The previous test is tainted. Warming up the event pool is useless as we were waiting 90s after the tracee starts to start emitting the events. The pool cools down as soon as the gc is triggered in the meantime. So, let's test without warming the pool and with different sleeps. Test#!/bin/bash
sleep 10 # cool down after building
sudo ./dist/tracee --metrics --pprof --pyroscope -f e=sched_process_exec -f comm=who -o none &
sleep 5
i=0; while ((i < 700000)); do who; ((i++)) ; done
sleep 30
sudo pkill tracee main branchevent-pool branchResults
We can see in the malloc, free and heap graphs less volatility, which is the purpose of using the event pool. So, warming the event pool doesn't make sense, since events receive isn't deterministic. I'll remove that logic. |
This commit employs sync.Pool to bolster memory performance in event handling. The benchmarking file, events_pipeline_bench_test.go, simulates the pipeline execution and measures sync.Pool's effectiveness. Benchmark command: go test \ -benchmem -benchtime=100x \ -run=^$ -bench ^(BenchmarkGetEventFromPool|BenchmarkNewEventObject)$ \ github.com/aquasecurity/tracee/pkg/ebpf Findings: Environment: - OS: Linux - Arch: amd64 - Package: github.com/aquasecurity/tracee/pkg/ebpf - CPU: Intel Core i7-12700H Benchmarking Results: BenchmarkGetEventFromPool (with pooling): - Time per Op: 1771.92 ms - Memory per Op: 243.8 KB - Allocations per Op: 542 BenchmarkNewEventObject (without pooling): - Time per Op: 1773.15 ms - Memory per Op: 896.2 KB - Allocations per Op: 2001 Using object pooling reduces memory allocation and allocation count by ~3.7x. In conclusion, sync.Pool results in remarkable memory optimization without substantial impact on runtime. This efficiency is critical in the pipeline that must have high throughput, as it can facilitate garbage collection and promote resource efficiency. For more information, see the benchmarking results of tracee actually running on aquasecurity#3297.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs a cleanup of leftovers. You can do the function move by your style call, otherwise LGTM.
Can I give a quick look before this is merged ? |
For sure. e2e 980 is green. |
As discussed offline with @NDStrahilevitz, there are other stages in the pipeline that can also make use of the event pool, such as instantiating derived and sigs events. |
This commit employs sync.Pool to bolster memory performance in event handling. The benchmarking file, events_pipeline_bench_test.go, simulates the pipeline execution and measures sync.Pool's effectiveness. Benchmark command: go test \ -benchmem -benchtime=100x \ -run=^$ -bench ^(BenchmarkGetEventFromPool|BenchmarkNewEventObject)$ \ github.com/aquasecurity/tracee/pkg/ebpf Findings: Environment: - OS: Linux - Arch: amd64 - Package: github.com/aquasecurity/tracee/pkg/ebpf - CPU: Intel Core i7-12700H Benchmarking Results: BenchmarkGetEventFromPool (with pooling): - Time per Op: 1771.92 ms - Memory per Op: 243.8 KB - Allocations per Op: 542 BenchmarkNewEventObject (without pooling): - Time per Op: 1773.15 ms - Memory per Op: 896.2 KB - Allocations per Op: 2001 Using object pooling reduces memory allocation and allocation count by ~3.7x. In conclusion, sync.Pool results in remarkable memory optimization without substantial impact on runtime. This efficiency is critical in the pipeline that must have high throughput, as it can facilitate garbage collection and promote resource efficiency. For more information, see the benchmarking results of tracee actually running on aquasecurity#3297.
This commit employs sync.Pool to bolster memory performance in event handling. The benchmarking file, events_pipeline_bench_test.go, simulates the pipeline execution and measures sync.Pool's effectiveness. Benchmark command: go test \ -benchmem -benchtime=100x \ -run=^$ -bench ^(BenchmarkGetEventFromPool|BenchmarkNewEventObject)$ \ github.com/aquasecurity/tracee/pkg/ebpf Findings: Environment: - OS: Linux - Arch: amd64 - Package: github.com/aquasecurity/tracee/pkg/ebpf - CPU: Intel Core i7-12700H Benchmarking Results: BenchmarkGetEventFromPool (with pooling): - Time per Op: 1740.88 ms - Memory per Op: 243.5 KB - Allocations per Op: 543 BenchmarkNewEventObject (without pooling): - Time per Op: 1754.69 ms - Memory per Op: 896.2 KB - Allocations per Op: 2001 Using object pooling reduces memory allocation and allocation count by ~3.7x. In conclusion, sync.Pool results in remarkable memory optimization without substantial impact on runtime. This efficiency is critical in the pipeline that must have high throughput, as it can facilitate garbage collection and promote resource efficiency. For more information, see the benchmarking results of tracee actually running on aquasecurity#3297.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I think this was a great example of PR. The benchmark was theoretical but yet ingenious. Nice work @geyslan!
1. Explain what the PR does
04bf853 perf:(pipeline) memory efficiency using pool (2023/jul/01) Geyslan Gregório <geyslan@gmail.com>
2. Explain how to test it
3. Other comments