pipeline memory efficiency using pool #3297

geyslan · 2023-07-03T18:08:16Z

1. Explain what the PR does

04bf853 perf:(pipeline) memory efficiency using pool _{(2023/jul/01) Geyslan Gregório <geyslan@gmail.com>}

This commit employs sync.Pool to bolster memory performance in event
handling. The benchmarking file, events_pipeline_bench_test.go,
simulates the pipeline execution and measures sync.Pool's effectiveness.

Benchmark command:

go test \
  -benchmem -benchtime=100x \
  -run=^$ -bench ^(BenchmarkGetEventFromPool|BenchmarkNewEventObject)$ \
  github.com/aquasecurity/tracee/pkg/ebpf

Findings:

Environment:
- OS: Linux
- Arch: amd64
- Package: github.com/aquasecurity/tracee/pkg/ebpf
- CPU: Intel Core i7-12700H

Benchmarking Results:

BenchmarkGetEventFromPool (with pooling):
- Time per Op: 1740.88 ms
- Memory per Op: 243.5 KB
- Allocations per Op: 543

BenchmarkNewEventObject (without pooling):
- Time per Op: 1754.69 ms
- Memory per Op: 896.2 KB
- Allocations per Op: 2001

Using object pooling reduces memory allocation and allocation count
by ~3.7x.

In conclusion, sync.Pool results in remarkable memory optimization
without substantial impact on runtime. This efficiency is critical in
the pipeline that must have high throughput, as it can facilitate
garbage collection and promote resource efficiency.

For more information, see the benchmarking results of tracee actually
running on #3297.

2. Explain how to test it

3. Other comments

geyslan · 2023-07-03T18:10:45Z

Since it's difficult to call decodeEvents() directly without simulating all prior steps, the benchmark tests were implemented in a separate file, events_pipeline_bench_test.go, which simulates the pipeline execution and measures the performance of the use of sync.Pool.

Results:

go test \
  -benchmem -benchtime=10000x -cpu=1 \
  -run=^$ -bench ^(BenchmarkEventPool|BenchmarkEventNew)$ \
  github.com/aquasecurity/tracee/pkg/ebpf

Test #	EventPool ns/op	EventNew ns/op	% Improvement in Time	EventPool B/op	EventNew B/op	% Improvement in Memory	EventNew allocs/op
1	1,178,529	1,184,807	+0.53%	26	448	+94.20%	1
2	1,187,255	1,175,401	-1.01%	26	448	+94.20%	1
3	1,186,611	1,169,843	-1.43%	26	448	+94.20%	1
4	1,184,051	1,196,323	+1.03%	26	448	+94.20%	1
5	1,180,822	1,182,371	+0.13%	26	448	+94.20%	1
6	1,181,872	1,186,390	+0.38%	26	448	+94.20%	1

geyslan · 2023-07-03T21:14:45Z

Accumulating results of 500 * N (-benchtime=100x), which keeps the simulated pipeline warm, e.g.:

	go func() {
		for i := 0; i < b.N; i++ {
			for j := 0; j < 500; j++ {
				ctxChan <- &ctx
			}
		}
	}()

Benchmark Name	Time per Operation (ns/op)	Total Time Elapsed (s)	Memory Allocation (B/op)	Memory Allocation Improvement (%)	Number of Allocations (allocs/op)	Allocation Improvement (%)
BenchmarkEventNew	601,467,798	60.15	224,007	Reference	500	Reference
BenchmarkEventPool	598,330,196	59.83	2,649	~98.8% improvement	0	100% improvement

And multi-core:

Benchmark Name	Time per Operation (ns/op)	Total Time Elapsed (s)	Memory Allocation (B/op)	Memory Allocation Improvement (%)	Number of Allocations (allocs/op)	Allocation Improvement (%)
BenchmarkEventNew-20	599,962,102	60.00	224,150	Reference	501	Reference
BenchmarkEventPool-20	594,689,167	59.47	1,788	~99.2% improvement	1	~99.8% improvement

NDStrahilevitz · 2023-07-04T06:10:42Z

Nice work! I toyed with this optimization before but was not able to benchmark a difference, nice to see it has an effect. If and when memory arenas (added in go 1.20) are moved to standard from experimental, we may be able to further optimize event allocations.

Edit: Would it be possible to add before and after pprofs? With the pyroscope makefile they should be relatively easy to generate.

NDStrahilevitz

I see that the pool isn't passed to some of the stages (like sorting, enrichment, etc.), I assume because events aren't dropped there right? Have you tested with these additional pipeline stages (at least enrichment since we enable it by default in kubernetes).

pkg/ebpf/events_pipeline.go

geyslan · 2023-07-04T14:17:12Z

I see that the pool isn't passed to some of the stages (like sorting, enrichment, etc.), I assume because events aren't dropped there right?

Exactly.

Have you tested with these additional pipeline stages (at least enrichment since we enable it by default in kubernetes).

So far, I've only simulated (roughly) the pipeline with the events_pipeline_bench_test.go. As you asked, I'll generate pyroscope outputs to take a look in the wild.

geyslan · 2023-07-04T22:23:13Z

The previous test is tainted. Warming up the event pool is useless as we were waiting 90s after the tracee starts to start emitting the events. The pool cools down as soon as the gc is triggered in the meantime.

So, let's test without warming the pool and with different sleeps.

Test

#!/bin/bash

sleep 10 # cool down after building

sudo ./dist/tracee --metrics --pprof --pyroscope -f e=sched_process_exec -f comm=who -o none &

sleep 5

i=0; while ((i < 700000)); do who; ((i++)) ; done

sleep 30

sudo pkill tracee

main branch

event-pool branch

Results

Less mallocs 1-(38707896/39296721) ~= -1,49%
Heap (in use and objects) seem to be used more in average:
- Less malloc rate 1-(50049/55230) ~= -9,38%
- More heap objects in use 1-(206973/182242) ~= +13,57% (in this case is good)
GC pressure data (avg) appears to be a bit lower: 53.3MB
GC pressure time (avg) shows an improvement of 1-(8,4/9,47)µs ~= -11,29%

We can see in the malloc, free and heap graphs less volatility, which is the purpose of using the event pool.

So, warming the event pool doesn't make sense, since events receive isn't deterministic. I'll remove that logic.

This commit employs sync.Pool to bolster memory performance in event handling. The benchmarking file, events_pipeline_bench_test.go, simulates the pipeline execution and measures sync.Pool's effectiveness. Benchmark command: go test \ -benchmem -benchtime=100x \ -run=^$ -bench ^(BenchmarkGetEventFromPool|BenchmarkNewEventObject)$ \ github.com/aquasecurity/tracee/pkg/ebpf Findings: Environment: - OS: Linux - Arch: amd64 - Package: github.com/aquasecurity/tracee/pkg/ebpf - CPU: Intel Core i7-12700H Benchmarking Results: BenchmarkGetEventFromPool (with pooling): - Time per Op: 1771.92 ms - Memory per Op: 243.8 KB - Allocations per Op: 542 BenchmarkNewEventObject (without pooling): - Time per Op: 1773.15 ms - Memory per Op: 896.2 KB - Allocations per Op: 2001 Using object pooling reduces memory allocation and allocation count by ~3.7x. In conclusion, sync.Pool results in remarkable memory optimization without substantial impact on runtime. This efficiency is critical in the pipeline that must have high throughput, as it can facilitate garbage collection and promote resource efficiency. For more information, see the benchmarking results of tracee actually running on aquasecurity#3297.

NDStrahilevitz

Needs a cleanup of leftovers. You can do the function move by your style call, otherwise LGTM.

pkg/ebpf/events_pipeline.go

rafaeldtinoco · 2023-07-05T12:40:56Z

Can I give a quick look before this is merged ?

geyslan · 2023-07-05T12:44:29Z

Can I give a quick look before this is merged ?

For sure.

e2e 980 is green.

geyslan · 2023-07-05T13:03:10Z

As discussed offline with @NDStrahilevitz, there are other stages in the pipeline that can also make use of the event pool, such as instantiating derived and sigs events.

This commit employs sync.Pool to bolster memory performance in event handling. The benchmarking file, events_pipeline_bench_test.go, simulates the pipeline execution and measures sync.Pool's effectiveness. Benchmark command: go test \ -benchmem -benchtime=100x \ -run=^$ -bench ^(BenchmarkGetEventFromPool|BenchmarkNewEventObject)$ \ github.com/aquasecurity/tracee/pkg/ebpf Findings: Environment: - OS: Linux - Arch: amd64 - Package: github.com/aquasecurity/tracee/pkg/ebpf - CPU: Intel Core i7-12700H Benchmarking Results: BenchmarkGetEventFromPool (with pooling): - Time per Op: 1771.92 ms - Memory per Op: 243.8 KB - Allocations per Op: 542 BenchmarkNewEventObject (without pooling): - Time per Op: 1773.15 ms - Memory per Op: 896.2 KB - Allocations per Op: 2001 Using object pooling reduces memory allocation and allocation count by ~3.7x. In conclusion, sync.Pool results in remarkable memory optimization without substantial impact on runtime. This efficiency is critical in the pipeline that must have high throughput, as it can facilitate garbage collection and promote resource efficiency. For more information, see the benchmarking results of tracee actually running on aquasecurity#3297.

This commit employs sync.Pool to bolster memory performance in event handling. The benchmarking file, events_pipeline_bench_test.go, simulates the pipeline execution and measures sync.Pool's effectiveness. Benchmark command: go test \ -benchmem -benchtime=100x \ -run=^$ -bench ^(BenchmarkGetEventFromPool|BenchmarkNewEventObject)$ \ github.com/aquasecurity/tracee/pkg/ebpf Findings: Environment: - OS: Linux - Arch: amd64 - Package: github.com/aquasecurity/tracee/pkg/ebpf - CPU: Intel Core i7-12700H Benchmarking Results: BenchmarkGetEventFromPool (with pooling): - Time per Op: 1740.88 ms - Memory per Op: 243.5 KB - Allocations per Op: 543 BenchmarkNewEventObject (without pooling): - Time per Op: 1754.69 ms - Memory per Op: 896.2 KB - Allocations per Op: 2001 Using object pooling reduces memory allocation and allocation count by ~3.7x. In conclusion, sync.Pool results in remarkable memory optimization without substantial impact on runtime. This efficiency is critical in the pipeline that must have high throughput, as it can facilitate garbage collection and promote resource efficiency. For more information, see the benchmarking results of tracee actually running on aquasecurity#3297.

pkg/ebpf/events_pipeline.go

rafaeldtinoco

LGTM. I think this was a great example of PR. The benchmark was theoretical but yet ingenious. Nice work @geyslan!

geyslan added area/performance kind/chore labels Jul 3, 2023

geyslan self-assigned this Jul 3, 2023

github-actions bot added area/ebpf area/testing and removed area/performance labels Jul 3, 2023

geyslan requested a review from yanivagman July 3, 2023 18:10

geyslan force-pushed the event-pool branch 2 times, most recently from 19adaff to a17cc3f Compare July 3, 2023 18:27

NDStrahilevitz self-requested a review July 4, 2023 06:10

NDStrahilevitz reviewed Jul 4, 2023

View reviewed changes

pkg/ebpf/events_pipeline.go Outdated Show resolved Hide resolved

geyslan force-pushed the event-pool branch from a17cc3f to 2831125 Compare July 4, 2023 14:23

This comment was marked as outdated.

Sign in to view

geyslan force-pushed the event-pool branch from 2831125 to e530c41 Compare July 4, 2023 22:48

geyslan requested a review from NDStrahilevitz July 4, 2023 22:48

NDStrahilevitz approved these changes Jul 5, 2023

View reviewed changes

pkg/ebpf/events_pipeline.go Show resolved Hide resolved

pkg/ebpf/events_pipeline.go Outdated Show resolved Hide resolved

pkg/ebpf/events_pipeline.go Outdated Show resolved Hide resolved

pkg/ebpf/events_pipeline.go Outdated Show resolved Hide resolved

geyslan force-pushed the event-pool branch from e530c41 to fb36c2c Compare July 5, 2023 13:03

geyslan force-pushed the event-pool branch from fb36c2c to 04bf853 Compare July 5, 2023 13:26

NDStrahilevitz reviewed Jul 5, 2023

View reviewed changes

pkg/ebpf/events_pipeline.go Show resolved Hide resolved

geyslan changed the title ~~perf:(pipeline) memory efficiency using pool~~ pipeline memory efficiency using pool Jul 6, 2023

rafaeldtinoco approved these changes Jul 11, 2023

View reviewed changes

rafaeldtinoco merged commit 9c4dd04 into aquasecurity:main Jul 11, 2023
25 checks passed

geyslan mentioned this pull request Jul 11, 2023

fix(server): re-enable prometheus counters. #3304

Merged

geyslan deleted the event-pool branch July 31, 2023 22:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pipeline memory efficiency using pool #3297

pipeline memory efficiency using pool #3297

geyslan commented Jul 3, 2023 •

edited

geyslan commented Jul 3, 2023

geyslan commented Jul 3, 2023 •

edited

NDStrahilevitz commented Jul 4, 2023 •

edited

NDStrahilevitz left a comment

geyslan commented Jul 4, 2023

This comment was marked as outdated.

geyslan commented Jul 4, 2023 •

edited

NDStrahilevitz left a comment

rafaeldtinoco commented Jul 5, 2023

geyslan commented Jul 5, 2023

geyslan commented Jul 5, 2023 •

edited

rafaeldtinoco left a comment

pipeline memory efficiency using pool #3297

pipeline memory efficiency using pool #3297

Conversation

geyslan commented Jul 3, 2023 • edited

1. Explain what the PR does

2. Explain how to test it

3. Other comments

geyslan commented Jul 3, 2023

geyslan commented Jul 3, 2023 • edited

NDStrahilevitz commented Jul 4, 2023 • edited

NDStrahilevitz left a comment

Choose a reason for hiding this comment

geyslan commented Jul 4, 2023

This comment was marked as outdated.

geyslan commented Jul 4, 2023 • edited

Test

main branch

event-pool branch

Results

NDStrahilevitz left a comment

Choose a reason for hiding this comment

rafaeldtinoco commented Jul 5, 2023

geyslan commented Jul 5, 2023

geyslan commented Jul 5, 2023 • edited

rafaeldtinoco left a comment

Choose a reason for hiding this comment

geyslan commented Jul 3, 2023 •

edited

geyslan commented Jul 3, 2023 •

edited

NDStrahilevitz commented Jul 4, 2023 •

edited

geyslan commented Jul 4, 2023 •

edited

geyslan commented Jul 5, 2023 •

edited