Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pipeline memory efficiency using pool #3297

Merged
merged 1 commit into from Jul 11, 2023

Conversation

geyslan
Copy link
Member

@geyslan geyslan commented Jul 3, 2023

1. Explain what the PR does

04bf853 perf:(pipeline) memory efficiency using pool (2023/jul/01) Geyslan Gregório <geyslan@gmail.com>

This commit employs sync.Pool to bolster memory performance in event
handling. The benchmarking file, events_pipeline_bench_test.go,
simulates the pipeline execution and measures sync.Pool's effectiveness.

Benchmark command:

go test \
  -benchmem -benchtime=100x \
  -run=^$ -bench ^(BenchmarkGetEventFromPool|BenchmarkNewEventObject)$ \
  github.com/aquasecurity/tracee/pkg/ebpf

Findings:

Environment:
- OS: Linux
- Arch: amd64
- Package: github.com/aquasecurity/tracee/pkg/ebpf
- CPU: Intel Core i7-12700H

Benchmarking Results:

BenchmarkGetEventFromPool (with pooling):
- Time per Op: 1740.88 ms
- Memory per Op: 243.5 KB
- Allocations per Op: 543

BenchmarkNewEventObject (without pooling):
- Time per Op: 1754.69 ms
- Memory per Op: 896.2 KB
- Allocations per Op: 2001

Using object pooling reduces memory allocation and allocation count
by ~3.7x.

In conclusion, sync.Pool results in remarkable memory optimization
without substantial impact on runtime. This efficiency is critical in
the pipeline that must have high throughput, as it can facilitate
garbage collection and promote resource efficiency.

For more information, see the benchmarking results of tracee actually
running on #3297.

2. Explain how to test it

3. Other comments

@geyslan
Copy link
Member Author

geyslan commented Jul 3, 2023

Since it's difficult to call decodeEvents() directly without simulating all prior steps, the benchmark tests were implemented in a separate file, events_pipeline_bench_test.go, which simulates the pipeline execution and measures the performance of the use of sync.Pool.

Results:

go test \
  -benchmem -benchtime=10000x -cpu=1 \
  -run=^$ -bench ^(BenchmarkEventPool|BenchmarkEventNew)$ \
  github.com/aquasecurity/tracee/pkg/ebpf
Test # EventPool ns/op EventNew ns/op % Improvement in Time EventPool B/op EventNew B/op % Improvement in Memory EventPool allocs/op EventNew allocs/op
1 1,178,529 1,184,807 +0.53% 26 448 +94.20% 0 1
2 1,187,255 1,175,401 -1.01% 26 448 +94.20% 0 1
3 1,186,611 1,169,843 -1.43% 26 448 +94.20% 0 1
4 1,184,051 1,196,323 +1.03% 26 448 +94.20% 0 1
5 1,180,822 1,182,371 +0.13% 26 448 +94.20% 0 1
6 1,181,872 1,186,390 +0.38% 26 448 +94.20% 0 1

@geyslan geyslan requested a review from yanivagman July 3, 2023 18:10
@geyslan geyslan force-pushed the event-pool branch 2 times, most recently from 19adaff to a17cc3f Compare July 3, 2023 18:27
@geyslan
Copy link
Member Author

geyslan commented Jul 3, 2023

Accumulating results of 500 * N (-benchtime=100x), which keeps the simulated pipeline warm, e.g.:

	go func() {
		for i := 0; i < b.N; i++ {
			for j := 0; j < 500; j++ {
				ctxChan <- &ctx
			}
		}
	}()
Benchmark Name Time per Operation (ns/op) Total Time Elapsed (s) Memory Allocation (B/op) Memory Allocation Improvement (%) Number of Allocations (allocs/op) Allocation Improvement (%)
BenchmarkEventNew 601,467,798 60.15 224,007 Reference 500 Reference
BenchmarkEventPool 598,330,196 59.83 2,649 ~98.8% improvement 0 100% improvement

And multi-core:

Benchmark Name Time per Operation (ns/op) Total Time Elapsed (s) Memory Allocation (B/op) Memory Allocation Improvement (%) Number of Allocations (allocs/op) Allocation Improvement (%)
BenchmarkEventNew-20 599,962,102 60.00 224,150 Reference 501 Reference
BenchmarkEventPool-20 594,689,167 59.47 1,788 ~99.2% improvement 1 ~99.8% improvement

@NDStrahilevitz
Copy link
Collaborator

NDStrahilevitz commented Jul 4, 2023

Nice work! I toyed with this optimization before but was not able to benchmark a difference, nice to see it has an effect. If and when memory arenas (added in go 1.20) are moved to standard from experimental, we may be able to further optimize event allocations.

Edit: Would it be possible to add before and after pprofs? With the pyroscope makefile they should be relatively easy to generate.

@NDStrahilevitz NDStrahilevitz self-requested a review July 4, 2023 06:10
Copy link
Collaborator

@NDStrahilevitz NDStrahilevitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that the pool isn't passed to some of the stages (like sorting, enrichment, etc.), I assume because events aren't dropped there right? Have you tested with these additional pipeline stages (at least enrichment since we enable it by default in kubernetes).

pkg/ebpf/events_pipeline.go Outdated Show resolved Hide resolved
@geyslan
Copy link
Member Author

geyslan commented Jul 4, 2023

I see that the pool isn't passed to some of the stages (like sorting, enrichment, etc.), I assume because events aren't dropped there right?

Exactly.

Have you tested with these additional pipeline stages (at least enrichment since we enable it by default in kubernetes).

So far, I've only simulated (roughly) the pipeline with the events_pipeline_bench_test.go. As you asked, I'll generate pyroscope outputs to take a look in the wild.

@geyslan

This comment was marked as outdated.

@geyslan
Copy link
Member Author

geyslan commented Jul 4, 2023

The previous test is tainted. Warming up the event pool is useless as we were waiting 90s after the tracee starts to start emitting the events. The pool cools down as soon as the gc is triggered in the meantime.

So, let's test without warming the pool and with different sleeps.

Test

#!/bin/bash

sleep 10 # cool down after building

sudo ./dist/tracee --metrics --pprof --pyroscope -f e=sched_process_exec -f comm=who -o none &

sleep 5

i=0; while ((i < 700000)); do who; ((i++)) ; done

sleep 30

sudo pkill tracee

main branch

image

event-pool branch

image

Results

  • Less mallocs 1-(38707896/39296721) ~= -1,49%
  • Heap (in use and objects) seem to be used more in average:
    • Less malloc rate 1-(50049/55230) ~= -9,38%
    • More heap objects in use 1-(206973/182242) ~= +13,57% (in this case is good)
  • GC pressure data (avg) appears to be a bit lower: 53.3MB
  • GC pressure time (avg) shows an improvement of 1-(8,4/9,47)µs ~= -11,29%

We can see in the malloc, free and heap graphs less volatility, which is the purpose of using the event pool.

So, warming the event pool doesn't make sense, since events receive isn't deterministic. I'll remove that logic.

geyslan added a commit to geyslan/tracee that referenced this pull request Jul 4, 2023
This commit employs sync.Pool to bolster memory performance in event
handling. The benchmarking file, events_pipeline_bench_test.go,
simulates the pipeline execution and measures sync.Pool's effectiveness.

Benchmark command:

go test \
  -benchmem -benchtime=100x \
  -run=^$ -bench ^(BenchmarkGetEventFromPool|BenchmarkNewEventObject)$ \
  github.com/aquasecurity/tracee/pkg/ebpf

Findings:

Environment:
- OS: Linux
- Arch: amd64
- Package: github.com/aquasecurity/tracee/pkg/ebpf
- CPU: Intel Core i7-12700H

Benchmarking Results:

BenchmarkGetEventFromPool (with pooling):
- Time per Op: 1771.92 ms
- Memory per Op: 243.8 KB
- Allocations per Op: 542

BenchmarkNewEventObject (without pooling):
- Time per Op: 1773.15 ms
- Memory per Op: 896.2 KB
- Allocations per Op: 2001

Using object pooling reduces memory allocation and allocation count
by ~3.7x.

In conclusion, sync.Pool results in remarkable memory optimization
without substantial impact on runtime. This efficiency is critical in
the pipeline that must have high throughput, as it can facilitate
garbage collection and promote resource efficiency.

For more information, see the benchmarking results of tracee actually
running on aquasecurity#3297.
Copy link
Collaborator

@NDStrahilevitz NDStrahilevitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a cleanup of leftovers. You can do the function move by your style call, otherwise LGTM.

pkg/ebpf/events_pipeline.go Show resolved Hide resolved
pkg/ebpf/events_pipeline.go Outdated Show resolved Hide resolved
pkg/ebpf/events_pipeline.go Outdated Show resolved Hide resolved
pkg/ebpf/events_pipeline.go Outdated Show resolved Hide resolved
@rafaeldtinoco
Copy link
Contributor

Can I give a quick look before this is merged ?

@geyslan
Copy link
Member Author

geyslan commented Jul 5, 2023

Can I give a quick look before this is merged ?

For sure.

e2e 980 is green.

@geyslan
Copy link
Member Author

geyslan commented Jul 5, 2023

As discussed offline with @NDStrahilevitz, there are other stages in the pipeline that can also make use of the event pool, such as instantiating derived and sigs events.

geyslan added a commit to geyslan/tracee that referenced this pull request Jul 5, 2023
This commit employs sync.Pool to bolster memory performance in event
handling. The benchmarking file, events_pipeline_bench_test.go,
simulates the pipeline execution and measures sync.Pool's effectiveness.

Benchmark command:

go test \
  -benchmem -benchtime=100x \
  -run=^$ -bench ^(BenchmarkGetEventFromPool|BenchmarkNewEventObject)$ \
  github.com/aquasecurity/tracee/pkg/ebpf

Findings:

Environment:
- OS: Linux
- Arch: amd64
- Package: github.com/aquasecurity/tracee/pkg/ebpf
- CPU: Intel Core i7-12700H

Benchmarking Results:

BenchmarkGetEventFromPool (with pooling):
- Time per Op: 1771.92 ms
- Memory per Op: 243.8 KB
- Allocations per Op: 542

BenchmarkNewEventObject (without pooling):
- Time per Op: 1773.15 ms
- Memory per Op: 896.2 KB
- Allocations per Op: 2001

Using object pooling reduces memory allocation and allocation count
by ~3.7x.

In conclusion, sync.Pool results in remarkable memory optimization
without substantial impact on runtime. This efficiency is critical in
the pipeline that must have high throughput, as it can facilitate
garbage collection and promote resource efficiency.

For more information, see the benchmarking results of tracee actually
running on aquasecurity#3297.
This commit employs sync.Pool to bolster memory performance in event
handling. The benchmarking file, events_pipeline_bench_test.go,
simulates the pipeline execution and measures sync.Pool's effectiveness.

Benchmark command:

go test \
  -benchmem -benchtime=100x \
  -run=^$ -bench ^(BenchmarkGetEventFromPool|BenchmarkNewEventObject)$ \
  github.com/aquasecurity/tracee/pkg/ebpf

Findings:

Environment:
- OS: Linux
- Arch: amd64
- Package: github.com/aquasecurity/tracee/pkg/ebpf
- CPU: Intel Core i7-12700H

Benchmarking Results:

BenchmarkGetEventFromPool (with pooling):
- Time per Op: 1740.88 ms
- Memory per Op: 243.5 KB
- Allocations per Op: 543

BenchmarkNewEventObject (without pooling):
- Time per Op: 1754.69 ms
- Memory per Op: 896.2 KB
- Allocations per Op: 2001

Using object pooling reduces memory allocation and allocation count
by ~3.7x.

In conclusion, sync.Pool results in remarkable memory optimization
without substantial impact on runtime. This efficiency is critical in
the pipeline that must have high throughput, as it can facilitate
garbage collection and promote resource efficiency.

For more information, see the benchmarking results of tracee actually
running on aquasecurity#3297.
@geyslan geyslan changed the title perf:(pipeline) memory efficiency using pool pipeline memory efficiency using pool Jul 6, 2023
Copy link
Contributor

@rafaeldtinoco rafaeldtinoco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I think this was a great example of PR. The benchmark was theoretical but yet ingenious. Nice work @geyslan!

@rafaeldtinoco rafaeldtinoco merged commit 9c4dd04 into aquasecurity:main Jul 11, 2023
25 checks passed
@geyslan geyslan deleted the event-pool branch July 31, 2023 22:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants