-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
daemon, fqdn: Add flag to control FQDN regex LRU size #19383
daemon, fqdn: Add flag to control FQDN regex LRU size #19383
Conversation
b.Skip() | ||
|
||
bb, err := ioutil.ReadFile("testdata/cnps-large.yaml") | ||
if err != nil { | ||
b.Fatal(err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now this benchmark will be skipped until we have a proper fixture for CNPs checked in.
/test Job 'Cilium-PR-K8s-1.21-kernel-5.4' failed: Click to show.Test Name
Failure Output
If it is a flake and a GitHub issue doesn't already exist to track it, comment Edit:
|
/test-1.23-net-next Job 'Cilium-PR-K8s-GKE' failed: Click to show.Test Name
Failure Output
If it is a flake and a GitHub issue doesn't already exist to track it, comment Edit: hit #16159 |
/test-1.21-5.4 |
Will push short to resolve conflict. Noting that CI has passed besides known flakes. |
Advanced users can configure the LRU size for the cache holding the compiled regex expressions of FQDN match{Pattern,Name}. This is useful if users are experiencing high memory usage spikes with many FQDN policies that have repeated matchPattern or matchName across many different policies. Signed-off-by: Chris Tarazi <chris@isovalent.com>
When comparing efficieny of increasing the LRU size from 128 to 1024 with ~22k CNPs, we see the following results: ``` \# LRU size 128. $ go test -tags privileged_tests -v -run '^$' -bench Benchmark_perEPAllow_setPortRulesForID_large -benchmem -benchtime 1x -memprofile memprofile.out ./pkg/fqdn/dnsproxy > old.txt \# LRU size 1024. $ go test -tags privileged_tests -v -run '^$' -bench Benchmark_perEPAllow_setPortRulesForID_large -benchmem -benchtime 1x -memprofile memprofile.out ./pkg/fqdn/dnsproxy > new.txt $ benchcmp old.txt new.txt benchcmp is deprecated in favor of benchstat: https://pkg.go.dev/golang.org/x/perf/cmd/benchstat benchmark old ns/op new ns/op delta Benchmark_perEPAllow_setPortRulesForID_large-8 3954101340 3010934555 -23.85% benchmark old allocs new allocs delta Benchmark_perEPAllow_setPortRulesForID_large-8 26480632 24167742 -8.73% benchmark old bytes new bytes delta Benchmark_perEPAllow_setPortRulesForID_large-8 2899811832 1824062992 -37.10% ``` Here's the raw test run with LRU size at 128: ``` Before (N=1) Alloc = 31 MiB HeapInuse = 45 MiB Sys = 1260 MiB NumGC = 15 After (N=1) Alloc = 445 MiB HeapInuse = 459 MiB Sys = 1260 MiB NumGC = 40 ``` Here's the raw test run with LRU size at 1024: ``` Before (N=1) Alloc = 31 MiB HeapInuse = 48 MiB Sys = 1177 MiB NumGC = 17 After (N=1) Alloc = 78 MiB HeapInuse = 93 MiB Sys = 1177 MiB NumGC = 53 ``` We can see that it's saving ~300MB. Furthermore, if we compare the memprofiles from the benchmark run via ``` go tool pprof -http :8080 -diff_base memprofile.out memprofile.1024.out ``` we see an ~800MB reduction in the regex compilation. Signed-off-by: Chris Tarazi <chris@isovalent.com>
Following the previous commit's benchmark result, let's update the LRU default size to be 1024, given that it only results in a few 10's of MBs increase when the cache nears full. Signed-off-by: Chris Tarazi <chris@isovalent.com>
c8cb648
to
676e87c
Compare
Travis passed as a sanity check for the rebase. Marking ready to merge given #19383 (comment). |
Previously, the printed memory values depended greatly on both the size of the loaded CNPs in memory, as well as when the gc ran for the last time. That favored solutions that allocated a lot of memory in the start of the test (but still inside the actual testing section), while solution allocating a lot of memory in the end, even if they removed all the referneces to it, were punished due the fact that we printed the memory stats before forcing a gc. For heap, the diff between the "Before Setup" and "After Test" should be inspected, since "Before Test" has all the CNPs in memory. When we hit "After test", those CNPs have been garbage collected. New output: root@10fc3f693c5d:/code# go test -tags privileged_tests -v -run '^$' -bench Benchmark_perEPAllow_setPortRulesForID_large -benchmem -benchtime 1x ./pkg/fqdn/dnsproxy goos: linux goarch: amd64 pkg: github.com/cilium/cilium/pkg/fqdn/dnsproxy cpu: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz Benchmark_perEPAllow_setPortRulesForID_large Before Setup (N=1,EPs=20,cache=128) Alloc = 3 MiB HeapInuse = 5 MiB Sys = 20 MiB NumGC = 6 Before Test (N=1,EPs=20,cache=128) Alloc = 32 MiB HeapInuse = 50 MiB Sys = 987 MiB NumGC = 16 After Test (N=1,EPs=20,cache=128) Alloc = 341 MiB HeapInuse = 374 MiB Sys = 987 MiB NumGC = 38 Before Setup (N=1,EPs=20,cache=128) Alloc = 3 MiB HeapInuse = 6 MiB Sys = 987 MiB NumGC = 40 Before Test (N=1,EPs=20,cache=128) Alloc = 32 MiB HeapInuse = 56 MiB Sys = 1020 MiB NumGC = 49 After Test (N=1,EPs=20,cache=128) Alloc = 344 MiB HeapInuse = 378 MiB Sys = 1020 MiB NumGC = 71 Benchmark_perEPAllow_setPortRulesForID_large-7 1 4301736800 ns/op 3047525016 B/op 26526920 allocs/op PASS ok github.com/cilium/cilium/pkg/fqdn/dnsproxy 20.796s Fixes: cilium#19383 Fixes: 38c0036 ("dnsproxy: Add benchmark for large FQDN-based CNPs") Signed-off-by: Odin Ugedal <ougedal@palantir.com>
Previously, the printed memory values depended greatly on both the size of the loaded CNPs in memory, as well as when the gc ran for the last time. That favored solutions that allocated a lot of memory in the start of the test (but still inside the actual testing section), while solution allocating a lot of memory in the end, even if they removed all the referneces to it, were punished due the fact that we printed the memory stats before forcing a gc. For heap, the diff between the "Before Setup" and "After Test" should be inspected, since "Before Test" has all the CNPs in memory. When we hit "After test", those CNPs have been garbage collected. New output: root@10fc3f693c5d:/code# go test -tags privileged_tests -v -run '^$' -bench Benchmark_perEPAllow_setPortRulesForID_large -benchmem -benchtime 1x ./pkg/fqdn/dnsproxy goos: linux goarch: amd64 pkg: github.com/cilium/cilium/pkg/fqdn/dnsproxy cpu: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz Benchmark_perEPAllow_setPortRulesForID_large Before Setup (N=1,EPs=20,cache=128) Alloc = 3 MiB HeapInuse = 5 MiB Sys = 20 MiB NumGC = 6 Before Test (N=1,EPs=20,cache=128) Alloc = 32 MiB HeapInuse = 50 MiB Sys = 987 MiB NumGC = 16 After Test (N=1,EPs=20,cache=128) Alloc = 341 MiB HeapInuse = 374 MiB Sys = 987 MiB NumGC = 38 Before Setup (N=1,EPs=20,cache=128) Alloc = 3 MiB HeapInuse = 6 MiB Sys = 987 MiB NumGC = 40 Before Test (N=1,EPs=20,cache=128) Alloc = 32 MiB HeapInuse = 56 MiB Sys = 1020 MiB NumGC = 49 After Test (N=1,EPs=20,cache=128) Alloc = 344 MiB HeapInuse = 378 MiB Sys = 1020 MiB NumGC = 71 Benchmark_perEPAllow_setPortRulesForID_large-7 1 4301736800 ns/op 3047525016 B/op 26526920 allocs/op PASS ok github.com/cilium/cilium/pkg/fqdn/dnsproxy 20.796s Fixes: #19383 Fixes: 38c0036 ("dnsproxy: Add benchmark for large FQDN-based CNPs") Signed-off-by: Odin Ugedal <ougedal@palantir.com>
[ upstream commit 6c0cc03 ] Previously, the printed memory values depended greatly on both the size of the loaded CNPs in memory, as well as when the gc ran for the last time. That favored solutions that allocated a lot of memory in the start of the test (but still inside the actual testing section), while solution allocating a lot of memory in the end, even if they removed all the referneces to it, were punished due the fact that we printed the memory stats before forcing a gc. For heap, the diff between the "Before Setup" and "After Test" should be inspected, since "Before Test" has all the CNPs in memory. When we hit "After test", those CNPs have been garbage collected. New output: root@10fc3f693c5d:/code# go test -tags privileged_tests -v -run '^$' -bench Benchmark_perEPAllow_setPortRulesForID_large -benchmem -benchtime 1x ./pkg/fqdn/dnsproxy goos: linux goarch: amd64 pkg: github.com/cilium/cilium/pkg/fqdn/dnsproxy cpu: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz Benchmark_perEPAllow_setPortRulesForID_large Before Setup (N=1,EPs=20,cache=128) Alloc = 3 MiB HeapInuse = 5 MiB Sys = 20 MiB NumGC = 6 Before Test (N=1,EPs=20,cache=128) Alloc = 32 MiB HeapInuse = 50 MiB Sys = 987 MiB NumGC = 16 After Test (N=1,EPs=20,cache=128) Alloc = 341 MiB HeapInuse = 374 MiB Sys = 987 MiB NumGC = 38 Before Setup (N=1,EPs=20,cache=128) Alloc = 3 MiB HeapInuse = 6 MiB Sys = 987 MiB NumGC = 40 Before Test (N=1,EPs=20,cache=128) Alloc = 32 MiB HeapInuse = 56 MiB Sys = 1020 MiB NumGC = 49 After Test (N=1,EPs=20,cache=128) Alloc = 344 MiB HeapInuse = 378 MiB Sys = 1020 MiB NumGC = 71 Benchmark_perEPAllow_setPortRulesForID_large-7 1 4301736800 ns/op 3047525016 B/op 26526920 allocs/op PASS ok github.com/cilium/cilium/pkg/fqdn/dnsproxy 20.796s Fixes: cilium#19383 Fixes: 38c0036 ("dnsproxy: Add benchmark for large FQDN-based CNPs") Signed-off-by: Odin Ugedal <ougedal@palantir.com> Signed-off-by: Louis DeLosSantos <louis.delos@isovalent.com>
[ upstream commit 6c0cc03 ] Previously, the printed memory values depended greatly on both the size of the loaded CNPs in memory, as well as when the gc ran for the last time. That favored solutions that allocated a lot of memory in the start of the test (but still inside the actual testing section), while solution allocating a lot of memory in the end, even if they removed all the referneces to it, were punished due the fact that we printed the memory stats before forcing a gc. For heap, the diff between the "Before Setup" and "After Test" should be inspected, since "Before Test" has all the CNPs in memory. When we hit "After test", those CNPs have been garbage collected. New output: root@10fc3f693c5d:/code# go test -tags privileged_tests -v -run '^$' -bench Benchmark_perEPAllow_setPortRulesForID_large -benchmem -benchtime 1x ./pkg/fqdn/dnsproxy goos: linux goarch: amd64 pkg: github.com/cilium/cilium/pkg/fqdn/dnsproxy cpu: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz Benchmark_perEPAllow_setPortRulesForID_large Before Setup (N=1,EPs=20,cache=128) Alloc = 3 MiB HeapInuse = 5 MiB Sys = 20 MiB NumGC = 6 Before Test (N=1,EPs=20,cache=128) Alloc = 32 MiB HeapInuse = 50 MiB Sys = 987 MiB NumGC = 16 After Test (N=1,EPs=20,cache=128) Alloc = 341 MiB HeapInuse = 374 MiB Sys = 987 MiB NumGC = 38 Before Setup (N=1,EPs=20,cache=128) Alloc = 3 MiB HeapInuse = 6 MiB Sys = 987 MiB NumGC = 40 Before Test (N=1,EPs=20,cache=128) Alloc = 32 MiB HeapInuse = 56 MiB Sys = 1020 MiB NumGC = 49 After Test (N=1,EPs=20,cache=128) Alloc = 344 MiB HeapInuse = 378 MiB Sys = 1020 MiB NumGC = 71 Benchmark_perEPAllow_setPortRulesForID_large-7 1 4301736800 ns/op 3047525016 B/op 26526920 allocs/op PASS ok github.com/cilium/cilium/pkg/fqdn/dnsproxy 20.796s Fixes: #19383 Fixes: 38c0036 ("dnsproxy: Add benchmark for large FQDN-based CNPs") Signed-off-by: Odin Ugedal <ougedal@palantir.com> Signed-off-by: Louis DeLosSantos <louis.delos@isovalent.com>
[ upstream commit 6c0cc03 ] Previously, the printed memory values depended greatly on both the size of the loaded CNPs in memory, as well as when the gc ran for the last time. That favored solutions that allocated a lot of memory in the start of the test (but still inside the actual testing section), while solution allocating a lot of memory in the end, even if they removed all the referneces to it, were punished due the fact that we printed the memory stats before forcing a gc. For heap, the diff between the "Before Setup" and "After Test" should be inspected, since "Before Test" has all the CNPs in memory. When we hit "After test", those CNPs have been garbage collected. New output: root@10fc3f693c5d:/code# go test -tags privileged_tests -v -run '^$' -bench Benchmark_perEPAllow_setPortRulesForID_large -benchmem -benchtime 1x ./pkg/fqdn/dnsproxy goos: linux goarch: amd64 pkg: github.com/cilium/cilium/pkg/fqdn/dnsproxy cpu: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz Benchmark_perEPAllow_setPortRulesForID_large Before Setup (N=1,EPs=20,cache=128) Alloc = 3 MiB HeapInuse = 5 MiB Sys = 20 MiB NumGC = 6 Before Test (N=1,EPs=20,cache=128) Alloc = 32 MiB HeapInuse = 50 MiB Sys = 987 MiB NumGC = 16 After Test (N=1,EPs=20,cache=128) Alloc = 341 MiB HeapInuse = 374 MiB Sys = 987 MiB NumGC = 38 Before Setup (N=1,EPs=20,cache=128) Alloc = 3 MiB HeapInuse = 6 MiB Sys = 987 MiB NumGC = 40 Before Test (N=1,EPs=20,cache=128) Alloc = 32 MiB HeapInuse = 56 MiB Sys = 1020 MiB NumGC = 49 After Test (N=1,EPs=20,cache=128) Alloc = 344 MiB HeapInuse = 378 MiB Sys = 1020 MiB NumGC = 71 Benchmark_perEPAllow_setPortRulesForID_large-7 1 4301736800 ns/op 3047525016 B/op 26526920 allocs/op PASS ok github.com/cilium/cilium/pkg/fqdn/dnsproxy 20.796s Fixes: cilium#19383 Fixes: 38c0036 ("dnsproxy: Add benchmark for large FQDN-based CNPs") Signed-off-by: Odin Ugedal <ougedal@palantir.com> Signed-off-by: Louis DeLosSantos <louis.delos@isovalent.com> Signed-off-by: Michi Mutsuzaki <michi@isovalent.com>
[ upstream commit 6c0cc03 ] Previously, the printed memory values depended greatly on both the size of the loaded CNPs in memory, as well as when the gc ran for the last time. That favored solutions that allocated a lot of memory in the start of the test (but still inside the actual testing section), while solution allocating a lot of memory in the end, even if they removed all the referneces to it, were punished due the fact that we printed the memory stats before forcing a gc. For heap, the diff between the "Before Setup" and "After Test" should be inspected, since "Before Test" has all the CNPs in memory. When we hit "After test", those CNPs have been garbage collected. New output: root@10fc3f693c5d:/code# go test -tags privileged_tests -v -run '^$' -bench Benchmark_perEPAllow_setPortRulesForID_large -benchmem -benchtime 1x ./pkg/fqdn/dnsproxy goos: linux goarch: amd64 pkg: github.com/cilium/cilium/pkg/fqdn/dnsproxy cpu: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz Benchmark_perEPAllow_setPortRulesForID_large Before Setup (N=1,EPs=20,cache=128) Alloc = 3 MiB HeapInuse = 5 MiB Sys = 20 MiB NumGC = 6 Before Test (N=1,EPs=20,cache=128) Alloc = 32 MiB HeapInuse = 50 MiB Sys = 987 MiB NumGC = 16 After Test (N=1,EPs=20,cache=128) Alloc = 341 MiB HeapInuse = 374 MiB Sys = 987 MiB NumGC = 38 Before Setup (N=1,EPs=20,cache=128) Alloc = 3 MiB HeapInuse = 6 MiB Sys = 987 MiB NumGC = 40 Before Test (N=1,EPs=20,cache=128) Alloc = 32 MiB HeapInuse = 56 MiB Sys = 1020 MiB NumGC = 49 After Test (N=1,EPs=20,cache=128) Alloc = 344 MiB HeapInuse = 378 MiB Sys = 1020 MiB NumGC = 71 Benchmark_perEPAllow_setPortRulesForID_large-7 1 4301736800 ns/op 3047525016 B/op 26526920 allocs/op PASS ok github.com/cilium/cilium/pkg/fqdn/dnsproxy 20.796s Fixes: cilium#19383 Fixes: 38c0036 ("dnsproxy: Add benchmark for large FQDN-based CNPs") Signed-off-by: Odin Ugedal <ougedal@palantir.com> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
Previously, the printed memory values depended greatly on both the size of the loaded CNPs in memory, as well as when the gc ran for the last time. That favored solutions that allocated a lot of memory in the start of the test (but still inside the actual testing section), while solution allocating a lot of memory in the end, even if they removed all the referneces to it, were punished due the fact that we printed the memory stats before forcing a gc. For heap, the diff between the "Before Setup" and "After Test" should be inspected, since "Before Test" has all the CNPs in memory. When we hit "After test", those CNPs have been garbage collected. New output: root@10fc3f693c5d:/code# go test -tags privileged_tests -v -run '^$' -bench Benchmark_perEPAllow_setPortRulesForID_large -benchmem -benchtime 1x ./pkg/fqdn/dnsproxy goos: linux goarch: amd64 pkg: github.com/cilium/cilium/pkg/fqdn/dnsproxy cpu: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz Benchmark_perEPAllow_setPortRulesForID_large Before Setup (N=1,EPs=20,cache=128) Alloc = 3 MiB HeapInuse = 5 MiB Sys = 20 MiB NumGC = 6 Before Test (N=1,EPs=20,cache=128) Alloc = 32 MiB HeapInuse = 50 MiB Sys = 987 MiB NumGC = 16 After Test (N=1,EPs=20,cache=128) Alloc = 341 MiB HeapInuse = 374 MiB Sys = 987 MiB NumGC = 38 Before Setup (N=1,EPs=20,cache=128) Alloc = 3 MiB HeapInuse = 6 MiB Sys = 987 MiB NumGC = 40 Before Test (N=1,EPs=20,cache=128) Alloc = 32 MiB HeapInuse = 56 MiB Sys = 1020 MiB NumGC = 49 After Test (N=1,EPs=20,cache=128) Alloc = 344 MiB HeapInuse = 378 MiB Sys = 1020 MiB NumGC = 71 Benchmark_perEPAllow_setPortRulesForID_large-7 1 4301736800 ns/op 3047525016 B/op 26526920 allocs/op PASS ok github.com/cilium/cilium/pkg/fqdn/dnsproxy 20.796s Fixes: cilium#19383 Fixes: 38c0036 ("dnsproxy: Add benchmark for large FQDN-based CNPs") Signed-off-by: Odin Ugedal <ougedal@palantir.com>
[ upstream commit 6c0cc03 ] Previously, the printed memory values depended greatly on both the size of the loaded CNPs in memory, as well as when the gc ran for the last time. That favored solutions that allocated a lot of memory in the start of the test (but still inside the actual testing section), while solution allocating a lot of memory in the end, even if they removed all the referneces to it, were punished due the fact that we printed the memory stats before forcing a gc. For heap, the diff between the "Before Setup" and "After Test" should be inspected, since "Before Test" has all the CNPs in memory. When we hit "After test", those CNPs have been garbage collected. New output: root@10fc3f693c5d:/code# go test -tags privileged_tests -v -run '^$' -bench Benchmark_perEPAllow_setPortRulesForID_large -benchmem -benchtime 1x ./pkg/fqdn/dnsproxy goos: linux goarch: amd64 pkg: github.com/cilium/cilium/pkg/fqdn/dnsproxy cpu: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz Benchmark_perEPAllow_setPortRulesForID_large Before Setup (N=1,EPs=20,cache=128) Alloc = 3 MiB HeapInuse = 5 MiB Sys = 20 MiB NumGC = 6 Before Test (N=1,EPs=20,cache=128) Alloc = 32 MiB HeapInuse = 50 MiB Sys = 987 MiB NumGC = 16 After Test (N=1,EPs=20,cache=128) Alloc = 341 MiB HeapInuse = 374 MiB Sys = 987 MiB NumGC = 38 Before Setup (N=1,EPs=20,cache=128) Alloc = 3 MiB HeapInuse = 6 MiB Sys = 987 MiB NumGC = 40 Before Test (N=1,EPs=20,cache=128) Alloc = 32 MiB HeapInuse = 56 MiB Sys = 1020 MiB NumGC = 49 After Test (N=1,EPs=20,cache=128) Alloc = 344 MiB HeapInuse = 378 MiB Sys = 1020 MiB NumGC = 71 Benchmark_perEPAllow_setPortRulesForID_large-7 1 4301736800 ns/op 3047525016 B/op 26526920 allocs/op PASS ok github.com/cilium/cilium/pkg/fqdn/dnsproxy 20.796s Fixes: #19383 Fixes: 38c0036 ("dnsproxy: Add benchmark for large FQDN-based CNPs") Signed-off-by: Odin Ugedal <ougedal@palantir.com> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
Advanced users can configure the LRU size for the cache holding the
compiled regex expressions of FQDN match{Pattern,Name}. This is useful
if users are experiencing high memory usage spikes with many FQDN
policies that have repeated matchPattern or matchName across many
different policies.
Please see second commit msg for the performance benchmark results.
Signed-off-by: Chris Tarazi chris@isovalent.com
Related: #17224