New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dnsproxy: Improve regex pattern #20246
Conversation
0830dc1
to
311ed71
Compare
311ed71
to
caed21b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! It would be great if we can get benchmarking code added to the tests. There should already be a framework that tests at a high-level inside pkg/fqdn/dnsproxy/proxy_test.go
. However, it would be good to add more piece-meal benchmarks corresponding to the code changes in this PR.
Benchmark_perEPAllow_setPortRulesForID_large
is skipped by default, but if you remove the t.Skip()
and add the CNPs as a yaml to testdata/cnps-large.yaml
and run go test, you should be able to get some results.
Hey @christarazi! Thanks! Did the initial testing on the v1.9 release branch, since thats what we are using atm., so didn't see those benchmarks. I don't have a set of policies at hand atm., I've just been creating a set of domains to push through. Do you have the CNP list you were using in that PR? Also, discussed this with a colleague, and we decided to try using a set (essentially a |
Did a quick test now with this PR vs. master, with a lru size of 128 with the same dataset as @christarazi was using with his initial testing;
We clearly see the heap usage is very much lower with this PR. Still need some tweaking tho. Will try to grab a bigger dataset and check the diff |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR, this is a great step towards a massive performance improvement in the Agent!
I understand this is a WIP :), and I have several comments about the code, but overall the idea is sound!
@odinuge I've marked the PR as draft since it's in WIP. Feel free to mark it as ready for review once it's done. Thank you! |
caed21b
to
34533b8
Compare
Thanks @aanm! And thanks @christarazi for the review. Did a overall cleanup, but as discussed AFK, we will look at #20396 first. Sneak peak of the benchmarks, where heap size goes from ~1800MiB to ~550MiB ;
With this PR;
|
ed8b2cb
to
b240780
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing I just realized is that once this lands, will it change the behavior of existing policies? Is there a potential it might break on existing matchpatterns in user environments?
/test Job 'Cilium-PR-K8s-1.26-kernel-net-next' failed: Click to show.Test Name
Failure Output
If it is a flake and a GitHub issue doesn't already exist to track it, comment |
Hi, thanks!
This should be a noop for users, since what the regex will match is the same, its just the regex pattern that is different. It should also be fully backwards compatible with older versions with the restore feature as well (unless I'm missing anything obvious..). I think the unit-tests here and here should cover all cases, and make sure that the end user behavior is the same. |
This pull request has been automatically marked as stale because it |
b240780
to
605d95b
Compare
/test |
This now wraps the overall regex in anchors and keep the distinct rules/fqdns as normal non-anchored regexes. It also removes the regex groups since we don't use them. This makes the final regex much smaller, (the heap in the benchmark is ~20% smaller), and makes it more performant when used for matching. $ benchstat <pre-this-commit> <this-commit> name old time/op new time/op delta _perEPAllow_setPortRulesForID-7 77.4s ± 8% 84.8s ±21% ~ (p=0.310 n=5+5) name old B(HeapInUse)/op new B(HeapInUse)/op delta _perEPAllow_setPortRulesForID-7 97.3M ± 1% 78.3M ± 1% -19.51% (p=0.008 n=5+5) name old alloc/op new alloc/op delta _perEPAllow_setPortRulesForID-7 78.6GB ± 0% 77.2GB ± 0% -1.86% (p=0.008 n=5+5) name old allocs/op new allocs/op delta _perEPAllow_setPortRulesForID-7 167M ± 0% 113M ± 0% -32.47% (p=0.008 n=5+5) Signed-off-by: Odin Ugedal <ougedal@palantir.com>
If there is a wildcard rule in the ruleset, return a regex matching everything early. This will reduce the size in memory, and the perf of the dnsproxy with such rules in general. $ benchstat <pre-this-commit> <this-commit> name old time/op new time/op delta _perEPAllow_setPortRulesForID-7 84.8s ±21% 83.0s ±17% ~ (p=0.690 n=5+5) name old B(HeapInUse)/op new B(HeapInUse)/op delta _perEPAllow_setPortRulesForID-7 78.3M ± 1% 74.2M ± 1% -5.27% (p=0.008 n=5+5) name old alloc/op new alloc/op delta _perEPAllow_setPortRulesForID-7 77.2GB ± 0% 73.0GB ± 0% -5.42% (p=0.008 n=5+5) name old allocs/op new allocs/op delta _perEPAllow_setPortRulesForID-7 113M ± 0% 108M ± 0% -4.28% (p=0.008 n=5+5) Signed-off-by: Odin Ugedal <ougedal@palantir.com>
If we know the regex match all input, we don't really need to run the matching. This will reduce the overall memory usage and allocations in this code path. Signed-off-by: Odin Ugedal <ougedal@palantir.com>
Also switch to %q to get quotes around it as well. Signed-off-by: Odin Ugedal <ougedal@palantir.com>
Reuse escapeRegexpCharacters func that implement the same escaping to avoid duplication. Signed-off-by: Odin Ugedal <ougedal@palantir.com>
605d95b
to
562ef20
Compare
/test |
Travis failed with #23314 |
Thanks @aanm! We deployed this to production here over the last few weeks, and on the hosts with the biggest l7 rulests and the most dns queries, including a |
TL;DR: This is now V3 of #20214.
The full list of changes can be found in the commits (tho. this is based off #21288);
.MatchString
on wildcard regexes.The overall perf change for all commits in this PR vs. #21288 (all without the lru cache, and ignore the time/op, since they are running in docker for mac 😢 ):
After this we plan doing;
GeneratePattern
map[string]struct{}
for matchNames.[.]
with\.
to reduce the regex size*
replacement from[-a-zA-Z0-9_]*
to[^.]*
--
description and a
Fixes: #XXX
line if the commit addresses a particularGitHub issue.
Fixes: #issue-number