Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

labels/cidr: Memoize labels for already seen prefixes #28465

Merged

Conversation

pippolo84
Copy link
Member

@pippolo84 pippolo84 commented Oct 9, 2023

GetCIDRLabels turns a CIDR into a set of labels representing the cidr itself and all broader CIDRS which include the specified CIDR in them.

In case of ToFQDNs policies matching many names that maps to many IPs with very short TTLs, this function can easily become a hot path.

To improve performance, the PR adds a LRU cache to memoize intermediate results.

See individual commit messages for further details.

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Oct 9, 2023
@pippolo84 pippolo84 added area/daemon Impacts operation of the Cilium daemon. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies. release-note/misc This PR makes changes that have no direct user impact. labels Oct 9, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot removed dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels Oct 9, 2023
@pippolo84 pippolo84 added kind/performance There is a performance impact of this. dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels Oct 9, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Oct 9, 2023
@pippolo84

This comment was marked as outdated.

@pippolo84 pippolo84 force-pushed the pr/pippolo84/cidr-labels-memoization branch 3 times, most recently from 892ef82 to c9d3119 Compare October 9, 2023 09:59
@pippolo84

This comment was marked as outdated.

@pippolo84 pippolo84 force-pushed the pr/pippolo84/cidr-labels-memoization branch 7 times, most recently from 70f8516 to a424d09 Compare October 14, 2023 14:51
@pippolo84 pippolo84 marked this pull request as ready for review October 16, 2023 09:32
@pippolo84 pippolo84 requested review from a team as code owners October 16, 2023 09:32
@pippolo84 pippolo84 force-pushed the pr/pippolo84/cidr-labels-memoization branch 2 times, most recently from bcfd0a0 to 0b5c1d2 Compare October 16, 2023 09:36
@joestringer joestringer merged commit 85aef68 into cilium:main Oct 17, 2023
61 of 62 checks passed
@odinuge
Copy link
Member

odinuge commented Oct 24, 2023

Hi!

It does seem like this introduces some strange behavior. My colleague @sjdot discovered that this change results in excessive numbers of false positive policy drops, for legit traffic that should not be dropped.

By adding this test

diff --git a/pkg/labels/cidr/cidr_test.go b/pkg/labels/cidr/cidr_test.go
index 8ee0dbd605..9a6dd722c4 100644
--- a/pkg/labels/cidr/cidr_test.go
+++ b/pkg/labels/cidr/cidr_test.go
@@ -4,6 +4,7 @@
 package cidr

 import (
+       "fmt"
        "net/netip"
        "testing"

@@ -309,6 +310,18 @@ func BenchmarkLabels_SortedListCIDRIDs(b *testing.B) {
        }
 }

+func TestThisThing(t *testing.T) {
+       for _, ip := range []string{
+               "10.0.0.0/1",
+               "10.0.0.0/3",
+       } {
+               t.Run(ip, func(b *testing.T) {
+                       lbls := GetCIDRLabels(netip.MustParsePrefix(ip))
+                       fmt.Printf("%q: %+v\n", netip.MustParsePrefix(ip), lbls.LabelArray())
+               })
+       }
+}
+
 func BenchmarkIPStringToLabel(b *testing.B) {
        for _, ip := range []string{
                "0.0.0.0/0",

, we get the following output;

=== RUN   TestThisThing
=== RUN   TestThisThing/10.0.0.0/1
"10.0.0.0/1": [cidr:0.0.0.0/0 cidr:0.0.0.0/1 reserved:world-ipv4]
=== RUN   TestThisThing/10.0.0.0/3
"10.0.0.0/3": [cidr:0.0.0.0/0 cidr:0.0.0.0/1 reserved:world-ipv4]
--- PASS: TestThisThing (0.00s)
  --- PASS: TestThisThing/10.0.0.0/1 (0.00s)
  --- PASS: TestThisThing/10.0.0.0/3 (0.00s)
PASS

When disabling the cache, we instead get this that make more sense.

=== RUN   TestThisThing
=== RUN   TestThisThing/10.0.0.0/1
"10.0.0.0/1": [cidr:0.0.0.0/0 cidr:0.0.0.0/1 reserved:world-ipv4]
=== RUN   TestThisThing/10.0.0.0/3
"10.0.0.0/3": [cidr:0.0.0.0/0 cidr:0.0.0.0/1 cidr:0.0.0.0/2 cidr:0.0.0.0/3 reserved:world-ipv4]
--- PASS: TestThisThing (0.00s)
    --- PASS: TestThisThing/10.0.0.0/1 (0.00s)
    --- PASS: TestThisThing/10.0.0.0/3 (0.00s)
PASS

So it overall seems like there is some issue in the way things are cached here, or is this expected..? @pippolo84 @joamaki

@pippolo84
Copy link
Member Author

pippolo84 commented Oct 24, 2023

Hi @odinuge , thank you for bringing this up. 🙏
I confirm this was not expected. We are working on a fix, together with better unit testing.

runtime.ReadMemStats(&m2)

usage := m2.HeapAlloc - m1.HeapAlloc
b.Logf("Memoization map heap usage: %.2f KiB", float64(usage)/1024)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unless you were aware, you can also plumb this into the benchmark result to make it easier to use things like benchstat to compare it; https://github.com/cilium/cilium/pull/21895/files#diff-a5084c37ba77e5fe28cbc8d15ffe1e5c312e2daf79c8a363ec52df758818d26dR1091

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the heads up, but I think that in this case it is better to switch to a simple test instead of a benchmark. That's because we want to measure the memory usage when the cache is full and to do that with the benchmark we would need to either:

  • pass benchtime=1x to complete only a single pass
  • reset the cache at each pass as part of the code to benchmark (but this resulted in longer benchmark run, probably for the time needed to stabilize the time measure)

@pippolo84 pippolo84 added release-blocker/1.12 This issue will prevent the release of the next version of Cilium. needs-backport/1.12 release-blocker/1.13 This issue will prevent the release of the next version of Cilium. needs-backport/1.13 This PR / issue needs backporting to the v1.13 branch release-blocker/1.14 This issue will prevent the release of the next version of Cilium. needs-backport/1.14 This PR / issue needs backporting to the v1.14 branch labels Oct 30, 2023
@pippolo84 pippolo84 mentioned this pull request Oct 30, 2023
9 tasks
@pippolo84 pippolo84 added backport-pending/1.14 The backport for Cilium 1.14.x for this PR is in progress. and removed needs-backport/1.14 This PR / issue needs backporting to the v1.14 branch labels Oct 30, 2023
@pippolo84 pippolo84 mentioned this pull request Oct 30, 2023
6 tasks
@pippolo84 pippolo84 added backport-pending/1.13 The backport for Cilium 1.13.x for this PR is in progress. and removed needs-backport/1.13 This PR / issue needs backporting to the v1.13 branch labels Oct 30, 2023
@pippolo84 pippolo84 mentioned this pull request Oct 31, 2023
4 tasks
@pippolo84 pippolo84 added backport-pending/1.12 backport-done/1.12 The backport for Cilium 1.12.x for this PR is done. backport-done/1.13 The backport for Cilium 1.13.x for this PR is done. and removed needs-backport/1.12 backport-pending/1.13 The backport for Cilium 1.13.x for this PR is in progress. labels Oct 31, 2023
@jibi jibi added backport-done/1.14 The backport for Cilium 1.14.x for this PR is done. and removed backport-pending/1.14 The backport for Cilium 1.14.x for this PR is in progress. labels Nov 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/daemon Impacts operation of the Cilium daemon. area/fqdn Affects the FQDN policies feature backport-done/1.12 The backport for Cilium 1.12.x for this PR is done. backport-done/1.13 The backport for Cilium 1.13.x for this PR is done. backport-done/1.14 The backport for Cilium 1.14.x for this PR is done. kind/performance There is a performance impact of this. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-blocker/1.12 This issue will prevent the release of the next version of Cilium. release-blocker/1.13 This issue will prevent the release of the next version of Cilium. release-blocker/1.14 This issue will prevent the release of the next version of Cilium. release-note/misc This PR makes changes that have no direct user impact.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants