Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't expand CIDR labels, match smartly in Labels instead #30897

Merged
merged 6 commits into from
Apr 3, 2024

Conversation

squeed
Copy link
Contributor

@squeed squeed commented Feb 21, 2024

CIDR labels are expensive. Very expensive. For an IPv6 prefix, GetCidrLabels() can take over 2 KiB of memory.

This change refactors the label internals so that CIDR labels match when the cidr is larger as well. Then, we no longer have to expand CIDR labels at all; the matching logic handles it correctly.

Note to reviewers: Please review by commit; earlier commits add some test invariants. Another commit lightly refactors the label matching API to try and bring some sanity. Only the final two commits change any logic.

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Feb 21, 2024
Copy link
Member

@joestringer joestringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In principle this seems like a really neat win. I can't currently think of any issues with encoding this into the selectors to identify labels with prefixes that are more specific than the selector.

I didn't review in deep detail beyond the second patch, primarily just for time and because I think you were interested in high level feedback at this time. Perhaps another @cilium/sig-policy member could do further indepth review or I can reschedule a subsequent follow-up review later on.

pkg/policy/selectorcache_selector.go Outdated Show resolved Hide resolved
pkg/policy/selectorcache_selector.go Outdated Show resolved Hide resolved
pkg/labels/array.go Show resolved Hide resolved
pkg/labels/labels.go Outdated Show resolved Hide resolved
@squeed
Copy link
Contributor Author

squeed commented Feb 22, 2024

Probably the biggest issue this change "tickles" is the inconsistency around "directionality" in the Label / Labels / LabelArray APIs. This is because label equality, as written, is non-commutative.

a := ParseLabel("any:foo=bar")
b := ParseLabel("k8s:foo=bar")

// What is this, Javascript?
a.Equals(b) // == true
b.Equals(a) // == false
a == b // false

Oh, and Labels.Equals() doesn't use Label.Equals():

aa := labels.FromSlice({a}); bb := labels.FromSlice({b})
aa.Equals(bb) // == false
bb.Equals(aa) // == false

But LabelArray.Equals() does:

aa.LabelArray().Equals(bb.LabelArray()) // true
bb.LabelArray().Equals(aa.LabelArray()) // false

For more fun, LabelArray.Has() and Labels.Has() are also inverted in direction:

aa.LabelArray().Has(b) // false
aa.Has(b) // true

My eventual goal with all of this:

  • .Equals() is always strict and commutative, for Label, Labels, and LabelArray. In fact, we should get rid of Label.Equals()
  • ["1.1.1.1/32"].Has("1.1.1.0/24") == true for Labels and LabelArray
  • ["k8s:foo=bar"].Has("any:foo=bar") == true for Labels and LabelArray
  • Contains() and Lacks() are consistent with Has() (I believe they are, but this needs to be tested)
  • Nothing breaks 😬

So, the first step is a consistent .Matches() function.

@squeed
Copy link
Contributor Author

squeed commented Feb 22, 2024

One thing this change relies on is the current limitation that CIDR labels don't have a value. We might fail the case for

["cidr:1.1.1.0/24=foo", "cidr:1.1.1.1/32=bar"].Get("cidr:1.1.1.1/32")
["cidr:1.1.1.0/24=foo", "cidr:1.1.1.1/32=bar"].Get("cidr:1.1.1.0/31")

@pippolo84
Copy link
Member

I second what Joe said regarding this being a neat win. I remember thinking about something similar while working on #28788 but got overwhelmed by the potential impact of such change.

As you said, the CIDR labels are way too expensive, and in the IPv6 case, this also limits the effectiveness of the CIDR labels cache introduced in #28788, as it is not possible to get a meaningful hit ratio without a very large memory footprint.
Besides, this seems a lot easier to reason about, compared with the memoization approach of the CIDR cache.

My only concern is the current inconsistencies with the API you already highlighted, but the plan you depicted here seems sound and might also give us a chance to clean this up, besides improving performance. 👍

@squeed squeed added the release-note/minor This PR changes functionality that users may find relevant to operating Cilium. label Feb 26, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Feb 26, 2024
@squeed squeed added release-note/misc This PR makes changes that have no direct user impact. kind/performance There is a performance impact of this. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies. and removed release-note/minor This PR changes functionality that users may find relevant to operating Cilium. labels Feb 26, 2024
@squeed squeed marked this pull request as ready for review February 26, 2024 15:57
@squeed squeed requested review from a team as code owners February 26, 2024 15:57
@squeed
Copy link
Contributor Author

squeed commented Feb 26, 2024

/test

@squeed
Copy link
Contributor Author

squeed commented Feb 27, 2024

/test

pkg/labels/array.go Outdated Show resolved Hide resolved
Copy link
Member

@nathanjsweet nathanjsweet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very impressive! Thanks! Two questions.

pkg/labels/labels.go Outdated Show resolved Hide resolved
pkg/labels/labels.go Show resolved Hide resolved
pkg/labels/labels.go Show resolved Hide resolved
@nathanjsweet nathanjsweet added affects/v1.13 This issue affects v1.13 branch affects/v1.14 This issue affects v1.14 branch affects/v1.15 This issue affects v1.15 branch labels Feb 27, 2024
pkg/labels/array.go Show resolved Hide resolved
pkg/labels/array.go Outdated Show resolved Hide resolved
pkg/labels/cidr.go Outdated Show resolved Hide resolved
pkg/labels/labels.go Show resolved Hide resolved
pkg/labels/cidr_test.go Outdated Show resolved Hide resolved
@squeed squeed added this pull request to the merge queue Apr 1, 2024
@squeed squeed removed this pull request from the merge queue due to a manual request Apr 1, 2024
Copy link
Member

@christarazi christarazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes LGTM and I can't think of any adverse effects from the PR. I am curious though if we have an understanding of what the actual performance impact is with this change. AFAIU, GetCIDRLabels consumed a lot of CPU and memory to generate CIDR labels constantly when the DNS proxy sees many DNS requests, but how much has this PR really changed? Have we done a pprof comparison, for example? I don't doubt that this PR improved things, but it's also useful to quantify how much has changed and worst case, potentially point out a flaw that we didn't detect previously.

pkg/labels/cidr.go Show resolved Hide resolved
@maintainer-s-little-helper maintainer-s-little-helper bot added ready-to-merge This PR has passed all tests and received consensus from code owners to merge. labels Apr 1, 2024
@christarazi christarazi removed the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Apr 1, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Apr 2, 2024
@squeed squeed added dont-merge/discussion A discussion is ongoing and should be resolved before merging, regardless of reviews & tests status. and removed ready-to-merge This PR has passed all tests and received consensus from code owners to merge. labels Apr 2, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Apr 2, 2024
We actually rely quite heavliy on the LabelSourceAny mechanism.
EndpointSelectors in CiliumNetworkPolicies always have LabelSourceAny
added.

For example, the block
```yaml
toEndpoints:
  - matchLabels:
      io.kubernetes.pod.namespace: kube-system
      k8s-app: kube-dns
```

converts to the label selector

`{any.io.kubernetes.pod.namespace: kube-system,any.k8s-app: kube-dns,}`

So, explicitly mention this in comments and update the SelectorCache
tests to capture this behavior.

Signed-off-by: Casey Callendrello <cdc@isovalent.com>
Adding some invariants that should not be broken during coming
refactors.

Signed-off-by: Casey Callendrello <cdc@isovalent.com>
This tests adds a very specific invariant that is needed by the policy
engine. Specifically, the expanded set of CIDR labels must always
`.Has()` a CIDR label that contains it.

This will be relevant when we stop expanding CIDR labels and, instead,
logically compute CIDR matching.

Signed-off-by: Casey Callendrello <cdc@isovalent.com>
The label matching API is complicated and inconsistent. This change
tries to bring some sanity to the API going forward, without changing
existing behavior.

Label matching is directional / non-communtative. Specifically,
`"any:foo=bar".Equals("k8s:foo=bar")` is true, whereas
`"k8s:foo=bar".Equals("any:foo=bar")` is false.

So, with the eventual goal of removing `Label.Equals()`, this commit
adds a new `Label.Has()` and `Label.HasKey()` api, with clear
documentation around directionality.

The fixed point here is `LabelArray.Has()`, which needs a specific
directionality as required by the k8s label selector library. Everything
else is based off of that.

This also changes `Labels.Has()` to match directionality w.r.t
`any`-source selectors. In theory this is a breaking change, in
actuality `Labels.Has()` is never passed `any` selectors, so this is
moot.

Signed-off-by: Casey Callendrello <cdc@isovalent.com>
This changes the Labels API to be CIDR-aware. It then logically
"expands" CIDR labels when computing matches, so that selectors can
match CIDRs even when not present. It does this by parsing CIDRs on
label creation, then checking CIDR overlap in the `MatchesKey()`
function.

The API contract we expose to the policy engine is unchanged:
```
GetCIDRLabels("10.0.0.0/24").LabelArray().Has("cidr.10.0.0.0/8") == true
```

The goal is to stop manually expanding CIDR labels, which is very
inefficient. This will follow in a subsequent commit.

Signed-off-by: Casey Callendrello <cdc@isovalent.com>
Previously, we would expand a CIDR in to the full set of possible CIDRs
that could select it. For example "1.1.1.1/32" would be expanded in to
[0.0.0.0/0, 0.0.0.0/1, ... 1.1.1.0/31, 1.1.1.0/32]. This causes
significant memory and CPUusage, especially for circumstances such as ToFQDN
policies where many /32 and /128 identities are created.

Now that CIDR selectors are prefix-aware, rather than just string
matches, we can stop generating the complete list of CIDRs. This is safe
because CIDRs labels now select CIDRs that are contained within.

Benchmark results:

                     │ ../bench_main.out │         ../bench_cidr.out
                     │      sec/op       │   sec/op    vs base
UpdateGenerateDNS-12          4.972 ± 2%   2.882 ± 3%  -42.02% (p=0.000 n=10)

                     │ ../bench_main.out │          ../bench_cidr.out
                     │       B/op        │     B/op      vs base
UpdateGenerateDNS-12        77.26Mi ± 0%   24.52Mi ± 0%  -68.26% (p=0.000 n=10)

                     │ ../bench_main.out │          ../bench_cidr.out
                     │     allocs/op     │  allocs/op   vs base
UpdateGenerateDNS-12         508.0k ± 0%   291.7k ± 0%  -42.59% (p=0.000 n

Signed-off-by: Casey Callendrello <cdc@isovalent.com>
@squeed squeed removed dont-merge/discussion A discussion is ongoing and should be resolved before merging, regardless of reviews & tests status. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. labels Apr 2, 2024
@squeed
Copy link
Contributor Author

squeed commented Apr 2, 2024

/test

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Apr 3, 2024
@squeed squeed added this pull request to the merge queue Apr 3, 2024
Merged via the queue into cilium:main with commit 24dd20d Apr 3, 2024
62 checks passed
@squeed squeed deleted the cidr-label-cover branch April 3, 2024 09:39
@squeed squeed removed the affects/v1.13 This issue affects v1.13 branch label Apr 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects/v1.14 This issue affects v1.14 branch affects/v1.15 This issue affects v1.15 branch kind/performance There is a performance impact of this. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/misc This PR makes changes that have no direct user impact. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants