repository: shard rules by namespace #27163

odinuge · 2023-07-31T16:42:20Z

This central ruleSlice has for us been one of the main bottle necks for throughput, and one of the main contributors to cpu usage. With this patch, we will now shard it by namespace transparently, so that we don't need to evaluate rules unnecessary. The main benefit is that when calculating rules for pod X in namespace A, we don't need to evaluate rules only applying to pods in namespace B. The same applies to the rule add and remove; when a rule is removed in namespace B, we don't need to evaluate rules in namespace A.

This is all transparent to users, and an implementation detail, so the sharding mechanism can easily be changed at a later stage.

TBD: Still some stuff missing here, so mostly a proof-of-concept.

Based on the BenchmarkParseLabel with 10 namespaces/shards, this is the benchstat diff;

name          old time/op    new time/op    delta
ParseLabel-7     5.21s ±33%     1.02s ±40%  -80.36%  (p=0.000 n=10+10)

name          old alloc/op   new alloc/op   delta
ParseLabel-7    26.9MB ± 0%    26.9MB ± 0%   -0.33%  (p=0.001 n=9+10)

name          old allocs/op  new allocs/op  delta
ParseLabel-7      520k ± 0%      520k ± 0%   +0.02%  (p=0.000 n=9+9)

The more rules, the higher bigger the gain will be! We also see that the locking (*Repository) getMatchingRules is also a huge chuck of cpu time, as well as a lot of lock contention.

This central ruleSlice has for us been one of the main bottle necks for throughput, and one of the main contributors to cpu usage. With this patch, we will now shard it by namespace transparently, so that we don't need to evaluate rules unnecessary. The main benefit is that when calculating rules for pod X in namespace A, we don't need to evaluate rules only applying to pods in namespace B. The same applies to the rule add and remove; when a rule is removed in namespace B, we don't need to evaluate rules in namespace A. This is all transparent to users, and an implementation detail, so the sharding mechanism can easily be changed at a later stage. Based on the BenchmarkParseLabel with 10 namespaces/shards, this is the benchstat diff; name old time/op new time/op delta ParseLabel-7 5.21s ±33% 1.02s ±40% -80.36% (p=0.000 n=10+10) name old alloc/op new alloc/op delta ParseLabel-7 26.9MB ± 0% 26.9MB ± 0% -0.33% (p=0.001 n=9+10) name old allocs/op new allocs/op delta ParseLabel-7 520k ± 0% 520k ± 0% +0.02% (p=0.000 n=9+9) Signed-off-by: Odin Ugedal <ougedal@palantir.com>

odinuge · 2023-07-31T17:23:35Z

cc @christarazi.

I'll do some more testing, but would be nice with some feedback on the overall idea. I think this should reduce our cilium-agent cpu load quite significantly, and it should also reduce the lock contention quite significantly as well.

maintainer-s-little-helper · 2023-07-31T19:32:39Z

Commit c9919e9f7ae70eb5fbb79abddd5e407e50f26ce1 does not contain "Signed-off-by".

Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin

Signed-off-by: Odin Ugedal <ougedal@palantir.com>

maintainer-s-little-helper · 2023-07-31T19:33:31Z

Commit c9919e9f7ae70eb5fbb79abddd5e407e50f26ce1 does not contain "Signed-off-by".

Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin

christarazi

Thanks for the PR, this is a great idea!

A few comments on initial approach.

christarazi · 2023-07-31T17:50:32Z

pkg/policy/repository.go

@@ -139,6 +141,18 @@ type Repository struct {
 	getEnvoyHTTPRules func(certificatemanager.SecretManager, *api.L7Rules, string) (*cilium.HttpNetworkPolicyRules, bool)
 }

+// getShardKey returns the key to be used for the sharing of Repository.rulesByShard.


Suggested change

// getShardKey returns the key to be used for the sharing of Repository.rulesByShard.

// getShardKey returns the key to be used for the sharding of Repository.rulesByShard.

Suble typo or did you mean sharing? If the latter, then somehow the sentence could be clarified as it will potentially confuse other readers if said typo was made. :)

Heh, yeah, sharding it should be!

christarazi · 2023-07-31T19:37:30Z

pkg/policy/repository.go

+func getShardKey(lbls labels.LabelArray) (string, bool) {
+	for _, lbl := range lbls {
+		if lbl.Key == k8sConst.PolicyLabelNamespace && lbl.Source == labels.LabelSourceK8s {
+			return lbl.Value, true


AFAICT, this returns the label value, but the function is called getShardKey. So the result returned from

k8s:foo=bar k8s:baz=net

these labels would be bar and net, depending on the order of the iteration.

Is the the expected behavior? Wouldn't this cause a problem for [1]?

Good point!

I think it should be fine given it doesn't really make sense having more than one label with k8s:io.cilium.k8s.policy.namespace=<xyz>. It is set based on the namespace here;

cilium/pkg/k8s/apis/cilium.io/utils/utils.go

Lines 345 to 346 in 6ba173d

policyLbls := GetPolicyLabels(namespace, name, uid, resourceType)

return append(policyLbls, ruleLbs...).Sort()

(and the same for the k8s version of NetworkPolicy). For CCNP its not present, and we can fallback to "".

But maybe the users can manually set the spec.Labels field manually with same source and key, resulting in two of them. I'll look at that case, and I guess we have to take a look at it and handle it properly if thats the case. 😄 The same applies to non-namespaced CCNPs then, if users can manually add a label to those "tricking" us into thinking its a namespaced policy while its not. This should not as I understand it affect what the policy applies to, just how its stored in this policy "repo".

Ah yeah it's a strange edge case. Maybe we can just document that if there are ever >1 namespace label, then the one that appears first will be used.

If Cilium prepends the namespace label and we use the first one, then there may be no need to document it at all?

On one hand the threat actor who could set these labels is a fairly privileged Kubernetes API server Attacker and if they could customize the labels of the policy then they already have significant privileges to allow actual traffic. But on the other hand if they only have access to policies in one namespace, they shouldn't be able to "poison" policy shards for other namespaces, even if the result is only some form of additional policy cost (a cost which is currently borne by all users in all namespaces in current versions of Cilium)

christarazi · 2023-07-31T19:39:18Z

pkg/policy/repository.go

+	key, ok := getShardKey(lbls)
+	relevantShards := p.rulesByShard
+	if ok {
+		relevantShards = map[string]ruleSlice{
+			key: p.rulesByShard[key],
+		}
+	}


pkg/policy/repository.go

joestringer

Neat proposal! Couple of brief comments from initial look.

Also, the danger here would be if there could be a bug for some corner case of rule format that somehow ends up not getting selected by the rule engine and hence the policy doesn't apply. I don't have a good grasp on the corner cases but it'd be nice to see some testing to evaluate those cases. One basic one would be to continue to ensure that CCNPs that select all namespaces will apply to all Pods.

pkg/policy/repository.go

joestringer · 2023-07-31T19:55:14Z

pkg/policy/repository.go

@@ -109,7 +109,9 @@ func (p *policyContext) SetDeny(deny bool) bool {
 type Repository struct {
 	// Mutex protects the whole policy tree
 	Mutex lock.RWMutex
-	rules ruleSlice
+	// Internally we shard the rule list to improve performance and maximize throughput


I think that this is specifically targeted for policy update performance, is that correct? Also, what's the difference between improving update performance vs. maximizing throughput? Aren't they the same thing?

Suggested change

// Internally we shard the rule list to improve performance and maximize throughput

// Internally we shard the rule list to improve update performance

Ah, based on the changes in getMatchingRules() it looks like this should also improve policy evaluation performance by excluding rule iteration for namespaces that are known to be different from the "current" namespace under iteration.

So maybe 'update and evaluation performance'?

Yeah, this stated as an experiment where I wanted to improve the policy addition/update/deletion flow, but yeah, it will also reduce the policy evaluation as well! I'll try updating the description when we all agree on the approach and implementation. 😄

odinuge · 2023-08-03T10:47:37Z

Neat proposal! Couple of brief comments from initial look.

Also, the danger here would be if there could be a bug for some corner case of rule format that somehow ends up not getting selected by the rule engine and hence the policy doesn't apply. I don't have a good grasp on the corner cases but it'd be nice to see some testing to evaluate those cases. One basic one would be to continue to ensure that CCNPs that select all namespaces will apply to all Pods.

Thanks for looking! And thanks for the initial feedback (both @joestringer and @christarazi)!

I don't have a good grasp on the corner cases but it'd be nice to see some testing to evaluate those cases.

Yeah, this is my main concern now; so I really want to poke at all those. My main concern is related to #27163 (comment) where users can add extra "namespace" labels to the spec.labels part of a CNP or CCNP, making it harder to reason about how those work. I'll push a new version now, but this is pretty far from production ready.

TODO: will squash when time comes. Signed-off-by: Odin Ugedal <ougedal@palantir.com>

joestringer · 2023-08-03T14:31:23Z

SGTM. Also when we've narrowed down on the user value it'll be worth updating the release note to say something about improving policy processing performance.

schlosna · 2023-08-06T15:14:48Z

pkg/policy/repository.go

+	key := ""
+	matched := false
+	for _, lbl := range lbls {
+		if lbl.Key == k8sConst.PolicyLabelNamespace && lbl.Source == labels.LabelSourceK8s {
+			// If we find two, ignore then and put them into the non-namespaced for easier housekeeping
+			if matched {
+				return "", false
+			}
+			key = lbl.Value
+			matched = true
+		}
+	}
+	return key, matched


The logic in getNamespaceFromLabels and getShardKey seems duplicative, might be worth implementing one in terms of the other.

Suggested change

key := ""

matched := false

for _, lbl := range lbls {

if lbl.Key == k8sConst.PolicyLabelNamespace && lbl.Source == labels.LabelSourceK8s {

// If we find two, ignore then and put them into the non-namespaced for easier housekeeping

if matched {

return "", false

}

key = lbl.Value

matched = true

}

}

return key, matched

if key := getNamespaceFromLabels(lbls); key != "" {

return key, true

}

return "", false

Sadly those two functions use different labels; since there is one label that is present in endpoint selectors (io.kubernetes.pod.namespace), while there is another one in the rule.Labels field (io.cilium.k8s.policy.namespace).

I was thinking we could reuse the k8s label search logic by adjusting the function definition for reuse. The refactor I had in mind is a bit more than fits in github suggestion, so I created odinuge#94

github-actions · 2023-09-16T01:44:01Z

This pull request has been automatically marked as stale because it
has not had recent activity. It will be closed if no further activity
occurs. Thank you for your contributions.

schlosna · 2023-09-16T13:53:37Z

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

I would still like to see this PR move forward

github-actions · 2024-01-14T01:50:21Z

This pull request has been automatically marked as stale because it
has not had recent activity. It will be closed if no further activity
occurs. Thank you for your contributions.

odinuge · 2024-01-19T15:05:11Z

Hey! Yes, will rebase and fix this as soon as we get #30338 sorted. :)

github-actions · 2024-02-19T01:45:04Z

This pull request has been automatically marked as stale because it
has not had recent activity. It will be closed if no further activity
occurs. Thank you for your contributions.

github-actions · 2024-03-28T01:44:51Z

This pull request has been automatically marked as stale because it
has not had recent activity. It will be closed if no further activity
occurs. Thank you for your contributions.

github-actions · 2024-05-02T01:45:34Z

This pull request has been automatically marked as stale because it
has not had recent activity. It will be closed if no further activity
occurs. Thank you for your contributions.

squeed · 2024-05-17T09:13:52Z

@odinuge I'm picking this up to try and get it merged for v1.16.

odinuge · 2024-05-17T09:36:04Z

Hey @squeed! Happy to push this myself. However we need a proper solution to #30338 first, and we need a lot of tests for that to avoid security regressions. If you want to start with that first that would be very appreciated!

odinuge · 2024-05-17T09:37:49Z

The easy alternative is reverting 4cb0940, but not too sure how happy people would be with that. Imo. we should not introduce more "indexes" here until we are 100% the current state handles all the edge cases, with tests ensuring that is the case.

squeed · 2024-05-17T10:50:52Z

we should not introduce more "indexes"

This seems like a good chance to convert the PolicyRepository to StateDB over the coming months.

joestringer · 2024-05-17T16:21:01Z

When converting more critical components over to StateDB, we'll need to exercise a lot of care and constraint. I'm open to the idea, but we should be rolling StateDB out in incremental manners through the tree that start non-critical components and move towards critical components, so we can ensure correctness and properly understand the performance characteristics.

squeed · 2024-05-17T19:19:43Z

Also, @odinuge, it looks like what this PR does is improve the time to insert / replace rules in the repository, and doesn't particularly improve endpoint validation. Which is fine, but the description is misleading ;-).

From your tests, which is more critical? Rule addition / deletion time, or endpoint policy calculation?

squeed · 2024-05-24T09:58:55Z

I filed #32703, which was inspired by this but takes a different tack.

squeed · 2024-05-31T12:16:21Z

#32703 should have removed this bottleneck, I believe this can be closed, @odinuge.

github-actions · 2024-07-01T01:54:58Z

This pull request has been automatically marked as stale because it
has not had recent activity. It will be closed if no further activity
occurs. Thank you for your contributions.

maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Jul 31, 2023

odinuge force-pushed the sharded-repo branch from 730df3f to 660dd8c Compare July 31, 2023 17:17

maintainer-s-little-helper bot added the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Jul 31, 2023

Properly plumb in namespace

ca65ce9

Signed-off-by: Odin Ugedal <ougedal@palantir.com>

odinuge force-pushed the sharded-repo branch from c9919e9 to ca65ce9 Compare July 31, 2023 19:33

christarazi reviewed Jul 31, 2023

View reviewed changes

odinuge commented Jul 31, 2023

View reviewed changes

pkg/policy/repository.go Outdated Show resolved Hide resolved

joestringer reviewed Jul 31, 2023

View reviewed changes

nathanjsweet self-requested a review August 1, 2023 02:02

learnitall self-requested a review August 1, 2023 13:50

Some more changes to the sharding mechanism

36cae95

TODO: will squash when time comes. Signed-off-by: Odin Ugedal <ougedal@palantir.com>

maintainer-s-little-helper bot removed the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Aug 3, 2023

joestringer added the release-note/minor This PR changes functionality that users may find relevant to operating Cilium. label Aug 3, 2023

maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Aug 3, 2023

schlosna reviewed Aug 6, 2023

View reviewed changes

schlosna mentioned this pull request Aug 16, 2023

[pkg/policy] Refactor namespace search to findK8sLabel odinuge/cilium#94

Open

github-actions bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Sep 16, 2023

github-actions bot removed the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Sep 17, 2023

christarazi mentioned this pull request Nov 21, 2023

FQDN: transition to asynchronous IPCache APIs #29036

Merged

3 tasks

christarazi removed the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Nov 21, 2023

aanm added dont-merge/blocked Another PR must be merged before this one. dont-merge/wait-until-release Freeze window for current release is blocking non-bugfix PRs and removed dont-merge/blocked Another PR must be merged before this one. labels Dec 4, 2023

joestringer removed the dont-merge/wait-until-release Freeze window for current release is blocking non-bugfix PRs label Dec 15, 2023

github-actions bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Jan 14, 2024

christarazi removed the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Jan 17, 2024

github-actions bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Feb 19, 2024

christarazi removed the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Feb 26, 2024

github-actions bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Mar 28, 2024

christarazi removed the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Apr 1, 2024

github-actions bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label May 2, 2024

christarazi removed the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label May 2, 2024

squeed mentioned this pull request May 24, 2024

PolicyRepository: index and replace rules by resource. #32703

Merged

github-actions bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Jul 1, 2024

	// getShardKey returns the key to be used for the sharing of Repository.rulesByShard.
	// getShardKey returns the key to be used for the sharding of Repository.rulesByShard.

	policyLbls := GetPolicyLabels(namespace, name, uid, resourceType)
	return append(policyLbls, ruleLbs...).Sort()

	// Internally we shard the rule list to improve performance and maximize throughput
	// Internally we shard the rule list to improve update performance

repository: shard rules by namespace #27163

Are you sure you want to change the base?

repository: shard rules by namespace #27163

Conversation

odinuge commented Jul 31, 2023 • edited Loading

odinuge commented Jul 31, 2023

maintainer-s-little-helper bot commented Jul 31, 2023

maintainer-s-little-helper bot commented Jul 31, 2023

christarazi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joestringer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

odinuge commented Aug 3, 2023

joestringer commented Aug 3, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Sep 16, 2023

schlosna commented Sep 16, 2023

github-actions bot commented Jan 14, 2024

odinuge commented Jan 19, 2024

github-actions bot commented Feb 19, 2024

github-actions bot commented Mar 28, 2024

github-actions bot commented May 2, 2024

squeed commented May 17, 2024

odinuge commented May 17, 2024

odinuge commented May 17, 2024

squeed commented May 17, 2024

joestringer commented May 17, 2024

squeed commented May 17, 2024

squeed commented May 24, 2024

squeed commented May 31, 2024

github-actions bot commented Jul 1, 2024

odinuge commented Jul 31, 2023 •

edited

Loading