Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

repository: shard rules by namespace #27163

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from
Draft

Conversation

odinuge
Copy link
Member

@odinuge odinuge commented Jul 31, 2023

This central ruleSlice has for us been one of the main bottle necks for throughput, and one of the main contributors to cpu usage. With this patch, we will now shard it by namespace transparently, so that we don't need to evaluate rules unnecessary. The main benefit is that when calculating rules for pod X in namespace A, we don't need to evaluate rules only applying to pods in namespace B. The same applies to the rule add and remove; when a rule is removed in namespace B, we don't need to evaluate rules in namespace A.

This is all transparent to users, and an implementation detail, so the sharding mechanism can easily be changed at a later stage.

TBD: Still some stuff missing here, so mostly a proof-of-concept.

Based on the BenchmarkParseLabel with 10 namespaces/shards, this is the benchstat diff;

name          old time/op    new time/op    delta
ParseLabel-7     5.21s ±33%     1.02s ±40%  -80.36%  (p=0.000 n=10+10)

name          old alloc/op   new alloc/op   delta
ParseLabel-7    26.9MB ± 0%    26.9MB ± 0%   -0.33%  (p=0.001 n=9+10)

name          old allocs/op  new allocs/op  delta
ParseLabel-7      520k ± 0%      520k ± 0%   +0.02%  (p=0.000 n=9+9)

The more rules, the higher bigger the gain will be! We also see that the locking (*Repository) getMatchingRules is also a huge chuck of cpu time, as well as a lot of lock contention.


@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Jul 31, 2023
This central ruleSlice has for us been one of the main bottle necks for
throughput, and one of the main contributors to cpu usage. With this
patch, we will now shard it by namespace transparently, so that we don't
need to evaluate rules unnecessary. The main benefit is that when
calculating rules for pod X in namespace A, we don't need to evaluate
rules only applying to pods in namespace B. The same applies to the rule
add and remove; when a rule is removed in namespace B, we don't need to
evaluate rules in namespace A.

This is all transparent to users, and an implementation detail, so the
sharding mechanism can easily be changed at a later stage.

Based on the BenchmarkParseLabel with 10 namespaces/shards, this is the
benchstat diff;
name          old time/op    new time/op    delta
ParseLabel-7     5.21s ±33%     1.02s ±40%  -80.36%  (p=0.000 n=10+10)

name          old alloc/op   new alloc/op   delta
ParseLabel-7    26.9MB ± 0%    26.9MB ± 0%   -0.33%  (p=0.001 n=9+10)

name          old allocs/op  new allocs/op  delta
ParseLabel-7      520k ± 0%      520k ± 0%   +0.02%  (p=0.000 n=9+9)

Signed-off-by: Odin Ugedal <ougedal@palantir.com>
@odinuge
Copy link
Member Author

odinuge commented Jul 31, 2023

cc @christarazi.

I'll do some more testing, but would be nice with some feedback on the overall idea. I think this should reduce our cilium-agent cpu load quite significantly, and it should also reduce the lock contention quite significantly as well.

@maintainer-s-little-helper
Copy link

Commit c9919e9f7ae70eb5fbb79abddd5e407e50f26ce1 does not contain "Signed-off-by".

Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Jul 31, 2023
Signed-off-by: Odin Ugedal <ougedal@palantir.com>
@maintainer-s-little-helper
Copy link

Commit c9919e9f7ae70eb5fbb79abddd5e407e50f26ce1 does not contain "Signed-off-by".

Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin

Copy link
Member

@christarazi christarazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, this is a great idea!

A few comments on initial approach.

@@ -139,6 +141,18 @@ type Repository struct {
getEnvoyHTTPRules func(certificatemanager.SecretManager, *api.L7Rules, string) (*cilium.HttpNetworkPolicyRules, bool)
}

// getShardKey returns the key to be used for the sharing of Repository.rulesByShard.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// getShardKey returns the key to be used for the sharing of Repository.rulesByShard.
// getShardKey returns the key to be used for the sharding of Repository.rulesByShard.

Suble typo or did you mean sharing? If the latter, then somehow the sentence could be clarified as it will potentially confuse other readers if said typo was made. :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh, yeah, sharding it should be!

Comment on lines 147 to 150
func getShardKey(lbls labels.LabelArray) (string, bool) {
for _, lbl := range lbls {
if lbl.Key == k8sConst.PolicyLabelNamespace && lbl.Source == labels.LabelSourceK8s {
return lbl.Value, true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT, this returns the label value, but the function is called getShardKey. So the result returned from

k8s:foo=bar
k8s:baz=net

these labels would be bar and net, depending on the order of the iteration.

Is the the expected behavior? Wouldn't this cause a problem for [1]?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point!

I think it should be fine given it doesn't really make sense having more than one label with k8s:io.cilium.k8s.policy.namespace=<xyz>. It is set based on the namespace here;

policyLbls := GetPolicyLabels(namespace, name, uid, resourceType)
return append(policyLbls, ruleLbs...).Sort()
(and the same for the k8s version of NetworkPolicy). For CCNP its not present, and we can fallback to "".

But maybe the users can manually set the spec.Labels field manually with same source and key, resulting in two of them. I'll look at that case, and I guess we have to take a look at it and handle it properly if thats the case. 😄 The same applies to non-namespaced CCNPs then, if users can manually add a label to those "tricking" us into thinking its a namespaced policy while its not. This should not as I understand it affect what the policy applies to, just how its stored in this policy "repo".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah it's a strange edge case. Maybe we can just document that if there are ever >1 namespace label, then the one that appears first will be used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If Cilium prepends the namespace label and we use the first one, then there may be no need to document it at all?

On one hand the threat actor who could set these labels is a fairly privileged Kubernetes API server Attacker and if they could customize the labels of the policy then they already have significant privileges to allow actual traffic. But on the other hand if they only have access to policies in one namespace, they shouldn't be able to "poison" policy shards for other namespaces, even if the result is only some form of additional policy cost (a cost which is currently borne by all users in all namespaces in current versions of Cilium)

Comment on lines 404 to 410
key, ok := getShardKey(lbls)
relevantShards := p.rulesByShard
if ok {
relevantShards = map[string]ruleSlice{
key: p.rulesByShard[key],
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[1]

pkg/policy/repository.go Outdated Show resolved Hide resolved
Copy link
Member

@joestringer joestringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat proposal! Couple of brief comments from initial look.

Also, the danger here would be if there could be a bug for some corner case of rule format that somehow ends up not getting selected by the rule engine and hence the policy doesn't apply. I don't have a good grasp on the corner cases but it'd be nice to see some testing to evaluate those cases. One basic one would be to continue to ensure that CCNPs that select all namespaces will apply to all Pods.

pkg/policy/repository.go Outdated Show resolved Hide resolved
@@ -109,7 +109,9 @@ func (p *policyContext) SetDeny(deny bool) bool {
type Repository struct {
// Mutex protects the whole policy tree
Mutex lock.RWMutex
rules ruleSlice
// Internally we shard the rule list to improve performance and maximize throughput
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this is specifically targeted for policy update performance, is that correct? Also, what's the difference between improving update performance vs. maximizing throughput? Aren't they the same thing?

Suggested change
// Internally we shard the rule list to improve performance and maximize throughput
// Internally we shard the rule list to improve update performance

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, based on the changes in getMatchingRules() it looks like this should also improve policy evaluation performance by excluding rule iteration for namespaces that are known to be different from the "current" namespace under iteration.

So maybe 'update and evaluation performance'?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this stated as an experiment where I wanted to improve the policy addition/update/deletion flow, but yeah, it will also reduce the policy evaluation as well! I'll try updating the description when we all agree on the approach and implementation. 😄

@nathanjsweet nathanjsweet self-requested a review August 1, 2023 02:02
@learnitall learnitall self-requested a review August 1, 2023 13:50
@odinuge
Copy link
Member Author

odinuge commented Aug 3, 2023

Neat proposal! Couple of brief comments from initial look.

Also, the danger here would be if there could be a bug for some corner case of rule format that somehow ends up not getting selected by the rule engine and hence the policy doesn't apply. I don't have a good grasp on the corner cases but it'd be nice to see some testing to evaluate those cases. One basic one would be to continue to ensure that CCNPs that select all namespaces will apply to all Pods.

Thanks for looking! And thanks for the initial feedback (both @joestringer and @christarazi)!

I don't have a good grasp on the corner cases but it'd be nice to see some testing to evaluate those cases.

Yeah, this is my main concern now; so I really want to poke at all those. My main concern is related to #27163 (comment) where users can add extra "namespace" labels to the spec.labels part of a CNP or CCNP, making it harder to reason about how those work. I'll push a new version now, but this is pretty far from production ready.

TODO: will squash when time comes.
Signed-off-by: Odin Ugedal <ougedal@palantir.com>
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Aug 3, 2023
@joestringer
Copy link
Member

SGTM. Also when we've narrowed down on the user value it'll be worth updating the release note to say something about improving policy processing performance.

@joestringer joestringer added the release-note/minor This PR changes functionality that users may find relevant to operating Cilium. label Aug 3, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Aug 3, 2023
@christarazi christarazi added kind/enhancement This would improve or streamline existing functionality. area/daemon Impacts operation of the Cilium daemon. kind/performance There is a performance impact of this. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies. labels Aug 3, 2023
Comment on lines +148 to +160
key := ""
matched := false
for _, lbl := range lbls {
if lbl.Key == k8sConst.PolicyLabelNamespace && lbl.Source == labels.LabelSourceK8s {
// If we find two, ignore then and put them into the non-namespaced for easier housekeeping
if matched {
return "", false
}
key = lbl.Value
matched = true
}
}
return key, matched
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic in getNamespaceFromLabels and getShardKey seems duplicative, might be worth implementing one in terms of the other.

Suggested change
key := ""
matched := false
for _, lbl := range lbls {
if lbl.Key == k8sConst.PolicyLabelNamespace && lbl.Source == labels.LabelSourceK8s {
// If we find two, ignore then and put them into the non-namespaced for easier housekeeping
if matched {
return "", false
}
key = lbl.Value
matched = true
}
}
return key, matched
if key := getNamespaceFromLabels(lbls); key != "" {
return key, true
}
return "", false

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly those two functions use different labels; since there is one label that is present in endpoint selectors (io.kubernetes.pod.namespace), while there is another one in the rule.Labels field (io.cilium.k8s.policy.namespace).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking we could reuse the k8s label search logic by adjusting the function definition for reuse. The refactor I had in mind is a bit more than fits in github suggestion, so I created odinuge#94

@github-actions
Copy link

This pull request has been automatically marked as stale because it
has not had recent activity. It will be closed if no further activity
occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Sep 16, 2023
@schlosna
Copy link
Contributor

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

I would still like to see this PR move forward

@github-actions github-actions bot removed the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Sep 17, 2023
@christarazi christarazi removed the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Nov 21, 2023
@aanm aanm added dont-merge/blocked Another PR must be merged before this one. dont-merge/wait-until-release Freeze window for current release is blocking non-bugfix PRs and removed dont-merge/blocked Another PR must be merged before this one. labels Dec 4, 2023
@joestringer joestringer removed the dont-merge/wait-until-release Freeze window for current release is blocking non-bugfix PRs label Dec 15, 2023
Copy link

This pull request has been automatically marked as stale because it
has not had recent activity. It will be closed if no further activity
occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Jan 14, 2024
@christarazi christarazi removed the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Jan 17, 2024
@odinuge
Copy link
Member Author

odinuge commented Jan 19, 2024

Hey! Yes, will rebase and fix this as soon as we get #30338 sorted. :)

Copy link

This pull request has been automatically marked as stale because it
has not had recent activity. It will be closed if no further activity
occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Feb 19, 2024
@christarazi christarazi removed the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Feb 26, 2024
Copy link

This pull request has been automatically marked as stale because it
has not had recent activity. It will be closed if no further activity
occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Mar 28, 2024
@christarazi christarazi removed the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Apr 1, 2024
Copy link

github-actions bot commented May 2, 2024

This pull request has been automatically marked as stale because it
has not had recent activity. It will be closed if no further activity
occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label May 2, 2024
@christarazi christarazi removed the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label May 2, 2024
@squeed
Copy link
Contributor

squeed commented May 17, 2024

@odinuge I'm picking this up to try and get it merged for v1.16.

@odinuge
Copy link
Member Author

odinuge commented May 17, 2024

Hey @squeed! Happy to push this myself. However we need a proper solution to #30338 first, and we need a lot of tests for that to avoid security regressions. If you want to start with that first that would be very appreciated!

@odinuge
Copy link
Member Author

odinuge commented May 17, 2024

The easy alternative is reverting 4cb0940, but not too sure how happy people would be with that. Imo. we should not introduce more "indexes" here until we are 100% the current state handles all the edge cases, with tests ensuring that is the case.

@squeed
Copy link
Contributor

squeed commented May 17, 2024

we should not introduce more "indexes"

This seems like a good chance to convert the PolicyRepository to StateDB over the coming months.

@joestringer
Copy link
Member

When converting more critical components over to StateDB, we'll need to exercise a lot of care and constraint. I'm open to the idea, but we should be rolling StateDB out in incremental manners through the tree that start non-critical components and move towards critical components, so we can ensure correctness and properly understand the performance characteristics.

@squeed
Copy link
Contributor

squeed commented May 17, 2024

Also, @odinuge, it looks like what this PR does is improve the time to insert / replace rules in the repository, and doesn't particularly improve endpoint validation. Which is fine, but the description is misleading ;-).

From your tests, which is more critical? Rule addition / deletion time, or endpoint policy calculation?

@squeed
Copy link
Contributor

squeed commented May 24, 2024

I filed #32703, which was inspired by this but takes a different tack.

@squeed
Copy link
Contributor

squeed commented May 31, 2024

#32703 should have removed this bottleneck, I believe this can be closed, @odinuge.

Copy link

github-actions bot commented Jul 1, 2024

This pull request has been automatically marked as stale because it
has not had recent activity. It will be closed if no further activity
occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Jul 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/daemon Impacts operation of the Cilium daemon. dont-merge/needs-rebase This PR needs to be rebased because it has merge conflicts. kind/enhancement This would improve or streamline existing functionality. kind/performance There is a performance impact of this. release-note/minor This PR changes functionality that users may find relevant to operating Cilium. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies. stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants