-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CFP: Dynamic Labels Filter #32576
Comments
The core idea here sounds interesting to try to improve the usability. Relaying some context from Slack (link), we had discussed this a little with @marseel and @dlapcevic at Kubecon NA a while back, and several concerns were raised around the practical impacts of this. We highlighted a few areas that would benefit from a more detailed discussion via CFP:
|
Hey @joestringer, |
I have read the document and have a few general questions, but also alternative idea so I will add my feedback here (I didn't want to put all of that in gdoc 😅) Problem statement:Currently, we use all namespace and pod labels in CIDs. This results sometimes in a high number of CIDs and high churn in the cluster in the following cases:
So I think these are a bit similar, but a two slightly different issues:
Proposed design in CFP:I have a feeling this CFP moves one problem to another place. Let's imagine a case when someone adds NetworkPolicy with the label selector
or did I miss something? I am basing this on the line in doc "For both cases the left-hand key must be extracted and marked as relevant" Alternative solution for (1)I know that currently, we consider Identities as non-mutable. Clustermesh incompatibility:❗ With Clustermesh, dynamic label-filter won't work as we don't have information about remote peer-clusters NetworkPolicies |
I share @marseel's concern that this dynamic reaction to Network Policies could result in massive impacts to the cluster that represent a risk to reliable forwarding.
Almost. We already eliminate several common causes for identity cardinality increase where the labels aren't expected to be used for security:
I realize this is something possible today, though my initial inclination would be to say that doing this safely for large namespaces is difficult, and recommend users avoid this wherever possible. Instead, create new namespaces with different labels and migrate apps over. This would be more incremental and easier to control the change of state in the cluster over time, rather than making a "small" change to a namespace that could have massive impacts which may/may not be reversable. That said, I think that this problem (1) is probably solvable if we properly think through the lifecycle and limits. More specifically, I am skeptical about a "fire and forget" style solution where every Cilium agent can reallocate the identities of every Pod in the cluster on startup. The change rate needs to be more controlled than that. More details in the CFP on some of these aspects.
I don't love the idea of extending mutability into Identity labelsets, I think it works against some of the properties we've worked to establish. Just a few problems I see - For instance, an Identity Foo at time A is different from at time B. So when you debug a problem relating to Identity Foo, now you must know not only the Identity and the Policies, but also the time when the event occurred and any time that any mutations may have occurred to the Identity (+ those timestamps). Furthermore, newly respected labels could change policy behaviour on remote nodes depending on the policies. For instance, a policy could match on labels A,B to reject traffic. You could add label B into a namespace and now all the remote nodes need to recalculate the policy posture for those existing Identities. You've saved the Identity number propagation but the network policy impact on those Identities still takes time to propagate and must be calculated to enforce a proper posture. Overall, Cilium defines a set of security-relevant labels and it has fairly sane defaults for these that work for the vast majority of users. For some advanced users who operate large and complex environments, the defaults may not work for them. Typically there is no real way out for those users other than learning the operational knowledge required to operate those environments, especially when it comes to scalability limitations. I think the core idea here for problem (2) raised by @marseel is that the user defines the security-relevant labels by writing Network Policies using certain labels... if Cilium can use this input then it can optimize the use of Labels in Identities automatically without explicitly requesting the user to provide them. However, if we look at some of the feedback I think that this is exactly what's causing some of the concerns. If the user suddenly starts using additional labels in Network Policies, then suddenly we need to perform significant security recalculations across the entire cluster to account for the change. That disruption and risk is introduced because the original set of labels to consider was inaccurate, it wasn't actually the set of security-relevant labels, because the proposed implementation just made that assumption rather than directly fetching that information from the user. |
Cilium Feature Proposal
Is your proposed feature related to a problem?
Operators with larger environments have to limit the set of identity-relevant labels to avoid frequent creation of new security identities manually. There is a need to manually compute all the relevant identity labels and keep them in sync (cilium.io/identity-relevant-labels).
Describe the feature you'd like
Dynamically limit the set of identity-relevant labels to avoid frequently creating new security identities. Because many labels are not useful for policy enforcement or visibility, the objective is to create CIDs only for labels used in network policies to greatly reduce the number of CIDs.
Describe your proposed solution
The current implementation supports limiting identity relevant labels by manually configuring the labels as a regex string in ConfigMap, see cilium.io/identity-relevant-labels. The existing approach requires manual interaction to determine which labels are relevant, manually update the ConfigMap, restart the cilium agent and the affected pods in order to re-generate the CEP/CIDs. To the customer friction the goal is to dynamically complete the above steps and update all the required resources. The feature should be placed behind a default enabled feature flag in order to easily switch between implementations.
CFP GDoc: CFP-2024-05-16 CFP: Dynamic Labels Filter
Proposed PRs:
#32501
#32574
Once the CFP is close to being finalized, please add it as a PR to the design-cfps repo for final approval.
The text was updated successfully, but these errors were encountered: