-
Notifications
You must be signed in to change notification settings - Fork 525
chore: only label nodes with kubernetes.io/role if they need it #4827
chore: only label nodes with kubernetes.io/role if they need it #4827
Conversation
I wonder if this is correct. This script just screams of hack. Is there a reason the labels do not come up to start with? Why can we add them later? This does not really do well with dynamic clusters, especially of large size. If we add 100 new nodes per minute for a while (during fast scaling periods), that means that those nodes will need the labels before fully functioning and thus we have these blind points (or maybe we don't need the labels ever, in which case we can just stop doing this) I only noticed that this is running due to the amount of log output produced by this script into syslog. It is rather voluminous, especially in the larger clusters. It also puts a stress on API server in those clusters where we already have API server stress due to our workloads. |
I think this comment suggests we should not be doing this at all: kubernetes/kubernetes#84912 (comment) I will remove this entire node labeling foo. I've confirmed that the nodes are automatically labeled thusly:
Unfortunately, this may break some existing customers who are relying upon these labels:
But, in advance of deprecating the project we should definitely finally do the right thing and stop breaking the rules (this has been a core principle of node labeling since 1.16). |
Hmm, nevermind, this label key is all over k/k code, including cloud-provider-azure: https://github.com/kubernetes-sigs/cloud-provider-azure/blob/master/pkg/consts/consts.go#L86 AKS nodes have these labels, so I take back everything above, we do need them. |
@Michael-Sinz read this thread: |
51f5835
to
034a554
Compare
O.K., so after going through the effort to re-acquaint myself with what's going on here:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the concept - let me try this on one of my small test clusters to see what the behavior is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to work rather well. Thank you for your quick response.
@Michael-Sinz a possible edge case going forward with this solution is if the I don't want to over-engineer this to death and so I'm not anticipating that edge case being present in the real world, but if you observe something weird in nodes not having the desired labels let us know! |
I think you are fine - given the mechanism involved, I would expect this to be all or nothing failure. |
Reason for Change:
This PR updates the way we idempotently enforce
kubernetes.io/role
node labels. Instead of redoing it blindly every minute forever, we only do it if the required labels aren't present.The key part of the change is the way we filter out nodes (by label) for labeling. As an example:
Was:
kubernetes.azure.com/role!=master,kubernetes.io/role!=master
Now:
kubernetes.azure.com/role=agent,kubernetes.io/role!=agent,node-role.kubernetes.io/agent!=
kubernetes.azure.com/role=agent
label (delivered via kubelet config args when node joins cluster), but have the missing or incorrectkubernetes.io/role
andnode-role.kubernetes.io/agent
labels.The key point above is when we have a set of nodes using the new filters, we apply the desired labels such that the next time we query for nodes those recently labeled nodes are not included in the next query. This results in far fewer, actual
kubectl label nodes
write operations, as the label selector we use to label nodes will only be non-empty when a new node has recently come online.Issue Fixed:
Credit Where Due:
Does this change contain code from or inspired by another project?
If "Yes," did you notify that project's maintainers and provide attribution?
Requirements:
Notes: