Skip to content
This repository has been archived by the owner on Oct 24, 2023. It is now read-only.

chore: only label nodes with kubernetes.io/role if they need it #4827

Merged
merged 3 commits into from
Feb 11, 2022

Conversation

jackfrancis
Copy link
Member

@jackfrancis jackfrancis commented Feb 9, 2022

Reason for Change:

This PR updates the way we idempotently enforce kubernetes.io/role node labels. Instead of redoing it blindly every minute forever, we only do it if the required labels aren't present.

The key part of the change is the way we filter out nodes (by label) for labeling. As an example:

Was:

  • Filter by nodes with label kubernetes.azure.com/role!=master,kubernetes.io/role!=master
    • This filter will always return a set of non-control-plane-labeled nodes.

Now:

  • Filter by nodes with label kubernetes.azure.com/role=agent,kubernetes.io/role!=agent,node-role.kubernetes.io/agent!=
    • This filter returns nodes that have the kubernetes.azure.com/role=agent label (delivered via kubelet config args when node joins cluster), but have the missing or incorrect kubernetes.io/role and node-role.kubernetes.io/agent labels.

The key point above is when we have a set of nodes using the new filters, we apply the desired labels such that the next time we query for nodes those recently labeled nodes are not included in the next query. This results in far fewer, actual kubectl label nodes write operations, as the label selector we use to label nodes will only be non-empty when a new node has recently come online.

Issue Fixed:

Credit Where Due:

Does this change contain code from or inspired by another project?

  • No
  • Yes

If "Yes," did you notify that project's maintainers and provide attribution?

  • No
  • Yes

Requirements:

Notes:

@Michael-Sinz
Copy link
Collaborator

I wonder if this is correct.
If these labels really are needed, we may have a case where, during scaling, we have new nodes show up and not have their labels set correctly.

This script just screams of hack. Is there a reason the labels do not come up to start with? Why can we add them later?

This does not really do well with dynamic clusters, especially of large size. If we add 100 new nodes per minute for a while (during fast scaling periods), that means that those nodes will need the labels before fully functioning and thus we have these blind points (or maybe we don't need the labels ever, in which case we can just stop doing this)

I only noticed that this is running due to the amount of log output produced by this script into syslog. It is rather voluminous, especially in the larger clusters. It also puts a stress on API server in those clusters where we already have API server stress due to our workloads.

@jackfrancis
Copy link
Member Author

I think this comment suggests we should not be doing this at all:

kubernetes/kubernetes#84912 (comment)

I will remove this entire node labeling foo. I've confirmed that the nodes are automatically labeled thusly:

  • control plane node: kubernetes.azure.com/role=master
  • worker node: kubernetes.azure.com/role=agent

Unfortunately, this may break some existing customers who are relying upon these labels:

  • kubernetes.io/role=master
  • node-role.kubernetes.io/master=
  • kubernetes.io/role=agent
  • node-role.kubernetes.io/agent=

But, in advance of deprecating the project we should definitely finally do the right thing and stop breaking the rules (this has been a core principle of node labeling since 1.16).

cc @mboersma @devigned

@jackfrancis
Copy link
Member Author

Hmm, nevermind, this label key is all over k/k code, including cloud-provider-azure:

https://github.com/kubernetes-sigs/cloud-provider-azure/blob/master/pkg/consts/consts.go#L86

AKS nodes have these labels, so I take back everything above, we do need them.

@jackfrancis
Copy link
Member Author

@Michael-Sinz read this thread:

eksctl-io/eksctl#2363 (comment)

@jackfrancis
Copy link
Member Author

O.K., so after going through the effort to re-acquaint myself with what's going on here:

  1. we definitely need to preserve this behavior
  2. understood that applying these role labels in the kubernetes-approved way (via a privileged k8s api operation, not via setting kubelet --node-labels configuration on the nodes themselves) can definitely be optimized from the existing "run this script on all control plane nodes every minute forever" implementation, but doing that optimization work is out of scope for aks-engine given project deprecation
  3. we can do a better job of identifying which nodes are new (by their not having certain labels), and I will turn this PR into such an improvement

@jackfrancis jackfrancis changed the title chore: only retry node label job if it fails chore: only label nodes with kubernetes.io/role if they need it Feb 10, 2022
Copy link
Collaborator

@Michael-Sinz Michael-Sinz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the concept - let me try this on one of my small test clusters to see what the behavior is.

Copy link
Collaborator

@Michael-Sinz Michael-Sinz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to work rather well. Thank you for your quick response.

@jackfrancis
Copy link
Member Author

jackfrancis commented Feb 10, 2022

@Michael-Sinz a possible edge case going forward with this solution is if the kubectl label command is not atomic, and one of the two labels is applied but not the other. Because our new query looks for all of the included labels (it's an AND query), if we're missing just one of them we won't filter it in for a follow-up kubectl label operation and that missing label will stay missing.

I don't want to over-engineer this to death and so I'm not anticipating that edge case being present in the real world, but if you observe something weird in nodes not having the desired labels let us know!

@Michael-Sinz
Copy link
Collaborator

@Michael-Sinz a possible edge case going forward with this solution is if the kubectl label command is not atomic, and one of the two labels is applied but not the other. Because our new query looks for all of the included labels (it's an AND query), if we're missing just one of them we won't filter it in for a follow-up kubectl label operation and that missing label will stay missing.

I don't want to over-engineer this to death and so I'm not anticipating that edge case being present in the real world, but if you observe something weird in nodes not having the desired labels let us know!

I think you are fine - given the mechanism involved, I would expect this to be all or nothing failure.

@jackfrancis jackfrancis merged commit fe9e4d4 into Azure:master Feb 11, 2022
@jackfrancis jackfrancis deleted the label-nodes-restart-on-failure branch February 11, 2022 00:05
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants