📖 Propagate taints from MachineTemplates down to Nodes #12329

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

nrb wants to merge 1 commit into kubernetes-sigs:main from nrb:taints-propagation-proposal

+255 −0

Contributor

nrb commented Jun 6, 2025

What this PR does / why we need it:

This PR contains a proposal to allow Cluster API to sync taints from resources referencing a MachineTemplate to the managed Machines and ultimately Nodes.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
In support of #11175

/area machine
/area machineset
/area machinedeployment
/area machinepool


          First draft of taint propagation proposal

b793d37

Signed-off-by: Nolan Brubaker <nolan@nbrubaker.com>

k8s-ci-robot added area/machine area/machineset area/machinedeployment area/machinepool cncf-cla: yes labels

Contributor

k8s-ci-robot commented Jun 6, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign justinsb for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot requested review from elmiko and richardcase

June 6, 2025 20:46

k8s-ci-robot added the size/L label

nrb mentioned this pull request

Support taint propagation to the underlying nodes #11175

Open

Member

damdo commented Jun 26, 2025

@nrb should we tag some people to review the taints proposal?

I am going to tag in the ones that were participating to the original discussion over at #11175 for a review

/assign @fabriziopandini @sbueringer @chrischdi @JoelSpeed @richardcase

cc. @Archisman-Mridha

k8s-ci-robot assigned chrischdi, fabriziopandini, JoelSpeed, richardcase and sbueringer

JoelSpeed reviewed

View reviewed changes

docs/proposals/20250513-propogate-taints.md

+              ## Summary
+              Users should be able to taint `Node` resources created via Cluster API using Cluster API's higher order resources such as `MachineSet`s, `MachineDeployment`s, and `MachinePool`s.
+              These taints should be additive and continuously reconciled.

Contributor

JoelSpeed Jul 7, 2025

Continuously reconciled concerns me. If you look at other ways to add taints (e.g. --register-with-taints as a kubelet argument), they generally only register once.

If we don't have a section in this document explaining why continuous reconciliation is preferred, we should add that.

Is there a use case for wanting to add a taint at startup and then be able to remove it later (lazy initialization of a node?)

docs/proposals/20250513-propogate-taints.md


		## Proposal

		### User Stories

Contributor

JoelSpeed Jul 7, 2025

When applying taints, i'm guessing we are expecting this to happen when the node has already joined the cluster?

What would happen if CAPI is slow to respond and the taint takes some time to be applied? I assume then that the wrong workloads could end up on the node. Would they continue to be there once CAPI has caught up and applied the taints?

Member

sbueringer Jul 7, 2025 •

edited

Loading

For labels/annotations we have an unitialized taint that is only removed after we synced lables/annotations. Would the same work for taints as well?

xref: https://cluster-api.sigs.k8s.io/developer/providers/contracts/bootstrap-config#taint-nodes-at-creation

Contributor

JoelSpeed Jul 7, 2025

Yes, i think we should call out that the uninitialized taint would resolve this race and should be used in cases where you need to guarantee the taints are applied before workloads are scheduled

docs/proposals/20250513-propogate-taints.md

+                  // +listType=map
+              	// +listMapKey=key
+              	// +listMapKey=effect
+              	// +mapType=granular

Contributor

JoelSpeed Jul 7, 2025

maptype doesn't apply to lists, only map[T]S

docs/proposals/20250513-propogate-taints.md

+                  // This list is not necessarily complete: other Kubernetes components may add or remove other taints.
+                  // Only those taints defined in this list will be added or removed by Cluster API.
+                  //
+                  // NOTE: This list is implemented as a "granular map" type, meaning that individual elements can be managed by different owners.

Contributor

JoelSpeed Jul 7, 2025

That is listType=map, not mapType=granular

docs/proposals/20250513-propogate-taints.md


		This risk can be mitigated by ensuring Cluster API only modifies taints that it owns on nodes, as decribed in the implementaton above.

		## Alternatives

Contributor

JoelSpeed Jul 7, 2025

The race condition example I mentioned in a previous example might also be a good argument for continuous reconciliation. I think ultimately there are different use cases for taints, and some of those must go through kubelets --register-with-taints to be safe, in which case, they must go through the bootstrap provider.

For day 2 operation, taints on Machines make sense and to have them continuously reconciled. I'm starting to believe that having two ways to manage taints is actually required and they compliment each other.

Contributor Author

nrb Jul 7, 2025

You might be right. As was pointed out in the other thread, there is the CAPI uninitialized taint and possibly similar ones with the cloud provider.

In discussing this feature with Fabrizio on a call, he mentioned the ability to have worker and infrastructure groupings within a ClusterClass. These currently propagate their labels and annotations to the Machines that are created using the templates for these groups. Right now, some providers offer a way to pass taints down with their ClusterClass templates. The kubeadm provider also passes arbitrary entries to the --register-with-taints argument as you pointed out.

Fabrizio expressed a desire to rationalize all of these ways of specifying taints so that hopefully, there aren't multiple ways to do it within CAPI.

I also recently spoke to @elmiko who mentioned there wasn't a way to specify taints generically for his Kubemark provider.

We may want to split up the use cases as you suggest. Perhaps we have a field for node bootstrapping that will indeed be removed, and a field for "runtime" taints.

Contributor

JoelSpeed Jul 8, 2025

Or perhaps just adjust the structured so that taints has a different shape, where users can choose for each taint how they want it to be reconciled

taints:
- taint: ... // As before
  persistence: OnInitialization | Always

Contributor

elmiko Jul 9, 2025

fwiw, in the kubemark case, i ended up adding a new field to the infra machine template because kubemark does not use kubeadm for initialization and i needed to pass flags to the kubemark binary. kubemark is definitely an outlier here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/machine area/machinedeployment area/machinepool area/machineset cncf-cla: yes size/L