Skip to content

📖 Propagate taints from MachineTemplates down to Nodes #12329

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

nrb
Copy link
Contributor

@nrb nrb commented Jun 6, 2025

What this PR does / why we need it:

This PR contains a proposal to allow Cluster API to sync taints from resources referencing a MachineTemplate to the managed Machines and ultimately Nodes.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
In support of #11175

/area machine
/area machineset
/area machinedeployment
/area machinepool

Signed-off-by: Nolan Brubaker <nolan@nbrubaker.com>
@k8s-ci-robot k8s-ci-robot added area/machine Issues or PRs related to machine lifecycle management area/machineset Issues or PRs related to machinesets area/machinedeployment Issues or PRs related to machinedeployments area/machinepool Issues or PRs related to machinepools cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jun 6, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign justinsb for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jun 6, 2025
@damdo
Copy link
Member

damdo commented Jun 26, 2025

@nrb should we tag some people to review the taints proposal?

I am going to tag in the ones that were participating to the original discussion over at #11175 for a review

/assign @fabriziopandini @sbueringer @chrischdi @JoelSpeed @richardcase

cc. @Archisman-Mridha

## Summary

Users should be able to taint `Node` resources created via Cluster API using Cluster API's higher order resources such as `MachineSet`s, `MachineDeployment`s, and `MachinePool`s.
These taints should be additive and continuously reconciled.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Continuously reconciled concerns me. If you look at other ways to add taints (e.g. --register-with-taints as a kubelet argument), they generally only register once.

If we don't have a section in this document explaining why continuous reconciliation is preferred, we should add that.

Is there a use case for wanting to add a taint at startup and then be able to remove it later (lazy initialization of a node?)


## Proposal

### User Stories
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When applying taints, i'm guessing we are expecting this to happen when the node has already joined the cluster?

What would happen if CAPI is slow to respond and the taint takes some time to be applied? I assume then that the wrong workloads could end up on the node. Would they continue to be there once CAPI has caught up and applied the taints?

Copy link
Member

@sbueringer sbueringer Jul 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For labels/annotations we have an unitialized taint that is only removed after we synced lables/annotations. Would the same work for taints as well?

xref: https://cluster-api.sigs.k8s.io/developer/providers/contracts/bootstrap-config#taint-nodes-at-creation

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, i think we should call out that the uninitialized taint would resolve this race and should be used in cases where you need to guarantee the taints are applied before workloads are scheduled

// +listType=map
// +listMapKey=key
// +listMapKey=effect
// +mapType=granular
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maptype doesn't apply to lists, only map[T]S

// This list is not necessarily complete: other Kubernetes components may add or remove other taints.
// Only those taints defined in this list will be added or removed by Cluster API.
//
// NOTE: This list is implemented as a "granular map" type, meaning that individual elements can be managed by different owners.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is listType=map, not mapType=granular


This risk can be mitigated by ensuring Cluster API only modifies taints that it owns on nodes, as decribed in the implementaton above.

## Alternatives
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The race condition example I mentioned in a previous example might also be a good argument for continuous reconciliation. I think ultimately there are different use cases for taints, and some of those must go through kubelets --register-with-taints to be safe, in which case, they must go through the bootstrap provider.

For day 2 operation, taints on Machines make sense and to have them continuously reconciled. I'm starting to believe that having two ways to manage taints is actually required and they compliment each other.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might be right. As was pointed out in the other thread, there is the CAPI uninitialized taint and possibly similar ones with the cloud provider.

In discussing this feature with Fabrizio on a call, he mentioned the ability to have worker and infrastructure groupings within a ClusterClass. These currently propagate their labels and annotations to the Machines that are created using the templates for these groups. Right now, some providers offer a way to pass taints down with their ClusterClass templates. The kubeadm provider also passes arbitrary entries to the --register-with-taints argument as you pointed out.

Fabrizio expressed a desire to rationalize all of these ways of specifying taints so that hopefully, there aren't multiple ways to do it within CAPI.

I also recently spoke to @elmiko who mentioned there wasn't a way to specify taints generically for his Kubemark provider.

We may want to split up the use cases as you suggest. Perhaps we have a field for node bootstrapping that will indeed be removed, and a field for "runtime" taints.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or perhaps just adjust the structured so that taints has a different shape, where users can choose for each taint how they want it to be reconciled

taints:
- taint: ... // As before
  persistence: OnInitialization | Always

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fwiw, in the kubemark case, i ended up adding a new field to the infra machine template because kubemark does not use kubeadm for initialization and i needed to pass flags to the kubemark binary. kubemark is definitely an outlier here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/machine Issues or PRs related to machine lifecycle management area/machinedeployment Issues or PRs related to machinedeployments area/machinepool Issues or PRs related to machinepools area/machineset Issues or PRs related to machinesets cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants