-
Notifications
You must be signed in to change notification settings - Fork 1.4k
📖 Propagate taints from MachineTemplates down to Nodes #12329
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Nolan Brubaker <nolan@nbrubaker.com>
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@nrb should we tag some people to review the taints proposal? I am going to tag in the ones that were participating to the original discussion over at #11175 for a review /assign @fabriziopandini @sbueringer @chrischdi @JoelSpeed @richardcase |
## Summary | ||
|
||
Users should be able to taint `Node` resources created via Cluster API using Cluster API's higher order resources such as `MachineSet`s, `MachineDeployment`s, and `MachinePool`s. | ||
These taints should be additive and continuously reconciled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Continuously reconciled concerns me. If you look at other ways to add taints (e.g. --register-with-taints
as a kubelet argument), they generally only register once.
If we don't have a section in this document explaining why continuous reconciliation is preferred, we should add that.
Is there a use case for wanting to add a taint at startup and then be able to remove it later (lazy initialization of a node?)
|
||
## Proposal | ||
|
||
### User Stories |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When applying taints, i'm guessing we are expecting this to happen when the node has already joined the cluster?
What would happen if CAPI is slow to respond and the taint takes some time to be applied? I assume then that the wrong workloads could end up on the node. Would they continue to be there once CAPI has caught up and applied the taints?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For labels/annotations we have an unitialized taint that is only removed after we synced lables/annotations. Would the same work for taints as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, i think we should call out that the uninitialized taint would resolve this race and should be used in cases where you need to guarantee the taints are applied before workloads are scheduled
// +listType=map | ||
// +listMapKey=key | ||
// +listMapKey=effect | ||
// +mapType=granular |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maptype doesn't apply to lists, only map[T]S
// This list is not necessarily complete: other Kubernetes components may add or remove other taints. | ||
// Only those taints defined in this list will be added or removed by Cluster API. | ||
// | ||
// NOTE: This list is implemented as a "granular map" type, meaning that individual elements can be managed by different owners. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is listType=map, not mapType=granular
|
||
This risk can be mitigated by ensuring Cluster API only modifies taints that it owns on nodes, as decribed in the implementaton above. | ||
|
||
## Alternatives |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The race condition example I mentioned in a previous example might also be a good argument for continuous reconciliation. I think ultimately there are different use cases for taints, and some of those must go through kubelets --register-with-taints
to be safe, in which case, they must go through the bootstrap provider.
For day 2 operation, taints on Machines make sense and to have them continuously reconciled. I'm starting to believe that having two ways to manage taints is actually required and they compliment each other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might be right. As was pointed out in the other thread, there is the CAPI uninitialized taint and possibly similar ones with the cloud provider.
In discussing this feature with Fabrizio on a call, he mentioned the ability to have worker
and infrastructure
groupings within a ClusterClass. These currently propagate their labels and annotations to the Machines that are created using the templates for these groups. Right now, some providers offer a way to pass taints down with their ClusterClass templates. The kubeadm provider also passes arbitrary entries to the --register-with-taints
argument as you pointed out.
Fabrizio expressed a desire to rationalize all of these ways of specifying taints so that hopefully, there aren't multiple ways to do it within CAPI.
I also recently spoke to @elmiko who mentioned there wasn't a way to specify taints generically for his Kubemark provider.
We may want to split up the use cases as you suggest. Perhaps we have a field for node bootstrapping that will indeed be removed, and a field for "runtime" taints.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or perhaps just adjust the structured so that taints has a different shape, where users can choose for each taint how they want it to be reconciled
taints:
- taint: ... // As before
persistence: OnInitialization | Always
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fwiw, in the kubemark case, i ended up adding a new field to the infra machine template because kubemark does not use kubeadm for initialization and i needed to pass flags to the kubemark binary. kubemark is definitely an outlier here.
What this PR does / why we need it:
This PR contains a proposal to allow Cluster API to sync taints from resources referencing a MachineTemplate to the managed Machines and ultimately Nodes.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):In support of #11175
/area machine
/area machineset
/area machinedeployment
/area machinepool