-
Notifications
You must be signed in to change notification settings - Fork 862
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Node Interruption should be aware of deprovisioning #2917
Comments
Talked with @bwagner5. I'm not really sure the difference here for pre-spin. I see the value of this being integrated into the deprovisioning flow, but we will always have to start with deleting the node and kicking off graceful pod termination. Once we start a delete, we consider all pods on that node to be in a pseudo-pending state, meaning that they will be scheduled for. The main benefits of pre-spinning were:
Let me know if I am missing some point that you are making here though. |
This SGTM. This issue was intended to track the work to integrate node interruption into the deprovisioning flow. I'll rename. |
I'd like to revisit this tradeoff. Do we want to maximize graceful time for a pod to drain, or minimize the amount of time that a drain pod will sit pending. We've heard recently about customers facing situations where pods are able to drain very rapidly, but their applications become unavailable during the replacement node launch duration. |
This seems strange to me. We could perhaps do a bit better here to ensure that we time the delete so that we reduce application unavailability but allow pods to go to their full pod termination grace period. I would expect that if a user sets up PDBs for their applications, this should be sufficient to ensure application stability while new nodes are coming up, unless I'm missing something in my understanding. |
IIRC this was to help cases of single pod deployments which can't use pdbs |
Could we somehow look at the grace period of all pods, and make sure we leave at least that amount of time left before triggering the deletion? |
This still seems tenuous to me due to our ordered deletion that we are executing right now. Even if all of the workload pods get a chance to drain this might not give enough time for the DaemonSet pods to run through their termination grace period. To get this right, we would basically have to reason about our ordered deletion and propagate that to orchestrate the time that we got the interruption. I'd be more interested to hear the number of users that are constrained by the lack of an ability to use PDBs here. This seems to me like an edge case and I'm skeptical if doing this is worth the complexity of trying to reason about when we actually terminate the node at the interruption-side. |
:thinking do we have to evict some pods before others? I know that we do critical after noncritical, but I can't remember why we have this sequencing requirement. It's a good point, though. However, if we could make this simple, I could see benefits for reducing downtime on singletons that run on spot. E.g. running a Jupiter notebook on spot, the developer might not even notice the disruption if it were only a few seconds. |
An interesting idea that was floated around among some of the maintainers offline: If we decide to set The rub for this is that the On top of that, it's extremely elegant. |
Thinking on this more, I'm realizing that this idea isn't perfect. The I'm also not convinced that we want to leverage |
It's worth noting here that setting |
Tell us about your request
Right now, spot interruption triggers cordon/drain on a node.
Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?
When interruption happens, deprovisioning logic should be aware so as to not voluntarily terminate additional capacity while the cluster is in flux.
Are you currently working around this issue?
Eating dirt.
Additional Context
No response
Attachments
No response
Community Note
The text was updated successfully, but these errors were encountered: