Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cordon all outdated nodes before any rolling update action #41

Closed
LaCodon opened this issue Nov 14, 2022 · 6 comments · Fixed by #42
Closed

Cordon all outdated nodes before any rolling update action #41

LaCodon opened this issue Nov 14, 2022 · 6 comments · Fixed by #42
Labels
enhancement New feature or request

Comments

@LaCodon
Copy link
Contributor

LaCodon commented Nov 14, 2022

Describe the feature request

The current behaviour is to iterate over every outdated node and then first cordon and then drain it immediately afterwards. I think the behaviour should actually be to first cordon all outdated instances before doing anything and then just behave as usual.

Why do you personally want this feature to be implemented?

I whish this feature to be implemented because the current behaviour often (in my experience) leads to pods beeing replaced onto an outdated instance. This leads to a lot of pod restarts during rolling updates as pods get replaced more than once. This is espacially bad for pods with a long terminationGracePeriod or a long startup period. It happens that a pod doesn't even get ready after a replacement before it gets replaced again.

How long have you been using this project?

~3-4 months

Additional information

I would volunteer to implement this feature, even with backward compatibility if required.

@LaCodon LaCodon added the enhancement New feature or request label Nov 14, 2022
@TwiN
Copy link
Owner

TwiN commented Nov 16, 2022

So the reason why I haven't designed the algorithm to just cordon everything before beginning the draining phase is that if you cordon everything and you have a spike in load, scaling will be delayed by at least the time it takes for a node to be spun up, and possibly longer, because the rolling update handler may also be draining a node while the scheduler is trying to schedule pods whose scaling was brought about by an HPA.

TL;DR: This was done on purpose, and cordoning all nodes, while making the upgrade faster, also increases the risk that the upgrade will no longer be "transparent"/"graceful" and may cause degraded application performance.

That said, I'm not completely against the idea of implementing it as an optional feature, if you believe it to be necessary for your use case(s).

@LaCodon
Copy link
Contributor Author

LaCodon commented Nov 16, 2022

I understand, that totally makes sense.
So I will try to implement the idea as an optional feature with respective readme documentation and propose it as a pull request to you :)

@TwiN
Copy link
Owner

TwiN commented Nov 17, 2022

Sounds good!

@TwiN
Copy link
Owner

TwiN commented Dec 3, 2022

Resolved by #42

@TwiN TwiN closed this as completed Dec 3, 2022
@someone-stole-my-name
Copy link
Contributor

Hey @TwiN, is this getting released any time soon?

@TwiN
Copy link
Owner

TwiN commented Dec 6, 2022

@someone-stole-my-name It was already available through the latest tag, but I've just released it in v1.7.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants