Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mega Issue: Deprovisioning Controls #1738

Open
8 of 18 tasks
njtran opened this issue Apr 28, 2022 · 37 comments
Open
8 of 18 tasks

Mega Issue: Deprovisioning Controls #1738

njtran opened this issue Apr 28, 2022 · 37 comments
Assignees
Labels
consolidation feature New feature or request v1 Issues requiring resolution by the v1 milestone

Comments

@njtran
Copy link
Contributor

njtran commented Apr 28, 2022

Tell us about your request
Karpenter provisions nodes and deprovisions nodes as described in the docs.

Some users are asking for more control on how to specify when Karpenter should disrupt nodes, and controls on how to rate-limit these disruptions. Opening this issue to aggregate the existing issues.

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?

** Karpenter disruption conditions:**

Control over how Karpenter should disrupt nodes:

Control over Karpenter's Eviction Policy:

Additional context

Attachments
If you think you might have additional information that you'd like to include via an attachment, please do - we'll take a look. (Remember to remove any personally-identifiable information.)

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment
@njtran njtran added the feature New feature or request label Apr 28, 2022
@vasylenko
Copy link

Heavy ➕ for #1716 - Karpenter should watch for AMI version updates and roll nodes to update to the new version.

Use case: rotate (refresh) running instances when new security or other important update is released to the AMI used by the project/organization.

@prashil-g
Copy link

+1

@aavileli
Copy link

aavileli commented Jul 12, 2022

How does ami refresh or OS refresh looks like in karpentar as it does not use asg node groups.
One of our needs is to refresh nodes every 3 weeks. we would use a similar public script https://github.com/hellofresh/eks-rolling-update which used asg and cluster autoscaler to
to cordon , drain and delete the nodes . With karpentar we could use ttlSecondsUntilExpired to expire the node. What happens if a new vulnerability is discovered we need to refresh all nodes
solution is to change the ttlSecondsUntilExpired for small cluster is fine but for a large cluster this would create a high churn so one idea is to have a max_number_of_node flag to control the refresh capacity. Is there a better way to achieve this ?

@hajdukd
Copy link

hajdukd commented Aug 6, 2022

Any update on this ? Pls

@blakepettersson
Copy link

Would it be possible to do something like a subset of #1841, which doesn't necessarily have to re-provision nodes? For example, updating instance tags shouldn't need to re-provision new nodes.

@hawkesn
Copy link
Contributor

hawkesn commented Aug 26, 2022

Another example of #1716 , I upgraded the EKS control plane and would like the nodes to upgrade accordingly.

@njtran
Copy link
Contributor Author

njtran commented Oct 3, 2022

Hey all. Check out #2569 for the design doc for node upgrades.

@karma-git
Copy link

@ellistarn Hi,

Maybe you could specify the ETA of NodeDrift implementation?
Thank you

@johngmyers
Copy link

johngmyers commented Mar 20, 2023

My view is that if a node could have been created from the current spec then it has not drifted, whereas if it could not have been created it should be deprovisioned (subject to rate limits, etc.)

So if the zone requirements changed from ["a"] to ["a", "b"] then no nodes need be deprovisioned. But if the opposite change is made then all nodes in zone b should be deprovisioned.

@njtran
Copy link
Contributor Author

njtran commented Mar 20, 2023

In these cases drift is undesirable.

@johngmyers, sorry I'm a little confused here by your comment. I thought previously you were advocating for label and tag drift, but in your message here it looks like you're saying drift is not desired?

@johngmyers
Copy link

I am saying I want drift to be remediated by karpenter.

@ellistarn ellistarn added the v1 Issues requiring resolution by the v1 milestone label Apr 18, 2023
@grandich
Copy link

@ellistarn Regarding "ability to control when and when not to expire nodes":

Just to understand: Will the proposed way to address this be using Node Disruption Budget and k8s cronjobs updating them to achieve "maintenance windows"?

@njtran njtran changed the title Drift Detection Deprovisioning Controls Jun 5, 2023
@njtran njtran changed the title Deprovisioning Controls Mega Issue: Deprovisioning Controls Jun 5, 2023
@njtran
Copy link
Contributor Author

njtran commented Jun 5, 2023

Changed the title and updated this issue to be more accurate of the current state of affairs. Added a design doc on how we're thinking Drift should work for the rest of the known fields here: kubernetes-sigs/karpenter#366

@hendryanw
Copy link

Hi, I’ve searched online and in Github, but can’t find the documentation covering the node patch / update. It’s related to #1716, which I saw it has already been ticked.

Can you please help guide me where can I the documentation?

@njtran
Copy link
Contributor Author

njtran commented Jul 10, 2023

@hendryanw you can drive node patches/updates by deprovisioning. As of v0.29.0, Karpenter automatically upgrades nodes through Drift for AMIs, Security Groups, and Subnets.

@calvinbui
Copy link

can we add kubernetes-sigs/karpenter#735 to this list?

@engedaam
Copy link
Contributor

Just released v0.30.0-rc.0 which contains the full set of drift expansion. Checkout the full release notes in karpenter and karpenter-core

@njtran
Copy link
Contributor Author

njtran commented Sep 20, 2023

Hey all, I've linked an RFC here that has some of the API decisions we're thinking about for some of the linked items in this mega issue.

Please take a read and give a review if you can!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
consolidation feature New feature or request v1 Issues requiring resolution by the v1 milestone
Projects
None yet
Development

No branches or pull requests