-
Notifications
You must be signed in to change notification settings - Fork 829
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: introduce opt-in support for orphaned node destruction #1268
Comments
Thanks for writing this up! My first concern: When the cluster is being torn down, what if the Karpenter process becomes permanently unavailable since its ASG is gone. There's nothing to execute the node termination logic (including ec2 terminate instance). |
Isn't it kind of the same right now? (assuming that if we do terminate node with karpenter on it, dependent nodes won't be deleted due to the presence of finalizer, preventing the destruction of the cluster?) This feature would be opt in, and I assume that it would be used by people with strong IAC pipeline (with great power, etc), so hopefully, they would have a proper order of execution during the destruction stage. |
Wouldn't it be possible to orchestrate cleanup against EC2 directly (e.g. your workaround command)? I love the idea of Karpenter handling this on uninstall, but unless Karpenter install/uninstall is codified more explicitly and/or run outside of the cluster it operates on, I don't see us being able to provide a robust solution. I've mentioned elsewhere (k8s slack, I think), that I think this is a great feature for Kubernetes installers (eks/kops/etc) |
Completely agree this is belongs with installers. kops already deletes all instances on cluster deletion, including karpenter-managed instances. Right now, it doesn't delete instances on provisioner deletion, but it's a trivial thing to implement and an expected feature given that we support this with ASG. |
We intentionally don't delete nodes on provisioner deletion. We think of provisioners as forward looking. It's the same reason that we don't apply labels to nodes if you update the provisioner's labels after a node is launched. |
Semantically, users would be deleting the instance group, so it's not entirely the same as deleting the provisioner resource itself. It would be a much more deliberate action with clearer intent. |
Tell us about your request
Introduce support for orphaned node destruction (both present in the cluster and outside)
Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?
Right now, karpenter's approach to provisioner deletion is "don't delete provisioned nodes" and that's a sensible approach in most cases.
However, I believe that some operators do want to ensure that once the provisioner is deleted, all nodes which are part of it should be deleted as well.
For example, in our pipeline for cluster bootstrapping the approach is following:
Create CLuster with asg consisting of 1 node tainted for karpenter -> install karpenter -> install provisioner -> install argocd -> install apps
on cluster destruction, the inverse process is not working because sometimes karpenter is deleted before getting to deal with empty nodes.
It would be nice if
Are you currently working around this issue?
I have to run
aws ec2 describe-instances --filters Name=instance-state-name,Values=running --filter Name=tag:eks:cluster-name,Values=${TF_VAR_CLUSTER_NAME} --filters Name=tag-key,Values=karpenter.sh/provisioner-name --query "Reservations[*].Instances[*].InstanceId" --output text | xargs aws ec2 terminate-instances --instance-ids || true
to delete all orphaned instancesCommunity Note
The text was updated successfully, but these errors were encountered: