-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cluster Autoscaler not backing off exhausted node group #6240
Comments
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
Is there something we need to do as the |
@apricote Looking at the relevant provider code, it seems possible that on failure it logs a message, but neglects to return the error from |
Thanks for the hint @tallaxes! I opened a PR to properly return encountered errors. |
Which component are you using?:
Cluster Autoscaler
What version of the component are you using?:
registry.k8s.io/autoscaling/cluster-autoscaler:v1.26.4
Component version:
What k8s version are you using (
kubectl version
)?:kubectl version
OutputWhat environment is this in?:
Hetzner Cloud
What did you expect to happen?:
When the cluster autoscaler is configured with a priority expander, and multiple node groups of differing priorities are provided, the cluster autoscaler should back-off after some time if the cloud provider fails to provision nodes in the high priority node group due to resource unavailability and proceed to lower priority node groups.
What happened instead?:
The high priority node group (in the below log pool1) has no resources available currently to provision the requested nodes.
The cluster autoscaler is stuck in a loop trying to provision nodes in the high-prio group and not proceeding to pool2 (lower prio, resources available). I've also tried to set
--max-node-group-backoff-duration=1m
with no effect.How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
The text was updated successfully, but these errors were encountered: