Scale-down is blocked indefinitely when a scale-up remains “InProgress” for a node group

**Which component are you using?**: Cluster Autoscaler



**What version of the component are you using?**:  v1.22



Component version:

**What k8s version are you using (`kubectl version`)?**: v1.22.17

<details><summary><code>kubectl version</code> Output</summary><br><pre>
$ kubectl version

</pre></details>

**What environment is this in?**: Forked Kubernetes



**What did you expect to happen?**: Once unneeded nodes have sat idle longer than --scale-down-unneeded-time + --scale-down-delay-after-add, CA should proceed with scale-down even if a scale-up is still in flight.



**What happened instead?**: CA will not scale down any nodes in a group if it believes a scale-up for that group is still in progress. It waits indefinitely until the pending node registers or until the ASG maxSize is reached and CA cancels the scale-up.





**How to reproduce it (as minimally and precisely as possible)**:  
1. Submit a pod request (with custom node selector) that triggers scale-up of a Node group (e.g. node-group-A)
2. CA scales up 1 node in node-group-A
3. Scheduler fails to schedule the pod in the node as Node selector constraint was not satisfied and keeps pod in pending stage
4. Ensure that pod remains Pending until a new node is added, and the new node takes >5 min to register.
5. During next scan, CA scales up another node in  node-group-A resulting in a continuous scale up loop that lasts several hours. 



**Anything else we need to know?**:
Other Findings:
1. The scale-up loop for our unschedulable pod takes ~5 min to complete (three 90 s scheduler retries at 30 s intervals).
2. Every time CA attempts to satisfy the pod, it increments CloudProviderTarget on node-group “nodes-4…”, leaving Ready < Target, so CA holds deletes.
3. Scale-down cooldown timers (--scale-down-unneeded-time, --scale-down-delay-after-add, etc.) have all elapsed, but CA’s group-level state remains “ScaleUp: InProgress,” so ScaleDown stays forbidden.
4. There is no current flag to allow scale-down while a scale-up is still pending.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Scale-down is blocked indefinitely when a scale-up remains “InProgress” for a node group #8258

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Scale-down is blocked indefinitely when a scale-up remains “InProgress” for a node group #8258

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions