Skip to content

Commit 6182275

Browse files
committed
Fix exhausted nodes not being retried when hard failure fills threshold
When all nodes are excluded/exhausted and a hard failure (not ResourceExhausted) is the last error that fills the threshold, the exhausted nodes were never cleared for retry. This fix checks and clears the exhausted pool before returning 'no nodes available', allowing capacity-exhausted nodes to be retried as intended.
1 parent c8b6882 commit 6182275

1 file changed

Lines changed: 6 additions & 1 deletion

File tree

packages/api/internal/orchestrator/placement/placement.go

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,12 @@ func PlaceSandbox(ctx context.Context, algorithm Algorithm, clusterNodes []*node
6262
skip[id] = struct{}{}
6363
}
6464
if len(skip) >= len(clusterNodes) {
65-
return nil, errors.New("no nodes available")
65+
if len(nodesExhausted) > 0 {
66+
clear(nodesExhausted)
67+
attempt++
68+
} else {
69+
return nil, errors.New("no nodes available")
70+
}
6671
}
6772

6873
node, err = algorithm.chooseNode(ctx, clusterNodes, skip, nodemanager.SandboxResources{CPUs: sbxRequest.GetSandbox().GetVcpu(), MiBMemory: sbxRequest.GetSandbox().GetRamMb()}, buildMachineInfo, labelFilteringEnabled, requiredLabels, affinityScores...)

0 commit comments

Comments
 (0)