Kubernetes: request all suitable GPUs #3259

un-def · 2025-11-05T14:59:07Z

Previously, KubernetesCompute only used GPU from the first offer to set node affinity, and if that type of GPU was not available (e.g., another job or even some non-dstack pod had already taken it), the job eventually failed with FAILED_TO_START_DUE_TO_NO_CAPACITY, even if there were other GPUs matching the run spec requirements.

Now, we inspect all nodes to request all suitable GPUs (any of).

In addition, we now use upper bounds of Ranges (CPU, memory, disk) as limits except for GPU, which cannot have request =/= limit (as it cannot be overcommited).

Part-of: #3126

Previously, KubernetesCompute only used GPU from the first offer to set node affinity, and if that type of GPU was not available (e.g., another job or even some non-dstack pod had already taken it), the job eventually failed with FAILED_TO_START_DUE_TO_NO_CAPACITY, even if there were other GPUs matching the run spec requirements. Now, we inspect all nodes to request all suitable GPUs (any of). In addition, we now use upper bounds of Ranges (CPU, memory, disk) as limits except for GPU, which cannot have request =/= limit (as it cannot be overcommited). Part-of: #3126

un-def requested review from jvstme and r4victor November 5, 2025 15:05

r4victor approved these changes Nov 6, 2025

View reviewed changes

un-def merged commit ea555f3 into master Nov 6, 2025
28 checks passed

un-def deleted the issue_3126_k8s_request_all_gpu_models branch November 6, 2025 07:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Kubernetes: request all suitable GPUs #3259

Kubernetes: request all suitable GPUs #3259

Uh oh!

un-def commented Nov 5, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Kubernetes: request all suitable GPUs #3259

Kubernetes: request all suitable GPUs #3259

Uh oh!

Conversation

un-def commented Nov 5, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants