Skip to content

orchestrator-kubernetes: suppress minDomains when availability_zones is set#36516

Merged
jubrad merged 3 commits into
MaterializeInc:mainfrom
jubrad:jubrad/fix-min-domains-az-pinning
May 12, 2026
Merged

orchestrator-kubernetes: suppress minDomains when availability_zones is set#36516
jubrad merged 3 commits into
MaterializeInc:mainfrom
jubrad:jubrad/fix-min-domains-az-pinning

Conversation

@jubrad
Copy link
Copy Markdown
Contributor

@jubrad jubrad commented May 11, 2026

Problem

When availability_zones pins pods to specific AZs via node affinity, minDomains was still being applied to the topology spread constraint. minDomains tells the scheduler "there must be at least N eligible topology domains." If minDomains exceeds the number of pinned zones, Kubernetes treats the global minimum as 0 — and with maxSkew=1, only one pod can be scheduled at a time, leaving additional replicas stuck pending indefinitely.

Fixes CLO-74.

Solution

Suppress minDomains in the TopologySpreadConstraint when availability_zones is set, mirroring the existing behavior for soft spread constraints. A warning is logged when min_domains is configured but will be ignored.

Testing

See follow-up commit for unit test.

jubrad and others added 3 commits May 11, 2026 15:42
…is set

When `availability_zones` pins pods to specific AZs via node affinity,
the number of eligible topology domains is constrained. If `minDomains`
exceeds the number of pinned zones, Kubernetes treats the global minimum
as 0, and with `maxSkew=1` only one pod can be scheduled — leaving
additional replicas stuck pending.

Fix by suppressing `minDomains` in the topology spread constraint
whenever `availability_zones` is set, matching the existing behavior for
soft spread constraints.

Fixes CLO-74

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Extract topology_spread_min_domains helper and add unit test covering
the four suppression cases: soft spread, az-pinned, both, and neither.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@jubrad jubrad marked this pull request as ready for review May 12, 2026 02:25
@jubrad jubrad requested a review from a team as a code owner May 12, 2026 02:25
@jubrad jubrad requested a review from rjimeno-mz May 12, 2026 02:25
},
min_domains: topology_spread_min_domains(
config.soft,
availability_zones.is_some(),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is fine for clusters of our size, but if we were to pin to say 4 AZs, then having minDomains as 2 or 3 is totally reasonable.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally! It's possible this should just be a var on environmentd to make it more configurable.

@jubrad jubrad merged commit 41d1ef9 into MaterializeInc:main May 12, 2026
118 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants