Skip to content

orchestratord: version-gate pod security context UID/GID for distroless#36101

Draft
jasonhernandez wants to merge 1 commit into
mainfrom
jason/distroless-orchestratord-uid
Draft

orchestratord: version-gate pod security context UID/GID for distroless#36101
jasonhernandez wants to merge 1 commit into
mainfrom
jason/distroless-orchestratord-uid

Conversation

@jasonhernandez
Copy link
Copy Markdown
Contributor

Summary

Distroless images run as nonroot (UID 65534) instead of root. Version-gate the pod security context in orchestratord so it sets the correct runAsUser/runAsGroup based on Materialize version, avoiding UID mismatches during rolling upgrades.

Part of the distroless migration, split from #35859.

Test plan

  • cargo test -p mz-orchestratord passes
  • Pods created for new versions get nonroot UID/GID
  • Pods created for old versions keep root UID/GID

🤖 Generated with Claude Code

@github-actions
Copy link
Copy Markdown
Contributor

Thanks for opening this PR! Here are a few tips to help make the review process smooth for everyone.

PR title guidelines

  • Use imperative mood: "Fix X" not "Fixed X" or "Fixes X"
  • Be specific: "Fix panic in catalog sync when controller restarts" not "Fix bug" or "Update catalog code"
  • Prefix with area if helpful: compute: , storage: , adapter: , sql:

Pre-merge checklist

  • The PR title is descriptive and will make sense in the git log.
  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).

Distroless images run as nonroot (UID 65534) instead of root. Add
version-gating so orchestratord sets the correct runAsUser/runAsGroup
based on the Materialize version, avoiding UID mismatches during
rolling upgrades from Debian-based to distroless images.

Gate versions (verified against release history, 2026-06):
- balancerd: V26_18_0. Its ci/Dockerfile switched to distroless-prod-base
  in v26.18.0 (prod-base in v26.17.x). The original V26_19_0 was off by
  one and would have forced UID 999 onto v26.18.x balancerd pods that
  actually run as 65534.
- environmentd/clusterd: V26_28_0, matching the release that ships their
  distroless migration (#36099). The original V26_20_0 predated the actual
  landing by ~8 releases (main is now 26.28-dev) and would have applied
  UID 65534 to v26.20-v26.27 images that still run as UID 999.

NOTE: the env/clusterd gate assumes #36099 lands in the 26.28 cycle. If it
slips, bump V26_28_0 to the actual release. The three distroless PRs
(#36099 image, #36100 SIGTERM, #36101 this) must ship in the same release.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@jasonhernandez
Copy link
Copy Markdown
Contributor Author

🔗 Distroless migration — coordination note

One of three PRs that must ship together in the same release: #36099 (distroless image), #36100 (SIGTERM handler), #36101 (this, orchestratord UID/GID gating).

⚠️ Version gates corrected during rebase (they had gone stale — verified against release tags 2026-06):

Service Old gate Corrected Why
balancerd V26_19_0 V26_18_0 ci/Dockerfile switched to distroless-prod-base in v26.18.0 (prod-base in v26.17.x). The old gate was off-by-one → would force UID 999 onto v26.18.x balancerd pods that run as 65534.
environmentd/clusterd V26_20_0 V26_28_0 v26.20 shipped ~8 releases ago; env/clusterd only become distroless when #36099 lands (main is 26.28-dev). Old gate → UID 65534 applied to v26.20–v26.27 images that still run as 999.

Action required before merge: the env/clusterd gate V26_28_0 assumes #36099 lands in the 26.28 cycle. If it slips, bump this constant to the actual landing release.

Possible pre-existing issue (separate from this PR): balancerd has been distroless (uid 65534) since v26.18.0, but this orchestrator gating is still unmerged — so v26.18–v26.27 may have a live UID mismatch unless enable_security_context is disabled in practice. Worth a separate look.

Rebased onto current main; cargo check -p mz-orchestratord green and test_pod_uid_gid passes with the corrected boundary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant