Skip to content

feat: webhook HA, security hardening, and RBAC naming cleanup#92

Merged
scotwells merged 4 commits into
mainfrom
feat/webhook-ha
May 1, 2026
Merged

feat: webhook HA, security hardening, and RBAC naming cleanup#92
scotwells merged 4 commits into
mainfrom
feat/webhook-ha

Conversation

@scotwells
Copy link
Copy Markdown
Contributor

@scotwells scotwells commented Apr 30, 2026

Summary

  • Renames the manager Deployment from compute to compute-manager to avoid ambiguity with the broader system name
  • Adds a new config/components/high-availability kustomize component with:
    • replicas: 2 — eliminates the single-replica webhook availability gap on pod restart
    • PodDisruptionBudget (minAvailable: 1) — prevents simultaneous eviction of both pods during node maintenance
    • topologySpreadConstraints (kubernetes.io/hostname, DoNotSchedule) — ensures the two replicas land on different nodes so a node failure or drain cannot take down both pods at once
  • Wires the high-availability component into config/overlays/single-cluster
  • Hardens the manager security context to the Kubernetes restricted pod security standard:
    • seccompProfile: RuntimeDefault at the pod level
    • runAsUser/runAsGroup: 65532 pinned to match the distroless nonroot image USER
    • readOnlyRootFilesystem: true at the container level
  • Prefixes all RBAC ClusterRole and ClusterRoleBinding names with compute- to prevent collisions with other controllers in the same cluster

Consumers that want a single replica (e.g. staging) simply omit the high-availability component from their overlay.

Closes #91

Test plan

  • kustomize build config/overlays/single-cluster/ produces a Deployment named compute-manager with replicas: 2, a PodDisruptionBudget with minAvailable: 1, and topologySpreadConstraints on kubernetes.io/hostname
  • kustomize build config/base/manager/ still builds cleanly with replicas: 1 and no PDB
  • All ClusterRole and ClusterRoleBinding names in the build output are prefixed with compute-
  • Deploy to a test cluster and verify both pods become Ready and the webhook remains available during a kubectl rollout restart
  • Verify pods are scheduled on separate nodes

🤖 Generated with Claude Code

scotwells and others added 3 commits April 30, 2026 16:16
…oyment

Adds a `high-availability` kustomize component that sets replicas to 2
and adds a PodDisruptionBudget (minAvailable: 1) to prevent webhook
downtime during node maintenance or pod restarts.

Also renames the manager Deployment from `compute` to `compute-manager`
for clarity — the old name was ambiguous given the broader system name.

Closes #91

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Ensures the two replicas land on different nodes so a single node
failure or drain cannot take down both webhook pods simultaneously.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ndard

Adds seccompProfile RuntimeDefault (pod-level), pins runAsUser/runAsGroup
to 65532 to match the distroless nonroot image USER, and sets
readOnlyRootFilesystem on the container.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@scotwells scotwells changed the title feat: add high-availability component for webhook HA feat: webhook HA, deployment rename, and security context hardening Apr 30, 2026
@scotwells scotwells requested review from a team, kevwilliams and privateip April 30, 2026 21:28
…ute-

Ensures role names are unique to this service and won't collide with
other controllers installed in the same cluster.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@scotwells scotwells changed the title feat: webhook HA, deployment rename, and security context hardening feat: webhook HA, security hardening, and RBAC naming cleanup Apr 30, 2026
@scotwells scotwells merged commit ed687eb into main May 1, 2026
9 checks passed
@scotwells scotwells deleted the feat/webhook-ha branch May 1, 2026 16:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add replicas and PodDisruptionBudget to base/manager for webhook HA

2 participants