Problem
The compute manager serves both controller reconciliation and admission webhooks from the same Deployment. With a single replica, any pod restart creates a gap where webhook requests fail — resources cannot be created or updated until the pod is Ready again (minimum 5s due to initialDelaySeconds: 5 on the readiness probe).
Proposed Solution
In base/manager:
- Set
replicas: 2 on the Deployment (the --leader-elect flag already handles multi-replica safely for the controller side; the webhook is stateless)
- Add a
PodDisruptionBudget with minAvailable: 1 to prevent both replicas from being evicted simultaneously during node maintenance
The cert-manager CSI driver approach is compatible with multiple replicas — each pod gets its own certificate from the same ClusterIssuer, so any pod's cert is trusted by the webhook configurations.
Consumers that want a single replica for cost reasons (e.g., staging) can override with a patch in their own overlay.
Problem
The compute manager serves both controller reconciliation and admission webhooks from the same Deployment. With a single replica, any pod restart creates a gap where webhook requests fail — resources cannot be created or updated until the pod is Ready again (minimum 5s due to
initialDelaySeconds: 5on the readiness probe).Proposed Solution
In
base/manager:replicas: 2on the Deployment (the--leader-electflag already handles multi-replica safely for the controller side; the webhook is stateless)PodDisruptionBudgetwithminAvailable: 1to prevent both replicas from being evicted simultaneously during node maintenanceThe cert-manager CSI driver approach is compatible with multiple replicas — each pod gets its own certificate from the same ClusterIssuer, so any pod's cert is trusted by the webhook configurations.
Consumers that want a single replica for cost reasons (e.g., staging) can override with a patch in their own overlay.