Skip to content

Add replicas and PodDisruptionBudget to base/manager for webhook HA #91

@scotwells

Description

@scotwells

Problem

The compute manager serves both controller reconciliation and admission webhooks from the same Deployment. With a single replica, any pod restart creates a gap where webhook requests fail — resources cannot be created or updated until the pod is Ready again (minimum 5s due to initialDelaySeconds: 5 on the readiness probe).

Proposed Solution

In base/manager:

  • Set replicas: 2 on the Deployment (the --leader-elect flag already handles multi-replica safely for the controller side; the webhook is stateless)
  • Add a PodDisruptionBudget with minAvailable: 1 to prevent both replicas from being evicted simultaneously during node maintenance

The cert-manager CSI driver approach is compatible with multiple replicas — each pod gets its own certificate from the same ClusterIssuer, so any pod's cert is trusted by the webhook configurations.

Consumers that want a single replica for cost reasons (e.g., staging) can override with a patch in their own overlay.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions