Skip to content

fix(mgmt): fail loud on missing federation kubeconfig; rename federation client#120

Merged
scotwells merged 1 commit into
feat/federated-deployment-schedulingfrom
fix/mgmt-controller-fail-loud
May 29, 2026
Merged

fix(mgmt): fail loud on missing federation kubeconfig; rename federation client#120
scotwells merged 1 commit into
feat/federated-deployment-schedulingfrom
fix/mgmt-controller-fail-loud

Conversation

@scotwells
Copy link
Copy Markdown
Contributor

@scotwells scotwells commented May 29, 2026

Summary

Management controllers now fail fast at startup — with a clear error and immediate exit — when --enable-management-controllers is set but --federation-kubeconfig is not. Previously, WorkloadDeploymentFederator and InstanceProjector were silently skipped, so federation and instance-status projection were invisibly broken: workloads appeared to schedule but never actually federated to edge cells, with no log signal and no alert. This is the same fail-open-silent pattern as the quota P1 (fixed in #118).

Why it matters

An operator who sets --enable-management-controllers=true but mis-wires the federation kubeconfig env var gets a hard failure at pod startup — visible in pod logs, surfaced immediately in a rollout — rather than a degraded system that silently does nothing for hours or until a workload is traced end-to-end.

What changed

Fail-loud guard (cmd/main.go): immediately after kubeconfig loading, if --enable-management-controllers is true and --federation-kubeconfig was not provided, the process logs "management controllers enabled but no federation kubeconfig configured" with hint: set --federation-kubeconfig and exits 1. The guard fires before any manager or controller setup, so the failure is instant and unambiguous. The --enable-cell-controllers path is unaffected.

Federation client rename: the Karmada client was named UpstreamClient / --upstream-kubeconfig / UPSTREAM_KUBECONFIG — all directional names that only make sense from one vantage point. These are renamed to FederationClient / --federation-kubeconfig / FEDERATION_KUBECONFIG across cmd/main.go, the three controller structs (WorkloadDeploymentFederator, InstanceProjector, InstanceReconciler), their tests, and the kustomize base manifests. The setupManagementControllers helper introduced in #118 is updated in place. All comments describing the Karmada plane as the "downstream control plane" are corrected.

Deployment coordination required: once this image is deployed, management-plane and edge/lab deployments must supply FEDERATION_KUBECONFIG (not UPSTREAM_KUBECONFIG). Paired with infra #2622 (management plane) and #2623 (edge/lab), which already set FEDERATION_KUBECONFIG.

Test plan

  • go build ./... — clean
  • go vet ./... — clean
  • make test — all tests pass (controller, config, validation, stateful instancecontrol)
  • make lint (golangci-lint v2.12.2) — 0 issues
  • Rebased onto c1c6261 (post-fix(quota): Enforce and harden project quota for edge-cell Instances #118); staticcheck and gocyclo issues from the original base are resolved
  • Smoke: start with --enable-management-controllers and no --federation-kubeconfig → expect immediate exit 1 with clear error in logs
  • Smoke: start with both flags → expect normal startup; federation and projection operate as before

🤖 Generated with Claude Code

…Client

Management controllers (WorkloadDeploymentFederator, InstanceProjector) now
refuse to start when --enable-management-controllers is set but
--federation-kubeconfig is omitted, logging a clear error and exiting 1.
Previously the controllers were silently skipped — the same fail-open-silent
class as the quota P1 issue — leaving federation and instance projection
broken with no operator-visible signal.

Alongside the fail-loud guard, rename the Karmada/federation client
identifiers to a neutral "federation" framing (FederationClient,
federationRestConfig, --federation-kubeconfig / FEDERATION_KUBECONFIG) across
all three controllers, cmd/main.go, and the kustomize base manifests. The
previous --upstream-kubeconfig flag is removed; deployments must migrate to
--federation-kubeconfig. Update all comments to match.

Coordination note: once this artifact is deployed, management-plane and
edge/lab deployments must set FEDERATION_KUBECONFIG (infra PRs in parallel).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@scotwells scotwells force-pushed the fix/mgmt-controller-fail-loud branch from cfddb1d to 6ae41d4 Compare May 29, 2026 02:15
@scotwells scotwells merged commit 553af62 into feat/federated-deployment-scheduling May 29, 2026
8 checks passed
@scotwells scotwells deleted the fix/mgmt-controller-fail-loud branch May 29, 2026 02:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant