You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The federation e2e harness introduced with the distributed-scheduling work runs the operators locally on the host (go run ./cmd/main.go, two processes pointed at the Kind clusters via kubeconfig) rather than deploying them into the cell clusters as Deployments.
That means the e2e suite does not exercise the production deployment surface:
RBAC — operators run with the kubeconfig's admin credentials, not their ServiceAccount Roles, so a missing/incorrect grant passes e2e but breaks in production (this has bitten us before — hub RBAC gaps silently killing edge mounts; cluster-wide informers needing RBAC and then OOM-ing).
Manifests — the config/overlays Deployments, resource limits, env wiring, and cell vs management overlays are never applied.
Image / entrypoint, in-cluster DNS/networking, and leader election are all skipped.
Only a pop-dfw cell operator runs (no pop-ord), and --leader-elect=false.
So the harness validates federation control logic but not deployability — the gap most likely to ship a broken release.
Decision
Defer the e2e work and rebuild it properly: an in-cluster harness that deploys the operators to the cell clusters via the real overlays (kustomize build → load image → kubectl apply), exercises RBAC + manifests + image, runs operators on both POP cells, and then runs the chainsaw suites against that.
The local-run foundation and the existing suites have been removed from the federation PR stack to cut its size; unit-test coverage is unaffected.
Preserved work (to restore/adapt)
archive/e2e-local-deferred branch — the full stack with the e2e foundation + refdata e2e suites.
Problem
The federation e2e harness introduced with the distributed-scheduling work runs the operators locally on the host (
go run ./cmd/main.go, two processes pointed at the Kind clusters via kubeconfig) rather than deploying them into the cell clusters as Deployments.That means the e2e suite does not exercise the production deployment surface:
config/overlaysDeployments, resource limits, env wiring, and cell vs management overlays are never applied.pop-dfwcell operator runs (nopop-ord), and--leader-elect=false.So the harness validates federation control logic but not deployability — the gap most likely to ship a broken release.
Decision
Defer the e2e work and rebuild it properly: an in-cluster harness that deploys the operators to the cell clusters via the real overlays (
kustomize build→ load image →kubectl apply), exercises RBAC + manifests + image, runs operators on both POP cells, and then runs the chainsaw suites against that.The local-run foundation and the existing suites have been removed from the federation PR stack to cut its size; unit-test coverage is unaffected.
Preserved work (to restore/adapt)
archive/e2e-local-deferredbranch — the full stack with the e2e foundation + refdata e2e suites.split/federation-e2eretained.referenced-data-mounts,referenced-data-delete-cascade) live on the archive branch.Scope when picked up