Skip to content

feat: Deliver referenced ConfigMaps and Secrets to instances#129

Draft
scotwells wants to merge 64 commits into
feat/federated-deployment-schedulingfrom
feat/configmap-secret-mounts-federated
Draft

feat: Deliver referenced ConfigMaps and Secrets to instances#129
scotwells wants to merge 64 commits into
feat/federated-deployment-schedulingfrom
feat/configmap-secret-mounts-federated

Conversation

@scotwells
Copy link
Copy Markdown
Contributor

@scotwells scotwells commented Jun 1, 2026

Summary

Workloads can now reference ConfigMaps and Secrets and have that data delivered to their instances — in every POP cell the workload runs in — without the user knowing federation exists. Previously a Workload could only set literal environment variables; configuration and credentials had to be baked into images or pasted in as plaintext.

The referenced data is resolved in the trusted management plane and delivered to the edge as derived companion objects, so secret values never enter the Workload or Instance spec and are never projected back to the user.

What this enables

  • Reference a ConfigMap (app config) and Secret (credentials) by name from the Workload template; the platform makes them available to every instance, in every placed cell.
  • Both env-var injection (key references) and file mounts are supported via the runtime's native Pod-spec consumption.

How it works

A management-plane resolver collects the referenced objects, materializes one labeled companion per source into the project's federation namespace, and records the expected set on the WorkloadDeployment. The federator's PropagationPolicy carries the companions to the same cells as the deployment, and a ReferencedData scheduling gate holds each instance until its companions land. Admission verifies the submitting user can read the referenced objects.

The whole path is behind the enableReferencedDataGate feature flag (default off), so behavior is unchanged until it's enabled fleet-wide after the cell and provider are deployed.

Deployment requirement: Karmada-hub RBAC

Delivery depends on the management-plane controller being able to read the referenced objects and write companions in the project's federation namespace on the Karmada hub. This requires the compute-manager ClusterRole on the hub (config/base/downstream-rbac/rbac.yaml) to grant configmaps/secrets access — added in this PR.

This is an easy gap to miss because it fails silently: the resolver returns forbidden on every reconcile, no companions are ever materialized, and mounts simply do nothing on the edge while the Kustomization reports Ready and the controller looks healthy. When promoting to a new environment, confirm this RBAC lands on the hub before enabling the feature.

Test plan

  • Unit + envtest coverage (resolver, validation, cell gate-clearing, federation routing), including concurrency, optional-source, and rolling-update regression tests
  • Federated e2e green on Kind + Karmada (companions reach the hub and the cell; gate clears; ReferencedDataReady=True)
  • Two rounds of code review; all blockers fixed and re-verified
  • Verified end-to-end on the edge lab: a pristine workload referencing a brand-new ConfigMap + Secret materializes companions, propagates them to the cell with matching contents, clears the gate, and the instance comes up Running/Ready with the mounts in its pod spec

Stacked on feat/federated-deployment-scheduling; merge after it. Consumer-side mounting lands in the unikraft-provider PR.

scotwells and others added 22 commits May 31, 2026 15:37
Engineering breakdown of the five workstreams, the reconciled cross-workstream
contracts, the phased critical path, the settled decisions, and a concrete
end-to-end testing-environment design.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Phase 0 of the cross-plane referenced-data delivery design: the API surface,
validation, and the management-plane building blocks every later phase compiles
against.

- API: EnvFrom on SandboxContainer; ReferencedDataReady condition + reasons;
  ReferencedData scheduling gate; referenced-data label and
  expected-referenced-data/restartedAt annotations.
- Validation: secret-volume validation, ConfigMap/Secret items key->path
  projection, file-mode range and path safety, EnvFrom validation, and an
  admission SubjectAccessReview that the submitter can read each referenced
  ConfigMap/Secret.
- internal/referenceddata seam: reference collector, deterministic companion
  naming, and the scoped ProjectConfigSecretReader (+ single-cluster variant).
- Feature-flagged gate insertion (EnableReferencedDataGate, default off) so
  behavior is unchanged until the delivery and consumption phases are deployed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Phase 1: the management-plane ReferencedDataController. For each
WorkloadDeployment it collects referenced ConfigMaps/Secrets, reads them with a
scoped reader, enforces per-object/aggregate size limits, materializes one
shared companion per source as a local copy, records the expected companion set
on the deployment, reference-counts across deployments with finalizer cleanup,
and surfaces resolution failures on ReferencedDataReady. Source watches refresh
companions on rotation.

Delivery is abstracted behind a companionWriter seam; the federated writer and
PropagationPolicy extension are deferred to Phase 1b (federation merge). Sizes
are configurable (referencedData.perObjectLimitBytes / aggregateLimitBytes).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Phase 2: the cell-side InstanceReconciler now resolves the referenced-data
scheduling gate. For a gated Instance it reads the expected companion set from
the owning WorkloadDeployment, lists the labeled companions in the namespace,
and clears the gate once they are all present — surfacing ReferencedDataReady
(Resolving/AwaitingPropagation/Ready) with the missing set on the message. A
companion watch re-checks waiting Instances, with a requeue safety net.

Adds Kubernetes Events on transitions and compute_referenced_data_* metrics
(companions present/expected, gate-wait duration, condition transitions), and
counts referenced-data-blocked replicas in the deployment status.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Fix type mismatches and test setup issues that arose when landing the
referenced-data commits onto the federated-deployment-scheduling base:

- Use multicluster.ClusterName (not string) in Watches handler signatures
  for referenceddata_controller.go and instance_controller.go companion
  watches; update enqueueWDsForSource to accept multicluster.ClusterName
- Cast req.ClusterName to string when passing to resolveAndValidateSources
- Replace New(Options{}) with NewWithOptions(Options{}) in stateful_control_test.go
  after re-introducing the no-arg New() on the combined Options struct
- Initialize r.finalizers in newRefDataReconciler test helper; add
  instanceControllerFinalizer to test instance fixtures so the finalizer
  framework does not short-circuit reconcile on first pass
- Update TestReferencedDataEventEmittedOnClear to expect single-pass
  gate-clearing (federation branch clears both status and gate in same pass)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Phase 1b: the resolver now has a federated companion writer. When a federation
client is configured it materializes companions into the project's downstream
namespace (ns-{project-uid}) on the Karmada hub via the same
MappedNamespaceResourceStrategy the federator uses; with no federation client it
falls back to the single-cluster local copy.

The federator's PropagationPolicy now selects the referenced-data-labeled
ConfigMaps and Secrets alongside the WorkloadDeployment, so companions
co-propagate to the same cells, and it forwards the expected-referenced-data
annotation to the downstream deployment so the cell can clear the gate.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The ReferencedDataController was registered unconditionally, so it also started
in the cell operator and collided with the cell's WorkloadDeploymentReconciler
(both default their controller name to "workloaddeployment"). Gate it to the
management controller set and give it a distinct name. Found by running the
operators in the multi-cluster e2e environment.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ss fixes

Adds the Chainsaw e2e scenario that validates the full cross-plane delivery
chain for ConfigMap and Secret mounts (Hops 1-5): source objects on the
control-plane → companions materialised in ns-{uid} on the Karmada hub →
propagated to the pop-dfw cell → Instance created with the ReferencedData
scheduling gate → gate cleared and ReferencedDataReady=True set.

Test assertion bug fixed: companions live on the Karmada hub and cell, NOT
on the control-plane. The two "assert-companion-*-on-control-plane" steps
now target cluster:downstream instead.

PropagationPolicy assertion updated to include all three resource selectors
(WorkloadDeployment + ConfigMap + Secret with namespace field) to satisfy
Chainsaw's exact array-length check.

Harness fixes committed alongside:
- _e2e:karmada:build-kubeconfig: cp karmada.yaml → downstream.yaml so
  cluster:downstream steps work after task e2e:up.
- e2e:operator:start: --karmada-kubeconfig → --federation-kubeconfig (correct
  flag name) for both operators; add --enable-management-controllers=true to
  the management operator invocation.
- e2e:crds:install: new _e2e:crds:quota step installs Milo quota CRDs from
  the module cache to all clusters and the Karmada API server.
- e2e:operator:start:referenced-data: new task that starts both operators
  with --server-config=hack/e2e/operator-config-referenced-data.yaml
  (featureFlags.enableReferencedDataGate:true).
- hack/e2e/operator-config-referenced-data.yaml: server config with the
  feature flag enabled (referenced by README and the new task).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The WorkloadDeploymentFederator and ReferencedDataController both call
Update() on the same WorkloadDeployment to add/remove their respective
finalizers, producing optimistic-lock conflicts under concurrent
reconciliation. The companions DO converge, but the errors are noisy.

Wrap all three finalizer mutation sites (add on first reconcile, remove
on deletion, remove on empty-refs) in retry.RetryOnConflict so that a
concurrent federator update causes a transparent re-read + retry rather
than a logged error. Each retry re-GETs the latest object version before
re-applying only the single finalizer change.

Adds two unit tests using interceptor.Funcs to inject a conflict on the
first Update and assert that the finalizer is correctly applied/removed
on retry.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…cret-mounts

Fixes six issues identified in code review of the federated configmap/secret
mounts feature:

1. [BLOCKER] ValidateUpdate was a no-op stub: an attacker could create a
   Workload with no refs then edit in refs to ConfigMaps/Secrets they cannot
   read. Fixed by implementing ValidateUpdate to run the same full template
   validation (including SAR) as ValidateCreate on the new object.

2. [MAJOR] SAR for referenced-data was missing UserInfo.Extra, unlike the
   sibling Network SAR. Fixed by populating Extra from AdmissionRequest.

3. [MAJOR] Collector still collected both refs from an envFrom entry that had
   both configMapRef and secretRef set, even though validateEnvFrom forbids it.
   Fixed by short-circuiting such entries before collection in CollectFromTemplate.

4. [MAJOR] validateReferencedDataAccess maintained a hand-rolled traversal
   (collectRefsFromSpec) parallel to referenceddata.CollectFromTemplate, with
   drift risk. Fixed by adding CollectFromSpec to collector.go and making the
   validation call it as the single source of truth. No import cycle (validation
   -> referenceddata, not the reverse).

5. [MINOR] names.go: after TrimRight("-.", ...) the truncated segment could be
   empty (e.g. source name is all '-' or '.'), producing "<prefix>.-<hash>"
   which is invalid DNS-1123 (segment starts with '-'). Fixed by emitting
   "<prefix>.<hash>" when truncated is empty.

6. [MINOR] Tests added: both-refs-set envFrom rejection + no SAR for invalid
   entry; SAR InternalError/fail-closed path; ValidateUpdate SAR path;
   names edge cases (all-dashes, all-dots, name ending on dot, valid prefix).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Fixes 10 issues found in the code review of the configmap/secret mounts
federation feature.

**Blockers:**

Fix 1 (federator status sync clobbers resolver conditions): syncStatusFromDownstream
now merges downstream status into the project WD while preserving the
resolver-owned ReferencedDataReady condition, rather than replacing .Status
wholesale. Wrapped in RetryOnConflict for concurrent-write safety.

Fix 2 (ref-count annotation race): materialiseOne and releaseOneCompanion now
wrap their read-modify-write of the ref-count annotation in
retry.RetryOnConflict loops so that two WDs reconciling concurrently against
the same companion cannot drop each other's ref entries.

**Majors:**

Fix 3 (optional source escape hatch): resolveAndValidateSources accepts the WD
template spec and uses a new isOptionalRef helper to determine per-source
optionality. NotFound and TooLarge errors for optional sources are silently
skipped; the WD proceeds with the remaining required companions.

Fix 4 (unparseable ref-count): decodeRefCount now returns ([]string, error).
An unmarshal failure propagates as a transient error; release paths return
the error rather than computing an empty remaining set that would incorrectly
delete the companion.

Fix 5 (nil-source panic): resolveOneSource nil-guards the reader's return
value and maps a (nil, nil) return to a SourceNotFound condition error.

Fix 6 (non-conflict-tolerant writes): the expected-set annotation Patch and
Status().Update are each wrapped in retry.RetryOnConflict; residual conflicts
after retries return (Result{}, nil) to requeue cleanly.

Fix 7 (annotation/label overwrite): ApplyConfigMap and ApplySecret in both
localCompanionWriter and downstreamCompanionWriter now use mergeLabels /
mergeAnnotations helpers that only upsert controller-owned keys, preserving
third-party entries such as Karmada bookkeeping annotations.

**Minors:**

Fix 8 (unused scheme field): ProjectReader.scheme field and the second
parameter of NewProjectReader removed; client.Options{} is sufficient.

Fix 9 (misleading localCompanionWriter comment): clarified that
localCompanionWriter is reachable only when FederationClient is nil, which
management-plane federation wiring never does on this branch.

Fix 10 (literal "v1" APIVersion): PropagationPolicy ConfigMap and Secret
selectors now use corev1.SchemeGroupVersion.String() instead of the literal
string "v1".

**Regression tests added:**
- Two WDs sharing a source, interleaved reconciles → both ref-count entries preserved
- Optional missing source → skipped, WD not failed
- Unparseable ref-count annotation → companion NOT deleted
- Federator status sync preserves ReferencedDataReady condition

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…etrics

- [BLOCKER] Restore ObservedGeneration guard on gate clearing: both the
  quota and referenced-data gate removals in reconcileSchedulingGates now
  check cond.ObservedGeneration == instance.Generation, preventing a stale
  True condition from generation N clearing a freshly-stamped gate at N+1.

- [MAJOR] Suppress AwaitingPropagation event flood: the Warning event is
  now emitted only on a reason transition (prev != AwaitingPropagation),
  not on every reconcile where the missing-set message changes.

- [MAJOR] Delete dead removeQuotaSchedulingGate: its generation guard has
  been ported to reconcileSchedulingGates; the standalone function is gone.

- [MAJOR] Drop high-cardinality instance label from CompanionsPresent and
  CompanionsExpected metrics: both gauges now aggregate per namespace only,
  eliminating the unbounded per-instance series set.

- [MINOR] Delete drainEvents test helper (never called).

- [MINOR] Remove instanceByWorkloadDeploymentUIDIndex: the constant, its
  IndexField registration, and the index func are all deleted; the companion
  watch enqueue path already lists the full namespace and never queried it.
  Updated addInstanceIndexers doc-comment accordingly.

- [MINOR] Route checkForNetworkCreationFailure through
  fetchOwnerWorkloadDeployment to avoid a duplicate WD Get; fix the stale
  doc-comment on enqueueInstancesInNamespace.

- [MINOR] Fix no-gate self-heal: the branch that flips a non-True
  ReferencedDataReady condition to True when the gate is absent now
  validates that all expected companions are actually present before
  marking Ready, avoiding a false-Ready status when the gate was stripped
  out-of-band while companions were still missing.

- ADD TEST: TestReferencedDataStaleConditionGuard verifies that a stale
  True condition at ObservedGeneration N does not clear a gate at generation
  N+1; only the second reconcile (after condition is re-evaluated at N+1)
  removes the gate.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The companion ref-count (companionRefCountAnnotation) read-modify-write
was non-atomic: materialiseOne fetched the companion (R1), computed the
updated ref-count, built a desired object carrying that ref-count, then
called ApplyConfigMap which issued a second independent GET (R2) before
calling Update. mergeAnnotations then overwrote the referenced-by key
on R2 with the R1-derived value. If a concurrent WD had committed its
ref-count entry between R1 and R2, R2 already contained that entry, but
the merge silently discarded it. Because Apply's own GET+Update targeted
R2, no conflict was raised and RetryOnConflict never fired.

Fix (approach a): remove the second internal GET from ApplyConfigMap /
ApplySecret. Both methods now accept an `existing` argument — the
already-fetched object (or nil when the companion does not yet exist).
When existing is non-nil, the implementation writes directly onto it and
calls Update, targeting the same resourceVersion that the caller used to
compute the ref-count. A concurrent change between GET and Update will
now produce a Conflict that propagates out to the materialiseOne /
releaseOneCompanion RetryOnConflict loop, which re-reads and recomputes.

The same double-GET path existed in releaseOneCompanion and is fixed
with the same pattern: pass the already-fetched companion as the
`existing` argument.

Add TestReferencedData_RefCount_ConflictForcesReread, which injects a
concurrent write of a third WD's key (wd3Key) into the companion on
realCl immediately after the outer GET returns. Under the fixed code the
subsequent Update at R1 conflicts with R2, RetryOnConflict re-reads R2
(with wd3Key), and all three keys appear in the final annotation. Under
the old code Apply's internal GET would return R2 with wd3Key, but
mergeAnnotations would overwrite referenced-by with the R1-derived value
([wd1Key, wd2Key]), silently dropping wd3Key, and the Update at R2 would
succeed — the test would fail with only 2 keys instead of 3.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace manual membership loop in refCountAdd with slices.Contains.
Replace two manual map-copy loops in mergeLabels/mergeAnnotations with
maps.Copy (semantics preserved: copies wanted INTO existing, never
replaces the map).

Add ReferencedDataLabelValue = "true" to api/v1alpha/labels.go and use
it everywhere in place of the repeated string literal. Use kindSecret
and kindConfigMap constants in indexers.go and the controller instead of
repeated string literals. Extract isOptionalInVolumes and
isOptionalInContainers helpers from isOptionalRef to bring gocyclo from
32 down to within the limit. Define named constants for all repeated test
strings (rdTestWD1, rdTestBlobKey, testKindConfigMap, etc.) introduced
by the configmap/secret change set to clear the goconst and gocyclo lint
issues it brought in.

The instance_controller.go GetEventRecorderFor call retains its existing
//nolint:staticcheck comment: the replacement API (GetEventRecorder)
has an incompatible Eventf signature requiring a migration of all emit
sites, which is deferred as a separate task.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…olve

Companions were named "<lower-kind>.<source-name>" (e.g. configmap.app-config)
but Instances and Pods reference sources by their verbatim source name
(e.g. app-config). The runtime found only the prefixed companion → 404
→ mounts/env vars failed.

Option B fix: companion objects are now named exactly by their source name.
Cross-kind collision is safe (ConfigMap and Secret are distinct resource
types in the same namespace). Pod/Instance volume and envFrom references
resolve naturally without any translation layer.

Kind-disambiguation is preserved in the expected-referenced-data annotation
as "Kind/name" tokens (e.g. ["ConfigMap/app-config","Secret/app-secret"])
so the cell InstanceReconciler can verify each companion by kind+name
without probing both resource types.

Changes:
- internal/referenceddata/names.go: CompanionName drops the kind prefix;
  adds CompanionToken("Kind", "name") → "Kind/name" helper
- internal/referenceddata/names_test.go: update expected values; add
  TestCompanionName_SourceNameContract (contract test that would fail
  against the old prefixed code) and TestCompanionName_SameSourceDifferentKind
- internal/controller/referenceddata_controller.go: annotation stores
  kind-qualified tokens; releaseOneCompanion accepts explicit kind param
- internal/controller/referenceddata_controller_test.go: update assertions
  to kind-qualified tokens; fix stale-RV and same-name source/companion cases
- internal/controller/instance_controller.go: gate-clearing parses kind-
  qualified tokens via listPresentCompanionsByKindName
- internal/controller/instance_referenced_data_test.go: align to new names
  and annotation token format
- test/e2e/referenced-data-mounts/chainsaw-test.yaml: update companion name
  assertions (configmap.app-config → app-config, secret.app-secret → app-secret)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The federation base this branch builds on predates the earlier guard
(6bcd822), so reconcileInstanceGates dereferenced instance.Status.Controller
— a nilable pointer the infra provider populates independently of the
Programmed condition — while counting current replicas. An instance
reporting Programmed=True with Status.Controller still nil panicked the
reconcile before the status write, which freezes the WorkloadDeployment
status and hot-loops the reconcile. Observed panicking live in the lab.

Guard the dereference so reconciliation completes.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add a regression test that constructs an Instance with Programmed=True
but Status.Controller==nil and asserts that reconcileInstanceGates neither
panics nor counts it as a current replica. Without the nil guard introduced
in the same area the test panics with a nil-pointer dereference, confirming
the guard is load-bearing.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Spec.Controller is a nilable pointer the infra provider populates
independently of an Instance's networking readiness. reconcileInstanceGates
dereferenced instance.Spec.Controller.SchedulingGates in the network
gate-clearing path without a nil guard, so an Instance that reached
networkReady before its controller spec was populated panic-looped the
WorkloadDeployment reconcile and froze status.

This mirrors the existing Status.Controller guard. Extends the
nil-controller regression test to drive networkReady=true with a nil
Spec.Controller, the case the first guard missed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The referenced-data controller materializes companion copies of the
ConfigMaps and Secrets a Workload references into its hub namespace, where
Karmada propagates them to the cell so kraftlet can mount them. The
compute-manager ClusterRole on the Karmada hub had no core ConfigMap/Secret
rule, so every reconcile failed with a forbidden error: companions were
never written, the expected-referenced-data annotation was never stamped,
and nothing propagated to the cell. ConfigMap/Secret mounts silently did
nothing on the edge.

Grant the controller full lifecycle access (it owns the companions,
including ref-count deletion) to configmaps and secrets in its namespace.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ed before Karmada finishes GC

Part 1 — ordering guard: WorkloadDeploymentFederator.Finalize now checks
whether any companion ConfigMaps or Secrets (referenced-data=true label)
remain in the downstream namespace before calling
cleanupPropagationPolicyIfUnused. The guard only fires when this is the last
WD for its city code (mirroring cleanupPropagationPolicyIfUnused's condition),
so deleting WD-A cannot block on a live WD-B's companion in the shared
namespace. If companions are present Finalize returns an errCompanionsStillPresent
sentinel; Reconcile intercepts it (walking the kerrors.Aggregate the finalizer
framework returns), logs at Info, and sets RequeueAfter — no error-metric
inflation. After companionGuardTimeout (2 min) the guard bypasses itself so a
wedged referenced-data controller cannot permanently block deletion.

Part 2 — authoritative cell-side GC: CompanionGCReconciler watches
WorkloadDeployment events on each cell and deletes companion ConfigMaps/Secrets
whose every referenced-by entry resolves to no live WD on the local cell.
Critical fix: the referenced-by annotation is written by the hub
ReferencedDataController as "projectNamespace/wdName" (e.g.
"default/mount-pristine-default-dfw"), but the cell WD lives in ns-{uid}.
hasLiveReferrer now ignores the namespace in the key and looks up by NAME
ONLY in the companion's own namespace — preventing false "referrer absent"
conclusions that would delete an actively-mounted companion. SetupWithManager
uses WithEngageWithLocalCluster(false) to match all other cell controllers.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…r-wide informer OOM

The CompanionGCReconciler previously registered For(&corev1.ConfigMap{}) which
established a cluster-wide ConfigMap informer with no label-scoped cache on the
cell manager. On a cell cluster this caused controller-runtime to sync and hold
every ConfigMap (and, via lazy cache reads, every Secret) in the cluster, pushing
the cell compute-manager well past its 128Mi limit → OOMKilled / CrashLoopBackOff.

Fix: change the For type to WorkloadDeployment. WorkloadDeployments are already
cached on the cell by the sibling WorkloadDeploymentReconciler, so this adds no
new cluster-wide informer. A WD delete event still enqueues the object's namespace
and fires Reconcile, which is the deletion trigger the GC needs.

All companion reads (ConfigMap/Secret List) inside Reconcile go through
cl.GetAPIReader() (uncached). A one-shot List via the API reader does not
establish a persistent informer, so it does not re-introduce the OOM. WD
liveness checks use the cached client because WDs are already in the cell cache.

Reconcile now sweeps the entire WD namespace (listing companions by label via
the uncached reader) rather than reconciling a single named object, and returns
ctrl.Result{} (event-driven only, no per-WD requeue).

The periodic backstop coverage gap is closed by a companionGCBackstop Runnable
(implements mcmanager.Runnable = manager.Runnable + multicluster.Aware). On
Engage it records each cell cluster; on each ticker interval it lists ALL
companions cluster-wide via the uncached APIReader, collects distinct namespaces,
and sends namespace-keyed GenericEvents into a buffered channel wired via
WatchesRawSource(source.TypedChannel). This covers namespaces whose last WD was
deleted before the controller started, which For(WD) would never enqueue. The
steady-state load is bounded to one cross-namespace List per interval regardless
of WD history, and no persistent CM/Secret informer is ever created.

The federator's companionsStillPresent check (workloaddeployment_federator.go)
reads from FederationClient (the Karmada hub client) and is unaffected by this
change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@scotwells scotwells force-pushed the feat/configmap-secret-mounts-federated branch from e0b8ee7 to 9a064cc Compare June 2, 2026 00:02
scotwells and others added 7 commits June 1, 2026 20:04
Supersede the cell-side CompanionGCReconciler design with a hub/federation-side
approach: tear down the companion's Karmada ResourceBinding at companion-deletion
time so the deletion cascades to the edge, plus a hub-side orphan-RB sweep as a
self-healing backstop. The cell-side GC fought Karmada's execution controller and
could not win against a live Work.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Remove the cell-side CompanionGCReconciler — deleting a Karmada-owned cell
copy directly causes a permanent delete/recreate thrash because the Work
object immediately re-applies the manifest. Hub-side cleanup is the only
correct layer.

Component 5: Delete companion_gc_controller.go and its test, and remove the
--enable-cell-controllers registration block from cmd/main.go.

Component 3: After deleting a hub companion (ref-count reaches zero), the
downstreamCompanionWriter now also deletes the companion's Karmada
ResourceBinding via hubClient. RB name follows the Karmada binding-controller
convention: "{companionName}-{configmap|secret}". Deleting the RB cascades:
binding-controller removes the Work, execution-controller removes the cell
copy permanently. IgnoreNotFound tolerates Karmada beating us to it.
localCompanionWriter (single-cluster dev mode) is a no-op.

Component 4: New OrphanRBReconciler runs on the Karmada hub federation
manager (alongside InstanceProjector). It watches ResourceBindings and
deletes any that are orphaned companion RBs — name ends with "-configmap" or
"-secret" AND propagationpolicy.karmada.io/name starts with "city-" AND the
hub companion no longer exists. WD RBs (suffix "-workloaddeployment") and
non-city-PP RBs are never touched. A periodic sweep Runnable fires every 5
minutes to catch RBs orphaned before the controller started, which will
automatically clean the existing stranded lab RBs on first deployment.

RBAC: add delete to work.karmada.io/resourcebindings in downstream-rbac
(hub ClusterRole). karmada-io/api work/v1alpha2 types added to global
scheme so the federation client can serialize ResourceBinding objects.

Tests: unit tests for Component 3 (RB teardown fires on ConfigMap/Secret
companion deletion, tolerates NotFound, no-op on localCompanionWriter) and
Component 4 (orphan detection, skip live/terminating companions, tight scope
guards against WD RBs and non-city-PP RBs, name pattern parsing).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The cell-side CompanionGCReconciler was removed in favor of hub-side cleanup;
update the federator ordering-guard comments and timeout-bypass log to reference
the OrphanRBReconciler as the authoritative backstop instead.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
OrphanRBReconciler scoped referenced-data companion ResourceBindings by reading
propagationpolicy.karmada.io/name from metadata.labels, but the running Karmada
version stores that value in metadata.annotations (labels carry only permanent-id
UUIDs). The label read always returned "", so the scope predicate rejected every
ResourceBinding and the controller never reclaimed an orphaned binding — confirmed
live in the lab, where the stranded cm-pristine/secret-pristine copies were never
cleaned.

Read from annotations in both isInScope and the watch predicate. The unit-test
fixtures previously set the PP name as a label (matching the bug, which is why they
passed against broken code); they now use the production annotation shape, so they
fail against the label read and pass against the fix.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copies the design documents from the working-tree scratch directory into
the branch so this branch is self-contained and reviewable without
needing access to untracked files.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds a new const block to api/v1alpha/instance_types.go with the
reason constants for the top-level readiness conditions
(Instance.Ready, WorkloadDeployment.Available, Workload.Available):

  WorkloadReasonNetworkNotFound
  WorkloadDeploymentReasonNetworkProvisioning   (replaces "ProvisioningNetwork")
  WorkloadDeploymentReasonInstancesProvisioning (replaces "ProvisioningInstances")
  WorkloadDeploymentReasonStableInstanceFound
  WorkloadDeploymentReasonReferencedDataNotReady (new)
  WorkloadDeploymentReasonQuotaNotGranted        (new)
  WorkloadReasonNoAvailablePlacements
  WorkloadReasonNoAvailableDeployments

Reason-string renames (deliberate, approved):
  "ProvisioningNetwork"   → "NetworkProvisioning"
  "ProvisioningInstances" → "InstancesProvisioning"

These renames align the emitted strings with the RFC-agreed vocabulary.
No client currently consumes these conditions; the rename is safe.

Replaces all inline string literals in workload_controller.go and
workloaddeployment_controller.go with the new named constants. No
behavior change; logic wiring happens in subsequent commits.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements the evaluate-all-then-pick logic in reconcileInstanceReadyCondition
so that the most actionable blocking cause is surfaced on Instance.Ready
instead of always collapsing to SchedulingGatesPresent.

Changes:
- reconcileReferencedDataCondition: when the owning WD carries a terminal
  ReferencedDataReady reason (SourceNotFound, SourceUnauthorized,
  SourceTooLarge), the Instance inherits the WD's reason+message verbatim.
  The companion will never arrive for a terminally missing source, so the
  WD's authoritative resolver verdict supersedes the cell-side "waiting for
  propagation" message. Zero extra API calls (WD already fetched).

- reconcileInstanceReadyCondition (scheduling-gates branch): evaluates ALL
  blocking sub-conditions (ReferencedDataReady, network failure) before
  selecting the winner via instanceBlockingReasonPriority. The previous
  code short-circuited on the first match, which could hide a
  higher-priority error behind a lower-priority one.

- isTerminalReferencedDataReason: helper predicate for the three terminal
  referenced-data reasons.

- instanceBlockingReasonPriority: private priority function implementing
  RFC §5.4 table. Duplicate of wdBlockingReasonPriority (intentional per
  RFC — avoids coupling the two controller packages).

Adds unit tests:
  TestReconcileInstanceReadyCondition_ReferencedDataEnrichment
  TestReconcileInstanceReadyCondition_EvaluateAllThenPick
  TestInstanceBlockingReasonPriority

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
scotwells and others added 3 commits June 2, 2026 14:23
Relocate the runnable hello-go/rust/node/python/ruby/php examples from
docs/compute/examples/ to a top-level examples/ directory for discoverability,
and fix the README links to the guides accordingly.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…-PIE

The Go guide and hello-go example were wrong: a plain CGO_ENABLED=0 build is
ET_EXEC (non-PIE) and the base:latest app-elfloader rejects it at boot with
"ELF executable is not position-independent". -buildmode=pie alone adds an
INTERP segment and is also rejected. The working recipe links statically against
musl (CGO_ENABLED=1, musl-gcc, -buildmode=pie -extldflags "-static-pie") to
produce a static PIE with no interpreter — verified booting on the lab. Update
the Dockerfile (with a build-time static-PIE self-check), the guide prose and
troubleshooting, and the example README. The --platform=$BUILDPLATFORM stage
keeps the Go toolchain native on arm64 hosts to avoid a qemu assembler segfault.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Rename the workload/image/module from hello-datum to hello-go to match the
hello-rust/node/python/ruby/php convention, and return "Hello from Datum (Go)"
like the other examples' language-tagged responses. Also correct two stale lines
in the guide (the verified-against reference and "fully static" -> "static-PIE").

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
scotwells and others added 2 commits June 2, 2026 15:59
The Go, Rust, Node.js, Python, PHP, and Ruby deploy guides and their
runnable examples/ apps have moved to their own focused PRs (#130-#135).
Dropping them here keeps this PR scoped to the ConfigMap/Secret mount
feature and shrinks the diff for reviewers.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The status-blocking-reason RFC has been lifted into a product-focused
enhancement on main (docs/enhancements/). The remaining plans/ working
notes and the superseded referenced-data-edge-cleanup RFC (whose central
proposal to remove the cell-side companion GC was not adopted) are
dropped -- git history is their durable record. Also restores the
configmap-secret-mounts RFC that this branch had deleted, since main
keeps design docs as RFCs.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
scotwells and others added 5 commits June 2, 2026 21:39
The WorkloadDeploymentFederator mirrors the downstream Karmada
WorkloadDeployment status onto the project (VCP) WorkloadDeployment, but
SetupWithManager only watched the project WD via For(). Nothing watched
the downstream WD whose status it mirrors, so when Karmada aggregated new
status onto the downstream object the federator was not notified — it
only caught up on the next informer resync (~10h default) or an
incidental project-WD spec write. This is why a freshly created
workload's replica counts stayed empty on the VCP long after its
projected Instance had already appeared (the InstanceProjector holds the
analogous downstream watch and so propagates immediately).

Add a downstream watch using the same cross-plane mechanism the
InstanceProjector and unikraft-provider use (milosource cluster source +
TypedEnqueueRequestsFromMapFunc). The map function correlates a
downstream WD event back to its project WD reconcile request: name is
stable across planes, namespace comes from the UpstreamOwnerNamespace
label the federator stamps, and the project cluster name is recovered by
decoding the UpstreamOwnerClusterName label on the downstream namespace
(the exact inverse of the encoding applied in ensureDownstreamNamespace).

The federation manager already constructed for the InstanceProjector is
reused as the watchable source, so there is no additional manager or
informer-cache cost beyond the new WD and Namespace informers.

Karmada's own status-aggregation interval (edge cell → downstream WD)
remains outside this repo; once Karmada writes the aggregated status, the
new watch reacts immediately.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…emory

Two Instance-controller correctness changes:

- Blocking-reason rollup: surface the most specific provider sub-condition
  (ImageUnavailable, InstanceCrashing, ConfigurationError, Provisioning) and its
  message onto the Instance Ready condition instead of a generic "Instance has
  not been programmed", so e.g. an image-pull failure reads as ImageUnavailable
  with the real message. Adds the reason constants and ranks them in the
  blocking-reason priority.

- Quota sizing: resolve vCPU/memory for instanceType-sized instances from a new
  instanceTypeCatalog (datumcloud/d1-standard-2 = 1 vCPU / 2 GiB) so the quota
  ResourceClaim requests vcpus + memory, not just instance count. Explicit
  container limits / instance requests still take precedence.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The downstream WorkloadDeployment status watch mapped events to a
reconcile request whose ClusterName was the full decoded org/project path
(decodeUpstreamClusterName turned the "cluster-<org>_<project>" namespace
label into "<org>/<project>"). But the Milo multicluster provider keys
project clusters by bare project name only. As a result every project
except the org-less "datum-cloud" failed to resolve: mcmanager routed the
unmatched name (ultimately the empty string) to the local host cluster,
which has no compute CRDs, so Reconcile failed with "no matches for kind
WorkloadDeployment" in a hot loop (~2 errors/sec observed on staging).

Extract the bare project name (final path segment) so it matches the
provider key, and guard the mapping with GetCluster: if the project
cluster isn't engaged yet, drop the event instead of enqueuing a request
that falls back to the host cluster and errors. Dropping is safe — once
the provider engages the cluster, the For watch reconciles it and the
next downstream status event maps cleanly.

Rename decodeUpstreamClusterName to projectClusterNameFromLabel to
reflect that it now returns the provider cluster key, and add the
not-engaged drop case to the mapping test.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The downstream WorkloadDeployment status watch was a complete no-op and
the source of a steady ~130 errors/min on the management plane. Two
layered causes:

milosource.NewClusterSource binds the raw source to the empty cluster
name, and the default mchandler.TypedEnqueueRequestsFromMapFunc wraps the
map in TypedInjectCluster, which overwrites each request's ClusterName
with that bound empty name. So the project cluster name computed by
mapDownstreamDeploymentToRequest (and validated by its GetCluster guard)
was discarded at enqueue time; every downstream event reached Reconcile
with ClusterName="". mcmanager routes the empty name to the local host
management cluster, which has no compute CRDs, so the Get failed with
"no matches for kind WorkloadDeployment" and requeued in a hot loop —
while the watch's actual purpose (immediate status mirror-back) never
ran for any project.

Switch the handler to TypedEnqueueRequestsFromMapFuncWithClusterPreservation
so the map's project cluster name survives to Reconcile, making the
downstream watch functional. Add a defensive guard at the top of Reconcile
that drops (returns nil, not an error) any request with an empty cluster
name, so a host-cluster fallback can never again spin in a requeue loop.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
A small static-PIE Go probe that boots as a Datum compute instance, reads back
its mounted ConfigMap/Secret files and injected env vars, and prints each item's
sha256 to the console — the tool used to verify that referenced ConfigMaps and
Secrets are delivered byte-exact to instances (secret values are hashed and
redacted, never printed). Includes the Dockerfile, Kraftfile, and a sample
Workload manifest.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…tance name

An Instance could wedge Pending forever (QuotaGranted=Unknown/QuotaNoBudget,
Quota scheduling gate never removed) even though its Milo ResourceClaim was
granted: the Instance reconciled once while the claim was still pending, and
nothing re-triggered it when the grant landed a beat later. The ResourceClaim
watch mapped a claim to its Spec.ResourceRef — the Project — so the grant
enqueued the project name, never the owning Instance.

Fix the watch to enqueue the owning Instance: its namespace is carried on a new
compute.datumapis.com/instance-namespace label (the claim lives in the project
quota namespace, not the Instance's), and its name is the claim name with the
resource-kind prefix stripped.

Also name the claim after the Instance (unique among Instances in the project
control plane) with an "instance-" prefix so it cannot collide with other
resource kinds' claims sharing the quota namespace, replacing the previous
"<namespace>--<name>" scheme.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
scotwells added a commit that referenced this pull request Jun 3, 2026
Drop the pre-merge dependency framing (the #129 'not yet on main' note)
and repoint example references from PR URLs to relative paths
(examples/serverless-js-configmap, examples/config-secret-probe), as if
the referenced-data delivery and example PRs are merged. Keep the
upstream base/base-compat ROM-enablement limitation, which is a Unikraft
runtime matter independent of these PRs.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
scotwells and others added 2 commits June 3, 2026 19:03
… them

A template-hash change (an image update, or a restartedAt annotation from
`datumctl compute restart`) previously resolved to an in-place Update of the
Instance. The unikraft provider bakes the pod at creation time and never
recomputes an existing pod's spec, so the in-place update silently failed to
roll the running workload — instances kept their old pod.

Emit a delete (recreate) for drifted Ready instances instead. The next
reconcile refills the slot via the create path with the new template, and the
provider's finalizer-gated teardown plus create-on-new-Instance roll the pod
with no provider changes. Ordered one-at-a-time pacing is preserved by the
existing descending-ordinal sort, skip-all-but-first, and the
DeletionTimestamp WaitAction.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…rvedGeneration

A restart/rolling update was invisible from the project plane: there was no
status field representing how many instances are on the new template revision.
Add UpdatedReplicas (instances whose observed template hash matches the desired
template, regardless of readiness) and ObservedGeneration to both
WorkloadDeployment and Workload (plus placement) status.

UpdatedReplicas is computed on the cell WD reconcile alongside CurrentReplicas
(which is now its Programmed subset), aggregated up into the Workload, and rides
the existing status sync to the project plane. Repoint the "Up-to-date"
printcolumn to .status.updatedReplicas to match `kubectl get deployment`
semantics, so a roll is visible as the count dips below Replicas and recovers.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
scotwells added a commit that referenced this pull request Jun 4, 2026
Surface rolling-update / restart progress in `datumctl compute workloads` by
showing updated/desired replica counts next to ready. UP-TO-DATE counts
instances on the latest template revision (status.updatedReplicas), so a roll
is visible as the count dips below desired and then recovers.

Includes a byte-identical copy of the UpdatedReplicas/ObservedGeneration
WorkloadDeployment status fields in api/v1alpha so the plugin can read them.
These fields are defined identically on the controller branch (PR #129); the
duplicate resolves cleanly once both land on main.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
scotwells and others added 6 commits June 3, 2026 20:26
The Instance "Running" status condition is renamed to "Available" (wire
value "Available"). An instance can be available while not actively
running a pod (e.g. scaled to zero), so "Running" was misleading as a
serving/health signal.

Renamed constants:
  InstanceRunning                -> InstanceAvailable               ("Available")
  InstanceReadyReasonRunning     -> InstanceReadyReasonAvailable    ("Available")
  InstanceRunningReasonRunning   -> InstanceAvailableReasonAvailable ("Available")
  InstanceRunningReasonStopped   -> InstanceAvailableReasonStopped
  InstanceRunningReasonStarting  -> InstanceAvailableReasonStarting
  InstanceRunningReasonStopping  -> InstanceAvailableReasonStopping

BREAKING CHANGE: the on-the-wire Instance condition type changes from
"Running" to "Available". Consumers reading conditions[type=="Running"]
must switch to "Available". Existing Instances self-heal on the next
provider reconcile (the provider re-asserts the condition under its new
name); the stale "Running" entry lingers cosmetically until then and is
no longer read by the Ready derivation.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…eals

The instance controller is re-queued by a ResourceClaim watch when the
claim is granted, but that grant event lives on the project control plane
and can be missed (informer engagement races, watch relist gaps),
wedging the instance at QuotaGranted!=True indefinitely (observed: claim
Granted, instance stuck QuotaNoBudget until a manual reconcile cleared
it). The pending-quota path returned no RequeueAfter, so there was no
safety net.

Add a backing-off requeue while QuotaGranted is not True, anchored on the
condition's last transition:

  <60s : 1s     (catch a grant landing almost immediately)
  60s–5m : 15s
  5m–10m : 60s
  >=10m : 300s

Folded into the existing referenced-data requeue (soonest wins). The
ResourceClaim watch remains the fast path; this only guarantees a missed
grant self-heals instead of wedging.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…roof

The pending-quota safety-net requeue was wired only at the tail of
Reconcile, so an early return during the pending window (a status-update
or upstream-writeback conflict) silently dropped it onto controller-
runtime's exponential error-backoff — which can stretch to minutes,
leaving an instance wedged at QuotaGranted!=True even though its
ResourceClaim was granted (observed: the 2nd instance in a rapid burst
consistently wedged).

- Compute the requeue once, up front, so every return path honors it.
- On a Conflict during the pending window, requeue at the bounded quota
  interval instead of returning the error (which would back off).
- Log the requeue decision (and conflict-driven requeues) so the path is
  observable: a re-firing requeue prints every pass while pending, a
  dropped one does not.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… LTT

Observability revealed the safety-net requeue was firing every reconcile
but always at the slowest tier (300s): elapsed was measured from the
QuotaGranted condition's LastTransitionTime, which stays at the
1970-01-01 CRD default while quota is pending (PendingEvaluation and
NoBudget are both Unknown, so SetStatusCondition never bumps it). Result:
a watch-missed instance waited up to 5 minutes for the safety net instead
of ~1s, appearing wedged.

Anchor elapsed on instance.CreationTimestamp, which reflects actual wait
time, so the fast tiers (1s/15s) apply early as intended.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The instance controller emits Warning events on Instances (QuotaNoBudget,
ImageUnavailable, InstanceCrashing, ConfigurationError, NetworkFailedToCreate,
…) via the event recorder, but no RBAC rule granted it. Every write was
rejected — "events is forbidden: ... cannot create resource events in API
group \"\" in the namespace ns-<uid>" — so the user-facing signals explaining
why an instance is stuck never reached the Instance (kubectl describe /
activity timeline). Reconciliation was unaffected; this is an observability gap.

Add the kubebuilder marker and regenerate the role. The regen also syncs a
pre-existing work.karmada.io/resourcebindings rule (from an existing marker
that wasn't reflected in the committed role).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The Lint job had been red on the branch from pre-existing findings,
unrelated to the rename/quota work:
- gofmt: re-align a struct in the instance-sizing test.
- goconst: extract the repeated "datumcloud/d1-standard-2", "app", and
  "test/image:latest" literals into test constants.

Tests and lint both pass locally.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant