feat: Deliver referenced ConfigMaps and Secrets to instances#129
Draft
scotwells wants to merge 64 commits into
Draft
feat: Deliver referenced ConfigMaps and Secrets to instances#129scotwells wants to merge 64 commits into
scotwells wants to merge 64 commits into
Conversation
Engineering breakdown of the five workstreams, the reconciled cross-workstream contracts, the phased critical path, the settled decisions, and a concrete end-to-end testing-environment design. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Phase 0 of the cross-plane referenced-data delivery design: the API surface, validation, and the management-plane building blocks every later phase compiles against. - API: EnvFrom on SandboxContainer; ReferencedDataReady condition + reasons; ReferencedData scheduling gate; referenced-data label and expected-referenced-data/restartedAt annotations. - Validation: secret-volume validation, ConfigMap/Secret items key->path projection, file-mode range and path safety, EnvFrom validation, and an admission SubjectAccessReview that the submitter can read each referenced ConfigMap/Secret. - internal/referenceddata seam: reference collector, deterministic companion naming, and the scoped ProjectConfigSecretReader (+ single-cluster variant). - Feature-flagged gate insertion (EnableReferencedDataGate, default off) so behavior is unchanged until the delivery and consumption phases are deployed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Phase 1: the management-plane ReferencedDataController. For each WorkloadDeployment it collects referenced ConfigMaps/Secrets, reads them with a scoped reader, enforces per-object/aggregate size limits, materializes one shared companion per source as a local copy, records the expected companion set on the deployment, reference-counts across deployments with finalizer cleanup, and surfaces resolution failures on ReferencedDataReady. Source watches refresh companions on rotation. Delivery is abstracted behind a companionWriter seam; the federated writer and PropagationPolicy extension are deferred to Phase 1b (federation merge). Sizes are configurable (referencedData.perObjectLimitBytes / aggregateLimitBytes). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Phase 2: the cell-side InstanceReconciler now resolves the referenced-data scheduling gate. For a gated Instance it reads the expected companion set from the owning WorkloadDeployment, lists the labeled companions in the namespace, and clears the gate once they are all present — surfacing ReferencedDataReady (Resolving/AwaitingPropagation/Ready) with the missing set on the message. A companion watch re-checks waiting Instances, with a requeue safety net. Adds Kubernetes Events on transitions and compute_referenced_data_* metrics (companions present/expected, gate-wait duration, condition transitions), and counts referenced-data-blocked replicas in the deployment status. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Fix type mismatches and test setup issues that arose when landing the
referenced-data commits onto the federated-deployment-scheduling base:
- Use multicluster.ClusterName (not string) in Watches handler signatures
for referenceddata_controller.go and instance_controller.go companion
watches; update enqueueWDsForSource to accept multicluster.ClusterName
- Cast req.ClusterName to string when passing to resolveAndValidateSources
- Replace New(Options{}) with NewWithOptions(Options{}) in stateful_control_test.go
after re-introducing the no-arg New() on the combined Options struct
- Initialize r.finalizers in newRefDataReconciler test helper; add
instanceControllerFinalizer to test instance fixtures so the finalizer
framework does not short-circuit reconcile on first pass
- Update TestReferencedDataEventEmittedOnClear to expect single-pass
gate-clearing (federation branch clears both status and gate in same pass)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Phase 1b: the resolver now has a federated companion writer. When a federation
client is configured it materializes companions into the project's downstream
namespace (ns-{project-uid}) on the Karmada hub via the same
MappedNamespaceResourceStrategy the federator uses; with no federation client it
falls back to the single-cluster local copy.
The federator's PropagationPolicy now selects the referenced-data-labeled
ConfigMaps and Secrets alongside the WorkloadDeployment, so companions
co-propagate to the same cells, and it forwards the expected-referenced-data
annotation to the downstream deployment so the cell can clear the gate.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The ReferencedDataController was registered unconditionally, so it also started in the cell operator and collided with the cell's WorkloadDeploymentReconciler (both default their controller name to "workloaddeployment"). Gate it to the management controller set and give it a distinct name. Found by running the operators in the multi-cluster e2e environment. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ss fixes
Adds the Chainsaw e2e scenario that validates the full cross-plane delivery
chain for ConfigMap and Secret mounts (Hops 1-5): source objects on the
control-plane → companions materialised in ns-{uid} on the Karmada hub →
propagated to the pop-dfw cell → Instance created with the ReferencedData
scheduling gate → gate cleared and ReferencedDataReady=True set.
Test assertion bug fixed: companions live on the Karmada hub and cell, NOT
on the control-plane. The two "assert-companion-*-on-control-plane" steps
now target cluster:downstream instead.
PropagationPolicy assertion updated to include all three resource selectors
(WorkloadDeployment + ConfigMap + Secret with namespace field) to satisfy
Chainsaw's exact array-length check.
Harness fixes committed alongside:
- _e2e:karmada:build-kubeconfig: cp karmada.yaml → downstream.yaml so
cluster:downstream steps work after task e2e:up.
- e2e:operator:start: --karmada-kubeconfig → --federation-kubeconfig (correct
flag name) for both operators; add --enable-management-controllers=true to
the management operator invocation.
- e2e:crds:install: new _e2e:crds:quota step installs Milo quota CRDs from
the module cache to all clusters and the Karmada API server.
- e2e:operator:start:referenced-data: new task that starts both operators
with --server-config=hack/e2e/operator-config-referenced-data.yaml
(featureFlags.enableReferencedDataGate:true).
- hack/e2e/operator-config-referenced-data.yaml: server config with the
feature flag enabled (referenced by README and the new task).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The WorkloadDeploymentFederator and ReferencedDataController both call Update() on the same WorkloadDeployment to add/remove their respective finalizers, producing optimistic-lock conflicts under concurrent reconciliation. The companions DO converge, but the errors are noisy. Wrap all three finalizer mutation sites (add on first reconcile, remove on deletion, remove on empty-refs) in retry.RetryOnConflict so that a concurrent federator update causes a transparent re-read + retry rather than a logged error. Each retry re-GETs the latest object version before re-applying only the single finalizer change. Adds two unit tests using interceptor.Funcs to inject a conflict on the first Update and assert that the finalizer is correctly applied/removed on retry. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…cret-mounts
Fixes six issues identified in code review of the federated configmap/secret
mounts feature:
1. [BLOCKER] ValidateUpdate was a no-op stub: an attacker could create a
Workload with no refs then edit in refs to ConfigMaps/Secrets they cannot
read. Fixed by implementing ValidateUpdate to run the same full template
validation (including SAR) as ValidateCreate on the new object.
2. [MAJOR] SAR for referenced-data was missing UserInfo.Extra, unlike the
sibling Network SAR. Fixed by populating Extra from AdmissionRequest.
3. [MAJOR] Collector still collected both refs from an envFrom entry that had
both configMapRef and secretRef set, even though validateEnvFrom forbids it.
Fixed by short-circuiting such entries before collection in CollectFromTemplate.
4. [MAJOR] validateReferencedDataAccess maintained a hand-rolled traversal
(collectRefsFromSpec) parallel to referenceddata.CollectFromTemplate, with
drift risk. Fixed by adding CollectFromSpec to collector.go and making the
validation call it as the single source of truth. No import cycle (validation
-> referenceddata, not the reverse).
5. [MINOR] names.go: after TrimRight("-.", ...) the truncated segment could be
empty (e.g. source name is all '-' or '.'), producing "<prefix>.-<hash>"
which is invalid DNS-1123 (segment starts with '-'). Fixed by emitting
"<prefix>.<hash>" when truncated is empty.
6. [MINOR] Tests added: both-refs-set envFrom rejection + no SAR for invalid
entry; SAR InternalError/fail-closed path; ValidateUpdate SAR path;
names edge cases (all-dashes, all-dots, name ending on dot, valid prefix).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Fixes 10 issues found in the code review of the configmap/secret mounts
federation feature.
**Blockers:**
Fix 1 (federator status sync clobbers resolver conditions): syncStatusFromDownstream
now merges downstream status into the project WD while preserving the
resolver-owned ReferencedDataReady condition, rather than replacing .Status
wholesale. Wrapped in RetryOnConflict for concurrent-write safety.
Fix 2 (ref-count annotation race): materialiseOne and releaseOneCompanion now
wrap their read-modify-write of the ref-count annotation in
retry.RetryOnConflict loops so that two WDs reconciling concurrently against
the same companion cannot drop each other's ref entries.
**Majors:**
Fix 3 (optional source escape hatch): resolveAndValidateSources accepts the WD
template spec and uses a new isOptionalRef helper to determine per-source
optionality. NotFound and TooLarge errors for optional sources are silently
skipped; the WD proceeds with the remaining required companions.
Fix 4 (unparseable ref-count): decodeRefCount now returns ([]string, error).
An unmarshal failure propagates as a transient error; release paths return
the error rather than computing an empty remaining set that would incorrectly
delete the companion.
Fix 5 (nil-source panic): resolveOneSource nil-guards the reader's return
value and maps a (nil, nil) return to a SourceNotFound condition error.
Fix 6 (non-conflict-tolerant writes): the expected-set annotation Patch and
Status().Update are each wrapped in retry.RetryOnConflict; residual conflicts
after retries return (Result{}, nil) to requeue cleanly.
Fix 7 (annotation/label overwrite): ApplyConfigMap and ApplySecret in both
localCompanionWriter and downstreamCompanionWriter now use mergeLabels /
mergeAnnotations helpers that only upsert controller-owned keys, preserving
third-party entries such as Karmada bookkeeping annotations.
**Minors:**
Fix 8 (unused scheme field): ProjectReader.scheme field and the second
parameter of NewProjectReader removed; client.Options{} is sufficient.
Fix 9 (misleading localCompanionWriter comment): clarified that
localCompanionWriter is reachable only when FederationClient is nil, which
management-plane federation wiring never does on this branch.
Fix 10 (literal "v1" APIVersion): PropagationPolicy ConfigMap and Secret
selectors now use corev1.SchemeGroupVersion.String() instead of the literal
string "v1".
**Regression tests added:**
- Two WDs sharing a source, interleaved reconciles → both ref-count entries preserved
- Optional missing source → skipped, WD not failed
- Unparseable ref-count annotation → companion NOT deleted
- Federator status sync preserves ReferencedDataReady condition
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…etrics - [BLOCKER] Restore ObservedGeneration guard on gate clearing: both the quota and referenced-data gate removals in reconcileSchedulingGates now check cond.ObservedGeneration == instance.Generation, preventing a stale True condition from generation N clearing a freshly-stamped gate at N+1. - [MAJOR] Suppress AwaitingPropagation event flood: the Warning event is now emitted only on a reason transition (prev != AwaitingPropagation), not on every reconcile where the missing-set message changes. - [MAJOR] Delete dead removeQuotaSchedulingGate: its generation guard has been ported to reconcileSchedulingGates; the standalone function is gone. - [MAJOR] Drop high-cardinality instance label from CompanionsPresent and CompanionsExpected metrics: both gauges now aggregate per namespace only, eliminating the unbounded per-instance series set. - [MINOR] Delete drainEvents test helper (never called). - [MINOR] Remove instanceByWorkloadDeploymentUIDIndex: the constant, its IndexField registration, and the index func are all deleted; the companion watch enqueue path already lists the full namespace and never queried it. Updated addInstanceIndexers doc-comment accordingly. - [MINOR] Route checkForNetworkCreationFailure through fetchOwnerWorkloadDeployment to avoid a duplicate WD Get; fix the stale doc-comment on enqueueInstancesInNamespace. - [MINOR] Fix no-gate self-heal: the branch that flips a non-True ReferencedDataReady condition to True when the gate is absent now validates that all expected companions are actually present before marking Ready, avoiding a false-Ready status when the gate was stripped out-of-band while companions were still missing. - ADD TEST: TestReferencedDataStaleConditionGuard verifies that a stale True condition at ObservedGeneration N does not clear a gate at generation N+1; only the second reconcile (after condition is re-evaluated at N+1) removes the gate. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The companion ref-count (companionRefCountAnnotation) read-modify-write was non-atomic: materialiseOne fetched the companion (R1), computed the updated ref-count, built a desired object carrying that ref-count, then called ApplyConfigMap which issued a second independent GET (R2) before calling Update. mergeAnnotations then overwrote the referenced-by key on R2 with the R1-derived value. If a concurrent WD had committed its ref-count entry between R1 and R2, R2 already contained that entry, but the merge silently discarded it. Because Apply's own GET+Update targeted R2, no conflict was raised and RetryOnConflict never fired. Fix (approach a): remove the second internal GET from ApplyConfigMap / ApplySecret. Both methods now accept an `existing` argument — the already-fetched object (or nil when the companion does not yet exist). When existing is non-nil, the implementation writes directly onto it and calls Update, targeting the same resourceVersion that the caller used to compute the ref-count. A concurrent change between GET and Update will now produce a Conflict that propagates out to the materialiseOne / releaseOneCompanion RetryOnConflict loop, which re-reads and recomputes. The same double-GET path existed in releaseOneCompanion and is fixed with the same pattern: pass the already-fetched companion as the `existing` argument. Add TestReferencedData_RefCount_ConflictForcesReread, which injects a concurrent write of a third WD's key (wd3Key) into the companion on realCl immediately after the outer GET returns. Under the fixed code the subsequent Update at R1 conflicts with R2, RetryOnConflict re-reads R2 (with wd3Key), and all three keys appear in the final annotation. Under the old code Apply's internal GET would return R2 with wd3Key, but mergeAnnotations would overwrite referenced-by with the R1-derived value ([wd1Key, wd2Key]), silently dropping wd3Key, and the Update at R2 would succeed — the test would fail with only 2 keys instead of 3. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace manual membership loop in refCountAdd with slices.Contains. Replace two manual map-copy loops in mergeLabels/mergeAnnotations with maps.Copy (semantics preserved: copies wanted INTO existing, never replaces the map). Add ReferencedDataLabelValue = "true" to api/v1alpha/labels.go and use it everywhere in place of the repeated string literal. Use kindSecret and kindConfigMap constants in indexers.go and the controller instead of repeated string literals. Extract isOptionalInVolumes and isOptionalInContainers helpers from isOptionalRef to bring gocyclo from 32 down to within the limit. Define named constants for all repeated test strings (rdTestWD1, rdTestBlobKey, testKindConfigMap, etc.) introduced by the configmap/secret change set to clear the goconst and gocyclo lint issues it brought in. The instance_controller.go GetEventRecorderFor call retains its existing //nolint:staticcheck comment: the replacement API (GetEventRecorder) has an incompatible Eventf signature requiring a migration of all emit sites, which is deferred as a separate task. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…olve
Companions were named "<lower-kind>.<source-name>" (e.g. configmap.app-config)
but Instances and Pods reference sources by their verbatim source name
(e.g. app-config). The runtime found only the prefixed companion → 404
→ mounts/env vars failed.
Option B fix: companion objects are now named exactly by their source name.
Cross-kind collision is safe (ConfigMap and Secret are distinct resource
types in the same namespace). Pod/Instance volume and envFrom references
resolve naturally without any translation layer.
Kind-disambiguation is preserved in the expected-referenced-data annotation
as "Kind/name" tokens (e.g. ["ConfigMap/app-config","Secret/app-secret"])
so the cell InstanceReconciler can verify each companion by kind+name
without probing both resource types.
Changes:
- internal/referenceddata/names.go: CompanionName drops the kind prefix;
adds CompanionToken("Kind", "name") → "Kind/name" helper
- internal/referenceddata/names_test.go: update expected values; add
TestCompanionName_SourceNameContract (contract test that would fail
against the old prefixed code) and TestCompanionName_SameSourceDifferentKind
- internal/controller/referenceddata_controller.go: annotation stores
kind-qualified tokens; releaseOneCompanion accepts explicit kind param
- internal/controller/referenceddata_controller_test.go: update assertions
to kind-qualified tokens; fix stale-RV and same-name source/companion cases
- internal/controller/instance_controller.go: gate-clearing parses kind-
qualified tokens via listPresentCompanionsByKindName
- internal/controller/instance_referenced_data_test.go: align to new names
and annotation token format
- test/e2e/referenced-data-mounts/chainsaw-test.yaml: update companion name
assertions (configmap.app-config → app-config, secret.app-secret → app-secret)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The federation base this branch builds on predates the earlier guard (6bcd822), so reconcileInstanceGates dereferenced instance.Status.Controller — a nilable pointer the infra provider populates independently of the Programmed condition — while counting current replicas. An instance reporting Programmed=True with Status.Controller still nil panicked the reconcile before the status write, which freezes the WorkloadDeployment status and hot-loops the reconcile. Observed panicking live in the lab. Guard the dereference so reconciliation completes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add a regression test that constructs an Instance with Programmed=True but Status.Controller==nil and asserts that reconcileInstanceGates neither panics nor counts it as a current replica. Without the nil guard introduced in the same area the test panics with a nil-pointer dereference, confirming the guard is load-bearing. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Spec.Controller is a nilable pointer the infra provider populates independently of an Instance's networking readiness. reconcileInstanceGates dereferenced instance.Spec.Controller.SchedulingGates in the network gate-clearing path without a nil guard, so an Instance that reached networkReady before its controller spec was populated panic-looped the WorkloadDeployment reconcile and froze status. This mirrors the existing Status.Controller guard. Extends the nil-controller regression test to drive networkReady=true with a nil Spec.Controller, the case the first guard missed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The referenced-data controller materializes companion copies of the ConfigMaps and Secrets a Workload references into its hub namespace, where Karmada propagates them to the cell so kraftlet can mount them. The compute-manager ClusterRole on the Karmada hub had no core ConfigMap/Secret rule, so every reconcile failed with a forbidden error: companions were never written, the expected-referenced-data annotation was never stamped, and nothing propagated to the cell. ConfigMap/Secret mounts silently did nothing on the edge. Grant the controller full lifecycle access (it owns the companions, including ref-count deletion) to configmaps and secrets in its namespace. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ed before Karmada finishes GC
Part 1 — ordering guard: WorkloadDeploymentFederator.Finalize now checks
whether any companion ConfigMaps or Secrets (referenced-data=true label)
remain in the downstream namespace before calling
cleanupPropagationPolicyIfUnused. The guard only fires when this is the last
WD for its city code (mirroring cleanupPropagationPolicyIfUnused's condition),
so deleting WD-A cannot block on a live WD-B's companion in the shared
namespace. If companions are present Finalize returns an errCompanionsStillPresent
sentinel; Reconcile intercepts it (walking the kerrors.Aggregate the finalizer
framework returns), logs at Info, and sets RequeueAfter — no error-metric
inflation. After companionGuardTimeout (2 min) the guard bypasses itself so a
wedged referenced-data controller cannot permanently block deletion.
Part 2 — authoritative cell-side GC: CompanionGCReconciler watches
WorkloadDeployment events on each cell and deletes companion ConfigMaps/Secrets
whose every referenced-by entry resolves to no live WD on the local cell.
Critical fix: the referenced-by annotation is written by the hub
ReferencedDataController as "projectNamespace/wdName" (e.g.
"default/mount-pristine-default-dfw"), but the cell WD lives in ns-{uid}.
hasLiveReferrer now ignores the namespace in the key and looks up by NAME
ONLY in the companion's own namespace — preventing false "referrer absent"
conclusions that would delete an actively-mounted companion. SetupWithManager
uses WithEngageWithLocalCluster(false) to match all other cell controllers.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…r-wide informer OOM
The CompanionGCReconciler previously registered For(&corev1.ConfigMap{}) which
established a cluster-wide ConfigMap informer with no label-scoped cache on the
cell manager. On a cell cluster this caused controller-runtime to sync and hold
every ConfigMap (and, via lazy cache reads, every Secret) in the cluster, pushing
the cell compute-manager well past its 128Mi limit → OOMKilled / CrashLoopBackOff.
Fix: change the For type to WorkloadDeployment. WorkloadDeployments are already
cached on the cell by the sibling WorkloadDeploymentReconciler, so this adds no
new cluster-wide informer. A WD delete event still enqueues the object's namespace
and fires Reconcile, which is the deletion trigger the GC needs.
All companion reads (ConfigMap/Secret List) inside Reconcile go through
cl.GetAPIReader() (uncached). A one-shot List via the API reader does not
establish a persistent informer, so it does not re-introduce the OOM. WD
liveness checks use the cached client because WDs are already in the cell cache.
Reconcile now sweeps the entire WD namespace (listing companions by label via
the uncached reader) rather than reconciling a single named object, and returns
ctrl.Result{} (event-driven only, no per-WD requeue).
The periodic backstop coverage gap is closed by a companionGCBackstop Runnable
(implements mcmanager.Runnable = manager.Runnable + multicluster.Aware). On
Engage it records each cell cluster; on each ticker interval it lists ALL
companions cluster-wide via the uncached APIReader, collects distinct namespaces,
and sends namespace-keyed GenericEvents into a buffered channel wired via
WatchesRawSource(source.TypedChannel). This covers namespaces whose last WD was
deleted before the controller started, which For(WD) would never enqueue. The
steady-state load is bounded to one cross-namespace List per interval regardless
of WD history, and no persistent CM/Secret informer is ever created.
The federator's companionsStillPresent check (workloaddeployment_federator.go)
reads from FederationClient (the Karmada hub client) and is unaffected by this
change.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
e0b8ee7 to
9a064cc
Compare
Supersede the cell-side CompanionGCReconciler design with a hub/federation-side approach: tear down the companion's Karmada ResourceBinding at companion-deletion time so the deletion cascades to the edge, plus a hub-side orphan-RB sweep as a self-healing backstop. The cell-side GC fought Karmada's execution controller and could not win against a live Work. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Remove the cell-side CompanionGCReconciler — deleting a Karmada-owned cell
copy directly causes a permanent delete/recreate thrash because the Work
object immediately re-applies the manifest. Hub-side cleanup is the only
correct layer.
Component 5: Delete companion_gc_controller.go and its test, and remove the
--enable-cell-controllers registration block from cmd/main.go.
Component 3: After deleting a hub companion (ref-count reaches zero), the
downstreamCompanionWriter now also deletes the companion's Karmada
ResourceBinding via hubClient. RB name follows the Karmada binding-controller
convention: "{companionName}-{configmap|secret}". Deleting the RB cascades:
binding-controller removes the Work, execution-controller removes the cell
copy permanently. IgnoreNotFound tolerates Karmada beating us to it.
localCompanionWriter (single-cluster dev mode) is a no-op.
Component 4: New OrphanRBReconciler runs on the Karmada hub federation
manager (alongside InstanceProjector). It watches ResourceBindings and
deletes any that are orphaned companion RBs — name ends with "-configmap" or
"-secret" AND propagationpolicy.karmada.io/name starts with "city-" AND the
hub companion no longer exists. WD RBs (suffix "-workloaddeployment") and
non-city-PP RBs are never touched. A periodic sweep Runnable fires every 5
minutes to catch RBs orphaned before the controller started, which will
automatically clean the existing stranded lab RBs on first deployment.
RBAC: add delete to work.karmada.io/resourcebindings in downstream-rbac
(hub ClusterRole). karmada-io/api work/v1alpha2 types added to global
scheme so the federation client can serialize ResourceBinding objects.
Tests: unit tests for Component 3 (RB teardown fires on ConfigMap/Secret
companion deletion, tolerates NotFound, no-op on localCompanionWriter) and
Component 4 (orphan detection, skip live/terminating companions, tight scope
guards against WD RBs and non-city-PP RBs, name pattern parsing).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The cell-side CompanionGCReconciler was removed in favor of hub-side cleanup; update the federator ordering-guard comments and timeout-bypass log to reference the OrphanRBReconciler as the authoritative backstop instead. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
OrphanRBReconciler scoped referenced-data companion ResourceBindings by reading propagationpolicy.karmada.io/name from metadata.labels, but the running Karmada version stores that value in metadata.annotations (labels carry only permanent-id UUIDs). The label read always returned "", so the scope predicate rejected every ResourceBinding and the controller never reclaimed an orphaned binding — confirmed live in the lab, where the stranded cm-pristine/secret-pristine copies were never cleaned. Read from annotations in both isInScope and the watch predicate. The unit-test fixtures previously set the PP name as a label (matching the bug, which is why they passed against broken code); they now use the production annotation shape, so they fail against the label read and pass against the fix. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copies the design documents from the working-tree scratch directory into the branch so this branch is self-contained and reviewable without needing access to untracked files. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds a new const block to api/v1alpha/instance_types.go with the reason constants for the top-level readiness conditions (Instance.Ready, WorkloadDeployment.Available, Workload.Available): WorkloadReasonNetworkNotFound WorkloadDeploymentReasonNetworkProvisioning (replaces "ProvisioningNetwork") WorkloadDeploymentReasonInstancesProvisioning (replaces "ProvisioningInstances") WorkloadDeploymentReasonStableInstanceFound WorkloadDeploymentReasonReferencedDataNotReady (new) WorkloadDeploymentReasonQuotaNotGranted (new) WorkloadReasonNoAvailablePlacements WorkloadReasonNoAvailableDeployments Reason-string renames (deliberate, approved): "ProvisioningNetwork" → "NetworkProvisioning" "ProvisioningInstances" → "InstancesProvisioning" These renames align the emitted strings with the RFC-agreed vocabulary. No client currently consumes these conditions; the rename is safe. Replaces all inline string literals in workload_controller.go and workloaddeployment_controller.go with the new named constants. No behavior change; logic wiring happens in subsequent commits. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements the evaluate-all-then-pick logic in reconcileInstanceReadyCondition so that the most actionable blocking cause is surfaced on Instance.Ready instead of always collapsing to SchedulingGatesPresent. Changes: - reconcileReferencedDataCondition: when the owning WD carries a terminal ReferencedDataReady reason (SourceNotFound, SourceUnauthorized, SourceTooLarge), the Instance inherits the WD's reason+message verbatim. The companion will never arrive for a terminally missing source, so the WD's authoritative resolver verdict supersedes the cell-side "waiting for propagation" message. Zero extra API calls (WD already fetched). - reconcileInstanceReadyCondition (scheduling-gates branch): evaluates ALL blocking sub-conditions (ReferencedDataReady, network failure) before selecting the winner via instanceBlockingReasonPriority. The previous code short-circuited on the first match, which could hide a higher-priority error behind a lower-priority one. - isTerminalReferencedDataReason: helper predicate for the three terminal referenced-data reasons. - instanceBlockingReasonPriority: private priority function implementing RFC §5.4 table. Duplicate of wdBlockingReasonPriority (intentional per RFC — avoids coupling the two controller packages). Adds unit tests: TestReconcileInstanceReadyCondition_ReferencedDataEnrichment TestReconcileInstanceReadyCondition_EvaluateAllThenPick TestInstanceBlockingReasonPriority Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Relocate the runnable hello-go/rust/node/python/ruby/php examples from docs/compute/examples/ to a top-level examples/ directory for discoverability, and fix the README links to the guides accordingly. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…-PIE The Go guide and hello-go example were wrong: a plain CGO_ENABLED=0 build is ET_EXEC (non-PIE) and the base:latest app-elfloader rejects it at boot with "ELF executable is not position-independent". -buildmode=pie alone adds an INTERP segment and is also rejected. The working recipe links statically against musl (CGO_ENABLED=1, musl-gcc, -buildmode=pie -extldflags "-static-pie") to produce a static PIE with no interpreter — verified booting on the lab. Update the Dockerfile (with a build-time static-PIE self-check), the guide prose and troubleshooting, and the example README. The --platform=$BUILDPLATFORM stage keeps the Go toolchain native on arm64 hosts to avoid a qemu assembler segfault. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Rename the workload/image/module from hello-datum to hello-go to match the hello-rust/node/python/ruby/php convention, and return "Hello from Datum (Go)" like the other examples' language-tagged responses. Also correct two stale lines in the guide (the verified-against reference and "fully static" -> "static-PIE"). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This was referenced Jun 2, 2026
The Go, Rust, Node.js, Python, PHP, and Ruby deploy guides and their runnable examples/ apps have moved to their own focused PRs (#130-#135). Dropping them here keeps this PR scoped to the ConfigMap/Secret mount feature and shrinks the diff for reviewers. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The status-blocking-reason RFC has been lifted into a product-focused enhancement on main (docs/enhancements/). The remaining plans/ working notes and the superseded referenced-data-edge-cleanup RFC (whose central proposal to remove the cell-side companion GC was not adopted) are dropped -- git history is their durable record. Also restores the configmap-secret-mounts RFC that this branch had deleted, since main keeps design docs as RFCs. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The WorkloadDeploymentFederator mirrors the downstream Karmada WorkloadDeployment status onto the project (VCP) WorkloadDeployment, but SetupWithManager only watched the project WD via For(). Nothing watched the downstream WD whose status it mirrors, so when Karmada aggregated new status onto the downstream object the federator was not notified — it only caught up on the next informer resync (~10h default) or an incidental project-WD spec write. This is why a freshly created workload's replica counts stayed empty on the VCP long after its projected Instance had already appeared (the InstanceProjector holds the analogous downstream watch and so propagates immediately). Add a downstream watch using the same cross-plane mechanism the InstanceProjector and unikraft-provider use (milosource cluster source + TypedEnqueueRequestsFromMapFunc). The map function correlates a downstream WD event back to its project WD reconcile request: name is stable across planes, namespace comes from the UpstreamOwnerNamespace label the federator stamps, and the project cluster name is recovered by decoding the UpstreamOwnerClusterName label on the downstream namespace (the exact inverse of the encoding applied in ensureDownstreamNamespace). The federation manager already constructed for the InstanceProjector is reused as the watchable source, so there is no additional manager or informer-cache cost beyond the new WD and Namespace informers. Karmada's own status-aggregation interval (edge cell → downstream WD) remains outside this repo; once Karmada writes the aggregated status, the new watch reacts immediately. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…emory Two Instance-controller correctness changes: - Blocking-reason rollup: surface the most specific provider sub-condition (ImageUnavailable, InstanceCrashing, ConfigurationError, Provisioning) and its message onto the Instance Ready condition instead of a generic "Instance has not been programmed", so e.g. an image-pull failure reads as ImageUnavailable with the real message. Adds the reason constants and ranks them in the blocking-reason priority. - Quota sizing: resolve vCPU/memory for instanceType-sized instances from a new instanceTypeCatalog (datumcloud/d1-standard-2 = 1 vCPU / 2 GiB) so the quota ResourceClaim requests vcpus + memory, not just instance count. Explicit container limits / instance requests still take precedence. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The downstream WorkloadDeployment status watch mapped events to a reconcile request whose ClusterName was the full decoded org/project path (decodeUpstreamClusterName turned the "cluster-<org>_<project>" namespace label into "<org>/<project>"). But the Milo multicluster provider keys project clusters by bare project name only. As a result every project except the org-less "datum-cloud" failed to resolve: mcmanager routed the unmatched name (ultimately the empty string) to the local host cluster, which has no compute CRDs, so Reconcile failed with "no matches for kind WorkloadDeployment" in a hot loop (~2 errors/sec observed on staging). Extract the bare project name (final path segment) so it matches the provider key, and guard the mapping with GetCluster: if the project cluster isn't engaged yet, drop the event instead of enqueuing a request that falls back to the host cluster and errors. Dropping is safe — once the provider engages the cluster, the For watch reconciles it and the next downstream status event maps cleanly. Rename decodeUpstreamClusterName to projectClusterNameFromLabel to reflect that it now returns the provider cluster key, and add the not-engaged drop case to the mapping test. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The downstream WorkloadDeployment status watch was a complete no-op and the source of a steady ~130 errors/min on the management plane. Two layered causes: milosource.NewClusterSource binds the raw source to the empty cluster name, and the default mchandler.TypedEnqueueRequestsFromMapFunc wraps the map in TypedInjectCluster, which overwrites each request's ClusterName with that bound empty name. So the project cluster name computed by mapDownstreamDeploymentToRequest (and validated by its GetCluster guard) was discarded at enqueue time; every downstream event reached Reconcile with ClusterName="". mcmanager routes the empty name to the local host management cluster, which has no compute CRDs, so the Get failed with "no matches for kind WorkloadDeployment" and requeued in a hot loop — while the watch's actual purpose (immediate status mirror-back) never ran for any project. Switch the handler to TypedEnqueueRequestsFromMapFuncWithClusterPreservation so the map's project cluster name survives to Reconcile, making the downstream watch functional. Add a defensive guard at the top of Reconcile that drops (returns nil, not an error) any request with an empty cluster name, so a host-cluster fallback can never again spin in a requeue loop. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
A small static-PIE Go probe that boots as a Datum compute instance, reads back its mounted ConfigMap/Secret files and injected env vars, and prints each item's sha256 to the console — the tool used to verify that referenced ConfigMaps and Secrets are delivered byte-exact to instances (secret values are hashed and redacted, never printed). Includes the Dockerfile, Kraftfile, and a sample Workload manifest. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…tance name An Instance could wedge Pending forever (QuotaGranted=Unknown/QuotaNoBudget, Quota scheduling gate never removed) even though its Milo ResourceClaim was granted: the Instance reconciled once while the claim was still pending, and nothing re-triggered it when the grant landed a beat later. The ResourceClaim watch mapped a claim to its Spec.ResourceRef — the Project — so the grant enqueued the project name, never the owning Instance. Fix the watch to enqueue the owning Instance: its namespace is carried on a new compute.datumapis.com/instance-namespace label (the claim lives in the project quota namespace, not the Instance's), and its name is the claim name with the resource-kind prefix stripped. Also name the claim after the Instance (unique among Instances in the project control plane) with an "instance-" prefix so it cannot collide with other resource kinds' claims sharing the quota namespace, replacing the previous "<namespace>--<name>" scheme. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
scotwells
added a commit
that referenced
this pull request
Jun 3, 2026
Drop the pre-merge dependency framing (the #129 'not yet on main' note) and repoint example references from PR URLs to relative paths (examples/serverless-js-configmap, examples/config-secret-probe), as if the referenced-data delivery and example PRs are merged. Keep the upstream base/base-compat ROM-enablement limitation, which is a Unikraft runtime matter independent of these PRs. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… them A template-hash change (an image update, or a restartedAt annotation from `datumctl compute restart`) previously resolved to an in-place Update of the Instance. The unikraft provider bakes the pod at creation time and never recomputes an existing pod's spec, so the in-place update silently failed to roll the running workload — instances kept their old pod. Emit a delete (recreate) for drifted Ready instances instead. The next reconcile refills the slot via the create path with the new template, and the provider's finalizer-gated teardown plus create-on-new-Instance roll the pod with no provider changes. Ordered one-at-a-time pacing is preserved by the existing descending-ordinal sort, skip-all-but-first, and the DeletionTimestamp WaitAction. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…rvedGeneration A restart/rolling update was invisible from the project plane: there was no status field representing how many instances are on the new template revision. Add UpdatedReplicas (instances whose observed template hash matches the desired template, regardless of readiness) and ObservedGeneration to both WorkloadDeployment and Workload (plus placement) status. UpdatedReplicas is computed on the cell WD reconcile alongside CurrentReplicas (which is now its Programmed subset), aggregated up into the Workload, and rides the existing status sync to the project plane. Repoint the "Up-to-date" printcolumn to .status.updatedReplicas to match `kubectl get deployment` semantics, so a roll is visible as the count dips below Replicas and recovers. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
scotwells
added a commit
that referenced
this pull request
Jun 4, 2026
Surface rolling-update / restart progress in `datumctl compute workloads` by showing updated/desired replica counts next to ready. UP-TO-DATE counts instances on the latest template revision (status.updatedReplicas), so a roll is visible as the count dips below desired and then recovers. Includes a byte-identical copy of the UpdatedReplicas/ObservedGeneration WorkloadDeployment status fields in api/v1alpha so the plugin can read them. These fields are defined identically on the controller branch (PR #129); the duplicate resolves cleanly once both land on main. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The Instance "Running" status condition is renamed to "Available" (wire
value "Available"). An instance can be available while not actively
running a pod (e.g. scaled to zero), so "Running" was misleading as a
serving/health signal.
Renamed constants:
InstanceRunning -> InstanceAvailable ("Available")
InstanceReadyReasonRunning -> InstanceReadyReasonAvailable ("Available")
InstanceRunningReasonRunning -> InstanceAvailableReasonAvailable ("Available")
InstanceRunningReasonStopped -> InstanceAvailableReasonStopped
InstanceRunningReasonStarting -> InstanceAvailableReasonStarting
InstanceRunningReasonStopping -> InstanceAvailableReasonStopping
BREAKING CHANGE: the on-the-wire Instance condition type changes from
"Running" to "Available". Consumers reading conditions[type=="Running"]
must switch to "Available". Existing Instances self-heal on the next
provider reconcile (the provider re-asserts the condition under its new
name); the stale "Running" entry lingers cosmetically until then and is
no longer read by the Ready derivation.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…eals The instance controller is re-queued by a ResourceClaim watch when the claim is granted, but that grant event lives on the project control plane and can be missed (informer engagement races, watch relist gaps), wedging the instance at QuotaGranted!=True indefinitely (observed: claim Granted, instance stuck QuotaNoBudget until a manual reconcile cleared it). The pending-quota path returned no RequeueAfter, so there was no safety net. Add a backing-off requeue while QuotaGranted is not True, anchored on the condition's last transition: <60s : 1s (catch a grant landing almost immediately) 60s–5m : 15s 5m–10m : 60s >=10m : 300s Folded into the existing referenced-data requeue (soonest wins). The ResourceClaim watch remains the fast path; this only guarantees a missed grant self-heals instead of wedging. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…roof The pending-quota safety-net requeue was wired only at the tail of Reconcile, so an early return during the pending window (a status-update or upstream-writeback conflict) silently dropped it onto controller- runtime's exponential error-backoff — which can stretch to minutes, leaving an instance wedged at QuotaGranted!=True even though its ResourceClaim was granted (observed: the 2nd instance in a rapid burst consistently wedged). - Compute the requeue once, up front, so every return path honors it. - On a Conflict during the pending window, requeue at the bounded quota interval instead of returning the error (which would back off). - Log the requeue decision (and conflict-driven requeues) so the path is observable: a re-firing requeue prints every pass while pending, a dropped one does not. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… LTT Observability revealed the safety-net requeue was firing every reconcile but always at the slowest tier (300s): elapsed was measured from the QuotaGranted condition's LastTransitionTime, which stays at the 1970-01-01 CRD default while quota is pending (PendingEvaluation and NoBudget are both Unknown, so SetStatusCondition never bumps it). Result: a watch-missed instance waited up to 5 minutes for the safety net instead of ~1s, appearing wedged. Anchor elapsed on instance.CreationTimestamp, which reflects actual wait time, so the fast tiers (1s/15s) apply early as intended. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The instance controller emits Warning events on Instances (QuotaNoBudget, ImageUnavailable, InstanceCrashing, ConfigurationError, NetworkFailedToCreate, …) via the event recorder, but no RBAC rule granted it. Every write was rejected — "events is forbidden: ... cannot create resource events in API group \"\" in the namespace ns-<uid>" — so the user-facing signals explaining why an instance is stuck never reached the Instance (kubectl describe / activity timeline). Reconciliation was unaffected; this is an observability gap. Add the kubebuilder marker and regenerate the role. The regen also syncs a pre-existing work.karmada.io/resourcebindings rule (from an existing marker that wasn't reflected in the committed role). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The Lint job had been red on the branch from pre-existing findings, unrelated to the rename/quota work: - gofmt: re-align a struct in the instance-sizing test. - goconst: extract the repeated "datumcloud/d1-standard-2", "app", and "test/image:latest" literals into test constants. Tests and lint both pass locally. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Workloads can now reference ConfigMaps and Secrets and have that data delivered to their instances — in every POP cell the workload runs in — without the user knowing federation exists. Previously a Workload could only set literal environment variables; configuration and credentials had to be baked into images or pasted in as plaintext.
The referenced data is resolved in the trusted management plane and delivered to the edge as derived companion objects, so secret values never enter the Workload or Instance spec and are never projected back to the user.
What this enables
ConfigMap(app config) andSecret(credentials) by name from the Workload template; the platform makes them available to every instance, in every placed cell.How it works
A management-plane resolver collects the referenced objects, materializes one labeled companion per source into the project's federation namespace, and records the expected set on the WorkloadDeployment. The federator's PropagationPolicy carries the companions to the same cells as the deployment, and a
ReferencedDatascheduling gate holds each instance until its companions land. Admission verifies the submitting user can read the referenced objects.The whole path is behind the
enableReferencedDataGatefeature flag (default off), so behavior is unchanged until it's enabled fleet-wide after the cell and provider are deployed.Deployment requirement: Karmada-hub RBAC
Delivery depends on the management-plane controller being able to read the referenced objects and write companions in the project's federation namespace on the Karmada hub. This requires the
compute-managerClusterRole on the hub (config/base/downstream-rbac/rbac.yaml) to grantconfigmaps/secretsaccess — added in this PR.This is an easy gap to miss because it fails silently: the resolver returns
forbiddenon every reconcile, no companions are ever materialized, and mounts simply do nothing on the edge while the Kustomization reports Ready and the controller looks healthy. When promoting to a new environment, confirm this RBAC lands on the hub before enabling the feature.Test plan
ReferencedDataReady=True)