feat(decofile): cascade-delete via Revision ownerReference#15
Merged
Conversation
The DecofileReconciler now adds the Knative Revision matching a Decofile's deploymentId as an ownerReference (controller=false) on the Decofile itself. When the Revision is later garbage-collected — either by Knative GC (maxNonActiveRevisions) or by the cluster's knative-clean-revisions CronJob — Kubernetes garbage collection cascades through Revision -> Decofile -> ConfigMap, eliminating orphan accumulation. Mechanism: * Reconcile path: after fetching the Decofile, syncRevisionOwnerRefs lists Revisions in the same namespace whose app.deco/deploymentId label matches the Decofile's deploymentId (or metadata.name when spec.deploymentId is empty), and appends any not-yet-present Revision as an ownerReference. Refetch-before-update avoids optimistic concurrency conflicts with the status update that happens later in the same Reconcile. Failures are logged but non-fatal — the operator continues without owner refs rather than blocking ConfigMap creation. * Watch path: Revision Create events enqueue any Decofile in the same namespace whose effective deploymentId matches the new Revision's label, so newly-created Revisions trigger ownerRef sync even when the corresponding Decofile already existed (the common case where the admin/build pipeline creates the Decofile slightly before the KSvc). Edge cases covered by the implementation and the unit tests: * No matching Revision yet: nothing is added; reconcile completes. * Revision in DeletionTimestamp: skipped (don't link to dying owners). * Multiple Revisions with same deploymentId (rollback): both become owners. Kubernetes GC waits for ALL owners to be deleted before reclaiming the Decofile. * Re-running Reconcile is idempotent — existing UIDs are detected and not duplicated. * Explicit spec.deploymentId is respected; decoys matching only the Decofile name are ignored. Defense-in-depth: the controller-cluster CronJob (PR decocms/infra_applications#92, deferred) remains useful as a fallback for legacy Decofiles created before this change, and for the rare case where the Revision is deleted before the operator manages to patch the ownerRef. Those orphans are still detectable by the script in infra_applications. Backfill of the existing fleet (~2276 Decofiles cluster-wide): the new operator will retroactively patch ownerRefs to any Decofile that still has a matching Revision, the next time each one is reconciled. Decofiles whose Revisions are already gone (likely the majority of the backlog) won't gain an owner — those require a one-off cleanup. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
hugo-ccabral
previously approved these changes
May 22, 2026
- prealloc: pre-size toAdd slice to len(revs.Items) - unparam: drop unused namespace param from test helpers (always "sites-foo")
hugo-ccabral
approved these changes
May 22, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The
DecofileReconcilernow sets the matching KnativeRevisionas anownerReference(controller=false) on theDecofile. When the Revision is later garbage-collected — either by Knative GC (maxNonActiveRevisions: 1) or by the cluster'sknative-clean-revisionsCronJob — Kubernetes garbage collection cascades:No CronJob, no script, no heuristics. Just declarative Kubernetes GC.
Why
Today, every build produces a new Decofile + new Revision pair. When the Revision is later cleaned up (either by Knative or by
knative-clean-revisions), the Decofile and its ConfigMap are left as orphans. The cluster currently carries ~2276 Decofiles, the majority of which are orphans of long-deleted Revisions.The alternative considered was a periodic cleanup CronJob (https://github.com/decocms/infra_applications/pull/92 — kept open as fallback). It works but accumulates complexity (multi-pass with safety cap, TOCTOU guard, dry-run gates). OwnerReference is one mechanism instead of three, native to the platform.
Mechanism
Reconcile path
After fetching the Decofile,
syncRevisionOwnerRefslists Revisions in the same namespace whoseapp.deco/deploymentIdlabel matches the Decofile's effective deploymentId (the explicitspec.deploymentIdif set, otherwisemetadata.name), and appends any not-yet-present Revision as anownerReference. Refetch-before-update prevents optimistic concurrency conflicts with the later status update.Failures are logged but non-fatal — if Revisions can't be listed for any reason (RBAC drift, API hiccup, transient timeout), the rest of the reconcile (ConfigMap creation, pod notification) continues without the ownerRef. The legacy cleanup fallback handles those rare cases.
Watch path
A new
Watches(&servingv1.Revision{}, ...)with aCreateFunc-only predicate enqueues any Decofile in the same namespace whose effective deploymentId matches a newly-created Revision's label. This covers the common case where the admin/build pipeline creates the Decofile slightly before the KSvc, so the Revision doesn't exist yet at the time the Decofile is reconciled.Update/Delete events on Revisions are intentionally NOT watched — they would only trigger no-op reconciles (existing ownerRef already in place; Delete is handled by K8s GC natively).
Multiple owners (rollback case)
A rollback that re-uses an existing deploymentId creates a new Revision with a different UID but the same label. Both Revisions become owners; the Decofile is only GC'd once both are gone. This is the correct semantics — the Decofile is needed as long as either Revision can scale up.
Tests
internal/controller/decofile_owner_test.go(new file, fake-client based — no envtest extension needed) covers:AddsOwnerWhenRevisionExists— happy pathNoMatchingRevisionDoesNothing— early-life Decofile with no Revision yetIsIdempotent— running 3× produces 1 ownerRefSkipsRevisionBeingDeleted— Revision with DeletionTimestamp is ignoredMultipleRevisionsBecomeMultipleOwners— rollback / canary caseRespectsExplicitDeploymentId—spec.deploymentIdwins overmetadata.namemapRevisionToDecofile— finds Decofile by defaulted ID, by explicit ID, ignores Revisions without the labelAll 9 unit tests pass locally; the existing Ginkgo suite continues to pass (
make test).RBAC
The kubebuilder marker
+kubebuilder:rbac:groups=serving.knative.dev,resources=revisions,verbs=get;list;watchwas added.make generate manifestsregeneratedconfig/rbac/role.yamlandchart/templates/clusterrole-operator-manager-role.yaml(note: therevisionspermission joins the existingservicespermission already granted to the same ClusterRole by the Service webhook).Backfill of existing Decofiles
The new operator will retroactively patch ownerRefs to any Decofile whose Revision is still alive, on its next reconcile. For Decofiles whose Revision has already been GC'd (likely the majority of the cluster's ~2276), no Revision exists to point to — those remain orphans and need a one-off manual cleanup (separate PR / script in
infra_applications).Test plan
decocms/infra_applications/provisioning/deco-operator/main/Chart.yamleks-serverlesskubectl -n sites-abracadabra get decofile mhsygflbgo -o jsonpath='{.metadata.ownerReferences}'Risks
kubectl get decofile -A.BlockOwnerDeletion— Revisions are not blocked from deletion by their Decofile ownership; cascading deletes proceed normally.🤖 Generated with Claude Code
Summary by cubic
Adds cascade deletion for Decofiles by setting the matching Knative Revision as an ownerReference, so when a Revision is GC’d the Decofile and its ConfigMap are removed too. Existing Decofiles get ownerRefs on their next reconcile; run a one-off cleanup for legacy orphans whose Revisions are already gone.
New Features
Bug Fixes
golangci-lint: preallocate slices and remove an unused test helper param; no behavior change.Written for commit 2585560. Summary will update on new commits. Review in cubic