Backup registry auth secret must not be owned by any workspace by akurinnoy · Pull Request #1631 · devfile/devworkspace-operator

akurinnoy · 2026-05-13T11:51:31Z

What does this PR do?

This PR removes the controller ownerReference from the backup registry auth secret so it is not garbage-collected when a workspace is deleted. Also makes the restore path fall back to copying the secret from the operator namespace when it is missing in the workspace namespace.
The PR includes an ADR documenting why the auth secret must not be owned by any workspace.

What issues does this PR fix or reference?

Fixes https://redhat.atlassian.net/browse/CRW-10760

Is it tested? How?

New unit tests added. Validated manually on CRC cluster (DWO 0.40.1, quay.io private registry):

Backup job creates auth secret without ownerReferences
Deleting a workspace does not garbage-collect the auth secret
Restore path copies the secret from operator namespace when missing

PR Checklist

E2E tests pass (when PR is ready, comment /test v8-devworkspace-operator-e2e, v8-che-happy-path to trigger)
- v8-devworkspace-operator-e2e: DevWorkspace e2e test
- v8-che-happy-path: Happy path for verification integration with Che

Summary by CodeRabbit

Bug Fixes
- Fixed backup list disappearing for namespace workspaces after deleting individual workspaces when using external registries.
Behavior Change
- Backup auth secret is no longer tied to a specific workspace; if missing, the operator will locate and copy it from the operator namespace on demand.
Documentation
- Added an ADR documenting the backup auth secret lifecycle and garbage-collection behavior.

The backup registry auth secret (devworkspace-backup-registry-auth) is a namespace singleton shared by all workspaces. Setting a controller ownerReference to a single workspace caused Kubernetes garbage collection to delete the secret when that workspace was deleted, breaking backup/restore for all remaining workspaces in the namespace. Remove the SetControllerReference call so the secret persists independently of any workspace lifecycle. The secret is cleaned up naturally when the namespace is deleted. Assisted-by: Claude Code Signed-off-by: Oleksii Kurinnyi <okurynny@redhat.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Oleksii Kurinnyi <okurinny@redhat.com>

When the backup registry auth secret is missing from the workspace namespace (e.g. after GC on upgrade), the restore path now resolves the operator namespace via infrastructure.GetNamespace() and copies the secret from there, matching the backup path behavior. Previously the restore path returned nil when the secret was missing, causing restore init containers to fail on private registries. Assisted-by: Claude Code Signed-off-by: Oleksii Kurinnyi <okurynny@redhat.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Oleksii Kurinnyi <okurinny@redhat.com>

openshift-ci · 2026-05-13T11:51:38Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: akurinnoy
Once this PR has been reviewed and has the lgtm label, please assign dkwon17 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

coderabbitai · 2026-05-13T11:55:11Z

📝 Walkthrough

Walkthrough

Removes workspace controller ownerReference from the namespace-scoped backup registry auth secret, adds operator-namespace fallback via infrastructure.GetNamespace() for restore when operatorConfigNamespace is empty, and updates/extends tests to validate both behaviors and data preservation.

Changes

Backup Auth Secret Lifecycle

Layer / File(s)	Summary
Architectural Decision & Problem Statement `docs/adr-backup-auth-secret-lifecycle.md`	ADR documents the ownerReference GC issue tying a namespace singleton secret to workspace lifecycle and the decision to stop setting controller ownerReferences in `CopySecret()` while preserving sync semantics and describing restore-on-demand via operator namespace resolution.
Secret Lifecycle Implementation Fix `pkg/secrets/backup.go`	Imports updated, `HandleRegistryAuthSecret` resolves operator namespace via `infrastructure.GetNamespace()` when `operatorConfigNamespace` is empty and returns an error on failure; `CopySecret` no longer calls `controllerutil.SetControllerReference` and retains create + AlreadyExists handling.
Test Coverage for Lifecycle Changes `pkg/secrets/backup_test.go`	Tests import `os` and `infrastructure`; existing copy test updated to expect no `ownerReferences`; new suites validate restore-path fallback to operator namespace when workspace secret is missing and that `CopySecret` creates workspace secret without ownerReferences while preserving data keys and `Type`.

Sequence Diagram

sequenceDiagram
    participant HandleAuth as HandleRegistryAuthSecret
    participant Infrastructure as infrastructure.GetNamespace()
    participant CopySecret as CopySecret
    participant Client as c.Create()
    participant WorkspaceNS as Workspace Namespace

    HandleAuth->>HandleAuth: operatorConfigNamespace empty?
    alt operatorConfigNamespace is empty
        HandleAuth->>Infrastructure: GetNamespace() resolve operator NS
        Infrastructure-->>HandleAuth: operator namespace
        HandleAuth->>CopySecret: source: operator NS\ndest: workspace NS
    end
    CopySecret->>Client: Create secret (no SetControllerReference)
    Client->>WorkspaceNS: secret created without ownerReferences
    WorkspaceNS-->>CopySecret: result

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

devfile/devworkspace-operator#1618: Also modifies HandleRegistryAuthSecret / CopySecret behavior and related tests for copying the registry auth secret.
devfile/devworkspace-operator#1614: Also adjusts restore-path behavior for registry auth secret lookup and copying when configured names/namespaces differ.

Suggested labels

lgtm, approved

Suggested reviewers

rohanKanojia
dkwon17
ibuziuk

Poem

🐰 A secret unbound, no owner in sight,
From operator's stash it springs into light,
Copied with care, its data held true,
No garbage to sweep when a workspace bids adieu,
Hops of relief—fresh tests say, "Woo-hoo!"

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically summarizes the main change: removing the workspace controller ownership from the backup registry auth secret to prevent garbage collection.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/secrets/backup_test.go`:
- Around line 279-287: BeforeEach currently calls
os.Setenv(infrastructure.WatchNamespaceEnvVar, operatorNS) without checking the
error and AfterEach unconditionally calls os.Unsetenv; instead, in BeforeEach
capture the prior value with os.LookupEnv, set the env using os.Setenv and
handle any error (fail the test via the test framework), and in AfterEach
restore the original state: if the prior value existed, call
os.Setenv(originalKey, originalVal) and check the error, otherwise call
os.Unsetenv and check the error; reference the BeforeEach/AfterEach blocks and
the use of infrastructure.WatchNamespaceEnvVar and operatorNS to locate where to
add the lookup, error checks, and restoration logic.

In `@pkg/secrets/backup.go`:
- Around line 64-69: The code currently swallows namespace resolution failures
by returning nil, nil when infrastructure.GetNamespace() returns an error;
update the error path in pkg/secrets/backup.go so that when nsErr != nil you
return the error (or a wrapped error) instead of nil, nil, and ensure
operatorConfigNamespace is only set after a successful GetNamespace() call;
reference GetNamespace(), nsErr, and operatorConfigNamespace to locate and fix
the failing branch so restore fails fast with a clear cause.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a4ce1a03-5397-4980-a568-430bf14ef1e2

📥 Commits

Reviewing files that changed from the base of the PR and between 1e949fc and 925f3bb.

📒 Files selected for processing (3)

docs/adr-backup-auth-secret-lifecycle.md
pkg/secrets/backup.go
pkg/secrets/backup_test.go

coderabbitai · 2026-05-13T12:01:25Z

+	BeforeEach(func() {
+		ctx = context.Background()
+		scheme = buildScheme()
+		os.Setenv(infrastructure.WatchNamespaceEnvVar, operatorNS)
+	})
+
+	AfterEach(func() {
+		os.Unsetenv(infrastructure.WatchNamespaceEnvVar)
+	})


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash rg -nP 'os\.(Setenv|Unsetenv)\(' pkg/secrets/backup_test.go rg -nP 'WatchNamespaceEnvVar|BeforeEach|AfterEach' pkg/secrets/backup_test.go

Repository: devfile/devworkspace-operator

Length of output: 437

🏁 Script executed:

cd pkg/secrets && sed -n '279,290p' backup_test.go

Repository: devfile/devworkspace-operator

Length of output: 448

Handle errors from os.Setenv and os.Unsetenv, and restore prior environment state

Lines 282 and 286 ignore errors returned by os.Setenv() and os.Unsetenv(). Additionally, the AfterEach unconditionally unsets the environment variable instead of restoring its original value before the test, which violates the error-handling requirement and can cause test isolation issues. Store the original value before the test and restore it in AfterEach, or explicitly check and handle any errors.

🧰 Tools

🪛 golangci-lint (2.12.2)

[error] 282-282: Error return value of os.Setenv is not checked

(errcheck)

[error] 286-286: Error return value of os.Unsetenv is not checked

(errcheck)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/secrets/backup_test.go` around lines 279 - 287, BeforeEach currently calls os.Setenv(infrastructure.WatchNamespaceEnvVar, operatorNS) without checking the error and AfterEach unconditionally calls os.Unsetenv; instead, in BeforeEach capture the prior value with os.LookupEnv, set the env using os.Setenv and handle any error (fail the test via the test framework), and in AfterEach restore the original state: if the prior value existed, call os.Setenv(originalKey, originalVal) and check the error, otherwise call os.Unsetenv and check the error; reference the BeforeEach/AfterEach blocks and the use of infrastructure.WatchNamespaceEnvVar and operatorNS to locate where to add the lookup, error checks, and restoration logic.

Return an error instead of silently returning nil when infrastructure.GetNamespace() fails on the restore path. This makes auth failures visible immediately rather than causing a confusing image pull error later. Also properly save and restore the WATCH_NAMESPACE env var in tests. Assisted-by: Claude Code Signed-off-by: Oleksii Kurinnyi <okurynny@redhat.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Oleksii Kurinnyi <okurinny@redhat.com>

Signed-off-by: Oleksii Kurinnyi <okurinny@redhat.com>

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (1)

pkg/secrets/backup_test.go (1)

286-287: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Handle and assert errors from env mutation calls.

os.Setenv / os.Unsetenv errors are still ignored in setup/teardown, which breaks errcheck and weakens test isolation guarantees.

#!/bin/bash
# Verify unchecked env mutation calls in this test file
rg -n -C2 'os\.(Setenv|Unsetenv)\(' pkg/secrets/backup_test.go

As per coding guidelines, "Don't ignore errors. Always handle or propagate errors explicitly."

Also applies to: 290-294

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/secrets/backup_test.go` around lines 286 - 287, The test currently
ignores errors from os.Setenv/os.Unsetenv (e.g., the call setting
infrastructure.WatchNamespaceEnvVar to operatorNS), which fails errcheck; update
the test to either use t.Setenv(...) (preferred) or check the returned error and
call t.Fatalf/require.NoError to fail the test on failure, and do the same for
the corresponding Unsetenv calls (and other occurrences around the same block at
the 290-294 region). Ensure you reference the environment variable symbol
infrastructure.WatchNamespaceEnvVar and the operatorNS value when updating the
setup/teardown so errors are handled/asserted.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/secrets/backup_test.go`:
- Around line 120-129: The test is flaky because it assumes WATCH_NAMESPACE is
unset and real infrastructure detection; make it deterministic by explicitly
setting the env var to an empty string (or saving and restoring it) within the
test and by initializing test infrastructure via
infrastructure.InitializeForTesting() so the test does not consult real
environment/infrastructure; update the spec around the call to
secrets.HandleRegistryAuthSecret (and helper calls makeWorkspace/makeConfig if
needed) to call infrastructure.InitializeForTesting() at start and ensure
WATCH_NAMESPACE is explicitly cleared/controlled for the duration of the test,
restoring prior state afterwards.

In `@pkg/secrets/backup.go`:
- Around line 63-69: The code resolves operatorConfigNamespace unconditionally
which causes failures even when no auth is needed; change the logic so
infrastructure.GetNamespace() is only called when AuthSecret is non-empty: wrap
the operatorConfigNamespace resolution inside the branch that checks
cfg.AuthSecret (or AuthSecret variable) and only attempt to resolve/set
operatorConfigNamespace when AuthSecret != ""; apply the same change for the
later block that currently resolves namespace (the code around
operatorConfigNamespace and infrastructure.GetNamespace) so anonymous (no-auth)
flows skip namespace resolution entirely.

---

Duplicate comments:
In `@pkg/secrets/backup_test.go`:
- Around line 286-287: The test currently ignores errors from
os.Setenv/os.Unsetenv (e.g., the call setting
infrastructure.WatchNamespaceEnvVar to operatorNS), which fails errcheck; update
the test to either use t.Setenv(...) (preferred) or check the returned error and
call t.Fatalf/require.NoError to fail the test on failure, and do the same for
the corresponding Unsetenv calls (and other occurrences around the same block at
the 290-294 region). Ensure you reference the environment variable symbol
infrastructure.WatchNamespaceEnvVar and the operatorNS value when updating the
setup/teardown so errors are handled/asserted.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d993c0c2-89ec-4642-997c-d9ed96e27aa5

📥 Commits

Reviewing files that changed from the base of the PR and between 925f3bb and 7189cc3.

📒 Files selected for processing (3)

docs/adr-backup-auth-secret-lifecycle.md
pkg/secrets/backup.go
pkg/secrets/backup_test.go

✅ Files skipped from review due to trivial changes (1)

docs/adr-backup-auth-secret-lifecycle.md

coderabbitai · 2026-05-13T14:59:04Z

+	It("returns error when secret is missing and operator namespace cannot be resolved", func() {
+		By("using a fake client with no secrets and no WATCH_NAMESPACE set")
 		fakeClient := fake.NewClientBuilder().WithScheme(scheme).Build()
 		workspace := makeWorkspace(workspaceNS)
 		config := makeConfig("quay-backup-auth")

 		result, err := secrets.HandleRegistryAuthSecret(ctx, fakeClient, workspace, config, "", scheme, log)
-		Expect(err).NotTo(HaveOccurred())
+		Expect(err).To(HaveOccurred())
+		Expect(err.Error()).To(ContainSubstring("cannot resolve operator namespace"))
 		Expect(result).To(BeNil())


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Make this failure-path test independent of ambient WATCH_NAMESPACE.

This spec assumes the env var is unset but does not enforce it locally, so it can become environment-dependent and flaky. Explicitly control env state for this case.

As per coding guidelines, "In test code, use 'infrastructure.InitializeForTesting()' to mock infrastructure type instead of relying on actual infrastructure detection".

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/secrets/backup_test.go` around lines 120 - 129, The test is flaky because it assumes WATCH_NAMESPACE is unset and real infrastructure detection; make it deterministic by explicitly setting the env var to an empty string (or saving and restoring it) within the test and by initializing test infrastructure via infrastructure.InitializeForTesting() so the test does not consult real environment/infrastructure; update the spec around the call to secrets.HandleRegistryAuthSecret (and helper calls makeWorkspace/makeConfig if needed) to call infrastructure.InitializeForTesting() at start and ensure WATCH_NAMESPACE is explicitly cleared/controlled for the duration of the test, restoring prior state afterwards.

coderabbitai · 2026-05-13T14:59:04Z

 	if operatorConfigNamespace == "" {
-		return nil, nil
+		resolvedNS, nsErr := infrastructure.GetNamespace()
+		if nsErr != nil {
+			return nil, fmt.Errorf("cannot resolve operator namespace to copy registry auth secret: %w", nsErr)
+		}
+		operatorConfigNamespace = resolvedNS
 	}


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid resolving operator namespace before confirming auth is required.

If AuthSecret is empty, the function should proceed anonymously, but it currently tries to resolve operator namespace first and can fail early with an unrelated error. Move namespace resolution to only run when AuthSecret is non-empty.

Also applies to: 72-79

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/secrets/backup.go` around lines 63 - 69, The code resolves operatorConfigNamespace unconditionally which causes failures even when no auth is needed; change the logic so infrastructure.GetNamespace() is only called when AuthSecret is non-empty: wrap the operatorConfigNamespace resolution inside the branch that checks cfg.AuthSecret (or AuthSecret variable) and only attempt to resolve/set operatorConfigNamespace when AuthSecret != ""; apply the same change for the later block that currently resolves namespace (the code around operatorConfigNamespace and infrastructure.GetNamespace) so anonymous (no-auth) flows skip namespace resolution entirely.

akurinnoy and others added 2 commits May 13, 2026 14:32

akurinnoy self-assigned this May 13, 2026

akurinnoy requested review from btjd, dkwon17, ibuziuk and rohanKanojia as code owners May 13, 2026 11:51

coderabbitai Bot reviewed May 13, 2026

View reviewed changes

akurinnoy and others added 2 commits May 13, 2026 17:18

fixup! fix: do not set ownerReference on backup registry auth secret

7189cc3

Signed-off-by: Oleksii Kurinnyi <okurinny@redhat.com>

akurinnoy force-pushed the fix/CRW-10760 branch from da631e3 to 7189cc3 Compare May 13, 2026 14:46

coderabbitai Bot reviewed May 13, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backup registry auth secret must not be owned by any workspace#1631

Backup registry auth secret must not be owned by any workspace#1631
akurinnoy wants to merge 4 commits into
devfile:mainfrom
akurinnoy:fix/CRW-10760

akurinnoy commented May 13, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

openshift-ci Bot commented May 13, 2026

Uh oh!

coderabbitai Bot commented May 13, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 13, 2026

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 13, 2026

Uh oh!

coderabbitai Bot May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

akurinnoy commented May 13, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

What issues does this PR fix or reference?

Is it tested? How?

PR Checklist

Summary by CodeRabbit

Uh oh!

openshift-ci Bot commented May 13, 2026

Uh oh!

coderabbitai Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

akurinnoy commented May 13, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 13, 2026 •

edited

Loading