Backup registry auth secret must not be owned by any workspace#1631
Backup registry auth secret must not be owned by any workspace#1631akurinnoy wants to merge 4 commits into
Conversation
The backup registry auth secret (devworkspace-backup-registry-auth) is a namespace singleton shared by all workspaces. Setting a controller ownerReference to a single workspace caused Kubernetes garbage collection to delete the secret when that workspace was deleted, breaking backup/restore for all remaining workspaces in the namespace. Remove the SetControllerReference call so the secret persists independently of any workspace lifecycle. The secret is cleaned up naturally when the namespace is deleted. Assisted-by: Claude Code Signed-off-by: Oleksii Kurinnyi <okurynny@redhat.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Oleksii Kurinnyi <okurinny@redhat.com>
When the backup registry auth secret is missing from the workspace namespace (e.g. after GC on upgrade), the restore path now resolves the operator namespace via infrastructure.GetNamespace() and copies the secret from there, matching the backup path behavior. Previously the restore path returned nil when the secret was missing, causing restore init containers to fail on private registries. Assisted-by: Claude Code Signed-off-by: Oleksii Kurinnyi <okurynny@redhat.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Oleksii Kurinnyi <okurinny@redhat.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: akurinnoy The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
📝 WalkthroughWalkthroughRemoves workspace controller ownerReference from the namespace-scoped backup registry auth secret, adds operator-namespace fallback via infrastructure.GetNamespace() for restore when operatorConfigNamespace is empty, and updates/extends tests to validate both behaviors and data preservation. ChangesBackup Auth Secret Lifecycle
Sequence DiagramsequenceDiagram
participant HandleAuth as HandleRegistryAuthSecret
participant Infrastructure as infrastructure.GetNamespace()
participant CopySecret as CopySecret
participant Client as c.Create()
participant WorkspaceNS as Workspace Namespace
HandleAuth->>HandleAuth: operatorConfigNamespace empty?
alt operatorConfigNamespace is empty
HandleAuth->>Infrastructure: GetNamespace() resolve operator NS
Infrastructure-->>HandleAuth: operator namespace
HandleAuth->>CopySecret: source: operator NS\ndest: workspace NS
end
CopySecret->>Client: Create secret (no SetControllerReference)
Client->>WorkspaceNS: secret created without ownerReferences
WorkspaceNS-->>CopySecret: result
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/secrets/backup_test.go`:
- Around line 279-287: BeforeEach currently calls
os.Setenv(infrastructure.WatchNamespaceEnvVar, operatorNS) without checking the
error and AfterEach unconditionally calls os.Unsetenv; instead, in BeforeEach
capture the prior value with os.LookupEnv, set the env using os.Setenv and
handle any error (fail the test via the test framework), and in AfterEach
restore the original state: if the prior value existed, call
os.Setenv(originalKey, originalVal) and check the error, otherwise call
os.Unsetenv and check the error; reference the BeforeEach/AfterEach blocks and
the use of infrastructure.WatchNamespaceEnvVar and operatorNS to locate where to
add the lookup, error checks, and restoration logic.
In `@pkg/secrets/backup.go`:
- Around line 64-69: The code currently swallows namespace resolution failures
by returning nil, nil when infrastructure.GetNamespace() returns an error;
update the error path in pkg/secrets/backup.go so that when nsErr != nil you
return the error (or a wrapped error) instead of nil, nil, and ensure
operatorConfigNamespace is only set after a successful GetNamespace() call;
reference GetNamespace(), nsErr, and operatorConfigNamespace to locate and fix
the failing branch so restore fails fast with a clear cause.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: a4ce1a03-5397-4980-a568-430bf14ef1e2
📒 Files selected for processing (3)
docs/adr-backup-auth-secret-lifecycle.mdpkg/secrets/backup.gopkg/secrets/backup_test.go
| BeforeEach(func() { | ||
| ctx = context.Background() | ||
| scheme = buildScheme() | ||
| os.Setenv(infrastructure.WatchNamespaceEnvVar, operatorNS) | ||
| }) | ||
|
|
||
| AfterEach(func() { | ||
| os.Unsetenv(infrastructure.WatchNamespaceEnvVar) | ||
| }) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
rg -nP 'os\.(Setenv|Unsetenv)\(' pkg/secrets/backup_test.go
rg -nP 'WatchNamespaceEnvVar|BeforeEach|AfterEach' pkg/secrets/backup_test.goRepository: devfile/devworkspace-operator
Length of output: 437
🏁 Script executed:
cd pkg/secrets && sed -n '279,290p' backup_test.goRepository: devfile/devworkspace-operator
Length of output: 448
Handle errors from os.Setenv and os.Unsetenv, and restore prior environment state
Lines 282 and 286 ignore errors returned by os.Setenv() and os.Unsetenv(). Additionally, the AfterEach unconditionally unsets the environment variable instead of restoring its original value before the test, which violates the error-handling requirement and can cause test isolation issues. Store the original value before the test and restore it in AfterEach, or explicitly check and handle any errors.
🧰 Tools
🪛 golangci-lint (2.12.2)
[error] 282-282: Error return value of os.Setenv is not checked
(errcheck)
[error] 286-286: Error return value of os.Unsetenv is not checked
(errcheck)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pkg/secrets/backup_test.go` around lines 279 - 287, BeforeEach currently
calls os.Setenv(infrastructure.WatchNamespaceEnvVar, operatorNS) without
checking the error and AfterEach unconditionally calls os.Unsetenv; instead, in
BeforeEach capture the prior value with os.LookupEnv, set the env using
os.Setenv and handle any error (fail the test via the test framework), and in
AfterEach restore the original state: if the prior value existed, call
os.Setenv(originalKey, originalVal) and check the error, otherwise call
os.Unsetenv and check the error; reference the BeforeEach/AfterEach blocks and
the use of infrastructure.WatchNamespaceEnvVar and operatorNS to locate where to
add the lookup, error checks, and restoration logic.
Return an error instead of silently returning nil when infrastructure.GetNamespace() fails on the restore path. This makes auth failures visible immediately rather than causing a confusing image pull error later. Also properly save and restore the WATCH_NAMESPACE env var in tests. Assisted-by: Claude Code Signed-off-by: Oleksii Kurinnyi <okurynny@redhat.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Oleksii Kurinnyi <okurinny@redhat.com>
Signed-off-by: Oleksii Kurinnyi <okurinny@redhat.com>
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
pkg/secrets/backup_test.go (1)
286-287:⚠️ Potential issue | 🟠 Major | ⚡ Quick winHandle and assert errors from env mutation calls.
os.Setenv/os.Unsetenverrors are still ignored in setup/teardown, which breaks errcheck and weakens test isolation guarantees.#!/bin/bash # Verify unchecked env mutation calls in this test file rg -n -C2 'os\.(Setenv|Unsetenv)\(' pkg/secrets/backup_test.goAs per coding guidelines, "Don't ignore errors. Always handle or propagate errors explicitly."
Also applies to: 290-294
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/secrets/backup_test.go` around lines 286 - 287, The test currently ignores errors from os.Setenv/os.Unsetenv (e.g., the call setting infrastructure.WatchNamespaceEnvVar to operatorNS), which fails errcheck; update the test to either use t.Setenv(...) (preferred) or check the returned error and call t.Fatalf/require.NoError to fail the test on failure, and do the same for the corresponding Unsetenv calls (and other occurrences around the same block at the 290-294 region). Ensure you reference the environment variable symbol infrastructure.WatchNamespaceEnvVar and the operatorNS value when updating the setup/teardown so errors are handled/asserted.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/secrets/backup_test.go`:
- Around line 120-129: The test is flaky because it assumes WATCH_NAMESPACE is
unset and real infrastructure detection; make it deterministic by explicitly
setting the env var to an empty string (or saving and restoring it) within the
test and by initializing test infrastructure via
infrastructure.InitializeForTesting() so the test does not consult real
environment/infrastructure; update the spec around the call to
secrets.HandleRegistryAuthSecret (and helper calls makeWorkspace/makeConfig if
needed) to call infrastructure.InitializeForTesting() at start and ensure
WATCH_NAMESPACE is explicitly cleared/controlled for the duration of the test,
restoring prior state afterwards.
In `@pkg/secrets/backup.go`:
- Around line 63-69: The code resolves operatorConfigNamespace unconditionally
which causes failures even when no auth is needed; change the logic so
infrastructure.GetNamespace() is only called when AuthSecret is non-empty: wrap
the operatorConfigNamespace resolution inside the branch that checks
cfg.AuthSecret (or AuthSecret variable) and only attempt to resolve/set
operatorConfigNamespace when AuthSecret != ""; apply the same change for the
later block that currently resolves namespace (the code around
operatorConfigNamespace and infrastructure.GetNamespace) so anonymous (no-auth)
flows skip namespace resolution entirely.
---
Duplicate comments:
In `@pkg/secrets/backup_test.go`:
- Around line 286-287: The test currently ignores errors from
os.Setenv/os.Unsetenv (e.g., the call setting
infrastructure.WatchNamespaceEnvVar to operatorNS), which fails errcheck; update
the test to either use t.Setenv(...) (preferred) or check the returned error and
call t.Fatalf/require.NoError to fail the test on failure, and do the same for
the corresponding Unsetenv calls (and other occurrences around the same block at
the 290-294 region). Ensure you reference the environment variable symbol
infrastructure.WatchNamespaceEnvVar and the operatorNS value when updating the
setup/teardown so errors are handled/asserted.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: d993c0c2-89ec-4642-997c-d9ed96e27aa5
📒 Files selected for processing (3)
docs/adr-backup-auth-secret-lifecycle.mdpkg/secrets/backup.gopkg/secrets/backup_test.go
✅ Files skipped from review due to trivial changes (1)
- docs/adr-backup-auth-secret-lifecycle.md
| It("returns error when secret is missing and operator namespace cannot be resolved", func() { | ||
| By("using a fake client with no secrets and no WATCH_NAMESPACE set") | ||
| fakeClient := fake.NewClientBuilder().WithScheme(scheme).Build() | ||
| workspace := makeWorkspace(workspaceNS) | ||
| config := makeConfig("quay-backup-auth") | ||
|
|
||
| result, err := secrets.HandleRegistryAuthSecret(ctx, fakeClient, workspace, config, "", scheme, log) | ||
| Expect(err).NotTo(HaveOccurred()) | ||
| Expect(err).To(HaveOccurred()) | ||
| Expect(err.Error()).To(ContainSubstring("cannot resolve operator namespace")) | ||
| Expect(result).To(BeNil()) |
There was a problem hiding this comment.
Make this failure-path test independent of ambient WATCH_NAMESPACE.
This spec assumes the env var is unset but does not enforce it locally, so it can become environment-dependent and flaky. Explicitly control env state for this case.
As per coding guidelines, "In test code, use 'infrastructure.InitializeForTesting()' to mock infrastructure type instead of relying on actual infrastructure detection".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pkg/secrets/backup_test.go` around lines 120 - 129, The test is flaky because
it assumes WATCH_NAMESPACE is unset and real infrastructure detection; make it
deterministic by explicitly setting the env var to an empty string (or saving
and restoring it) within the test and by initializing test infrastructure via
infrastructure.InitializeForTesting() so the test does not consult real
environment/infrastructure; update the spec around the call to
secrets.HandleRegistryAuthSecret (and helper calls makeWorkspace/makeConfig if
needed) to call infrastructure.InitializeForTesting() at start and ensure
WATCH_NAMESPACE is explicitly cleared/controlled for the duration of the test,
restoring prior state afterwards.
| if operatorConfigNamespace == "" { | ||
| return nil, nil | ||
| resolvedNS, nsErr := infrastructure.GetNamespace() | ||
| if nsErr != nil { | ||
| return nil, fmt.Errorf("cannot resolve operator namespace to copy registry auth secret: %w", nsErr) | ||
| } | ||
| operatorConfigNamespace = resolvedNS | ||
| } |
There was a problem hiding this comment.
Avoid resolving operator namespace before confirming auth is required.
If AuthSecret is empty, the function should proceed anonymously, but it currently tries to resolve operator namespace first and can fail early with an unrelated error. Move namespace resolution to only run when AuthSecret is non-empty.
Also applies to: 72-79
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pkg/secrets/backup.go` around lines 63 - 69, The code resolves
operatorConfigNamespace unconditionally which causes failures even when no auth
is needed; change the logic so infrastructure.GetNamespace() is only called when
AuthSecret is non-empty: wrap the operatorConfigNamespace resolution inside the
branch that checks cfg.AuthSecret (or AuthSecret variable) and only attempt to
resolve/set operatorConfigNamespace when AuthSecret != ""; apply the same change
for the later block that currently resolves namespace (the code around
operatorConfigNamespace and infrastructure.GetNamespace) so anonymous (no-auth)
flows skip namespace resolution entirely.
What does this PR do?
This PR removes the controller ownerReference from the backup registry auth secret so it is not garbage-collected when a workspace is deleted. Also makes the restore path fall back to copying the secret from the operator namespace when it is missing in the workspace namespace.
The PR includes an ADR documenting why the auth secret must not be owned by any workspace.
What issues does this PR fix or reference?
Fixes https://redhat.atlassian.net/browse/CRW-10760
Is it tested? How?
New unit tests added. Validated manually on CRC cluster (DWO 0.40.1, quay.io private registry):
PR Checklist
/test v8-devworkspace-operator-e2e, v8-che-happy-pathto trigger)v8-devworkspace-operator-e2e: DevWorkspace e2e testv8-che-happy-path: Happy path for verification integration with CheSummary by CodeRabbit
Bug Fixes
Behavior Change
Documentation