Skip to content

fix(backend): eliminate race condition in feature flag override writes#839

Merged
Gkrumbach07 merged 1 commit intomainfrom
fix/feature-flag-override-race-condition
Mar 11, 2026
Merged

fix(backend): eliminate race condition in feature flag override writes#839
Gkrumbach07 merged 1 commit intomainfrom
fix/feature-flag-override-race-condition

Conversation

@maskarb
Copy link
Copy Markdown
Contributor

@maskarb maskarb commented Mar 6, 2026

Summary

  • Replace read-modify-write pattern with merge patches for ConfigMap updates, eliminating a race condition that caused 500 errors during batch saves
  • setFlagOverride: now uses MergePatchType to atomically set a single key; falls back to Create + retry if the ConfigMap doesn't exist yet
  • DeleteFeatureFlagOverride: now uses MergePatchType with null value to atomically remove a key
  • Permission checks changed from "update"/"create" to "patch" to match the new write method

Problem

When workspace admins saved multiple feature flag overrides via the batch save UI, the frontend sent parallel PUT requests. The old setFlagOverride used a GET → modify → Update pattern. All concurrent requests read the same resourceVersion, and every request after the first failed with a Kubernetes 409 Conflict (surfaced as HTTP 500).

Test plan

  • Existing Ginkgo feature flag admin tests pass (3 previously-failing create-from-scratch tests now pass)
  • Manual: toggle 3+ flags in Workspace Settings and click Save — all should succeed

Fixes: RHOAIENG-52262

🤖 Generated with Claude Code

Replace read-modify-write pattern with merge patches for ConfigMap
updates in setFlagOverride and DeleteFeatureFlagOverride. The old
pattern (GET → modify → Update) caused 500 errors when the frontend
batch-saved multiple flag overrides in parallel, because concurrent
requests would read the same resourceVersion and the second Update
would fail with a conflict.

Now uses MergePatchType which atomically sets/removes a single key
without reading the full ConfigMap first. Concurrent patches to
different keys never conflict. Permission checks changed from
"update"/"create" to "patch" to match the new write method.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 6, 2026

Claude Code Review - PR 839

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 6, 2026

Claude Code Review

Summary

This PR replaces the error-prone read-modify-write pattern in setFlagOverride and DeleteFeatureFlagOverride with atomic JSON Merge Patches, eliminating the 409 Conflict race condition that caused HTTP 500s during concurrent batch flag saves. The approach is correct and all error-handling paths are covered. RBAC manifests already grant the patch verb for project admins so the permission verb change is safe.

Issues by Severity

Blocker Issues: None

Critical Issues: None

Major Issues: None

Minor Issues:

M1 - Use encoding/json instead of fmt.Sprintf to build patch bodies

File: components/backend/handlers/featureflags_admin.go (both setFlagOverride and DeleteFeatureFlagOverride)

Go's %q verb uses non-JSON hex notation for invalid UTF-8 sequences. Flag names and bool-string values never contain invalid UTF-8 in practice so this is not a production risk, but json.Marshal is the robust alternative that guarantees valid JSON output. Ref: error-handling.md -- prefer type-safe operations over manual string construction.

M2 - Concurrent-create fallback is correct but partially documented

File: components/backend/handlers/featureflags_admin.go, inner IsAlreadyExists branch

The logic is correct: non-AlreadyExists create errors propagate to the outer error guard. A one-line comment to that effect would help the next reader verify the error flow without tracing every path manually.

Positive Highlights

  • Race condition is correctly eliminated: MergePatchType is applied atomically by the API server so concurrent requests touching different keys in the same ConfigMap never conflict.
  • All six execution paths through the nested if blocks (patch ok / 404->create ok / 404->create 409->retry ok / retry fail / create non-409 fail / non-404 patch fail) are correctly handled by the outer error guard.
  • Auth pattern is correct: reqK8s (user token) for the RBAC check, K8sClient (service account) for the actual write -- the validate-then-escalate pattern from k8s-client-usage.md.
  • Permission verb now matches the operation: checking patch is consistent with what is performed, and ambient-project-admin-clusterrole already grants patch on ConfigMaps -- no manifest changes needed.
  • Error messages follow project standards: generic message to user, full context in server logs -- matches error-handling.md.

Recommendations

  1. (Minor) Replace fmt.Sprintf patch-body construction with json.Marshal to eliminate the theoretical invalid-UTF-8 edge case.
  2. (Cosmetic) Add a note in the inner IsAlreadyExists block that non-AlreadyExists create errors propagate to the outer guard.

Overall: clean, well-scoped fix. Safe to merge after considering item 1 above.

@ambient-code ambient-code bot added this to the Merge Queue milestone Mar 10, 2026
@Gkrumbach07 Gkrumbach07 merged commit 3f8f78b into main Mar 11, 2026
29 checks passed
@Gkrumbach07 Gkrumbach07 deleted the fix/feature-flag-override-race-condition branch March 11, 2026 20:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants