test(jobs): assert reconciler periodic-jobs bind to the reconcile queue#3
Merged
Merged
Conversation
The previous TestQueueReconcileConst only guarded the const string
"reconcile" — but a typo could still route the periodic-job closures
to a different queue if the InsertOpts struct literal was edited in
isolation.
This PR extracts reconcileInsertOpts() — a single helper that returns
the InsertOpts every reconciler closure must use. Both periodic-job
builders (custom-domain, deploy-status) now call the helper, and a
new test exercises it directly:
TestReconcileInsertOpts_BindsToReconcileQueue verifies:
- returns non-nil (a nil opts falls back to QueueDefault, which
is exactly the starvation bug fix/reconcile-queue prevented)
- Queue equals queueReconcile
- Queue resolves to the literal "reconcile" (matches the
QueueConfig key in the river.Config)
Closure code path is identical to before — production behavior
unchanged.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mastermanas805
added a commit
that referenced
this pull request
May 20, 2026
PROBLEM. The prod orphan instant-deploy-04dc0b31 (2026-05-14) sat in
ImagePullBackOff for 9h+. PASS 3 left it alone because the deployments
row was status='deploying', not 'deleted', and its team was active.
deploy_status_reconcile flips a row to 'failed' only when k8s reports
DeploymentReplicaFailure=True — a stuck pod never trips that.
PASS 3 ENHANCEMENTS
- Per-namespace reason labels (team_tombstoned, no_db_row,
failed_old_deployment) drive new Prometheus metric
instant_orphan_sweep_reaped_total{reason}.
- no_db_row now applies a 1h grace via GetNamespaceAge to avoid racing
with in-flight provisions.
- failed_old_deployment reaps instant-deploy-* whose row is status='failed'
AND created_at > 6h ago (autopsy stays in deployment_events).
- Proposed-reap structured log lands BEFORE every delete with full
evidence (constraint #3: operator must see what is about to happen).
PASS 6 (NEW) — STUCK-BUILD DETECTION
Catches deployments stuck in 'building'/'deploying' for >30min whose
only pod is in ImagePullBackOff/ErrImagePull/CrashLoopBackOff.
Flips the row to 'failed' + sets error_message. The autopsy is captured
by the next deploy_status_reconcile tick (one source of truth).
SAFETY
- Whitelist on the three prefixes (instant-deploy-*, instant-customer-*,
instant-stack-*) enforced by the ListNamespacesWithPrefix seam.
- All grace thresholds (1h/6h/30min) conservative on purpose.
- Per-namespace fail-open posture matches the existing passes.
- Pure classifier extracted (classifyDeployOrphan) for direct table-driven testing.
TESTS
- TestOrphanSweep_NamespaceWithoutDBRow_ReapsAfterGrace
- TestOrphanSweep_FailedDeployment_ReapedAfter6h
- TestOrphanSweep_Pass6_StuckBuild_FlipsToFailed
- TestOrphanSweep_Pass6_RunningPod_DoesNotFlip
- TestOrphanSweep_PrefixWhitelist_RefusesUnknownNamespace
- TestOrphanSweep_ClassifyDeployOrphan_TableDriven (10 row shapes)
- TestOrphanSweep_StuckBuildWaitingReasons_Registry (registry-iterating
per CLAUDE.md rule 18)
METRICS
- instant_orphan_sweep_reaped_total{reason} — PASS 3/4/5/6 reaps
- instant_orphan_sweep_reap_failed_total{reason} — k8s/DB failures
Companion infra PR adds the Prom alerts (no_db_row > 0 over 1h → P0).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
`TestQueueReconcileConst` guarded the const string but not the actual InsertOpts binding. A typo in the periodic-job closures could route reconcilers back to the default queue and `TestQueueReconcileConst` would still pass.
Extracts `reconcileInsertOpts()` as the single call site every reconciler closure uses, and adds a test that exercises it directly.
Test plan
🤖 Generated with Claude Code