Skip to content

fix(operator): defer workflow reconciliation until runner is ready#1487

Merged
mergify[bot] merged 2 commits intomainfrom
fix/workflow-reconciliation-retry
Apr 30, 2026
Merged

fix(operator): defer workflow reconciliation until runner is ready#1487
mergify[bot] merged 2 commits intomainfrom
fix/workflow-reconciliation-retry

Conversation

@ambient-code
Copy link
Copy Markdown
Contributor

@ambient-code ambient-code Bot commented Apr 30, 2026

Summary

  • Remove premature workflow reconciliation from the Pending phase handler — reconcileActiveWorkflowWithPatch() was called before the runner pod existed, causing DNS failures that were silently discarded via _ =
  • Add workflow retry in Running phase — new NeedsWorkflowReconciliation() check detects WorkflowReconciled != True and triggers ReconcileWorkflow() with 5-second backoff requeue
  • Fix swallowed errors — replace _ = statusPatch.Apply() with proper error logging in ReconcileSpecChanges, and add requeue on spec reconciliation failure

Root Cause

Three-part failure chain (as described in #1486):

  1. reconcileActiveWorkflowWithPatch() attempted HTTP POST to runner before pod existed → DNS error
  2. Error silently discarded (_ = reconcileActiveWorkflowWithPatch(...))
  3. observedGeneration set despite failure, preventing Running phase from retrying

Changes

File What
handlers/sessions.go Remove reconcileActiveWorkflowWithPatch() call from Pending phase; log repo reconciliation errors
handlers/reconciler.go Add ReconcileWorkflow() for targeted workflow retry; fix _ = statusPatch.Apply()
handlers/helpers.go Add NeedsWorkflowReconciliation() to detect unreconciled workflows
handlers/helpers_test.go 6 test cases for NeedsWorkflowReconciliation
controller/reconcile_phases.go Add workflow retry check in reconcileRunning(); requeue on ReconcileSpecChanges failure

Test plan

  • go build ./... passes
  • go vet ./... passes
  • TestNeedsWorkflowReconciliation — 6 cases covering: no workflow, empty gitUrl, no conditions, True condition, False condition, missing condition
  • Pre-existing handler tests still pass (registry tests fail due to missing config file — pre-existing)
  • Deploy to dev cluster and create session with activeWorkflow — verify WorkflowReconciled becomes True
  • Verify sessions without workflows are unaffected

Closes #1486

🤖 Generated with Claude Code


🤖 Ambient Session

Ambient Code Bot and others added 2 commits April 30, 2026 12:42
Workflow reconciliation was attempted during the Pending phase before the
runner pod existed, causing DNS failures that were silently discarded.
The observedGeneration was then set, preventing any retry in the Running
phase.

Move workflow reconciliation to the Running phase where the runner HTTP
endpoint is reachable. Add NeedsWorkflowReconciliation check and
ReconcileWorkflow to retry failed or missing workflow application with
5-second backoff. Also fix swallowed errors in ReconcileSpecChanges and
add requeue on spec reconciliation failure.

Closes #1486

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@ambient-code ambient-code Bot added ambient-code:managed PR managed by AI automation ambient-code:self-reviewed Self-reviewed by Ambient agent labels Apr 30, 2026
@netlify
Copy link
Copy Markdown

netlify Bot commented Apr 30, 2026

Deploy Preview for cheerful-kitten-f556a0 canceled.

Name Link
🔨 Latest commit fed6387
🔍 Latest deploy log https://app.netlify.com/projects/cheerful-kitten-f556a0/deploys/69f34e76a0ff5c0008ef54f1

@mergify mergify Bot added the queued label Apr 30, 2026
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Apr 30, 2026

Merge Queue Status

  • Entered queue2026-04-30 13:14 UTC · Rule: default
  • Checks skipped · PR is already up-to-date
  • Merged2026-04-30 13:15 UTC · at fed63879c23a752225dbdfaa6382f0ce7647d252 · squash

This pull request spent 15 seconds in the queue, including 2 seconds running CI.

Required conditions to merge

@mergify mergify Bot merged commit 6347f32 into main Apr 30, 2026
65 of 66 checks passed
@mergify mergify Bot deleted the fix/workflow-reconciliation-retry branch April 30, 2026 13:15
@mergify mergify Bot removed the queued label Apr 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ambient-code:managed PR managed by AI automation ambient-code:self-reviewed Self-reviewed by Ambient agent

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug(operator): workflow reconciliation fails on session creation and is never retried

0 participants