Implementation Plan: Queue Ejection Loop Fix#628
Conversation
… in wait_for_ci Gap 2: DefaultMergeQueueWatcher._make_result now accepts ejection_cause; returns pr_state='ejected_ci_failure' + ejection_cause='ci_failure' when checks_state=='FAILURE' at both the CLOSED terminal and the confirmed-not-in-queue ejection path. Gap 5: wait_for_ci tool handler now includes head_sha in the returned JSON when git rev-parse HEAD succeeds, enabling orchestrators to verify CI results correspond to the current HEAD after a force-push. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…, 3, 4, 6) - Insert ci_watch_post_queue_fix step in implementation.yaml, remediation.yaml, and implementation-groups.yaml between re_push_queue_fix and reenter_merge_queue to validate CI on the rebased feature branch before re-entering the merge queue - Add ejected_ci_failure route in wait_for_queue.on_result (before ejected) in all three recipes to route CI-failure ejections to diagnose_ci instead of the conflict-resolution path - Add Steps 5a and 5b to resolve-merge-conflicts/SKILL.md: language-aware manifest validation and duplicate key scanning after pre-commit, with escalation on failure - Update clean-rebase success routing in Step 2 to go through Step 5a instead of skipping directly to Step 7 - Add regression tests for all structural invariants across all three recipe fixtures Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Trecek
left a comment
There was a problem hiding this comment.
AutoSkillit PR Review — Verdict: changes_requested
8 actionable findings (warning severity) require fixes before merge.
| cwd=cwd, | ||
| ) | ||
|
|
||
| # Include head_sha used for this CI check so orchestrators can verify |
There was a problem hiding this comment.
[info] slop: Two-line comment restates what the code does. The second sentence duplicates the first. A single concise comment is sufficient.
|
|
||
| # Include head_sha used for this CI check so orchestrators can verify | ||
| # CI results correspond to the current HEAD after a force-push. | ||
| if scope.head_sha: |
There was a problem hiding this comment.
[info] defense: scope.head_sha type contract is not visible here. If the attribute is Optional[str], the truthiness guard is correct. If it can be an empty string sentinel, document whether absence means 'not applicable' vs 'could not determine'.
| assert "Invalid repo format" in result["reason"] | ||
|
|
||
|
|
||
| class TestEjectionEnrichment: |
There was a problem hiding this comment.
[info] tests: No test covers checks_state='ERROR' at ejection. The production code only checks == 'FAILURE'; any other non-None value falls through to plain 'ejected'. A test for ERROR would prevent regressions if the condition is accidentally widened.
| ] | ||
| assert ci_failure_routes, ( | ||
| f"wait_for_queue.on_result must route ejected_ci_failure to diagnose_ci" | ||
| f" in {recipe_fixture}" |
There was a problem hiding this comment.
[warning] tests: The ejected_ci_failure condition is matched with 'in' (substring) while the plain ejected condition uses exact string equality. This asymmetry is fragile: minor whitespace variation in the recipe YAML would cause ejected_idx to be None, producing a misleading 'ejected route must still exist' failure rather than an ordering failure.
There was a problem hiding this comment.
Investigated — this is intentional. The actual YAML condition string is '${{ result.pr_state }} == ejected_ci_failure' (a full template expression). Substring match ('ejected_ci_failure' in w) is the only correct approach to match this full expression. The ejected condition uses exact equality because its full expression '${{ result.pr_state }} == ejected' can be matched exactly and is unambiguous. The asymmetry is deliberate: substring is needed for ejected_ci_failure because it appears inside a longer expression string, not as a standalone value.
| assert "subtype" not in result | ||
|
|
||
|
|
||
| # --------------------------------------------------------------------------- |
There was a problem hiding this comment.
[info] tests: The assertion only verifies head_sha is present and correct. It does not assert that the CI watcher result fields (run_id, conclusion, failed_jobs) are also present in the merged output. A bug where the merge returns only {head_sha: ...} would not be caught.
| SubprocessResult( | ||
| returncode=0, | ||
| stdout="deadbeef1234\n", | ||
| stderr="", |
There was a problem hiding this comment.
[info] tests: The test checks that head_sha and exit_code are absent on git failure, but does not verify the CI watcher result fields are still present. If the implementation accidentally returns an empty dict on git failure, this test would still pass.
| on_failure: release_issue_failure | ||
| skip_when_false: "inputs.open_pr" | ||
|
|
||
| ci_watch_post_queue_fix: |
There was a problem hiding this comment.
[info] cohesion: ci_watch_post_queue_fix in remediation.yaml is byte-for-byte identical to implementation.yaml, including skip_when_false: 'inputs.open_pr'. The remediation flow operates on already-open PRs — confirm whether this skip_when_false guard is semantically correct for the remediation recipe context.
Trecek
left a comment
There was a problem hiding this comment.
AutoSkillit review found 8 blocking issues (verdict: changes_requested). See inline comments for details.
… in merge_queue.py
…and add state-transition test
…queue_fix'] access
## Summary `DefaultMergeQueueWatcher.wait` in `src/autoskillit/execution/merge_queue.py:164-270` classifies PR states **by elimination**: the final `return _make_result(False, "ejected", ...)` at line 260 fires whenever no other gate (merged, closed, stalled, pending, CI-failed) has matched. Nothing in the pipeline requires a **positive signal** of ejection, and three separate recurrences (PRs #422, #519, #628, now #802) have each patched one new signal into the elimination tree without eliminating the elimination. Part A delivers architectural immunity at three layers: 1. **Data layer immunity** — `_QUERY` selects a new `mergeable` field; `PRFetchState` gains `mergeable` and `auto_merge_present`; a module-level `RuntimeError` (mirroring `recipe/io.py:126-161`) asserts `_QUERY` selections match `PRFetchState.__required_keys__` at import time. Adding a field to one without the other fails at every import, every test run, every server startup. 2. **Classifier layer immunity** — Extract a pure `_classify_pr_state(state) -> ClassificationResult` function from `wait()`. The classifier uses **positive-signal gates only**: every terminal return originates from a direct positive signal. No fall-through to `EJECTED`. When no positive signal matches, the classifier raises `ClassifierInconclusive`, which `wait()` handles by continuing to poll within a bounded retry budget. `DROPPED_HEALTHY` is introduced as a positive-signal terminal for "PR is healthy but auto_merge was cleared." 3. **Type system immunity** — Introduce `PRState(StrEnum)` in `_type_enums.py` with every terminal value. Replace free-form strings in `_make_result`, the classifier, and tests. Wire into protocol docstrings. Part A also ships the minimal recipe routing to make the fix usable: a new `DROPPED_HEALTHY → reenter_merge_queue_cheap` arm on all three recipe YAMLs, plus the new `reenter_merge_queue_cheap` step itself. Full cross-recipe parity enforcement is deferred to the follow-up part. ## Architecture Impact ### State Lifecycle Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 65, 'curve': 'basis'}}}%% flowchart TB %% CLASS DEFINITIONS %% classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000; classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; %% ENTRY %% ENTRY(["Tool call:<br/>wait_for_merge_queue"]) subgraph Gates ["VALIDATION GATES — ordered"] direction TB G0["Import-time Schema Guard<br/>━━━━━━━━━━<br/>Key-set parity: PRFetchState ↔ _QUERY_FIELD_MAP<br/>GraphQL coverage: every field in _QUERY<br/>Raises RuntimeError — module won't load"] G1["● _require_enabled()<br/>━━━━━━━━━━<br/>tools_ci.py gate<br/>Short-circuits ALL tools if plugin disabled<br/>Returns structured error string"] G2["Runtime Input Guard<br/>━━━━━━━━━━<br/>repo non-empty AND contains '/'\nReturns {success:false, pr_state:'error'}"] G3["Watcher Presence Guard<br/>━━━━━━━━━━<br/>merge_queue_watcher is not None<br/>Returns {pr_state:'error', reason:'not configured'}"] end subgraph SnapshotLayer ["INIT_ONLY — PRFetchState (per-cycle snapshot)"] direction LR F1["● merged: bool<br/>━━━━━━━━━━<br/>INIT_ONLY"] F2["● state: str<br/>━━━━━━━━━━<br/>INIT_ONLY"] F3["● mergeable: str<br/>━━━━━━━━━━<br/>INIT_ONLY"] F4["● in_queue: bool<br/>━━━━━━━━━━<br/>INIT_ONLY"] F5["● queue_state: str|None<br/>━━━━━━━━━━<br/>INIT_ONLY"] F6["● checks_state: str|None<br/>━━━━━━━━━━<br/>INIT_ONLY"] F7["● auto_merge_present: bool<br/>━━━━━━━━━━<br/>INIT_ONLY"] end subgraph PollLoop ["MUTABLE — Loop-scoped counters (wait() local vars)"] direction LR C1["● stall_retries_attempted<br/>━━━━━━━━━━<br/>+1 after each toggle attempt<br/>Never decremented<br/>Budget ceiling: max_stall_retries"] C2["● not_in_queue_cycles<br/>━━━━━━━━━━<br/>+1 when in_queue=False<br/>Reset→0 on re-entry OR post-toggle<br/>Confirmation window gate"] C3["● inconclusive_count<br/>━━━━━━━━━━<br/>+1 only after confirmation window met<br/>Never decremented<br/>Ceiling: _max_inconclusive_retries"] end subgraph ClassResult ["INIT_ONLY — ClassificationResult (frozen=True)"] direction LR CR1["● terminal: PRState<br/>━━━━━━━━━━<br/>frozen=True<br/>structurally immutable"] CR2["● reason: str<br/>━━━━━━━━━━<br/>frozen=True<br/>structurally immutable"] end subgraph PRStateEnum ["● PRState Enum — all 7 terminal values"] direction LR PS1["merged<br/>━━━━━━━━━━<br/>positive signal required"] PS2["ejected<br/>━━━━━━━━━━<br/>state=CLOSED or<br/>mergeable=CONFLICTING"] PS3["ejected_ci_failure<br/>━━━━━━━━━━<br/>CI checks failed<br/>sub-classification of ejected"] PS4["stalled<br/>━━━━━━━━━━<br/>stall retries exhausted"] PS5["dropped_healthy<br/>━━━━━━━━━━<br/>auto_merge cleared,<br/>no structural cause"] PS6["timeout<br/>━━━━━━━━━━<br/>deadline exceeded"] PS7["error<br/>━━━━━━━━━━<br/>unexpected exception<br/>or config missing"] end subgraph RecipeRules ["● Recipe Semantic Rules — rules_ci.py (scoped)"] direction TB RR1["● Rule 5: wait-for-merge-queue-routing-covers-all-pr-states<br/>━━━━━━━━━━<br/>ERROR: missing explicit arm for any non-ERROR PRState<br/>SCOPED: only recipes with 'release_issue_timeout' step"] RR2["● Rule 6: wait-for-merge-queue-routing-conforms-to-expected-targets<br/>━━━━━━━━━━<br/>ERROR: fallback/on_failure must route → release_issue_timeout<br/>SCOPED: same scope guard as Rule 5"] SCOPE["● _recipe_uses_release_issue_timeout()<br/>━━━━━━━━━━<br/>New scope guard — early-exit if recipe<br/>does NOT define 'release_issue_timeout' step<br/>Prevents false positives on merge-prs.yaml family"] end subgraph RecipeRouting ["DERIVED — Recipe Routing on PRState (remediation / implementation)"] direction TB RR_merged["pr_state=merged<br/>→ release_issue_success"] RR_ejci["pr_state=ejected_ci_failure<br/>→ diagnose_ci"] RR_drop["pr_state=dropped_healthy<br/>→ reenter_merge_queue_cheap"] RR_ej["pr_state=ejected<br/>→ queue_ejected_fix"] RR_stall["pr_state=stalled<br/>→ reenroll_stalled_pr"] RR_to["pr_state=timeout<br/>→ release_issue_timeout"] RR_fb["(fallback / on_failure)<br/>→ release_issue_timeout"] end subgraph NoResume ["RESUME DETECTION — None"] NR["No checkpointing<br/>━━━━━━━━━━<br/>All loop counters are process-local<br/>Reset to zero on any restart<br/>Stall exhaustion NOT sticky across restarts<br/>Fresh timeout budget on every invocation"] end %% FLOW %% ENTRY --> G0 G0 --> G1 G1 --> G2 G2 --> G3 G3 --> SnapshotLayer SnapshotLayer --> PollLoop PollLoop --> SnapshotLayer PollLoop --> ClassResult ClassResult --> PRStateEnum PRStateEnum --> RecipeRouting SCOPE --> RR1 SCOPE --> RR2 RR1 -.->|"validates"| RecipeRouting RR2 -.->|"validates"| RecipeRouting G3 -.->|"no resume path"| NoResume %% CLASS ASSIGNMENTS %% class ENTRY terminal; class G0 detector; class G1,G2,G3 detector; class F1,F2,F3,F4,F5,F6,F7 stateNode; class C1,C2,C3 phase; class CR1,CR2 stateNode; class PS1,PS2,PS3,PS4,PS5,PS6,PS7 handler; class RR1,RR2 detector; class SCOPE newComponent; class RR_merged,RR_ejci,RR_drop,RR_ej,RR_stall output; class RR_to,RR_fb gap; class NR cli; ``` ### Process Flow Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 44, 'rankSpacing': 54, 'curve': 'basis'}}}%% flowchart TB %% CLASS DEFINITIONS %% classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; %% ── TERMINALS ──────────────────────────────────────────────────────── %% START([START: recipe reaches wait_for_queue step]) DONE([DONE: recipe proceeds to terminal step]) %% ── RECIPE LAYER ────────────────────────────────────────────────────── %% subgraph RecipeLayer ["● Recipe Layer (remediation / implementation / implementation-groups)"] direction TB wait_step["● wait_for_queue step<br/>━━━━━━━━━━<br/>tool: wait_for_merge_queue<br/>timeout_seconds: 900<br/>poll_interval: 15"] route_mq{"● route on<br/>result.pr_state"} end %% ── TOOL GATE ───────────────────────────────────────────────────────── %% subgraph ToolGate ["● tools_ci.py · wait_for_merge_queue MCP Tool"] direction TB gate_chk{"_require_enabled()?"} gate_err["return gate error msg"] watcher_chk{"merge_queue_watcher<br/>configured?"} watcher_err["return pr_state=error"] delegate["delegate to<br/>DefaultMergeQueueWatcher.wait()"] end %% ── POLLING LOOP ────────────────────────────────────────────────────── %% subgraph PollingLoop ["● merge_queue.py · DefaultMergeQueueWatcher.wait() — main polling loop"] direction TB repo_ok{"repo format<br/>valid?"} repo_err_exit["return pr_state=error"] deadline{"monotonic() <<br/>deadline?"] fetch["_fetch_pr_and_queue_state()<br/>━━━━━━━━━━<br/>POST github.com/graphql<br/>reads: merged, state, mergeable,<br/>merge_state_status, auto_merge_*,<br/>in_queue, queue_state, checks_state"] fetch_ok{"fetch OK?"} sleep_retry["sleep poll_interval<br/>continue loop"] in_queue_chk{"state.in_queue<br/>== True?"} unmergeable_chk{"queue_state ==<br/>UNMERGEABLE?"} reset_cycles["reset not_in_queue_cycles = 0<br/>sleep poll_interval, continue"] inc_cycles["not_in_queue_cycles += 1"] classify_call["_classify_pr_state(state)<br/>━━━━━━━━━━<br/>pure function — no I/O"] inconcl_chk{"ClassifierInconclusive<br/>raised?"} win_chk{"not_in_queue_cycles >=<br/>confirmation_cycles (2)?"} budget_chk{"inconclusive_count <<br/>max_inconclusive_retries (5)?"} inconcl_sleep["inconclusive_count += 1<br/>sleep poll_interval, continue"] dispatch{"terminal state?"} merged_bypass["MERGED: bypass confirmation<br/>return immediately"] closed_bypass["CLOSED (EJECTED / EJECTED_CI_FAILURE):<br/>bypass confirmation<br/>return immediately"] timeout_exit["return pr_state=timeout"] end %% ── CLASSIFIER ──────────────────────────────────────────────────────── %% subgraph Classifier ["● _classify_pr_state() — pure decision tree (no I/O)"] direction TB c_merged{"state.merged?"} c_closed{"state == CLOSED?"} c_ci_closed{"checks_state in<br/>FAILURE / ERROR?"} c_ci_open{"checks_state in<br/>FAILURE / ERROR?"} c_stall["_is_positive_stall()<br/>━━━━━━━━━━<br/>auto_merge_enabled_at != None<br/>AND merge_state_status in<br/>CLEAN / HAS_HOOKS"] c_conflict{"mergeable ==<br/>CONFLICTING?"} c_dropped["● _is_positive_dropped_healthy()<br/>━━━━━━━━━━<br/>OPEN + MERGEABLE + CLEAN<br/>+ checks OK + no auto_merge<br/>+ not in queue"] c_pending{"checks_state in<br/>PENDING / EXPECTED?"} c_inconclusive["raise ClassifierInconclusive<br/>reason: no positive signal"] c_inconclusive2["raise ClassifierInconclusive<br/>reason: checks still running"] c_ret_merged["return MERGED"] c_ret_ejci_closed["return EJECTED_CI_FAILURE"] c_ret_ej_closed["return EJECTED"] c_ret_ejci_open["return EJECTED_CI_FAILURE"] c_ret_stall["return STALLED"] c_ret_ej_conflict["return EJECTED"] c_ret_dropped["● return DROPPED_HEALTHY"] end %% ── STALL RECOVERY ──────────────────────────────────────────────────── %% subgraph StallRecovery ["Stall Recovery Sub-flow"] direction TB stall_grace{"stall_duration <<br/>stall_grace_period (60 s)?"} stall_budget{"stall_retries_attempted <<br/>max_stall_retries (3)?"} do_toggle["_toggle_auto_merge()<br/>━━━━━━━━━━<br/>writes: GitHub disable auto_merge<br/>sleep 2 s<br/>writes: GitHub enable auto_merge SQUASH<br/>stall_retries_attempted += 1<br/>reset not_in_queue_cycles = 0<br/>sleep backoff: min(30×2ⁿ, 120 s)"] stall_grace_sleep["still in grace period<br/>sleep poll_interval, continue"] end %% ── RECIPE ROUTING TARGETS ──────────────────────────────────────────── %% subgraph RecipeRouting ["● Recipe-level Routing (on_result from wait_for_queue step)"] direction LR r_merged["release_issue_success<br/>━━━━━━━━━━<br/>→ register_clone_success → done"] r_ejci["diagnose_ci<br/>━━━━━━━━━━<br/>→ resolve_ci (retries:2)<br/>→ re_push → ci_watch"] r_dropped["● reenter_merge_queue_cheap<br/>━━━━━━━━━━<br/>→ wait_for_queue (loop)"] r_ejected["queue_ejected_fix<br/>━━━━━━━━━━<br/>→ re_push → ci_watch<br/>→ reenter_merge_queue<br/>→ wait_for_queue (loop)"] r_stall["reenroll_stalled_pr<br/>━━━━━━━━━━<br/>(toggle_auto_merge)<br/>→ wait_for_queue (loop)"] r_timeout["release_issue_timeout<br/>━━━━━━━━━━<br/>→ register_clone_success → done"] r_default["release_issue_timeout<br/>━━━━━━━━━━<br/>(fallback / ERROR state)"] end %% ── VALIDATION LAYER (static, not runtime) ──────────────────────────── %% subgraph ValidationLayer ["● rules_ci.py · Static Validation (recipe load time, not runtime)"] direction LR rule5["● wait-for-merge-queue-routing-covers-all-pr-states<br/>━━━━━━━━━━<br/>ERROR if any PRState (excl. ERROR) missing<br/>from on_result when conditions"] rule6["● wait-for-merge-queue-routing-conforms-to-expected-targets<br/>━━━━━━━━━━<br/>ERROR if fallback ≠ release_issue_timeout<br/>or on_failure ≠ release_issue_timeout"] rule_gate["_recipe_uses_release_issue_timeout()<br/>━━━━━━━━━━<br/>scopes rules to implementation/<br/>remediation recipe family only"] end %% ── PRSTATE ENUM ────────────────────────────────────────────────────── %% subgraph PRStateEnum ["● PRState (StrEnum) — _type_enums.py"] direction LR ps_merged["MERGED<br/>positive signal"] ps_ejected["EJECTED<br/>positive signal"] ps_ejci["EJECTED_CI_FAILURE<br/>positive signal"] ps_stalled["STALLED<br/>positive signal"] ps_dropped["● DROPPED_HEALTHY<br/>positive signal"] ps_timeout["TIMEOUT<br/>produced by wait() only"] ps_error["ERROR<br/>produced by wait() only"] end %% ═══════════════════════════════════════════════════════════════════════ %% %% CONNECTIONS %% ═══════════════════════════════════════════════════════════════════════ %% START --> wait_step wait_step --> gate_chk %% Tool gate gate_chk -->|"disabled"| gate_err gate_chk -->|"enabled"| watcher_chk gate_err --> DONE watcher_chk -->|"not configured"| watcher_err watcher_err --> DONE watcher_chk -->|"configured"| delegate %% Polling loop entry delegate --> repo_ok repo_ok -->|"invalid"| repo_err_exit repo_err_exit --> route_mq repo_ok -->|"valid"| deadline %% Main while loop deadline -->|"exceeded"| timeout_exit timeout_exit --> route_mq deadline -->|"time remaining"| fetch %% Fetch result fetch --> fetch_ok fetch_ok -->|"failed (HTTP/GraphQL error)"| sleep_retry sleep_retry -->|"loop back"| deadline fetch_ok -->|"success"| in_queue_chk %% Queue membership gate in_queue_chk -->|"in queue"| unmergeable_chk unmergeable_chk -->|"UNMERGEABLE"| closed_bypass unmergeable_chk -->|"other queue state"| reset_cycles reset_cycles -->|"loop back"| deadline in_queue_chk -->|"not in queue"| inc_cycles %% Classification inc_cycles --> classify_call classify_call --> inconcl_chk inconcl_chk -->|"inconclusive"| win_chk %% Confirmation window — inconclusive path win_chk -->|"window not met yet"| sleep_retry win_chk -->|"window met"| budget_chk budget_chk -->|"budget remaining"| inconcl_sleep inconcl_sleep -->|"loop back"| deadline budget_chk -->|"budget exhausted"| timeout_exit %% Classification result dispatch inconcl_chk -->|"MERGED"| merged_bypass inconcl_chk -->|"CLOSED (EJECTED/EJECTED_CI_FAILURE)"| closed_bypass inconcl_chk -->|"other terminal"| dispatch %% Confirmation window — other terminals dispatch -->|"not_in_queue_cycles < confirmation_cycles"| sleep_retry dispatch -->|"confirmation met: STALLED"| stall_grace dispatch -->|"confirmation met: EJECTED_CI_FAILURE"| route_mq dispatch -->|"confirmation met: EJECTED"| route_mq dispatch -->|"confirmation met: DROPPED_HEALTHY"| route_mq %% Stall recovery stall_grace -->|"in grace period"| stall_grace_sleep stall_grace_sleep -->|"loop back"| deadline stall_grace -->|"grace expired"| stall_budget stall_budget -->|"retries remaining"| do_toggle do_toggle -->|"loop back"| deadline stall_budget -->|"retries exhausted"| route_mq %% Bypasses feed routing merged_bypass --> route_mq closed_bypass --> route_mq %% Classifier internal edges classify_call --> c_merged c_merged -->|"yes"| c_ret_merged c_merged -->|"no"| c_closed c_closed -->|"yes"| c_ci_closed c_ci_closed -->|"CI failure"| c_ret_ejci_closed c_ci_closed -->|"no CI failure"| c_ret_ej_closed c_closed -->|"no"| c_ci_open c_ci_open -->|"CI failure"| c_ret_ejci_open c_ci_open -->|"no"| c_stall c_stall -->|"positive stall"| c_ret_stall c_stall -->|"no"| c_conflict c_conflict -->|"CONFLICTING"| c_ret_ej_conflict c_conflict -->|"no"| c_dropped c_dropped -->|"positive dropped"| c_ret_dropped c_dropped -->|"no"| c_pending c_pending -->|"checks running"| c_inconclusive2 c_pending -->|"no positive signal"| c_inconclusive %% Recipe routing on each PRState route_mq -->|"merged"| r_merged route_mq -->|"ejected_ci_failure"| r_ejci route_mq -->|"dropped_healthy"| r_dropped route_mq -->|"ejected"| r_ejected route_mq -->|"stalled (after max retries)"| r_stall route_mq -->|"timeout"| r_timeout route_mq -->|"default / error"| r_default %% Recipe routing → loops r_dropped -->|"loop: re-enqueue"| wait_step r_ejected -->|"loop: fix + re-enqueue"| wait_step r_stall -->|"loop: toggle + re-enqueue"| wait_step r_merged --> DONE r_ejci --> DONE r_timeout --> DONE r_default --> DONE %% Validation (static, separate from runtime flow) rule_gate --> rule5 rule_gate --> rule6 %% CLASS ASSIGNMENTS %% class START,DONE terminal; class wait_step,dispatch,route_mq stateNode; class gate_chk,watcher_chk,repo_ok,deadline,fetch_ok,in_queue_chk,unmergeable_chk,inconcl_chk,win_chk,budget_chk stateNode; class fetch,delegate,classify_call,inc_cycles,reset_cycles,inconcl_sleep,do_toggle handler; class merged_bypass,closed_bypass,sleep_retry phase; class gate_err,watcher_err,repo_err_exit,timeout_exit,c_inconclusive,c_inconclusive2 detector; class c_merged,c_closed,c_ci_closed,c_ci_open,c_stall,c_conflict,c_dropped,c_pending stateNode; class c_ret_merged,c_ret_ejci_closed,c_ret_ej_closed,c_ret_ejci_open,c_ret_stall,c_ret_ej_conflict,c_ret_dropped output; class stall_grace,stall_budget stateNode; class stall_grace_sleep phase; class r_merged,r_ejci,r_dropped,r_ejected,r_stall,r_timeout,r_default handler; class rule5,rule6,rule_gate detector; class ps_merged,ps_ejected,ps_ejci,ps_stalled,ps_dropped,ps_timeout,ps_error output; ``` ### Error/Resilience Diagram ```mermaid %%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 52, 'curve': 'basis'}}}%% flowchart TB %% CLASS DEFINITIONS %% classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000; classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef integration fill:#c62828,stroke:#ef9a9a,stroke-width:2px,color:#fff; START(["TOOL CALL<br/>wait_for_merge_queue"]) %% ── IMPORT-TIME VALIDATION ─────────────────────────── %% subgraph ImportGates ["IMPORT-TIME VALIDATION GATES (merge_queue.py ●)"] direction LR FIELD_SYNC["● _QUERY_FIELD_MAP sync<br/>━━━━━━━━━━<br/>keys == PRFetchState keys<br/>→ RuntimeError on mismatch"] FIELD_PRES["● _QUERY field presence<br/>━━━━━━━━━━<br/>all map paths in GraphQL query<br/>→ RuntimeError on mismatch"] end %% ── MCP ENTRY GATES ────────────────────────────────── %% subgraph MCPGates ["MCP ENTRY GATES (tools_ci.py ●)"] direction LR KITCHEN["● Kitchen gate<br/>━━━━━━━━━━<br/>_require_enabled()<br/>→ error JSON if not open"] WATCHER["● Watcher null check<br/>━━━━━━━━━━<br/>merge_queue_watcher is None<br/>→ pr_state='error' JSON"] end %% ── POLLING LOOP ───────────────────────────────────── %% subgraph PollingLoop ["POLLING LOOP (DefaultMergeQueueWatcher.wait ●)"] DEADLINE{"Deadline<br/>exceeded?"} FETCH["_fetch_pr_and_queue_state()<br/>━━━━━━━━━━<br/>GraphQL round-trip<br/>raise_for_status()"] FETCH_FAIL["● Fetch Exception Handler<br/>━━━━━━━━━━<br/>catch Exception<br/>log warning + sleep(poll_interval)<br/>→ retry poll cycle"] IN_QUEUE{"In merge<br/>queue?"} UNMERGE["UNMERGEABLE in queue<br/>━━━━━━━━━━<br/>queue_state == UNMERGEABLE<br/>→ immediate EJECTED"] end %% ── CLASSIFIER ─────────────────────────────────────── %% subgraph ClassifierSG ["CLASSIFIER (_classify_pr_state ●) — positive signals only"] direction TB CLASSIFY["● _classify_pr_state(state)<br/>━━━━━━━━━━<br/>evaluates positive gates in order<br/>never falls through to EJECTED"] MERGED_GATE["merged == True<br/>→ MERGED<br/>(definitive — bypasses window)"] CLOSED_GATE["state == CLOSED<br/>━━━━━━━━━━<br/>checks FAILURE/ERROR → EJECTED_CI_FAILURE<br/>else → EJECTED<br/>(definitive — bypasses window)"] CHECKS_GATE["checks_state FAILURE/ERROR<br/>→ EJECTED_CI_FAILURE<br/>(requires confirmation window)"] STALL_GATE["● stall signal<br/>━━━━━━━━━━<br/>auto_merge_enabled_at present<br/>+ merge_state CLEAN/HAS_HOOKS<br/>→ STALLED"] CONFLICT_GATE["mergeable == CONFLICTING<br/>→ EJECTED<br/>(requires confirmation window)"] DROP_GATE["● _is_positive_dropped_healthy()<br/>━━━━━━━━━━<br/>OPEN+MERGEABLE+CLEAN<br/>checks None/SUCCESS<br/>auto_merge=False, not in queue<br/>→ DROPPED_HEALTHY"] INCONCLUSIVE["● ClassifierInconclusive<br/>━━━━━━━━━━<br/>no positive signal matched<br/>.state snapshot for logging"] end %% ── CONFIRMATION WINDOW ────────────────────────────── %% subgraph ConfirmWindow ["CONFIRMATION WINDOW (●)"] CONFIRM{"● not_in_queue_cycles<br/>< confirmation_cycles (2)?"} INCONCL{"● inconclusive_count<br/>>= max_retries (5)?"} end %% ── STALL RECOVERY ─────────────────────────────────── %% subgraph StallRec ["STALL RECOVERY (●)"] GRACE{"stall_duration<br/>< grace_period?"} STALL_BUDGET{"● stall_retries<br/>< max_retries (3)?"} TOGGLE["● toggle_auto_merge<br/>━━━━━━━━━━<br/>disable → sleep 2s → re-enable<br/>backoff: min(30×2ⁿ, 120)s<br/>stall_retries++ / reset window"] TOGGLE_FAIL["● Toggle Exception Handler<br/>━━━━━━━━━━<br/>catch Exception<br/>log warning → continue"] end %% ── TERMINAL STATES ────────────────────────────────── %% T_MERGED(["● MERGED<br/>success=True"]) T_EJECTED(["● EJECTED"]) T_ECI(["● EJECTED_CI_FAILURE"]) T_STALLED(["● STALLED"]) T_DROP(["● DROPPED_HEALTHY"]) T_TIMEOUT(["● TIMEOUT"]) T_ERROR(["● ERROR<br/>success=False"]) %% ── RECIPE ROUTING ─────────────────────────────────── %% subgraph RecipeRouting ["RECIPE ROUTING (remediation.yaml ●)"] direction TB R_SUCCESS["release_issue_success<br/>━━━━━━━━━━<br/>apply staged label<br/>→ register_clone_success"] R_DIAGNOSE["diagnose_ci<br/>━━━━━━━━━━<br/>CI failure analysis path"] R_CHEAP["● reenter_merge_queue_cheap<br/>━━━━━━━━━━<br/>re-enroll healthy PR"] R_FIX["queue_ejected_fix<br/>━━━━━━━━━━<br/>conflict resolution cycle"] R_REENROLL["● reenroll_stalled_pr<br/>━━━━━━━━━━<br/>toggle_auto_merge tool<br/>→ wait_for_queue"] R_TIMEOUT["release_issue_timeout<br/>━━━━━━━━━━<br/>fallback + on_failure route<br/>release claim, no staged label"] end %% ── SEMANTIC VALIDATION RULES ──────────────────────── %% subgraph SemanticRules ["STATIC VALIDATION RULES (rules_ci.py ●) — recipe-load-time"] direction LR RULE_COV["● routing-covers-all-pr-states<br/>━━━━━━━━━━<br/>ERROR: any non-error PRState<br/>lacks explicit when arm<br/>scope: release_issue_timeout recipes"] RULE_CONF["● routing-conforms-to-expected-targets<br/>━━━━━━━━━━<br/>ERROR: fallback or on_failure<br/>≠ release_issue_timeout<br/>scope: release_issue_timeout recipes"] end END_OK(["PIPELINE CONTINUES<br/>issue released"]) END_FAIL(["ESCALATION<br/>release_issue_timeout / error"]) %% ═══════════════════════════════════ %% CONNECTIONS %% ═══════════════════════════════════ %% Entry START --> KITCHEN KITCHEN -->|"not open"| END_FAIL KITCHEN -->|"open"| WATCHER WATCHER -->|"None"| T_ERROR WATCHER -->|"configured"| DEADLINE %% Polling core DEADLINE -->|"exceeded"| T_TIMEOUT DEADLINE -->|"budget"| FETCH FETCH -->|"HTTP/network/GraphQL/null/date error"| FETCH_FAIL FETCH -->|"PRFetchState ok"| IN_QUEUE FETCH_FAIL -->|"sleep + retry"| DEADLINE %% In-queue branch IN_QUEUE -->|"yes"| UNMERGE UNMERGE -->|"UNMERGEABLE"| T_EJECTED UNMERGE -->|"other states"| DEADLINE IN_QUEUE -->|"no"| CLASSIFY %% Classifier dispatch CLASSIFY --> MERGED_GATE CLASSIFY --> CLOSED_GATE CLASSIFY --> CHECKS_GATE CLASSIFY --> STALL_GATE CLASSIFY --> CONFLICT_GATE CLASSIFY --> DROP_GATE CLASSIFY --> INCONCLUSIVE %% Definitive terminals (bypass confirmation window) MERGED_GATE --> T_MERGED CLOSED_GATE --> T_EJECTED CLOSED_GATE --> T_ECI %% Non-definitive → confirmation window CHECKS_GATE --> CONFIRM STALL_GATE --> CONFIRM CONFLICT_GATE --> CONFIRM DROP_GATE --> CONFIRM INCONCLUSIVE --> CONFIRM CONFIRM -->|"in window"| DEADLINE %% Post-confirmation dispatch CONFIRM -->|"window passed: EJECTED_CI_FAILURE"| T_ECI CONFIRM -->|"window passed: EJECTED"| T_EJECTED CONFIRM -->|"window passed: DROPPED_HEALTHY"| T_DROP CONFIRM -->|"window passed: STALLED"| GRACE CONFIRM -->|"window passed: inconclusive"| INCONCL INCONCL -->|"exhausted"| T_TIMEOUT INCONCL -->|"budget ok"| DEADLINE %% Stall recovery GRACE -->|"too recent"| DEADLINE GRACE -->|"expired"| STALL_BUDGET STALL_BUDGET -->|"exhausted"| T_STALLED STALL_BUDGET -->|"budget ok"| TOGGLE TOGGLE -->|"success"| DEADLINE TOGGLE -->|"exception"| TOGGLE_FAIL TOGGLE_FAIL --> DEADLINE %% Terminal → recipe routing T_MERGED --> R_SUCCESS T_ECI --> R_DIAGNOSE T_DROP --> R_CHEAP T_EJECTED --> R_FIX T_STALLED --> R_REENROLL T_TIMEOUT --> R_TIMEOUT T_ERROR --> END_FAIL R_SUCCESS --> END_OK R_DIAGNOSE --> END_FAIL R_CHEAP --> END_FAIL R_FIX --> END_FAIL R_REENROLL -->|"re-enters wait_for_queue"| DEADLINE R_TIMEOUT --> END_FAIL %% ═══════════════════════════════════ %% CLASS ASSIGNMENTS %% ═══════════════════════════════════ class START,END_OK,END_FAIL terminal; class FIELD_SYNC,FIELD_PRES,KITCHEN,WATCHER detector; class FETCH integration; class FETCH_FAIL,TOGGLE_FAIL gap; class DEADLINE,IN_QUEUE,CONFIRM,INCONCL,GRACE,STALL_BUDGET stateNode; class UNMERGE,CLASSIFY detector; class MERGED_GATE,CLOSED_GATE,CHECKS_GATE,CONFLICT_GATE phase; class STALL_GATE,DROP_GATE,INCONCLUSIVE newComponent; class TOGGLE handler; class T_MERGED,T_EJECTED,T_ECI,T_STALLED,T_DROP,T_TIMEOUT,T_ERROR terminal; class R_SUCCESS,R_DIAGNOSE,R_CHEAP,R_FIX,R_REENROLL,R_TIMEOUT output; class RULE_COV,RULE_CONF newComponent; ``` Closes #802 ## Implementation Plan Plan file: `/home/talon/projects/autoskillit-runs/remediation-802-20260413-075451-958993/.autoskillit/temp/rectify/rectify_merge_queue_classifier_immunity_2026-04-13_110523_part_a.md` 🤖 Generated with [Claude Code](https://claude.com/claude-code) via AutoSkillit ## Token Usage Summary | Step | uncached | output | cache_read | cache_write | count | time | |------|----------|--------|------------|-------------|-------|------| | investigate | 200 | 21.1k | 986.6k | 121.4k | 1 | 8m 24s | | rectify | 289 | 72.1k | 1.4M | 241.5k | 2 | 24m 36s | | review | 52 | 9.8k | 176.6k | 56.1k | 1 | 9m 36s | | dry_walkthrough | 360 | 50.5k | 3.1M | 165.8k | 2 | 16m 17s | | implement | 1.3k | 117.5k | 16.1M | 239.7k | 2 | 37m 55s | | assess | 310 | 18.9k | 2.1M | 69.8k | 1 | 9m 54s | | prepare_pr | 70 | 7.5k | 256.5k | 36.9k | 1 | 2m 22s | | run_arch_lenses | 215 | 34.1k | 868.7k | 131.3k | 3 | 11m 48s | | compose_pr | 59 | 14.3k | 249.6k | 42.9k | 1 | 3m 15s | | **Total** | 2.9k | 345.8k | 25.3M | 1.1M | | 2h 4m | --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Summary
Fixes a 3-iteration ejection loop in the merge queue pipeline by introducing ejection-cause enrichment (
ejected_ci_failurestate andejection_causefield inwait_for_merge_queue), a CI gate after every force-push (ci_watch_post_queue_fixstep), and two post-rebase manifest validation gates (language-aware validity check and duplicate key scan) inresolve-merge-conflicts. Closes all six gaps identified in #627: blind CI ejection routing, missing CI gate after re-push, absent manifest/semantic validation, and missinghead_shain CI results.Individual Group Plans
Group 1: Implementation Plan: Queue Ejection Loop Fix — PART A ONLY
This part addresses the Python code layer for the queue ejection loop fix (Gaps 2 and 5 from issue #627).
Gap 2 —
execution/merge_queue.pycurrently returnspr_state="ejected"for every ejection regardless of cause. When GitHub's CI fails on a merge-group commit, the recipe cannot distinguish a CI failure ejection from a conflict ejection, so it retries conflict resolution indefinitely (no-op rebase loop). The fix: when the ejection is confirmed andchecks_state == "FAILURE", returnpr_state="ejected_ci_failure"plus anejection_cause="ci_failure"field, allowing recipeon_resultrouting to send CI failures directly todiagnose_ciinstead ofqueue_ejected_fix.Gap 5 —
server/tools_ci.pyinfershead_shafromgit rev-parse HEADbut never includes it in the JSON response. Recipe orchestrators cannot verify that CI results correspond to the current HEAD after a force-push. The fix: includehead_shain thewait_for_cireturn dict when it was resolved.Group 2: Implementation Plan: Queue Ejection Loop Fix — PART B ONLY
This part addresses the recipe and skill layer of the queue ejection loop fix (Gaps 1, 3, 4, 6 from issue #627). Part A (code layer) must be implemented first — this part routes on
pr_state="ejected_ci_failure"which Part A introduces.Gap 1 —
re_push_queue_fixroutes directly toreenter_merge_queueafter force-push, bypassing CI. Fix: insert a newci_watch_post_queue_fixstep betweenre_push_queue_fixandreenter_merge_queue, mirroring the existingci_watchstep.Gap 6 —
wait_for_queueroutes allejectedstates toqueue_ejected_fix(conflict resolution), even when the ejection was caused by a CI failure that conflict resolution cannot fix. Fix: add anejected_ci_failureroute beforeejectedinwait_for_queue.on_result, routing todiagnose_ciinstead.Gap 3 —
resolve-merge-conflictsSKILL.md runs onlypre-commit run --all-filespost-rebase. Fix: add Step 5a — language-detected manifest validation using fast non-compiling checks.Gap 4 — Even a clean rebase can produce duplicate keys when both branches independently added the same dependency. Fix: add Step 5b — targeted duplicate key scan in TOML/JSON manifest files.
Applied to:
recipes/implementation.yaml,recipes/remediation.yaml,recipes/implementation-groups.yaml,skills_extended/resolve-merge-conflicts/SKILL.md.Architecture Impact
Process Flow Diagram
%%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%% flowchart TB %% CLASS DEFINITIONS %% classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff; %% TERMINALS %% START([wait_for_queue\nrecipe step]) END_OK([release_issue_success]) END_FAIL([release_issue_failure]) END_TIMEOUT([release_issue_timeout]) END_DIAG([diagnose_ci]) subgraph MQPoll ["● Merge Queue Watcher (merge_queue.py)"] direction TB POLL["poll GitHub GraphQL\n━━━━━━━━━━\nPR state + queue state\n+ checks_state"] MERGED{"merged?"} CI_FAIL{"● checks_state\n== 'FAILURE'?"} CONFIRM["confirmation window\n━━━━━━━━━━\nnot_in_queue_cycles++"] CONFIRMED{"cycles ≥ threshold?"} STALL{"stall retries\nexhausted?"} TIMEOUT{"deadline\nexceeded?"} end subgraph EjectRoute ["● Recipe Ejection Routing (implementation.yaml)"] direction TB ROUTE{"● pr_state?"} REENROLL["reenroll_stalled_pr\n━━━━━━━━━━\ntoggle_auto_merge tool"] end subgraph ConflictFix ["● Conflict Fix Sub-Flow (implementation.yaml)"] direction TB QFIX["queue_ejected_fix\n━━━━━━━━━━\nresolve-merge-conflicts skill"] ESC{"escalation_required?"} REPUSH["re_push_queue_fix\n━━━━━━━━━━\npush_to_remote force=true"] CI_WATCH["● ci_watch_post_queue_fix\n━━━━━━━━━━\nwait_for_ci tool\ntimeout=300s"] CI_PASS{"CI pass?"} DETECT["detect_ci_conflict\n━━━━━━━━━━\ndiagnose-ci skill"] REENTER["reenter_merge_queue\n━━━━━━━━━━\ngh pr merge --squash --auto"] end subgraph WFCITool ["● wait_for_ci tool handler (tools_ci.py)"] direction LR INFER["infer head_sha\n━━━━━━━━━━\ngit rev-parse HEAD"] CIWAIT["ci_watcher.wait(scope)"] ENRICH["● result includes head_sha\n━━━━━━━━━━\nverifies SHA matches HEAD\nafter force-push"] end %% MAIN FLOW %% START --> POLL POLL --> MERGED MERGED -->|"yes"| END_OK MERGED -->|"no"| CONFIRM CONFIRM --> CONFIRMED CONFIRMED -->|"no"| STALL CONFIRMED -->|"yes (not in queue)"| CI_FAIL STALL -->|"yes"| END_TIMEOUT STALL -->|"no"| TIMEOUT TIMEOUT -->|"yes"| END_TIMEOUT TIMEOUT -->|"no"| POLL CI_FAIL -->|"yes"| ROUTE CI_FAIL -->|"no"| ROUTE ROUTE -->|"ejected_ci_failure\n(● new route)"| END_DIAG ROUTE -->|"ejected"| QFIX ROUTE -->|"stalled"| REENROLL ROUTE -->|"timeout"| END_TIMEOUT REENROLL -->|"success"| START REENROLL -->|"failure"| END_FAIL QFIX --> ESC ESC -->|"true"| END_FAIL ESC -->|"false"| REPUSH REPUSH -->|"failure"| END_FAIL REPUSH -->|"success"| CI_WATCH CI_WATCH --> INFER --> CIWAIT --> ENRICH ENRICH --> CI_PASS CI_PASS -->|"failure"| DETECT CI_PASS -->|"success"| REENTER DETECT --> END_FAIL REENTER -->|"success"| START REENTER -->|"failure"| END_FAIL %% CLASS ASSIGNMENTS %% class START terminal; class END_OK,END_FAIL,END_TIMEOUT,END_DIAG terminal; class POLL,CONFIRM handler; class MERGED,CONFIRMED,STALL,TIMEOUT stateNode; class CI_FAIL,ROUTE,ESC,CI_PASS detector; class QFIX,REPUSH,REENTER handler; class REENROLL,DETECT handler; class CI_WATCH,INFER,CIWAIT,ENRICH newComponent;State Lifecycle Diagram
%%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%% flowchart TB %% CLASS DEFINITIONS %% classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000; subgraph MQResult ["wait_for_merge_queue Return Dict (merge_queue.py)"] direction TB PS["● pr_state : str\n━━━━━━━━━━\nmerged | ejected\nejected_ci_failure | stalled\ntimeout | error\n(bare literals, no StrEnum)"] SUC["success : bool\n━━━━━━━━━━\ntrue only for 'merged'"] REASON["reason : str\n━━━━━━━━━━\nhuman-readable\nalways present"] STALL["stall_retries_attempted : int\n━━━━━━━━━━\nalways present\nexcept 'error' path"] EC["● ejection_cause : str\n━━━━━━━━━━\n'ci_failure' only\nwhen pr_state==ejected_ci_failure\nCONDITIONAL FIELD"] end subgraph InternalPoll ["PRFetchState — Internal Polling State (not returned)"] direction LR CHECKS["checks_state : str|None\n━━━━━━━━━━\nGitHub StatusCheckRollup\nNone = no checks configured"] INQUEUE["in_queue : bool\n━━━━━━━━━━\nPR in mergeQueue.entries"] QSTATE["queue_state : str|None\n━━━━━━━━━━\nUNMERGEABLE | AWAITING_CHECKS\n| LOCKED | null"] end subgraph Gate1 ["● Ejection Decision Gate (merge_queue.py)"] direction TB CFAIL{"checks_state\n== 'FAILURE'?"} SET_ECI["● set pr_state='ejected_ci_failure'\n━━━━━━━━━━\nejection_cause='ci_failure'\nINJECTED into result"] SET_EJ["set pr_state='ejected'\n━━━━━━━━━━\nno ejection_cause field\n(absent, not null)"] end subgraph CIScope ["CIRunScope — Frozen Input Scope (core/types)"] direction LR WF["workflow : str|None\n━━━━━━━━━━\ne.g. 'tests.yml'"] HS["● head_sha : str|None\n━━━━━━━━━━\ngit rev-parse HEAD\nor caller-supplied"] end subgraph CIResult ["● wait_for_ci Return Dict (tools_ci.py)"] direction TB RUNID["run_id : int|None\n━━━━━━━━━━\nGitHub Actions run ID"] CONC["conclusion : str\n━━━━━━━━━━\nsuccess|failure|cancelled\naction_required|timed_out\nno_runs|error|unknown"] FJOBS["failed_jobs : list\n━━━━━━━━━━\nalways present\nempty on billing errors"] HSHA["● head_sha : str\n━━━━━━━━━━\nCONDITIONAL: present only\nwhen scope.head_sha truthy\ninjected by tool layer"] end subgraph ConsumerGate ["Recipe Routing Gate (on_result)"] direction TB ROUTE{"pr_state value?"} R1["ejected_ci_failure\n→ diagnose_ci"] R2["ejected\n→ queue_ejected_fix"] R3["merged|stalled|timeout\n→ other routes"] end %% FLOW %% CHECKS --> CFAIL INQUEUE --> CFAIL QSTATE --> CFAIL CFAIL -->|"FAILURE"| SET_ECI CFAIL -->|"other"| SET_EJ SET_ECI --> PS SET_ECI --> EC SET_EJ --> PS PS --> SUC PS --> REASON PS --> STALL HS --> CIResult WF --> CIResult RUNID --> CONC CONC --> FJOBS FJOBS --> HSHA PS --> ROUTE EC --> ROUTE ROUTE --> R1 ROUTE --> R2 ROUTE --> R3 HSHA -.->|"verifies HEAD\nafter force-push"| R2 %% CLASS ASSIGNMENTS %% class PS,EC,HSHA,SET_ECI,HS,CFAIL gap; class SUC,REASON,STALL,RUNID,CONC,FJOBS output; class CHECKS,INQUEUE,QSTATE,WF stateNode; class SET_EJ handler; class ROUTE,R1,R2,R3 detector; class InternalPoll phase;Error/Resilience Diagram
%%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%% flowchart TB %% CLASS DEFINITIONS %% classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff; classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff; classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff; classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff; classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff; classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000; END_OK([release_issue_success]) END_FAIL([release_issue_failure\n━━━━━━━━━━\nhuman escalation\nclone preserved]) END_DIAG([diagnose_ci]) subgraph MQLoop ["● Merge Queue Poll Loop (merge_queue.py)"] direction TB POLL["GraphQL fetch\n━━━━━━━━━━\nPR + queue state"] POLL_ERR{"Exception\ncaught?"} TIMEOUT_CHK{"deadline\nexceeded?"} STALL_CHK{"stall retries\n≥ max (3)?"} end subgraph EjectGate ["● Ejection Classification Gate (merge_queue.py)"] direction TB EJECT_DECISION{"● checks_state\n== 'FAILURE'?"} CI_EJ["● ejected_ci_failure\n━━━━━━━━━━\nejection_cause=ci_failure\nskips conflict resolution"] CONF_EJ["ejected\n━━━━━━━━━━\nno cause field\nconflict resolution"] end subgraph StallBreaker ["Stall Circuit Breaker (merge_queue.py)"] direction LR TOGGLE["_toggle_auto_merge\n━━━━━━━━━━\ndisable → 2s → re-enable\nbackoff: 30/60/120s"] TOGGLE_ERR{"Exception\ncaught?"} end subgraph ConflictPath ["Conflict Resolution Path (implementation.yaml)"] direction TB QFIX["queue_ejected_fix\n━━━━━━━━━━\nresolve-merge-conflicts skill"] ESC_CHK{"escalation\nrequired?"} REPUSH["re_push_queue_fix\n━━━━━━━━━━\nforce-push"] REPUSH_FAIL{"push\nfailed?"} end subgraph CIGate ["● CI Gate After Re-Push (implementation.yaml + tools_ci.py)"] direction TB CI_WATCH["● ci_watch_post_queue_fix\n━━━━━━━━━━\nwait_for_ci, timeout=300s\nincludes head_sha"] CI_CONC{"conclusion\n== success?"} DETECT["detect_ci_conflict\n━━━━━━━━━━\ngit merge-base check\n(stale base?)"] DETECT_CHK{"stale\nbase?"} CI_CF["ci_conflict_fix\n━━━━━━━━━━\nresolve-merge-conflicts"] end subgraph ManifestGates ["● Post-Rebase Manifest Validation (SKILL.md)"] direction TB STEP5A["● Step 5a: manifest validity\n━━━━━━━━━━\ncargo metadata / node JSON.parse\nuv lock --check / tomllib"] STEP5A_CHK{"manifest\nvalid?"} STEP5B["● Step 5b: duplicate key scan\n━━━━━━━━━━\nTOML dep sections\nJSON object_pairs_hook"] STEP5B_CHK{"duplicates\nfound?"} REBASE_ABORT["git rebase --abort\n━━━━━━━━━━\nescalation_required=true"] end %% POLL LOOP FLOW %% POLL --> POLL_ERR POLL_ERR -->|"yes: log + retry"| POLL POLL_ERR -->|"no"| TIMEOUT_CHK TIMEOUT_CHK -->|"yes"| END_FAIL TIMEOUT_CHK -->|"no"| STALL_CHK STALL_CHK -->|"yes: stalled"| END_FAIL STALL_CHK -->|"no: stall attempt"| TOGGLE TOGGLE --> TOGGLE_ERR TOGGLE_ERR -->|"yes: log + increment"| STALL_CHK TOGGLE_ERR -->|"no: success"| POLL %% EJECTION GATE %% STALL_CHK -->|"ejection confirmed"| EJECT_DECISION EJECT_DECISION -->|"FAILURE"| CI_EJ EJECT_DECISION -->|"other"| CONF_EJ CI_EJ --> END_DIAG CONF_EJ --> QFIX %% CONFLICT PATH %% QFIX --> STEP5A STEP5A --> STEP5A_CHK STEP5A_CHK -->|"invalid"| REBASE_ABORT STEP5A_CHK -->|"valid"| STEP5B STEP5B --> STEP5B_CHK STEP5B_CHK -->|"duplicates"| REBASE_ABORT STEP5B_CHK -->|"clean"| ESC_CHK REBASE_ABORT --> ESC_CHK ESC_CHK -->|"true"| END_FAIL ESC_CHK -->|"false"| REPUSH REPUSH --> REPUSH_FAIL REPUSH_FAIL -->|"yes"| END_FAIL REPUSH_FAIL -->|"no"| CI_WATCH %% CI GATE %% CI_WATCH --> CI_CONC CI_CONC -->|"yes"| END_OK CI_CONC -->|"no"| DETECT DETECT --> DETECT_CHK DETECT_CHK -->|"yes: stale base"| CI_CF DETECT_CHK -->|"no: code failure"| END_DIAG CI_CF --> ESC_CHK %% CLASS ASSIGNMENTS %% class END_OK,END_FAIL,END_DIAG terminal; class POLL,TOGGLE handler; class POLL_ERR,TOGGLE_ERR,TIMEOUT_CHK,STALL_CHK gap; class EJECT_DECISION,CI_CONC,DETECT_CHK,STEP5A_CHK,STEP5B_CHK,ESC_CHK,REPUSH_FAIL detector; class CI_EJ,CONF_EJ,REBASE_ABORT output; class QFIX,REPUSH,CI_WATCH,DETECT,CI_CF handler; class STEP5A,STEP5B phase;Closes #627
Implementation Plan
Plan files:
/home/talon/projects/autoskillit-runs/impl-20260405-122055-152199/.autoskillit/temp/make-plan/queue_ejection_loop_plan_2026-04-05_122055_part_a.md/home/talon/projects/autoskillit-runs/impl-20260405-122055-152199/.autoskillit/temp/make-plan/queue_ejection_loop_plan_2026-04-05_122055_part_b.md🤖 Generated with Claude Code via AutoSkillit
Token Usage Summary