Skip to content

Implementation Plan: Queue Ejection Loop Fix#628

Merged
Trecek merged 5 commits intointegrationfrom
queue-ejection-loop-missing-ci-gate-blind-ejection-routing-a/627
Apr 5, 2026
Merged

Implementation Plan: Queue Ejection Loop Fix#628
Trecek merged 5 commits intointegrationfrom
queue-ejection-loop-missing-ci-gate-blind-ejection-routing-a/627

Conversation

@Trecek
Copy link
Copy Markdown
Collaborator

@Trecek Trecek commented Apr 5, 2026

Summary

Fixes a 3-iteration ejection loop in the merge queue pipeline by introducing ejection-cause enrichment (ejected_ci_failure state and ejection_cause field in wait_for_merge_queue), a CI gate after every force-push (ci_watch_post_queue_fix step), and two post-rebase manifest validation gates (language-aware validity check and duplicate key scan) in resolve-merge-conflicts. Closes all six gaps identified in #627: blind CI ejection routing, missing CI gate after re-push, absent manifest/semantic validation, and missing head_sha in CI results.

Individual Group Plans

Group 1: Implementation Plan: Queue Ejection Loop Fix — PART A ONLY

This part addresses the Python code layer for the queue ejection loop fix (Gaps 2 and 5 from issue #627).

Gap 2execution/merge_queue.py currently returns pr_state="ejected" for every ejection regardless of cause. When GitHub's CI fails on a merge-group commit, the recipe cannot distinguish a CI failure ejection from a conflict ejection, so it retries conflict resolution indefinitely (no-op rebase loop). The fix: when the ejection is confirmed and checks_state == "FAILURE", return pr_state="ejected_ci_failure" plus an ejection_cause="ci_failure" field, allowing recipe on_result routing to send CI failures directly to diagnose_ci instead of queue_ejected_fix.

Gap 5server/tools_ci.py infers head_sha from git rev-parse HEAD but never includes it in the JSON response. Recipe orchestrators cannot verify that CI results correspond to the current HEAD after a force-push. The fix: include head_sha in the wait_for_ci return dict when it was resolved.

Group 2: Implementation Plan: Queue Ejection Loop Fix — PART B ONLY

This part addresses the recipe and skill layer of the queue ejection loop fix (Gaps 1, 3, 4, 6 from issue #627). Part A (code layer) must be implemented first — this part routes on pr_state="ejected_ci_failure" which Part A introduces.

Gap 1re_push_queue_fix routes directly to reenter_merge_queue after force-push, bypassing CI. Fix: insert a new ci_watch_post_queue_fix step between re_push_queue_fix and reenter_merge_queue, mirroring the existing ci_watch step.

Gap 6wait_for_queue routes all ejected states to queue_ejected_fix (conflict resolution), even when the ejection was caused by a CI failure that conflict resolution cannot fix. Fix: add an ejected_ci_failure route before ejected in wait_for_queue.on_result, routing to diagnose_ci instead.

Gap 3resolve-merge-conflicts SKILL.md runs only pre-commit run --all-files post-rebase. Fix: add Step 5a — language-detected manifest validation using fast non-compiling checks.

Gap 4 — Even a clean rebase can produce duplicate keys when both branches independently added the same dependency. Fix: add Step 5b — targeted duplicate key scan in TOML/JSON manifest files.

Applied to: recipes/implementation.yaml, recipes/remediation.yaml, recipes/implementation-groups.yaml, skills_extended/resolve-merge-conflicts/SKILL.md.

Architecture Impact

Process Flow Diagram

%%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%%
flowchart TB
    %% CLASS DEFINITIONS %%
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;

    %% TERMINALS %%
    START([wait_for_queue\nrecipe step])
    END_OK([release_issue_success])
    END_FAIL([release_issue_failure])
    END_TIMEOUT([release_issue_timeout])
    END_DIAG([diagnose_ci])

    subgraph MQPoll ["● Merge Queue Watcher (merge_queue.py)"]
        direction TB
        POLL["poll GitHub GraphQL\n━━━━━━━━━━\nPR state + queue state\n+ checks_state"]
        MERGED{"merged?"}
        CI_FAIL{"● checks_state\n== 'FAILURE'?"}
        CONFIRM["confirmation window\n━━━━━━━━━━\nnot_in_queue_cycles++"]
        CONFIRMED{"cycles ≥ threshold?"}
        STALL{"stall retries\nexhausted?"}
        TIMEOUT{"deadline\nexceeded?"}
    end

    subgraph EjectRoute ["● Recipe Ejection Routing (implementation.yaml)"]
        direction TB
        ROUTE{"● pr_state?"}
        REENROLL["reenroll_stalled_pr\n━━━━━━━━━━\ntoggle_auto_merge tool"]
    end

    subgraph ConflictFix ["● Conflict Fix Sub-Flow (implementation.yaml)"]
        direction TB
        QFIX["queue_ejected_fix\n━━━━━━━━━━\nresolve-merge-conflicts skill"]
        ESC{"escalation_required?"}
        REPUSH["re_push_queue_fix\n━━━━━━━━━━\npush_to_remote force=true"]
        CI_WATCH["● ci_watch_post_queue_fix\n━━━━━━━━━━\nwait_for_ci tool\ntimeout=300s"]
        CI_PASS{"CI pass?"}
        DETECT["detect_ci_conflict\n━━━━━━━━━━\ndiagnose-ci skill"]
        REENTER["reenter_merge_queue\n━━━━━━━━━━\ngh pr merge --squash --auto"]
    end

    subgraph WFCITool ["● wait_for_ci tool handler (tools_ci.py)"]
        direction LR
        INFER["infer head_sha\n━━━━━━━━━━\ngit rev-parse HEAD"]
        CIWAIT["ci_watcher.wait(scope)"]
        ENRICH["● result includes head_sha\n━━━━━━━━━━\nverifies SHA matches HEAD\nafter force-push"]
    end

    %% MAIN FLOW %%
    START --> POLL
    POLL --> MERGED
    MERGED -->|"yes"| END_OK
    MERGED -->|"no"| CONFIRM
    CONFIRM --> CONFIRMED
    CONFIRMED -->|"no"| STALL
    CONFIRMED -->|"yes (not in queue)"| CI_FAIL
    STALL -->|"yes"| END_TIMEOUT
    STALL -->|"no"| TIMEOUT
    TIMEOUT -->|"yes"| END_TIMEOUT
    TIMEOUT -->|"no"| POLL

    CI_FAIL -->|"yes"| ROUTE
    CI_FAIL -->|"no"| ROUTE

    ROUTE -->|"ejected_ci_failure\n(● new route)"| END_DIAG
    ROUTE -->|"ejected"| QFIX
    ROUTE -->|"stalled"| REENROLL
    ROUTE -->|"timeout"| END_TIMEOUT
    REENROLL -->|"success"| START
    REENROLL -->|"failure"| END_FAIL

    QFIX --> ESC
    ESC -->|"true"| END_FAIL
    ESC -->|"false"| REPUSH
    REPUSH -->|"failure"| END_FAIL
    REPUSH -->|"success"| CI_WATCH

    CI_WATCH --> INFER --> CIWAIT --> ENRICH
    ENRICH --> CI_PASS
    CI_PASS -->|"failure"| DETECT
    CI_PASS -->|"success"| REENTER
    DETECT --> END_FAIL
    REENTER -->|"success"| START
    REENTER -->|"failure"| END_FAIL

    %% CLASS ASSIGNMENTS %%
    class START terminal;
    class END_OK,END_FAIL,END_TIMEOUT,END_DIAG terminal;
    class POLL,CONFIRM handler;
    class MERGED,CONFIRMED,STALL,TIMEOUT stateNode;
    class CI_FAIL,ROUTE,ESC,CI_PASS detector;
    class QFIX,REPUSH,REENTER handler;
    class REENROLL,DETECT handler;
    class CI_WATCH,INFER,CIWAIT,ENRICH newComponent;
Loading

State Lifecycle Diagram

%%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%%
flowchart TB
    %% CLASS DEFINITIONS %%
    classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000;

    subgraph MQResult ["wait_for_merge_queue Return Dict (merge_queue.py)"]
        direction TB
        PS["● pr_state : str\n━━━━━━━━━━\nmerged | ejected\nejected_ci_failure | stalled\ntimeout | error\n(bare literals, no StrEnum)"]
        SUC["success : bool\n━━━━━━━━━━\ntrue only for 'merged'"]
        REASON["reason : str\n━━━━━━━━━━\nhuman-readable\nalways present"]
        STALL["stall_retries_attempted : int\n━━━━━━━━━━\nalways present\nexcept 'error' path"]
        EC["● ejection_cause : str\n━━━━━━━━━━\n'ci_failure' only\nwhen pr_state==ejected_ci_failure\nCONDITIONAL FIELD"]
    end

    subgraph InternalPoll ["PRFetchState — Internal Polling State (not returned)"]
        direction LR
        CHECKS["checks_state : str|None\n━━━━━━━━━━\nGitHub StatusCheckRollup\nNone = no checks configured"]
        INQUEUE["in_queue : bool\n━━━━━━━━━━\nPR in mergeQueue.entries"]
        QSTATE["queue_state : str|None\n━━━━━━━━━━\nUNMERGEABLE | AWAITING_CHECKS\n| LOCKED | null"]
    end

    subgraph Gate1 ["● Ejection Decision Gate (merge_queue.py)"]
        direction TB
        CFAIL{"checks_state\n== 'FAILURE'?"}
        SET_ECI["● set pr_state='ejected_ci_failure'\n━━━━━━━━━━\nejection_cause='ci_failure'\nINJECTED into result"]
        SET_EJ["set pr_state='ejected'\n━━━━━━━━━━\nno ejection_cause field\n(absent, not null)"]
    end

    subgraph CIScope ["CIRunScope — Frozen Input Scope (core/types)"]
        direction LR
        WF["workflow : str|None\n━━━━━━━━━━\ne.g. 'tests.yml'"]
        HS["● head_sha : str|None\n━━━━━━━━━━\ngit rev-parse HEAD\nor caller-supplied"]
    end

    subgraph CIResult ["● wait_for_ci Return Dict (tools_ci.py)"]
        direction TB
        RUNID["run_id : int|None\n━━━━━━━━━━\nGitHub Actions run ID"]
        CONC["conclusion : str\n━━━━━━━━━━\nsuccess|failure|cancelled\naction_required|timed_out\nno_runs|error|unknown"]
        FJOBS["failed_jobs : list\n━━━━━━━━━━\nalways present\nempty on billing errors"]
        HSHA["● head_sha : str\n━━━━━━━━━━\nCONDITIONAL: present only\nwhen scope.head_sha truthy\ninjected by tool layer"]
    end

    subgraph ConsumerGate ["Recipe Routing Gate (on_result)"]
        direction TB
        ROUTE{"pr_state value?"}
        R1["ejected_ci_failure\n→ diagnose_ci"]
        R2["ejected\n→ queue_ejected_fix"]
        R3["merged|stalled|timeout\n→ other routes"]
    end

    %% FLOW %%
    CHECKS --> CFAIL
    INQUEUE --> CFAIL
    QSTATE --> CFAIL
    CFAIL -->|"FAILURE"| SET_ECI
    CFAIL -->|"other"| SET_EJ
    SET_ECI --> PS
    SET_ECI --> EC
    SET_EJ --> PS
    PS --> SUC
    PS --> REASON
    PS --> STALL

    HS --> CIResult
    WF --> CIResult
    RUNID --> CONC
    CONC --> FJOBS
    FJOBS --> HSHA

    PS --> ROUTE
    EC --> ROUTE
    ROUTE --> R1
    ROUTE --> R2
    ROUTE --> R3

    HSHA -.->|"verifies HEAD\nafter force-push"| R2

    %% CLASS ASSIGNMENTS %%
    class PS,EC,HSHA,SET_ECI,HS,CFAIL gap;
    class SUC,REASON,STALL,RUNID,CONC,FJOBS output;
    class CHECKS,INQUEUE,QSTATE,WF stateNode;
    class SET_EJ handler;
    class ROUTE,R1,R2,R3 detector;
    class InternalPoll phase;
Loading

Error/Resilience Diagram

%%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 50, 'curve': 'basis'}}}%%
flowchart TB
    %% CLASS DEFINITIONS %%
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000;

    END_OK([release_issue_success])
    END_FAIL([release_issue_failure\n━━━━━━━━━━\nhuman escalation\nclone preserved])
    END_DIAG([diagnose_ci])

    subgraph MQLoop ["● Merge Queue Poll Loop (merge_queue.py)"]
        direction TB
        POLL["GraphQL fetch\n━━━━━━━━━━\nPR + queue state"]
        POLL_ERR{"Exception\ncaught?"}
        TIMEOUT_CHK{"deadline\nexceeded?"}
        STALL_CHK{"stall retries\n≥ max (3)?"}
    end

    subgraph EjectGate ["● Ejection Classification Gate (merge_queue.py)"]
        direction TB
        EJECT_DECISION{"● checks_state\n== 'FAILURE'?"}
        CI_EJ["● ejected_ci_failure\n━━━━━━━━━━\nejection_cause=ci_failure\nskips conflict resolution"]
        CONF_EJ["ejected\n━━━━━━━━━━\nno cause field\nconflict resolution"]
    end

    subgraph StallBreaker ["Stall Circuit Breaker (merge_queue.py)"]
        direction LR
        TOGGLE["_toggle_auto_merge\n━━━━━━━━━━\ndisable → 2s → re-enable\nbackoff: 30/60/120s"]
        TOGGLE_ERR{"Exception\ncaught?"}
    end

    subgraph ConflictPath ["Conflict Resolution Path (implementation.yaml)"]
        direction TB
        QFIX["queue_ejected_fix\n━━━━━━━━━━\nresolve-merge-conflicts skill"]
        ESC_CHK{"escalation\nrequired?"}
        REPUSH["re_push_queue_fix\n━━━━━━━━━━\nforce-push"]
        REPUSH_FAIL{"push\nfailed?"}
    end

    subgraph CIGate ["● CI Gate After Re-Push (implementation.yaml + tools_ci.py)"]
        direction TB
        CI_WATCH["● ci_watch_post_queue_fix\n━━━━━━━━━━\nwait_for_ci, timeout=300s\nincludes head_sha"]
        CI_CONC{"conclusion\n== success?"}
        DETECT["detect_ci_conflict\n━━━━━━━━━━\ngit merge-base check\n(stale base?)"]
        DETECT_CHK{"stale\nbase?"}
        CI_CF["ci_conflict_fix\n━━━━━━━━━━\nresolve-merge-conflicts"]
    end

    subgraph ManifestGates ["● Post-Rebase Manifest Validation (SKILL.md)"]
        direction TB
        STEP5A["● Step 5a: manifest validity\n━━━━━━━━━━\ncargo metadata / node JSON.parse\nuv lock --check / tomllib"]
        STEP5A_CHK{"manifest\nvalid?"}
        STEP5B["● Step 5b: duplicate key scan\n━━━━━━━━━━\nTOML dep sections\nJSON object_pairs_hook"]
        STEP5B_CHK{"duplicates\nfound?"}
        REBASE_ABORT["git rebase --abort\n━━━━━━━━━━\nescalation_required=true"]
    end

    %% POLL LOOP FLOW %%
    POLL --> POLL_ERR
    POLL_ERR -->|"yes: log + retry"| POLL
    POLL_ERR -->|"no"| TIMEOUT_CHK
    TIMEOUT_CHK -->|"yes"| END_FAIL
    TIMEOUT_CHK -->|"no"| STALL_CHK
    STALL_CHK -->|"yes: stalled"| END_FAIL
    STALL_CHK -->|"no: stall attempt"| TOGGLE
    TOGGLE --> TOGGLE_ERR
    TOGGLE_ERR -->|"yes: log + increment"| STALL_CHK
    TOGGLE_ERR -->|"no: success"| POLL

    %% EJECTION GATE %%
    STALL_CHK -->|"ejection confirmed"| EJECT_DECISION
    EJECT_DECISION -->|"FAILURE"| CI_EJ
    EJECT_DECISION -->|"other"| CONF_EJ
    CI_EJ --> END_DIAG
    CONF_EJ --> QFIX

    %% CONFLICT PATH %%
    QFIX --> STEP5A
    STEP5A --> STEP5A_CHK
    STEP5A_CHK -->|"invalid"| REBASE_ABORT
    STEP5A_CHK -->|"valid"| STEP5B
    STEP5B --> STEP5B_CHK
    STEP5B_CHK -->|"duplicates"| REBASE_ABORT
    STEP5B_CHK -->|"clean"| ESC_CHK
    REBASE_ABORT --> ESC_CHK
    ESC_CHK -->|"true"| END_FAIL
    ESC_CHK -->|"false"| REPUSH
    REPUSH --> REPUSH_FAIL
    REPUSH_FAIL -->|"yes"| END_FAIL
    REPUSH_FAIL -->|"no"| CI_WATCH

    %% CI GATE %%
    CI_WATCH --> CI_CONC
    CI_CONC -->|"yes"| END_OK
    CI_CONC -->|"no"| DETECT
    DETECT --> DETECT_CHK
    DETECT_CHK -->|"yes: stale base"| CI_CF
    DETECT_CHK -->|"no: code failure"| END_DIAG
    CI_CF --> ESC_CHK

    %% CLASS ASSIGNMENTS %%
    class END_OK,END_FAIL,END_DIAG terminal;
    class POLL,TOGGLE handler;
    class POLL_ERR,TOGGLE_ERR,TIMEOUT_CHK,STALL_CHK gap;
    class EJECT_DECISION,CI_CONC,DETECT_CHK,STEP5A_CHK,STEP5B_CHK,ESC_CHK,REPUSH_FAIL detector;
    class CI_EJ,CONF_EJ,REBASE_ABORT output;
    class QFIX,REPUSH,CI_WATCH,DETECT,CI_CF handler;
    class STEP5A,STEP5B phase;
Loading

Closes #627

Implementation Plan

Plan files:

  • /home/talon/projects/autoskillit-runs/impl-20260405-122055-152199/.autoskillit/temp/make-plan/queue_ejection_loop_plan_2026-04-05_122055_part_a.md
  • /home/talon/projects/autoskillit-runs/impl-20260405-122055-152199/.autoskillit/temp/make-plan/queue_ejection_loop_plan_2026-04-05_122055_part_b.md

🤖 Generated with Claude Code via AutoSkillit

Token Usage Summary

Step uncached output cache_read cache_write count time
plan 37 31.7k 1.9M 113.2k 1 11m 19s
review 3.4k 5.6k 147.3k 41.5k 1 5m 45s
verify 44 35.4k 1.9M 144.8k 2 11m 15s
implement 100 33.5k 4.6M 123.5k 2 12m 17s
audit_impl 15 14.0k 279.5k 44.2k 1 3m 46s
open_pr 33 30.5k 1.2M 68.1k 1 10m 58s
Total 3.6k 150.8k 9.9M 535.3k 55m 23s

Trecek and others added 2 commits April 5, 2026 12:47
… in wait_for_ci

Gap 2: DefaultMergeQueueWatcher._make_result now accepts ejection_cause; returns
pr_state='ejected_ci_failure' + ejection_cause='ci_failure' when checks_state=='FAILURE'
at both the CLOSED terminal and the confirmed-not-in-queue ejection path.

Gap 5: wait_for_ci tool handler now includes head_sha in the returned JSON when
git rev-parse HEAD succeeds, enabling orchestrators to verify CI results correspond
to the current HEAD after a force-push.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…, 3, 4, 6)

- Insert ci_watch_post_queue_fix step in implementation.yaml, remediation.yaml,
  and implementation-groups.yaml between re_push_queue_fix and reenter_merge_queue
  to validate CI on the rebased feature branch before re-entering the merge queue
- Add ejected_ci_failure route in wait_for_queue.on_result (before ejected) in all
  three recipes to route CI-failure ejections to diagnose_ci instead of the
  conflict-resolution path
- Add Steps 5a and 5b to resolve-merge-conflicts/SKILL.md: language-aware manifest
  validation and duplicate key scanning after pre-commit, with escalation on failure
- Update clean-rebase success routing in Step 2 to go through Step 5a instead of
  skipping directly to Step 7
- Add regression tests for all structural invariants across all three recipe fixtures

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator Author

@Trecek Trecek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AutoSkillit PR Review — Verdict: changes_requested

8 actionable findings (warning severity) require fixes before merge.

Comment thread src/autoskillit/execution/merge_queue.py Outdated
Comment thread src/autoskillit/execution/merge_queue.py Outdated
cwd=cwd,
)

# Include head_sha used for this CI check so orchestrators can verify
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[info] slop: Two-line comment restates what the code does. The second sentence duplicates the first. A single concise comment is sufficient.


# Include head_sha used for this CI check so orchestrators can verify
# CI results correspond to the current HEAD after a force-push.
if scope.head_sha:
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[info] defense: scope.head_sha type contract is not visible here. If the attribute is Optional[str], the truthiness guard is correct. If it can be an empty string sentinel, document whether absence means 'not applicable' vs 'could not determine'.

assert "Invalid repo format" in result["reason"]


class TestEjectionEnrichment:
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[info] tests: No test covers checks_state='ERROR' at ejection. The production code only checks == 'FAILURE'; any other non-None value falls through to plain 'ejected'. A test for ERROR would prevent regressions if the condition is accidentally widened.

Comment thread tests/recipe/test_merge_prs_queue.py
]
assert ci_failure_routes, (
f"wait_for_queue.on_result must route ejected_ci_failure to diagnose_ci"
f" in {recipe_fixture}"
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[warning] tests: The ejected_ci_failure condition is matched with 'in' (substring) while the plain ejected condition uses exact string equality. This asymmetry is fragile: minor whitespace variation in the recipe YAML would cause ejected_idx to be None, producing a misleading 'ejected route must still exist' failure rather than an ordering failure.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Investigated — this is intentional. The actual YAML condition string is '${{ result.pr_state }} == ejected_ci_failure' (a full template expression). Substring match ('ejected_ci_failure' in w) is the only correct approach to match this full expression. The ejected condition uses exact equality because its full expression '${{ result.pr_state }} == ejected' can be matched exactly and is unambiguous. The asymmetry is deliberate: substring is needed for ejected_ci_failure because it appears inside a longer expression string, not as a standalone value.

assert "subtype" not in result


# ---------------------------------------------------------------------------
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[info] tests: The assertion only verifies head_sha is present and correct. It does not assert that the CI watcher result fields (run_id, conclusion, failed_jobs) are also present in the merged output. A bug where the merge returns only {head_sha: ...} would not be caught.

SubprocessResult(
returncode=0,
stdout="deadbeef1234\n",
stderr="",
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[info] tests: The test checks that head_sha and exit_code are absent on git failure, but does not verify the CI watcher result fields are still present. If the implementation accidentally returns an empty dict on git failure, this test would still pass.

on_failure: release_issue_failure
skip_when_false: "inputs.open_pr"

ci_watch_post_queue_fix:
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[info] cohesion: ci_watch_post_queue_fix in remediation.yaml is byte-for-byte identical to implementation.yaml, including skip_when_false: 'inputs.open_pr'. The remediation flow operates on already-open PRs — confirm whether this skip_when_false guard is semantically correct for the remediation recipe context.

Copy link
Copy Markdown
Collaborator Author

@Trecek Trecek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AutoSkillit review found 8 blocking issues (verdict: changes_requested). See inline comments for details.

@Trecek Trecek added this pull request to the merge queue Apr 5, 2026
Merged via the queue into integration with commit 03ff0c7 Apr 5, 2026
2 checks passed
@Trecek Trecek deleted the queue-ejection-loop-missing-ci-gate-blind-ejection-routing-a/627 branch April 5, 2026 20:58
github-merge-queue bot pushed a commit that referenced this pull request Apr 13, 2026
## Summary

`DefaultMergeQueueWatcher.wait` in
`src/autoskillit/execution/merge_queue.py:164-270` classifies PR states
**by elimination**: the final `return _make_result(False, "ejected",
...)` at line 260 fires whenever no other gate (merged, closed, stalled,
pending, CI-failed) has matched. Nothing in the pipeline requires a
**positive signal** of ejection, and three separate recurrences (PRs
#422, #519, #628, now #802) have each patched one new signal into the
elimination tree without eliminating the elimination.

Part A delivers architectural immunity at three layers:

1. **Data layer immunity** — `_QUERY` selects a new `mergeable` field;
`PRFetchState` gains `mergeable` and `auto_merge_present`; a
module-level `RuntimeError` (mirroring `recipe/io.py:126-161`) asserts
`_QUERY` selections match `PRFetchState.__required_keys__` at import
time. Adding a field to one without the other fails at every import,
every test run, every server startup.
2. **Classifier layer immunity** — Extract a pure
`_classify_pr_state(state) -> ClassificationResult` function from
`wait()`. The classifier uses **positive-signal gates only**: every
terminal return originates from a direct positive signal. No
fall-through to `EJECTED`. When no positive signal matches, the
classifier raises `ClassifierInconclusive`, which `wait()` handles by
continuing to poll within a bounded retry budget. `DROPPED_HEALTHY` is
introduced as a positive-signal terminal for "PR is healthy but
auto_merge was cleared."
3. **Type system immunity** — Introduce `PRState(StrEnum)` in
`_type_enums.py` with every terminal value. Replace free-form strings in
`_make_result`, the classifier, and tests. Wire into protocol
docstrings.

Part A also ships the minimal recipe routing to make the fix usable: a
new `DROPPED_HEALTHY → reenter_merge_queue_cheap` arm on all three
recipe YAMLs, plus the new `reenter_merge_queue_cheap` step itself. Full
cross-recipe parity enforcement is deferred to the follow-up part.

## Architecture Impact

### State Lifecycle Diagram

```mermaid
%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 65, 'curve': 'basis'}}}%%
flowchart TB
    %% CLASS DEFINITIONS %%
    classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000;
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;

    %% ENTRY %%
    ENTRY(["Tool call:<br/>wait_for_merge_queue"])

    subgraph Gates ["VALIDATION GATES — ordered"]
        direction TB
        G0["Import-time Schema Guard<br/>━━━━━━━━━━<br/>Key-set parity: PRFetchState ↔ _QUERY_FIELD_MAP<br/>GraphQL coverage: every field in _QUERY<br/>Raises RuntimeError — module won't load"]
        G1["● _require_enabled()<br/>━━━━━━━━━━<br/>tools_ci.py gate<br/>Short-circuits ALL tools if plugin disabled<br/>Returns structured error string"]
        G2["Runtime Input Guard<br/>━━━━━━━━━━<br/>repo non-empty AND contains '/'\nReturns {success:false, pr_state:'error'}"]
        G3["Watcher Presence Guard<br/>━━━━━━━━━━<br/>merge_queue_watcher is not None<br/>Returns {pr_state:'error', reason:'not configured'}"]
    end

    subgraph SnapshotLayer ["INIT_ONLY — PRFetchState (per-cycle snapshot)"]
        direction LR
        F1["● merged: bool<br/>━━━━━━━━━━<br/>INIT_ONLY"]
        F2["● state: str<br/>━━━━━━━━━━<br/>INIT_ONLY"]
        F3["● mergeable: str<br/>━━━━━━━━━━<br/>INIT_ONLY"]
        F4["● in_queue: bool<br/>━━━━━━━━━━<br/>INIT_ONLY"]
        F5["● queue_state: str|None<br/>━━━━━━━━━━<br/>INIT_ONLY"]
        F6["● checks_state: str|None<br/>━━━━━━━━━━<br/>INIT_ONLY"]
        F7["● auto_merge_present: bool<br/>━━━━━━━━━━<br/>INIT_ONLY"]
    end

    subgraph PollLoop ["MUTABLE — Loop-scoped counters (wait() local vars)"]
        direction LR
        C1["● stall_retries_attempted<br/>━━━━━━━━━━<br/>+1 after each toggle attempt<br/>Never decremented<br/>Budget ceiling: max_stall_retries"]
        C2["● not_in_queue_cycles<br/>━━━━━━━━━━<br/>+1 when in_queue=False<br/>Reset→0 on re-entry OR post-toggle<br/>Confirmation window gate"]
        C3["● inconclusive_count<br/>━━━━━━━━━━<br/>+1 only after confirmation window met<br/>Never decremented<br/>Ceiling: _max_inconclusive_retries"]
    end

    subgraph ClassResult ["INIT_ONLY — ClassificationResult (frozen=True)"]
        direction LR
        CR1["● terminal: PRState<br/>━━━━━━━━━━<br/>frozen=True<br/>structurally immutable"]
        CR2["● reason: str<br/>━━━━━━━━━━<br/>frozen=True<br/>structurally immutable"]
    end

    subgraph PRStateEnum ["● PRState Enum — all 7 terminal values"]
        direction LR
        PS1["merged<br/>━━━━━━━━━━<br/>positive signal required"]
        PS2["ejected<br/>━━━━━━━━━━<br/>state=CLOSED or<br/>mergeable=CONFLICTING"]
        PS3["ejected_ci_failure<br/>━━━━━━━━━━<br/>CI checks failed<br/>sub-classification of ejected"]
        PS4["stalled<br/>━━━━━━━━━━<br/>stall retries exhausted"]
        PS5["dropped_healthy<br/>━━━━━━━━━━<br/>auto_merge cleared,<br/>no structural cause"]
        PS6["timeout<br/>━━━━━━━━━━<br/>deadline exceeded"]
        PS7["error<br/>━━━━━━━━━━<br/>unexpected exception<br/>or config missing"]
    end

    subgraph RecipeRules ["● Recipe Semantic Rules — rules_ci.py (scoped)"]
        direction TB
        RR1["● Rule 5: wait-for-merge-queue-routing-covers-all-pr-states<br/>━━━━━━━━━━<br/>ERROR: missing explicit arm for any non-ERROR PRState<br/>SCOPED: only recipes with 'release_issue_timeout' step"]
        RR2["● Rule 6: wait-for-merge-queue-routing-conforms-to-expected-targets<br/>━━━━━━━━━━<br/>ERROR: fallback/on_failure must route → release_issue_timeout<br/>SCOPED: same scope guard as Rule 5"]
        SCOPE["● _recipe_uses_release_issue_timeout()<br/>━━━━━━━━━━<br/>New scope guard — early-exit if recipe<br/>does NOT define 'release_issue_timeout' step<br/>Prevents false positives on merge-prs.yaml family"]
    end

    subgraph RecipeRouting ["DERIVED — Recipe Routing on PRState (remediation / implementation)"]
        direction TB
        RR_merged["pr_state=merged<br/>→ release_issue_success"]
        RR_ejci["pr_state=ejected_ci_failure<br/>→ diagnose_ci"]
        RR_drop["pr_state=dropped_healthy<br/>→ reenter_merge_queue_cheap"]
        RR_ej["pr_state=ejected<br/>→ queue_ejected_fix"]
        RR_stall["pr_state=stalled<br/>→ reenroll_stalled_pr"]
        RR_to["pr_state=timeout<br/>→ release_issue_timeout"]
        RR_fb["(fallback / on_failure)<br/>→ release_issue_timeout"]
    end

    subgraph NoResume ["RESUME DETECTION — None"]
        NR["No checkpointing<br/>━━━━━━━━━━<br/>All loop counters are process-local<br/>Reset to zero on any restart<br/>Stall exhaustion NOT sticky across restarts<br/>Fresh timeout budget on every invocation"]
    end

    %% FLOW %%
    ENTRY --> G0
    G0 --> G1
    G1 --> G2
    G2 --> G3
    G3 --> SnapshotLayer
    SnapshotLayer --> PollLoop
    PollLoop --> SnapshotLayer
    PollLoop --> ClassResult
    ClassResult --> PRStateEnum
    PRStateEnum --> RecipeRouting

    SCOPE --> RR1
    SCOPE --> RR2
    RR1 -.->|"validates"| RecipeRouting
    RR2 -.->|"validates"| RecipeRouting

    G3 -.->|"no resume path"| NoResume

    %% CLASS ASSIGNMENTS %%
    class ENTRY terminal;
    class G0 detector;
    class G1,G2,G3 detector;
    class F1,F2,F3,F4,F5,F6,F7 stateNode;
    class C1,C2,C3 phase;
    class CR1,CR2 stateNode;
    class PS1,PS2,PS3,PS4,PS5,PS6,PS7 handler;
    class RR1,RR2 detector;
    class SCOPE newComponent;
    class RR_merged,RR_ejci,RR_drop,RR_ej,RR_stall output;
    class RR_to,RR_fb gap;
    class NR cli;
```

### Process Flow Diagram

```mermaid
%%{init: {'flowchart': {'nodeSpacing': 44, 'rankSpacing': 54, 'curve': 'basis'}}}%%
flowchart TB
    %% CLASS DEFINITIONS %%
    classDef terminal  fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler   fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase     fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef detector  fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef output    fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;

    %% ── TERMINALS ──────────────────────────────────────────────────────── %%
    START([START: recipe reaches wait_for_queue step])
    DONE([DONE: recipe proceeds to terminal step])

    %% ── RECIPE LAYER ────────────────────────────────────────────────────── %%
    subgraph RecipeLayer ["● Recipe Layer  (remediation / implementation / implementation-groups)"]
        direction TB
        wait_step["● wait_for_queue step<br/>━━━━━━━━━━<br/>tool: wait_for_merge_queue<br/>timeout_seconds: 900<br/>poll_interval: 15"]
        route_mq{"● route on<br/>result.pr_state"}
    end

    %% ── TOOL GATE ───────────────────────────────────────────────────────── %%
    subgraph ToolGate ["● tools_ci.py · wait_for_merge_queue MCP Tool"]
        direction TB
        gate_chk{"_require_enabled()?"}
        gate_err["return gate error msg"]
        watcher_chk{"merge_queue_watcher<br/>configured?"}
        watcher_err["return pr_state=error"]
        delegate["delegate to<br/>DefaultMergeQueueWatcher.wait()"]
    end

    %% ── POLLING LOOP ────────────────────────────────────────────────────── %%
    subgraph PollingLoop ["● merge_queue.py · DefaultMergeQueueWatcher.wait()  — main polling loop"]
        direction TB
        repo_ok{"repo format<br/>valid?"}
        repo_err_exit["return pr_state=error"]
        deadline{"monotonic() &lt;<br/>deadline?"]
        fetch["_fetch_pr_and_queue_state()<br/>━━━━━━━━━━<br/>POST github.com/graphql<br/>reads: merged, state, mergeable,<br/>merge_state_status, auto_merge_*,<br/>in_queue, queue_state, checks_state"]
        fetch_ok{"fetch OK?"}
        sleep_retry["sleep poll_interval<br/>continue loop"]
        in_queue_chk{"state.in_queue<br/>== True?"}
        unmergeable_chk{"queue_state ==<br/>UNMERGEABLE?"}
        reset_cycles["reset not_in_queue_cycles = 0<br/>sleep poll_interval, continue"]
        inc_cycles["not_in_queue_cycles += 1"]
        classify_call["_classify_pr_state(state)<br/>━━━━━━━━━━<br/>pure function — no I/O"]
        inconcl_chk{"ClassifierInconclusive<br/>raised?"}
        win_chk{"not_in_queue_cycles &gt;=<br/>confirmation_cycles (2)?"}
        budget_chk{"inconclusive_count &lt;<br/>max_inconclusive_retries (5)?"}
        inconcl_sleep["inconclusive_count += 1<br/>sleep poll_interval, continue"]
        dispatch{"terminal state?"}
        merged_bypass["MERGED: bypass confirmation<br/>return immediately"]
        closed_bypass["CLOSED (EJECTED / EJECTED_CI_FAILURE):<br/>bypass confirmation<br/>return immediately"]
        timeout_exit["return pr_state=timeout"]
    end

    %% ── CLASSIFIER ──────────────────────────────────────────────────────── %%
    subgraph Classifier ["● _classify_pr_state()  — pure decision tree (no I/O)"]
        direction TB
        c_merged{"state.merged?"}
        c_closed{"state == CLOSED?"}
        c_ci_closed{"checks_state in<br/>FAILURE / ERROR?"}
        c_ci_open{"checks_state in<br/>FAILURE / ERROR?"}
        c_stall["_is_positive_stall()<br/>━━━━━━━━━━<br/>auto_merge_enabled_at != None<br/>AND merge_state_status in<br/>CLEAN / HAS_HOOKS"]
        c_conflict{"mergeable ==<br/>CONFLICTING?"}
        c_dropped["● _is_positive_dropped_healthy()<br/>━━━━━━━━━━<br/>OPEN + MERGEABLE + CLEAN<br/>+ checks OK + no auto_merge<br/>+ not in queue"]
        c_pending{"checks_state in<br/>PENDING / EXPECTED?"}
        c_inconclusive["raise ClassifierInconclusive<br/>reason: no positive signal"]
        c_inconclusive2["raise ClassifierInconclusive<br/>reason: checks still running"]
        c_ret_merged["return MERGED"]
        c_ret_ejci_closed["return EJECTED_CI_FAILURE"]
        c_ret_ej_closed["return EJECTED"]
        c_ret_ejci_open["return EJECTED_CI_FAILURE"]
        c_ret_stall["return STALLED"]
        c_ret_ej_conflict["return EJECTED"]
        c_ret_dropped["● return DROPPED_HEALTHY"]
    end

    %% ── STALL RECOVERY ──────────────────────────────────────────────────── %%
    subgraph StallRecovery ["Stall Recovery Sub-flow"]
        direction TB
        stall_grace{"stall_duration &lt;<br/>stall_grace_period (60 s)?"}
        stall_budget{"stall_retries_attempted &lt;<br/>max_stall_retries (3)?"}
        do_toggle["_toggle_auto_merge()<br/>━━━━━━━━━━<br/>writes: GitHub disable auto_merge<br/>sleep 2 s<br/>writes: GitHub enable auto_merge SQUASH<br/>stall_retries_attempted += 1<br/>reset not_in_queue_cycles = 0<br/>sleep backoff: min(30×2ⁿ, 120 s)"]
        stall_grace_sleep["still in grace period<br/>sleep poll_interval, continue"]
    end

    %% ── RECIPE ROUTING TARGETS ──────────────────────────────────────────── %%
    subgraph RecipeRouting ["● Recipe-level Routing (on_result from wait_for_queue step)"]
        direction LR
        r_merged["release_issue_success<br/>━━━━━━━━━━<br/>→ register_clone_success → done"]
        r_ejci["diagnose_ci<br/>━━━━━━━━━━<br/>→ resolve_ci (retries:2)<br/>→ re_push → ci_watch"]
        r_dropped["● reenter_merge_queue_cheap<br/>━━━━━━━━━━<br/>→ wait_for_queue (loop)"]
        r_ejected["queue_ejected_fix<br/>━━━━━━━━━━<br/>→ re_push → ci_watch<br/>→ reenter_merge_queue<br/>→ wait_for_queue (loop)"]
        r_stall["reenroll_stalled_pr<br/>━━━━━━━━━━<br/>(toggle_auto_merge)<br/>→ wait_for_queue (loop)"]
        r_timeout["release_issue_timeout<br/>━━━━━━━━━━<br/>→ register_clone_success → done"]
        r_default["release_issue_timeout<br/>━━━━━━━━━━<br/>(fallback / ERROR state)"]
    end

    %% ── VALIDATION LAYER (static, not runtime) ──────────────────────────── %%
    subgraph ValidationLayer ["● rules_ci.py · Static Validation (recipe load time, not runtime)"]
        direction LR
        rule5["● wait-for-merge-queue-routing-covers-all-pr-states<br/>━━━━━━━━━━<br/>ERROR if any PRState (excl. ERROR) missing<br/>from on_result when conditions"]
        rule6["● wait-for-merge-queue-routing-conforms-to-expected-targets<br/>━━━━━━━━━━<br/>ERROR if fallback ≠ release_issue_timeout<br/>or on_failure ≠ release_issue_timeout"]
        rule_gate["_recipe_uses_release_issue_timeout()<br/>━━━━━━━━━━<br/>scopes rules to implementation/<br/>remediation recipe family only"]
    end

    %% ── PRSTATE ENUM ────────────────────────────────────────────────────── %%
    subgraph PRStateEnum ["● PRState (StrEnum) — _type_enums.py"]
        direction LR
        ps_merged["MERGED<br/>positive signal"]
        ps_ejected["EJECTED<br/>positive signal"]
        ps_ejci["EJECTED_CI_FAILURE<br/>positive signal"]
        ps_stalled["STALLED<br/>positive signal"]
        ps_dropped["● DROPPED_HEALTHY<br/>positive signal"]
        ps_timeout["TIMEOUT<br/>produced by wait() only"]
        ps_error["ERROR<br/>produced by wait() only"]
    end

    %% ═══════════════════════════════════════════════════════════════════════ %%
    %% CONNECTIONS
    %% ═══════════════════════════════════════════════════════════════════════ %%

    START --> wait_step
    wait_step --> gate_chk

    %% Tool gate
    gate_chk -->|"disabled"| gate_err
    gate_chk -->|"enabled"| watcher_chk
    gate_err --> DONE
    watcher_chk -->|"not configured"| watcher_err
    watcher_err --> DONE
    watcher_chk -->|"configured"| delegate

    %% Polling loop entry
    delegate --> repo_ok
    repo_ok -->|"invalid"| repo_err_exit
    repo_err_exit --> route_mq
    repo_ok -->|"valid"| deadline

    %% Main while loop
    deadline -->|"exceeded"| timeout_exit
    timeout_exit --> route_mq
    deadline -->|"time remaining"| fetch

    %% Fetch result
    fetch --> fetch_ok
    fetch_ok -->|"failed (HTTP/GraphQL error)"| sleep_retry
    sleep_retry -->|"loop back"| deadline
    fetch_ok -->|"success"| in_queue_chk

    %% Queue membership gate
    in_queue_chk -->|"in queue"| unmergeable_chk
    unmergeable_chk -->|"UNMERGEABLE"| closed_bypass
    unmergeable_chk -->|"other queue state"| reset_cycles
    reset_cycles -->|"loop back"| deadline
    in_queue_chk -->|"not in queue"| inc_cycles

    %% Classification
    inc_cycles --> classify_call
    classify_call --> inconcl_chk
    inconcl_chk -->|"inconclusive"| win_chk

    %% Confirmation window — inconclusive path
    win_chk -->|"window not met yet"| sleep_retry
    win_chk -->|"window met"| budget_chk
    budget_chk -->|"budget remaining"| inconcl_sleep
    inconcl_sleep -->|"loop back"| deadline
    budget_chk -->|"budget exhausted"| timeout_exit

    %% Classification result dispatch
    inconcl_chk -->|"MERGED"| merged_bypass
    inconcl_chk -->|"CLOSED (EJECTED/EJECTED_CI_FAILURE)"| closed_bypass
    inconcl_chk -->|"other terminal"| dispatch

    %% Confirmation window — other terminals
    dispatch -->|"not_in_queue_cycles < confirmation_cycles"| sleep_retry
    dispatch -->|"confirmation met: STALLED"| stall_grace
    dispatch -->|"confirmation met: EJECTED_CI_FAILURE"| route_mq
    dispatch -->|"confirmation met: EJECTED"| route_mq
    dispatch -->|"confirmation met: DROPPED_HEALTHY"| route_mq

    %% Stall recovery
    stall_grace -->|"in grace period"| stall_grace_sleep
    stall_grace_sleep -->|"loop back"| deadline
    stall_grace -->|"grace expired"| stall_budget
    stall_budget -->|"retries remaining"| do_toggle
    do_toggle -->|"loop back"| deadline
    stall_budget -->|"retries exhausted"| route_mq

    %% Bypasses feed routing
    merged_bypass --> route_mq
    closed_bypass --> route_mq

    %% Classifier internal edges
    classify_call --> c_merged
    c_merged -->|"yes"| c_ret_merged
    c_merged -->|"no"| c_closed
    c_closed -->|"yes"| c_ci_closed
    c_ci_closed -->|"CI failure"| c_ret_ejci_closed
    c_ci_closed -->|"no CI failure"| c_ret_ej_closed
    c_closed -->|"no"| c_ci_open
    c_ci_open -->|"CI failure"| c_ret_ejci_open
    c_ci_open -->|"no"| c_stall
    c_stall -->|"positive stall"| c_ret_stall
    c_stall -->|"no"| c_conflict
    c_conflict -->|"CONFLICTING"| c_ret_ej_conflict
    c_conflict -->|"no"| c_dropped
    c_dropped -->|"positive dropped"| c_ret_dropped
    c_dropped -->|"no"| c_pending
    c_pending -->|"checks running"| c_inconclusive2
    c_pending -->|"no positive signal"| c_inconclusive

    %% Recipe routing on each PRState
    route_mq -->|"merged"| r_merged
    route_mq -->|"ejected_ci_failure"| r_ejci
    route_mq -->|"dropped_healthy"| r_dropped
    route_mq -->|"ejected"| r_ejected
    route_mq -->|"stalled (after max retries)"| r_stall
    route_mq -->|"timeout"| r_timeout
    route_mq -->|"default / error"| r_default

    %% Recipe routing → loops
    r_dropped -->|"loop: re-enqueue"| wait_step
    r_ejected -->|"loop: fix + re-enqueue"| wait_step
    r_stall -->|"loop: toggle + re-enqueue"| wait_step

    r_merged --> DONE
    r_ejci --> DONE
    r_timeout --> DONE
    r_default --> DONE

    %% Validation (static, separate from runtime flow)
    rule_gate --> rule5
    rule_gate --> rule6

    %% CLASS ASSIGNMENTS %%
    class START,DONE terminal;
    class wait_step,dispatch,route_mq stateNode;
    class gate_chk,watcher_chk,repo_ok,deadline,fetch_ok,in_queue_chk,unmergeable_chk,inconcl_chk,win_chk,budget_chk stateNode;
    class fetch,delegate,classify_call,inc_cycles,reset_cycles,inconcl_sleep,do_toggle handler;
    class merged_bypass,closed_bypass,sleep_retry phase;
    class gate_err,watcher_err,repo_err_exit,timeout_exit,c_inconclusive,c_inconclusive2 detector;
    class c_merged,c_closed,c_ci_closed,c_ci_open,c_stall,c_conflict,c_dropped,c_pending stateNode;
    class c_ret_merged,c_ret_ejci_closed,c_ret_ej_closed,c_ret_ejci_open,c_ret_stall,c_ret_ej_conflict,c_ret_dropped output;
    class stall_grace,stall_budget stateNode;
    class stall_grace_sleep phase;
    class r_merged,r_ejci,r_dropped,r_ejected,r_stall,r_timeout,r_default handler;
    class rule5,rule6,rule_gate detector;
    class ps_merged,ps_ejected,ps_ejci,ps_stalled,ps_dropped,ps_timeout,ps_error output;
```

### Error/Resilience Diagram

```mermaid
%%{init: {'flowchart': {'nodeSpacing': 40, 'rankSpacing': 52, 'curve': 'basis'}}}%%
flowchart TB
    %% CLASS DEFINITIONS %%
    classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
    classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
    classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
    classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
    classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
    classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000;
    classDef terminal fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
    classDef integration fill:#c62828,stroke:#ef9a9a,stroke-width:2px,color:#fff;

    START(["TOOL CALL<br/>wait_for_merge_queue"])

    %% ── IMPORT-TIME VALIDATION ─────────────────────────── %%
    subgraph ImportGates ["IMPORT-TIME VALIDATION GATES (merge_queue.py ●)"]
        direction LR
        FIELD_SYNC["● _QUERY_FIELD_MAP sync<br/>━━━━━━━━━━<br/>keys == PRFetchState keys<br/>→ RuntimeError on mismatch"]
        FIELD_PRES["● _QUERY field presence<br/>━━━━━━━━━━<br/>all map paths in GraphQL query<br/>→ RuntimeError on mismatch"]
    end

    %% ── MCP ENTRY GATES ────────────────────────────────── %%
    subgraph MCPGates ["MCP ENTRY GATES (tools_ci.py ●)"]
        direction LR
        KITCHEN["● Kitchen gate<br/>━━━━━━━━━━<br/>_require_enabled()<br/>→ error JSON if not open"]
        WATCHER["● Watcher null check<br/>━━━━━━━━━━<br/>merge_queue_watcher is None<br/>→ pr_state='error' JSON"]
    end

    %% ── POLLING LOOP ───────────────────────────────────── %%
    subgraph PollingLoop ["POLLING LOOP (DefaultMergeQueueWatcher.wait ●)"]
        DEADLINE{"Deadline<br/>exceeded?"}
        FETCH["_fetch_pr_and_queue_state()<br/>━━━━━━━━━━<br/>GraphQL round-trip<br/>raise_for_status()"]
        FETCH_FAIL["● Fetch Exception Handler<br/>━━━━━━━━━━<br/>catch Exception<br/>log warning + sleep(poll_interval)<br/>→ retry poll cycle"]
        IN_QUEUE{"In merge<br/>queue?"}
        UNMERGE["UNMERGEABLE in queue<br/>━━━━━━━━━━<br/>queue_state == UNMERGEABLE<br/>→ immediate EJECTED"]
    end

    %% ── CLASSIFIER ─────────────────────────────────────── %%
    subgraph ClassifierSG ["CLASSIFIER (_classify_pr_state ●) — positive signals only"]
        direction TB
        CLASSIFY["● _classify_pr_state(state)<br/>━━━━━━━━━━<br/>evaluates positive gates in order<br/>never falls through to EJECTED"]
        MERGED_GATE["merged == True<br/>→ MERGED<br/>(definitive — bypasses window)"]
        CLOSED_GATE["state == CLOSED<br/>━━━━━━━━━━<br/>checks FAILURE/ERROR → EJECTED_CI_FAILURE<br/>else → EJECTED<br/>(definitive — bypasses window)"]
        CHECKS_GATE["checks_state FAILURE/ERROR<br/>→ EJECTED_CI_FAILURE<br/>(requires confirmation window)"]
        STALL_GATE["● stall signal<br/>━━━━━━━━━━<br/>auto_merge_enabled_at present<br/>+ merge_state CLEAN/HAS_HOOKS<br/>→ STALLED"]
        CONFLICT_GATE["mergeable == CONFLICTING<br/>→ EJECTED<br/>(requires confirmation window)"]
        DROP_GATE["● _is_positive_dropped_healthy()<br/>━━━━━━━━━━<br/>OPEN+MERGEABLE+CLEAN<br/>checks None/SUCCESS<br/>auto_merge=False, not in queue<br/>→ DROPPED_HEALTHY"]
        INCONCLUSIVE["● ClassifierInconclusive<br/>━━━━━━━━━━<br/>no positive signal matched<br/>.state snapshot for logging"]
    end

    %% ── CONFIRMATION WINDOW ────────────────────────────── %%
    subgraph ConfirmWindow ["CONFIRMATION WINDOW (●)"]
        CONFIRM{"● not_in_queue_cycles<br/>< confirmation_cycles (2)?"}
        INCONCL{"● inconclusive_count<br/>>= max_retries (5)?"}
    end

    %% ── STALL RECOVERY ─────────────────────────────────── %%
    subgraph StallRec ["STALL RECOVERY (●)"]
        GRACE{"stall_duration<br/>< grace_period?"}
        STALL_BUDGET{"● stall_retries<br/>< max_retries (3)?"}
        TOGGLE["● toggle_auto_merge<br/>━━━━━━━━━━<br/>disable → sleep 2s → re-enable<br/>backoff: min(30×2ⁿ, 120)s<br/>stall_retries++ / reset window"]
        TOGGLE_FAIL["● Toggle Exception Handler<br/>━━━━━━━━━━<br/>catch Exception<br/>log warning → continue"]
    end

    %% ── TERMINAL STATES ────────────────────────────────── %%
    T_MERGED(["● MERGED<br/>success=True"])
    T_EJECTED(["● EJECTED"])
    T_ECI(["● EJECTED_CI_FAILURE"])
    T_STALLED(["● STALLED"])
    T_DROP(["● DROPPED_HEALTHY"])
    T_TIMEOUT(["● TIMEOUT"])
    T_ERROR(["● ERROR<br/>success=False"])

    %% ── RECIPE ROUTING ─────────────────────────────────── %%
    subgraph RecipeRouting ["RECIPE ROUTING (remediation.yaml ●)"]
        direction TB
        R_SUCCESS["release_issue_success<br/>━━━━━━━━━━<br/>apply staged label<br/>→ register_clone_success"]
        R_DIAGNOSE["diagnose_ci<br/>━━━━━━━━━━<br/>CI failure analysis path"]
        R_CHEAP["● reenter_merge_queue_cheap<br/>━━━━━━━━━━<br/>re-enroll healthy PR"]
        R_FIX["queue_ejected_fix<br/>━━━━━━━━━━<br/>conflict resolution cycle"]
        R_REENROLL["● reenroll_stalled_pr<br/>━━━━━━━━━━<br/>toggle_auto_merge tool<br/>→ wait_for_queue"]
        R_TIMEOUT["release_issue_timeout<br/>━━━━━━━━━━<br/>fallback + on_failure route<br/>release claim, no staged label"]
    end

    %% ── SEMANTIC VALIDATION RULES ──────────────────────── %%
    subgraph SemanticRules ["STATIC VALIDATION RULES (rules_ci.py ●) — recipe-load-time"]
        direction LR
        RULE_COV["● routing-covers-all-pr-states<br/>━━━━━━━━━━<br/>ERROR: any non-error PRState<br/>lacks explicit when arm<br/>scope: release_issue_timeout recipes"]
        RULE_CONF["● routing-conforms-to-expected-targets<br/>━━━━━━━━━━<br/>ERROR: fallback or on_failure<br/>≠ release_issue_timeout<br/>scope: release_issue_timeout recipes"]
    end

    END_OK(["PIPELINE CONTINUES<br/>issue released"])
    END_FAIL(["ESCALATION<br/>release_issue_timeout / error"])

    %% ═══════════════════════════════════
    %% CONNECTIONS
    %% ═══════════════════════════════════

    %% Entry
    START --> KITCHEN
    KITCHEN -->|"not open"| END_FAIL
    KITCHEN -->|"open"| WATCHER
    WATCHER -->|"None"| T_ERROR
    WATCHER -->|"configured"| DEADLINE

    %% Polling core
    DEADLINE -->|"exceeded"| T_TIMEOUT
    DEADLINE -->|"budget"| FETCH
    FETCH -->|"HTTP/network/GraphQL/null/date error"| FETCH_FAIL
    FETCH -->|"PRFetchState ok"| IN_QUEUE
    FETCH_FAIL -->|"sleep + retry"| DEADLINE

    %% In-queue branch
    IN_QUEUE -->|"yes"| UNMERGE
    UNMERGE -->|"UNMERGEABLE"| T_EJECTED
    UNMERGE -->|"other states"| DEADLINE
    IN_QUEUE -->|"no"| CLASSIFY

    %% Classifier dispatch
    CLASSIFY --> MERGED_GATE
    CLASSIFY --> CLOSED_GATE
    CLASSIFY --> CHECKS_GATE
    CLASSIFY --> STALL_GATE
    CLASSIFY --> CONFLICT_GATE
    CLASSIFY --> DROP_GATE
    CLASSIFY --> INCONCLUSIVE

    %% Definitive terminals (bypass confirmation window)
    MERGED_GATE --> T_MERGED
    CLOSED_GATE --> T_EJECTED
    CLOSED_GATE --> T_ECI

    %% Non-definitive → confirmation window
    CHECKS_GATE --> CONFIRM
    STALL_GATE --> CONFIRM
    CONFLICT_GATE --> CONFIRM
    DROP_GATE --> CONFIRM
    INCONCLUSIVE --> CONFIRM

    CONFIRM -->|"in window"| DEADLINE

    %% Post-confirmation dispatch
    CONFIRM -->|"window passed: EJECTED_CI_FAILURE"| T_ECI
    CONFIRM -->|"window passed: EJECTED"| T_EJECTED
    CONFIRM -->|"window passed: DROPPED_HEALTHY"| T_DROP
    CONFIRM -->|"window passed: STALLED"| GRACE
    CONFIRM -->|"window passed: inconclusive"| INCONCL
    INCONCL -->|"exhausted"| T_TIMEOUT
    INCONCL -->|"budget ok"| DEADLINE

    %% Stall recovery
    GRACE -->|"too recent"| DEADLINE
    GRACE -->|"expired"| STALL_BUDGET
    STALL_BUDGET -->|"exhausted"| T_STALLED
    STALL_BUDGET -->|"budget ok"| TOGGLE
    TOGGLE -->|"success"| DEADLINE
    TOGGLE -->|"exception"| TOGGLE_FAIL
    TOGGLE_FAIL --> DEADLINE

    %% Terminal → recipe routing
    T_MERGED --> R_SUCCESS
    T_ECI --> R_DIAGNOSE
    T_DROP --> R_CHEAP
    T_EJECTED --> R_FIX
    T_STALLED --> R_REENROLL
    T_TIMEOUT --> R_TIMEOUT
    T_ERROR --> END_FAIL

    R_SUCCESS --> END_OK
    R_DIAGNOSE --> END_FAIL
    R_CHEAP --> END_FAIL
    R_FIX --> END_FAIL
    R_REENROLL -->|"re-enters wait_for_queue"| DEADLINE
    R_TIMEOUT --> END_FAIL

    %% ═══════════════════════════════════
    %% CLASS ASSIGNMENTS
    %% ═══════════════════════════════════
    class START,END_OK,END_FAIL terminal;
    class FIELD_SYNC,FIELD_PRES,KITCHEN,WATCHER detector;
    class FETCH integration;
    class FETCH_FAIL,TOGGLE_FAIL gap;
    class DEADLINE,IN_QUEUE,CONFIRM,INCONCL,GRACE,STALL_BUDGET stateNode;
    class UNMERGE,CLASSIFY detector;
    class MERGED_GATE,CLOSED_GATE,CHECKS_GATE,CONFLICT_GATE phase;
    class STALL_GATE,DROP_GATE,INCONCLUSIVE newComponent;
    class TOGGLE handler;
    class T_MERGED,T_EJECTED,T_ECI,T_STALLED,T_DROP,T_TIMEOUT,T_ERROR terminal;
    class R_SUCCESS,R_DIAGNOSE,R_CHEAP,R_FIX,R_REENROLL,R_TIMEOUT output;
    class RULE_COV,RULE_CONF newComponent;
```

Closes #802

## Implementation Plan

Plan file:
`/home/talon/projects/autoskillit-runs/remediation-802-20260413-075451-958993/.autoskillit/temp/rectify/rectify_merge_queue_classifier_immunity_2026-04-13_110523_part_a.md`

🤖 Generated with [Claude Code](https://claude.com/claude-code) via
AutoSkillit

## Token Usage Summary

| Step | uncached | output | cache_read | cache_write | count | time |
|------|----------|--------|------------|-------------|-------|------|
| investigate | 200 | 21.1k | 986.6k | 121.4k | 1 | 8m 24s |
| rectify | 289 | 72.1k | 1.4M | 241.5k | 2 | 24m 36s |
| review | 52 | 9.8k | 176.6k | 56.1k | 1 | 9m 36s |
| dry_walkthrough | 360 | 50.5k | 3.1M | 165.8k | 2 | 16m 17s |
| implement | 1.3k | 117.5k | 16.1M | 239.7k | 2 | 37m 55s |
| assess | 310 | 18.9k | 2.1M | 69.8k | 1 | 9m 54s |
| prepare_pr | 70 | 7.5k | 256.5k | 36.9k | 1 | 2m 22s |
| run_arch_lenses | 215 | 34.1k | 868.7k | 131.3k | 3 | 11m 48s |
| compose_pr | 59 | 14.3k | 249.6k | 42.9k | 1 | 3m 15s |
| **Total** | 2.9k | 345.8k | 25.3M | 1.1M | | 2h 4m |

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant