Dev pipeline: agent writes broken 'until gh pr checks' loops to wait for CI

## Symptom

Observed live on \`dev-AInvirion-ctrlrelay-79-51f35e21\`. Agent finished the code, pushed the branch, opened PR #84, then tried to wait for CI before returning "done" to the orchestrator. It wrote this (truncated) shell loop:

\`\`\`bash
until gh pr checks 84 --repo AInvirion/ctrlrelay 2>&1 \
        | grep -qE 'pending|queued|in_progress' \
        | head -1 ; \
      [ $? -ne 0 ] && [ "$(gh pr checks 84 ...)" ... ] ; do ... done
\`\`\`

Two structural problems with that one-liner:

1. **Inverted semantics.** \`until CMD; do BODY; done\` runs \`BODY\` while \`CMD\` returns non-zero (i.e. **until success**). The \`grep -q 'pending|queued|in_progress'\` returns 0 when it **finds** pending — so the loop exits when pending IS found, and keeps running when everything has gone green. Exactly backwards.

2. **Pipe-to-\`head\` eats the meaningful exit code.** The pipeline \`grep -q ... | head -1\` takes \`grep\`'s status code, pipes its empty stdout to \`head\`, and \`until\` evaluates \`head\`'s exit code. That's always 0, so the loop terminates immediately in some interpretations and spins in others depending on shell. Either way, it's not waiting on CI.

Net effect on #79: CI went green within ~1 minute, but the agent's shell spun for the full \`default_timeout_seconds\` (1800s / 30 min). Dispatcher SIGKILL'd the child, orchestrator marked the session \`failed\` with "No checkpoint state returned", Telegram sent \`❌ Failed on #79\` even though PR #84 was sitting merge-ready.

## Why it matters

- Eats 30 minutes of dispatcher runtime per affected session.
- Sends misleading "❌ Failed" Telegram pings for PRs that actually shipped fine — confusing for the operator and erodes trust in the notification stream.
- Compounds with #83 (PR watcher gated on session success): a timed-out session loses its merge-detection coverage entirely, so no "Issue closed" follow-up either.

## Root cause

The dev pipeline's prompt to Claude asks it to "wait for CI to be green before closing the session" without telling it *how*. Claude then improvises a bash loop and gets it wrong. We're outsourcing shell correctness to the model under time pressure — a bad idea when the session's own timeout is the enforcement mechanism.

## Proposed fix — pick one (or both)

### Option A: first-class CLI helper

Ship \`ctrlrelay pr wait-for-checks <PR>\` (or \`ctrlrelay ci wait --pr <N>\`) that:
- Polls \`gh pr checks <N> --json\` every N seconds.
- Treats \`IN_PROGRESS\` / \`QUEUED\` / \`PENDING\` as "keep waiting."
- Exits 0 on all-pass, 1 on any fail, 2 on timeout.
- Hard timeout (e.g. 600s / 10 min default, overridable) so it can't eat the full session budget.

Then the dev-pipeline prompt instructs Claude to call that one command and check its exit code. No bash improvisation.

### Option B: use \`gh pr checks --watch\`

Recent \`gh\` versions have \`gh pr checks <N> --watch\` which blocks until all checks complete, then exits with code matching pass/fail. It already does most of Option A. Downside: no built-in hard timeout — you'd wrap it with \`timeout 600 gh pr checks --watch\`.

### Prompt fix either way

Update the dev pipeline's prompt (wherever \`run_dev_issue\` constructs the session prompt — likely in \`src/ctrlrelay/pipelines/dev.py\`) to explicitly say:

> To wait for CI, run \`<the approved command>\`. Do NOT write bash \`until\` / \`while\` loops by hand.

## Severity

Medium. Affects any session that hits this specific waiting pattern. Right now the incidence rate is maybe 50% of dev-pipeline sessions (I saw it on #79 after not seeing it on earlier ones — probably model RNG). As dev-pipeline adoption grows, this will become the top source of misleading \`❌ Failed\` notifications.

## Related

- #83: post-session resume failures (composite session-id vs Claude UUID) — same net symptom (❌ Failed Telegram, session marked failed in state_db) but different entry path. Fixing one doesn't fix the other.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dev pipeline: agent writes broken 'until gh pr checks' loops to wait for CI #85

Symptom

Why it matters

Root cause

Proposed fix — pick one (or both)

Option A: first-class CLI helper

Option B: use `gh pr checks --watch`

Prompt fix either way

Severity

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Dev pipeline: agent writes broken 'until gh pr checks' loops to wait for CI #85

Description

Symptom

Why it matters

Root cause

Proposed fix — pick one (or both)

Option A: first-class CLI helper

Option B: use `gh pr checks --watch`

Prompt fix either way

Severity

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions