Skip to content

Dev pipeline: agent writes broken 'until gh pr checks' loops to wait for CI #85

@oscarvalenzuelab

Description

@oscarvalenzuelab

Symptom

Observed live on `dev-AInvirion-ctrlrelay-79-51f35e21`. Agent finished the code, pushed the branch, opened PR #84, then tried to wait for CI before returning "done" to the orchestrator. It wrote this (truncated) shell loop:

```bash
until gh pr checks 84 --repo AInvirion/ctrlrelay 2>&1
| grep -qE 'pending|queued|in_progress'
| head -1 ;
[ $? -ne 0 ] && [ "$(gh pr checks 84 ...)" ... ] ; do ... done
```

Two structural problems with that one-liner:

  1. Inverted semantics. `until CMD; do BODY; done` runs `BODY` while `CMD` returns non-zero (i.e. until success). The `grep -q 'pending|queued|in_progress'` returns 0 when it finds pending — so the loop exits when pending IS found, and keeps running when everything has gone green. Exactly backwards.

  2. Pipe-to-`head` eats the meaningful exit code. The pipeline `grep -q ... | head -1` takes `grep`'s status code, pipes its empty stdout to `head`, and `until` evaluates `head`'s exit code. That's always 0, so the loop terminates immediately in some interpretations and spins in others depending on shell. Either way, it's not waiting on CI.

Net effect on #79: CI went green within ~1 minute, but the agent's shell spun for the full `default_timeout_seconds` (1800s / 30 min). Dispatcher SIGKILL'd the child, orchestrator marked the session `failed` with "No checkpoint state returned", Telegram sent `❌ Failed on #79` even though PR #84 was sitting merge-ready.

Why it matters

Root cause

The dev pipeline's prompt to Claude asks it to "wait for CI to be green before closing the session" without telling it how. Claude then improvises a bash loop and gets it wrong. We're outsourcing shell correctness to the model under time pressure — a bad idea when the session's own timeout is the enforcement mechanism.

Proposed fix — pick one (or both)

Option A: first-class CLI helper

Ship `ctrlrelay pr wait-for-checks ` (or `ctrlrelay ci wait --pr `) that:

  • Polls `gh pr checks --json` every N seconds.
  • Treats `IN_PROGRESS` / `QUEUED` / `PENDING` as "keep waiting."
  • Exits 0 on all-pass, 1 on any fail, 2 on timeout.
  • Hard timeout (e.g. 600s / 10 min default, overridable) so it can't eat the full session budget.

Then the dev-pipeline prompt instructs Claude to call that one command and check its exit code. No bash improvisation.

Option B: use `gh pr checks --watch`

Recent `gh` versions have `gh pr checks --watch` which blocks until all checks complete, then exits with code matching pass/fail. It already does most of Option A. Downside: no built-in hard timeout — you'd wrap it with `timeout 600 gh pr checks --watch`.

Prompt fix either way

Update the dev pipeline's prompt (wherever `run_dev_issue` constructs the session prompt — likely in `src/ctrlrelay/pipelines/dev.py`) to explicitly say:

To wait for CI, run ``. Do NOT write bash `until` / `while` loops by hand.

Severity

Medium. Affects any session that hits this specific waiting pattern. Right now the incidence rate is maybe 50% of dev-pipeline sessions (I saw it on #79 after not seeing it on earlier ones — probably model RNG). As dev-pipeline adoption grows, this will become the top source of misleading `❌ Failed` notifications.

Related

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions