refactor(runner): shim-owned retry policy (closes PR #87 no_output bypass + moonshot gap)#91
Merged
Merged
Conversation
…pass) PR #87 added a lineage-dispatch escape valve in isRetryableErrorKind for opencode null-with-no-kind. The chorus self-audit (6 request-changes / 2 approves) found two real problems: 1. THE BUG (antigravity + gemini, HIGH): The COMMON opencode-go failure is empty-stdout exit with eventCount=0. The runner's finally block synthesises kind='no_output' for that case — and no_output is universally terminal — so the new opencode- null retry was UNREACHABLE for the bug it shipped to fix. It only helped the rare case where opencode emits some events but no usable content. 2. SLIPPERY SLOPE (every reviewer): Central classifier with a lineage exception is a one-off escape valve that grows into a table. Moonshot (also opencode-go) was already an unprotected gap (codex flagged via isOpenCodeFamily idiom). Fix: move the retry-on-null-kind and retry-on-no-output policies out of the central classifier and into the shim layer. Each AgentShim declares its own `retryPolicy: { extraKinds?, onNullKind?, onNoOutput? }` block. - opencode shim: opts into both flags (closes the no_output bypass) - kimi shim (moonshot): same opts-in (closes the moonshot gap) - codex/claude/gemini/grok/antigravity/openrouter shims: unchanged — fall through to the universal transient list as before isRetryableErrorKind signature changes from (kind, lineage?) to (kind, shim?). Universal terminal/transient layer unchanged; shim policies layer on top. Universal terminal kinds (quota_exhausted, token_refresh_lost, opencode_db_corrupt) still beat shim opt-ins — explicit terminal kinds cannot be overridden by a shim. Tests: 978 -> 982 passing. Test cases now use tiny shim stubs instead of lineage strings, matching the production call shape. Out of scope (separate follow-ups flagged in audit): - Telemetry: record retry-succeeded rows so the policy's effectiveness is measurable. Universal across reviewers. - Verdict-parse-failure false positive (codex finding #5): opencode-cli-7 traced this and found non-empty content always returns a real verdict, so apology-shaped responses don't reach the retry path. Closed as non-issue.
…logy shapes PR #91 audit catch (multiple reviewers, MEDIUM-HIGH severity): when a reviewer writes non-empty prose with no approve/reject keyword, verdictFromReviewerText returns null. Pre-fix, lastError.kind stayed undefined → the new opencode shim's onNullKind=true policy fired a retry that wasted tokens on the same prose shape. This isn't the transport flake retry was designed for — it's a successful model run that just didn't match the verdict heuristic (apology, refusal, off-topic continuation). Classify it as `verdict_ambiguous` (not in any retry list) so the driver advances the chain cleanly. Guarded on `!lastError.kind` so a real classified failure earlier in the run (e.g. stream_failure with the finally block writing a REVIEWER FAILED summary to answer.md) survives the verdict check. Closes my own PR description claim (PR #91) that opencode-cli-7 had closed this — that earlier trace was incomplete. The audit correctly re-traced it and found the path IS reachable. Tests: 983 -> 984 (+1 contract test for verdict_ambiguous + 1 classifier test confirming it's terminal even on opencode shim).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Follow-up to the PR #87 chorus self-audit (6 request-changes / 2 approves).
What the audit caught
THE BUG (antigravity + gemini, HIGH severity): the COMMON opencode-go failure mode is empty-stdout exit with `eventCount === 0`. The runner's finally block synthesises `kind: 'no_output'` for that case, and `no_output` is universally terminal — so the new opencode-null retry was unreachable for the bug PR feat(runner): one-shot retry on opencode null-with-no-errorKind #87 was shipped to fix. The retry only helped the rare case where opencode emits some events but no usable content.
Slippery slope (every reviewer): a lineage exception in the central classifier is a one-off escape valve that grows. Moonshot (also opencode-go) was an unprotected gap (codex flagged via the `isOpenCodeFamily` idiom used throughout error-detector.ts).
Fix
Move retry-on-null-kind and retry-on-no-output out of the central classifier and into the shim layer. Each `AgentShim` declares its own `retryPolicy`:
```ts
readonly retryPolicy?: {
readonly extraKinds?: readonly string[];
readonly onNullKind?: boolean;
readonly onNoOutput?: boolean;
};
```
`isRetryableErrorKind` signature changes from `(kind, lineage?)` to `(kind, shim?)`. Universal terminal kinds (`quota_exhausted`, `token_refresh_lost`, `opencode_db_corrupt`) still beat shim opt-ins — explicit terminal kinds cannot be overridden.
Tests
978 → 982 passing. Test stubs use `Pick<AgentShim, 'retryPolicy'>` instead of lineage strings — matches production call shape.
Out of scope (audit follow-ups for a separate PR)
Test plan