Reframe stuck-loop failures as graceful capability-gap handoffs#10
Merged
Conversation
When the bounded replan loop stops because the agent is structurally stuck — the user asked for something that needs a capability fcli does not have, a path that does not exist, or the planner spun without progress — the chat surface now shows a calm "here's what's blocking this + options" message instead of a raw error suffix. Options include a constrained retry, "report it so it can be fixed" (writes a structured gap report + a pre-filled GitHub issue link), and stop. In the REPL, choosing the retry resumes work in place. The underlying failure is hidden from the chat UI, not discarded: it stays in execution_results and is recorded to the trace + NDJSON event log via a new capability_gap event. - models/orchestration.py: CapabilityGapKind/Option/Report/Handoff + OrchestrationResult.gap_handoff - services/gap_handoff.py: classify failure -> build handoff, issue URL, write_gap_report - orchestrator.py: build the handoff at finalize, swap the message, emit EVENT_CAPABILITY_GAP (only for FATAL_EXECUTION_FAILURE / NO_PROGRESS; soft no-progress with changes is unaffected) - cli.py: prompt the handoff options, submit reports, resume on retry; one-shot runs surface options + report link as a notice - 12 new tests (gap builder, orchestrator reframe, CLI dispatch); 417 pass Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The gap handoff now asks the provider to phrase the user-facing message in natural language, while the options and structured report stay deterministic. If the model call fails, or returns output that doesn't look like plain prose (JSON, a plan object, fenced code, empty), it falls back to the heuristic template — phrasing can never crash the handoff it decorates. - gap_handoff.py: GapMessagePhraser type, make_provider_phraser(provider), _build_phrasing_prompt (TEXT completion), _sanitize_phrased_message (rejects JSON/plan-like output, collapses whitespace, caps at 400 chars) - build_gap_handoff gains an optional phraser; orchestrator wires its provider - 7 new tests (override, fallback, sanitize, JSON-reject, provider-error, truncate, end-to-end model-phrased orchestrator turn); 424 pass Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A wholly empty chat completion (no content, no thinking — typically a cold model load or worker glitch on Ollama Cloud, seen as eval_count=1) was raised as a non-retryable INVALID_RESPONSE, so a single transient empty killed the whole turn. Mark that case retryable so the existing retry loop re-sends the prompt (up to max_attempts) and usually recovers. Scoped to the empty case only: the "thinking tokens but no JSON" case stays non-retryable (re-sending reproduces it), and done_reason=length truncation stays a TRUNCATED signal (the plan should shrink, not retry blindly). - 2 tests: empty-then-valid recovers in 2 attempts; persistent empty gives up after max_attempts. 426 pass, ruff clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Handle malformed deferred file-write hints, repeated read-only loops, gh api raw-flag plans, and stale command-error final messages. Also tightens live-turn status typing and documents the fixes. Verification: ./scripts/uv run ruff check src tests; ./scripts/uv run ruff format --check src tests; ./scripts/uv run mypy; ./scripts/uv run pytest; ./scripts/uv run foundation doctor
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What & why
When the bounded replan loop stops because the agent is structurally stuck — the user asked for something needing a capability fcli doesn't have, a path that doesn't exist, or the planner spun without progress — the chat surface now shows a calm "here's what's blocking this + options" message instead of a raw error suffix.
The motivating case (maintainer's words): when you ask for something impossible — a missing input/capability, or an outcome path this young project never wired up — and the model keeps trying, fcli should say what's happening and offer "we can do X / report it to fix it" without it reading as an error or broken loop.
Behavior
Triggers on
FATAL_EXECUTION_FAILUREandNO_PROGRESSonly (soft no-progress with cumulative changes keeps its existing soft-completion suffix). The user sees, e.g.:gap-<id>.jsonunder<state_dir>/gaps/and prints a pre-filled GitHub issue link to file it.The failure is hidden from the chat UI, not discarded — it stays in
execution_resultsand is recorded to the trace + NDJSON event log via a newcapability_gapevent.Message phrasing (hybrid)
The user-facing message is model-phrased: the provider rewrites the explanation in natural language, while the options and structured report stay deterministic. If the model call fails or returns non-prose (JSON, a plan object, fenced code, empty), it falls back to the heuristic template — phrasing can never crash the handoff. Phrasing only runs when a handoff is actually built (no extra calls on normal turns).
Changes
models/orchestration.py—CapabilityGapKind/Option/Report/Handoff+OrchestrationResult.gap_handoffservices/gap_handoff.py(new) — classify failure → build handoff;make_provider_phraser+ prompt/sanitizer;build_issue_url,write_gap_reportorchestrator.py— build the handoff at finalize, swap the message, emitEVENT_CAPABILITY_GAP, wire the provider phrasercli.py— prompt options, submit reports, resume on retry; one-shot noticeobservability.py—EVENT_CAPABILITY_GAPTests
19 new tests (gap builder, classification, model-phrasing + sanitizer/fallback, orchestrator reframe + model-phrased turn, CLI dispatch). 424 pass, ruff clean.
🤖 Generated with Claude Code