Skip to content

Reframe stuck-loop failures as graceful capability-gap handoffs#10

Merged
Anmolnoor merged 6 commits into
mainfrom
feat/capability-gap-handoff
Jun 1, 2026
Merged

Reframe stuck-loop failures as graceful capability-gap handoffs#10
Anmolnoor merged 6 commits into
mainfrom
feat/capability-gap-handoff

Conversation

@Anmolnoor
Copy link
Copy Markdown
Owner

@Anmolnoor Anmolnoor commented May 30, 2026

What & why

When the bounded replan loop stops because the agent is structurally stuck — the user asked for something needing a capability fcli doesn't have, a path that doesn't exist, or the planner spun without progress — the chat surface now shows a calm "here's what's blocking this + options" message instead of a raw error suffix.

The motivating case (maintainer's words): when you ask for something impossible — a missing input/capability, or an outcome path this young project never wired up — and the model keeps trying, fcli should say what's happening and offer "we can do X / report it to fix it" without it reading as an error or broken loop.

Behavior

Triggers on FATAL_EXECUTION_FAILURE and NO_PROGRESS only (soft no-progress with cumulative changes keeps its existing soft-completion suffix). The user sees, e.g.:

That data store isn't wired into fcli yet, so I couldn't finish — it's a gap in the tool, not anything you did.

  1. Have fcli retry using only the tools it does have
  2. Report this so it can be fixed
  3. Stop here
  • Retry → resumes work in place (queued as the next REPL turn).
  • Report → writes a structured gap-<id>.json under <state_dir>/gaps/ and prints a pre-filled GitHub issue link to file it.
  • Stop → graceful end.
  • One-shot / non-TTY runs surface the options + report link as a notice.

The failure is hidden from the chat UI, not discarded — it stays in execution_results and is recorded to the trace + NDJSON event log via a new capability_gap event.

Message phrasing (hybrid)

The user-facing message is model-phrased: the provider rewrites the explanation in natural language, while the options and structured report stay deterministic. If the model call fails or returns non-prose (JSON, a plan object, fenced code, empty), it falls back to the heuristic template — phrasing can never crash the handoff. Phrasing only runs when a handoff is actually built (no extra calls on normal turns).

Changes

  • models/orchestration.pyCapabilityGapKind/Option/Report/Handoff + OrchestrationResult.gap_handoff
  • services/gap_handoff.py (new) — classify failure → build handoff; make_provider_phraser + prompt/sanitizer; build_issue_url, write_gap_report
  • orchestrator.py — build the handoff at finalize, swap the message, emit EVENT_CAPABILITY_GAP, wire the provider phraser
  • cli.py — prompt options, submit reports, resume on retry; one-shot notice
  • observability.pyEVENT_CAPABILITY_GAP

Tests

19 new tests (gap builder, classification, model-phrasing + sanitizer/fallback, orchestrator reframe + model-phrased turn, CLI dispatch). 424 pass, ruff clean.

🤖 Generated with Claude Code

Anmolnoor and others added 6 commits May 30, 2026 16:17
When the bounded replan loop stops because the agent is structurally stuck
— the user asked for something that needs a capability fcli does not have, a
path that does not exist, or the planner spun without progress — the chat
surface now shows a calm "here's what's blocking this + options" message
instead of a raw error suffix. Options include a constrained retry, "report
it so it can be fixed" (writes a structured gap report + a pre-filled GitHub
issue link), and stop. In the REPL, choosing the retry resumes work in place.

The underlying failure is hidden from the chat UI, not discarded: it stays
in execution_results and is recorded to the trace + NDJSON event log via a
new capability_gap event.

- models/orchestration.py: CapabilityGapKind/Option/Report/Handoff +
  OrchestrationResult.gap_handoff
- services/gap_handoff.py: classify failure -> build handoff, issue URL,
  write_gap_report
- orchestrator.py: build the handoff at finalize, swap the message, emit
  EVENT_CAPABILITY_GAP (only for FATAL_EXECUTION_FAILURE / NO_PROGRESS;
  soft no-progress with changes is unaffected)
- cli.py: prompt the handoff options, submit reports, resume on retry;
  one-shot runs surface options + report link as a notice
- 12 new tests (gap builder, orchestrator reframe, CLI dispatch); 417 pass

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The gap handoff now asks the provider to phrase the user-facing message in
natural language, while the options and structured report stay deterministic.
If the model call fails, or returns output that doesn't look like plain prose
(JSON, a plan object, fenced code, empty), it falls back to the heuristic
template — phrasing can never crash the handoff it decorates.

- gap_handoff.py: GapMessagePhraser type, make_provider_phraser(provider),
  _build_phrasing_prompt (TEXT completion), _sanitize_phrased_message
  (rejects JSON/plan-like output, collapses whitespace, caps at 400 chars)
- build_gap_handoff gains an optional phraser; orchestrator wires its provider
- 7 new tests (override, fallback, sanitize, JSON-reject, provider-error,
  truncate, end-to-end model-phrased orchestrator turn); 424 pass

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A wholly empty chat completion (no content, no thinking — typically a cold
model load or worker glitch on Ollama Cloud, seen as eval_count=1) was raised
as a non-retryable INVALID_RESPONSE, so a single transient empty killed the
whole turn. Mark that case retryable so the existing retry loop re-sends the
prompt (up to max_attempts) and usually recovers.

Scoped to the empty case only: the "thinking tokens but no JSON" case stays
non-retryable (re-sending reproduces it), and done_reason=length truncation
stays a TRUNCATED signal (the plan should shrink, not retry blindly).

- 2 tests: empty-then-valid recovers in 2 attempts; persistent empty gives up
  after max_attempts. 426 pass, ruff clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Handle malformed deferred file-write hints, repeated read-only loops, gh api raw-flag plans, and stale command-error final messages. Also tightens live-turn status typing and documents the fixes.

Verification: ./scripts/uv run ruff check src tests; ./scripts/uv run ruff format --check src tests; ./scripts/uv run mypy; ./scripts/uv run pytest; ./scripts/uv run foundation doctor
@Anmolnoor Anmolnoor marked this pull request as ready for review June 1, 2026 10:46
@Anmolnoor Anmolnoor merged commit 2c7b32c into main Jun 1, 2026
1 check passed
@Anmolnoor Anmolnoor deleted the feat/capability-gap-handoff branch June 1, 2026 12:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant