Skip to content

refactor(llm): raise_on_error mode + migrate fail-loud sites (A-12 tranche 2c)#70

Merged
AVADSA25 merged 1 commit into
mainfrom
fix/pr3-a12-tranche2c-raise-mode
May 22, 2026
Merged

refactor(llm): raise_on_error mode + migrate fail-loud sites (A-12 tranche 2c)#70
AVADSA25 merged 1 commit into
mainfrom
fix/pr3-a12-tranche2c-raise-mode

Conversation

@AVADSA25
Copy link
Copy Markdown
Owner

Summary

PR-3E-2c. Adds the raise-on-failure contract that tranche 2 deferred, then migrates the 4 sites that must fail loud. Design-first → docs/PR3E2C-RAISE-MODE-DESIGN.md.

New

  • codec_llm.LLMError + codec_llm.call(raise_on_error=True) — when True, raises LLMError on every non-success path (non-200 after retries, request exception after retries, or a 200 with empty/unparseable content). Default False keeps the never-raise → "" contract, so the streaming/best-effort callers (codec.py, qwen_call, compaction, dictate) are untouched — pinned by a regression-guard test.

Migrated (the 4 fail-loud sites)

  • codec_textassist.call_qwen — fixes a real bug: on LLM failure the never-raise path would pbcopy "" + ⌘V, pasting empty over the user's selection + "Text replaced!". Now the caller's except shows the Error overlay (also on empty-200). ### FINAL ANSWER: strip stays at the call site; <think> strip now handled by codec_llm.
  • scripts/regen_skill_descriptions._llm — fail-loud preserved (LLMError propagates like the old raise_for_status; empty-200 now raises instead of writing an empty description).
  • codec_agent_plan._qwen_chat + codec_agent_runner._qwen_chatcall(raise_on_error=True) behind a thin adapter that maps LLMError → their public QwenUnavailableError, so the daemon's except QwenUnavailableError retry/abort/resume logic is unchanged. Added a parallel _qwen_base() resolver (call-time config). They also gain <think> strip + enable_thinking=False → more robust downstream JSON parsing.

Behavior deltas (documented)

  • All 4: empty-200 now raises (was: empty paste / empty desc / parse-"") — strict improvement, fail-loud is the intent.
  • agent_plan/runner: exception message changes but the type QwenUnavailableError is preserved (adapter) — daemon logic unaffected.
  • No added retries for the agents (retries=1 default = single attempt, matching their old single POST).

Test plan

  • tests/test_llm_raise_mode.py — 14 tests: raise-mode success / non-200 / exception / empty-200; default-still-never-raises regression guard; agent adapters map LLMErrorQwenUnavailableError (asserts the wrapped message) + pass content through on success; source invariants (4 sites call codec_llm.call(, inline POST / raise_for_status gone).
  • 109 agent tests (test_agent_plan / test_agent_runner / test_chat_plan_persistence) still green — the QwenUnavailableError contract holds.
  • Full suite: 1423 passed, 23 known-baseline failures, zero new, 74 skipped.
  • Ruff: codec_llm 0 errors; per-file F-delta vs origin/main = 0 on all changed files (pre-existing debt untouched).
  • No skills/ touched → no manifest regen.
  • Manual (Mac Studio): a textassist proofread with the LLM down shows the Error overlay (no empty paste); an agent plan/run surfaces QwenUnavailableError when Qwen is down.

🤖 Generated with Claude Code

…s (A-12 tranche 2c)

PR-3E-2c. Adds the raise-on-failure contract that tranche 2 deferred, then
migrates the 4 sites that MUST fail loud.

New: codec_llm.LLMError + codec_llm.call(raise_on_error=True). When True, call()
raises LLMError on EVERY non-success path — non-200 (after retries), request
exception (after retries), and a 200 with empty/unparseable content. Default
stays False (never-raise -> ""), so the existing streaming/best-effort callers
(codec.py, qwen_call, compaction, dictate) are untouched — pinned by a
regression guard test.

Migrated:
- codec_textassist.call_qwen -> call(raise_on_error=True). Fixes a real bug: on
  LLM failure the never-raise path would pbcopy "" + Cmd-V, pasting EMPTY over
  the user's selection and showing "Text replaced!". Now the caller's except
  shows the Error overlay (also on empty-200). FINAL-ANSWER strip kept at the
  call site; <think> strip now handled by codec_llm.
- scripts/regen_skill_descriptions._llm -> call(raise_on_error=True). Fail-loud
  preserved (LLMError propagates like the old raise_for_status; empty-200 now
  raises instead of writing an empty description).
- codec_agent_plan._qwen_chat + codec_agent_runner._qwen_chat -> call(
  raise_on_error=True) behind a thin adapter that maps LLMError onto their
  PUBLIC QwenUnavailableError, so the daemon's `except QwenUnavailableError`
  retry/abort/resume logic is unchanged. Added a parallel _qwen_base() resolver
  (call-time config). These also gain <think> strip + enable_thinking=False ->
  more robust JSON parsing downstream.

Tests: tests/test_llm_raise_mode.py (14 — raise-mode success/non-200/exception/
empty-200, default-still-never-raises regression guard, agent adapters map to
QwenUnavailableError + pass content through, source invariants). 109 agent
tests (test_agent_plan/runner/chat_plan_persistence) still green. Full suite
1423 passing, 23 known-baseline failures, zero new. Zero net-new ruff (per-file
delta vs origin/main = 0). No skills/ touched -> no manifest regen.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@AVADSA25 AVADSA25 merged commit 1ddaa4f into main May 22, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants