Skip to content

feat: extract JSON from thinking preamble in ThinkingAwareOpenAILike#238

Merged
neoneye merged 1 commit intoPlanExeOrg:mainfrom
VoynichLabs:feat/extract-json-from-thinking
Mar 10, 2026
Merged

feat: extract JSON from thinking preamble in ThinkingAwareOpenAILike#238
neoneye merged 1 commit intoPlanExeOrg:mainfrom
VoynichLabs:feat/extract-json-from-thinking

Conversation

@82deutschmark
Copy link
Collaborator

Problem

When LM Studio's Qwen 3.5-35B runs with thinking enabled, reasoning_content contains the full thinking preamble concatenated with the final JSON answer. PR #231 correctly falls back to reasoning_content when content is empty — but then passes the raw mixed text to Pydantic's model_validate_json, which fails because it sees 12,000+ chars of thinking prose before the JSON object.

This caused CreateWBSLevel3Task to fail mid-pipeline on real runs with Qwen 3.5-35B thinking mode.

Solution

Add _extract_json_from_thinking() with a 3-strategy approach:

  1. Direct parse — fast path for short responses where reasoning_content is already valid JSON
  2. </think> tag — extract content after the tag (DeepSeek and some other models emit this marker to cleanly separate thinking from output)
  3. Right-to-left scan — find the rightmost { that yields a valid JSON parse (handles Qwen 3.5-35B which does not emit </think>)

If all strategies fail, the original text is returned unchanged (existing behavior — caller handles the failure).

Testing

Tested on Qwen 3.5-35B (Q4_K_M, LM Studio 0.3.x) with thinking always-on. The function correctly extracts JSON from reasoning_content strings of 12,000–19,000 chars. Fixed CreateWBSLevel3Task failures on Batman RICO v10 pipeline run (Hartford CT variant).

Compatibility

  • Models without thinking tokens: reasoning_content is not present → code path unchanged
  • Models that emit </think>: strategy 2 handles them efficiently
  • Models that don't emit </think> (Qwen 3.5-35B via LM Studio): strategy 3 handles them
  • If reasoning_content is already valid JSON (short non-thinking responses): strategy 1 returns immediately

When LM Studio's Qwen 3.5-35B runs with thinking enabled, reasoning_content
contains the full thinking preamble concatenated with the final JSON answer.
Passing this raw text to Pydantic's model_validate_json fails because it sees
thinking prose + JSON rather than just JSON.

Add _extract_json_from_thinking() with a 3-strategy approach:
1. Direct parse — fast path for responses where reasoning_content is already JSON
2. </think> tag — extract content after the tag (some models emit this marker)
3. Right-to-left scan — find the rightmost '{' that yields a valid JSON parse

The extracted JSON (or original text on failure) replaces reasoning_content
in the ChatResponse, so downstream Pydantic validation works correctly.

Tested on Qwen 3.5-35B (Q4_K_M, LM Studio 0.3.x) with thinking always-on:
fixed CreateWBSLevel3Task failures in real pipeline runs where the task was
receiving 12,000+ char reasoning_content with JSON buried at the end.
@neoneye neoneye merged commit e1c8d38 into PlanExeOrg:main Mar 10, 2026
3 checks passed
@neoneye neoneye deleted the feat/extract-json-from-thinking branch March 10, 2026 21:25
neoneye added a commit that referenced this pull request Mar 10, 2026
Bring in latest main changes including usage metrics (#110),
LLMChatError traceability (#237), ThinkingAwareOpenAILike (#238),
pipeline versioning, plan resume improvements, and error
classification in usage metrics.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants