feat: extract JSON from thinking preamble in ThinkingAwareOpenAILike by 82deutschmark · Pull Request #238 · PlanExeOrg/PlanExe

82deutschmark · 2026-03-10T21:23:04Z

Problem

When LM Studio's Qwen 3.5-35B runs with thinking enabled, reasoning_content contains the full thinking preamble concatenated with the final JSON answer. PR #231 correctly falls back to reasoning_content when content is empty — but then passes the raw mixed text to Pydantic's model_validate_json, which fails because it sees 12,000+ chars of thinking prose before the JSON object.

This caused CreateWBSLevel3Task to fail mid-pipeline on real runs with Qwen 3.5-35B thinking mode.

Solution

Add _extract_json_from_thinking() with a 3-strategy approach:

Direct parse — fast path for short responses where reasoning_content is already valid JSON
</think> tag — extract content after the tag (DeepSeek and some other models emit this marker to cleanly separate thinking from output)
Right-to-left scan — find the rightmost { that yields a valid JSON parse (handles Qwen 3.5-35B which does not emit </think>)

If all strategies fail, the original text is returned unchanged (existing behavior — caller handles the failure).

Testing

Tested on Qwen 3.5-35B (Q4_K_M, LM Studio 0.3.x) with thinking always-on. The function correctly extracts JSON from reasoning_content strings of 12,000–19,000 chars. Fixed CreateWBSLevel3Task failures on Batman RICO v10 pipeline run (Hartford CT variant).

Compatibility

Models without thinking tokens: reasoning_content is not present → code path unchanged
Models that emit </think>: strategy 2 handles them efficiently
Models that don't emit </think> (Qwen 3.5-35B via LM Studio): strategy 3 handles them
If reasoning_content is already valid JSON (short non-thinking responses): strategy 1 returns immediately

When LM Studio's Qwen 3.5-35B runs with thinking enabled, reasoning_content contains the full thinking preamble concatenated with the final JSON answer. Passing this raw text to Pydantic's model_validate_json fails because it sees thinking prose + JSON rather than just JSON. Add _extract_json_from_thinking() with a 3-strategy approach: 1. Direct parse — fast path for responses where reasoning_content is already JSON 2. </think> tag — extract content after the tag (some models emit this marker) 3. Right-to-left scan — find the rightmost '{' that yields a valid JSON parse The extracted JSON (or original text on failure) replaces reasoning_content in the ChatResponse, so downstream Pydantic validation works correctly. Tested on Qwen 3.5-35B (Q4_K_M, LM Studio 0.3.x) with thinking always-on: fixed CreateWBSLevel3Task failures in real pipeline runs where the task was receiving 12,000+ char reasoning_content with JSON buried at the end.

Bring in latest main changes including usage metrics (#110), LLMChatError traceability (#237), ThinkingAwareOpenAILike (#238), pipeline versioning, plan resume improvements, and error classification in usage metrics. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

neoneye merged commit e1c8d38 into PlanExeOrg:main Mar 10, 2026
3 checks passed

neoneye deleted the feat/extract-json-from-thinking branch March 10, 2026 21:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: extract JSON from thinking preamble in ThinkingAwareOpenAILike#238

feat: extract JSON from thinking preamble in ThinkingAwareOpenAILike#238
neoneye merged 1 commit intoPlanExeOrg:mainfrom
VoynichLabs:feat/extract-json-from-thinking

82deutschmark commented Mar 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

82deutschmark commented Mar 10, 2026

Problem

Solution

Testing

Compatibility

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants