Skip to content

fix: Complete tool call handling fixes for SDK mode and conversation history#26

Closed
saxyguy81 wants to merge 5 commits intoCaddyGlow:mainfrom
saxyguy81:fix/tool-call-streaming-and-validation
Closed

fix: Complete tool call handling fixes for SDK mode and conversation history#26
saxyguy81 wants to merge 5 commits intoCaddyGlow:mainfrom
saxyguy81:fix/tool-call-streaming-and-validation

Conversation

@saxyguy81
Copy link
Copy Markdown

@saxyguy81 saxyguy81 commented Dec 19, 2025

Real-World Use Case

I'm using ccproxy-api to route requests from the superdesign.dev VSCode plugin to my Claude Max subscription. This setup works great for simple conversations, but breaks on longer conversations that trigger tool calls, especially after the client implicitly compacts the conversation history.

Symptoms I Encountered

  1. First few messages work fine, tool calls complete successfully
  2. After multiple tool-heavy exchanges, responses start failing:
    • Tool call results missing or incomplete
    • Error: "unexpected tool_use_id found in tool_result blocks"
  3. STDIO MCP tools (filesystem, chrome-extension) were particularly affected

Root Causes Found

After deep investigation, I found 6 bugs in the tool call handling code:

  1. Tool calls only sent for SSE format, not dict format (SDK mode)
  2. Race condition: messages discarded between listener check and broadcast
  3. Race condition: worker starts before listener is registered
  4. No error handling around streaming yields
  5. Fire-and-forget cleanup loses in-flight messages
  6. Orphaned tool_result blocks after conversation compaction

Summary

This PR fixes all the above bugs affecting tool call handling in ccproxy-api, particularly when using SDK mode with STDIO MCP tools and when conversation history contains compacted/summarized messages.

Issues Fixed:

  • Tool calls not being sent in SDK mode (dict format)
  • Race conditions causing message loss with fast STDIO tools
  • Orphaned tool_result blocks causing API errors after conversation compaction

Changes

1. OpenAI Streaming Adapter (adapters/openai/streaming.py)

Bug: Tool calls were only yielded for SSE format, not dict format (SDK mode).

# Before (broken):
elif self.tool_calls and self.enable_tool_calls and self.output_format == "sse":

# After (fixed):
elif self.tool_calls and self.enable_tool_calls:
  • Removed SSE-only condition so tool calls work in SDK mode
  • Added proper tool call indexing for multiple simultaneous calls
  • Added clearing of tool_calls dict after yielding to prevent duplicates

2. SDK Streaming Race Conditions (claude_sdk/stream_worker.py, stream_handle.py)

Bug #1: Non-atomic check-and-broadcast caused message loss

  • Removed separate has_listeners() check
  • Always call broadcast() which handles atomically

Bug #2: Worker started before listener was registered

  • Split worker lifecycle into create/start phases
  • Pre-register listener before starting worker

Bug #3: Fire-and-forget cleanup lost in-flight messages

  • Added 100ms drain period before interrupt
  • Changed asyncio.create_task() to await

3. SDK Streaming Yield Guarantees (claude_sdk/streaming.py)

Bug: No error handling around yields

  • Added try-except for GeneratorExit at 5 locations
  • Added completion tracking (chunks_sent/total_chunks)
  • Added logging for interrupted blocks

4. Tool Result Sanitization (adapters/openai/adapter.py)

Bug: Orphaned tool_result blocks caused API errors when conversation history was compacted.

Error: "unexpected tool_use_id found in tool_result blocks"

  • Added _sanitize_tool_results() method
  • Removes orphaned tool_results that don't have matching tool_use in preceding message
  • Converts orphaned results to text blocks to preserve information
  • Logs warnings when sanitization occurs

Why These Bugs Occurred

  1. SDK mode uses dict format, but tool call yielding was SSE-only
  2. STDIO tools respond in <1ms, hitting race windows that slower API calls don't
  3. Conversation compaction (by clients like superdesign.dev) removes assistant messages with tool_use blocks while keeping user messages with tool_result references

Test Plan

New Test Files

  • tests/test_tool_call_streaming_fix.py - OpenAI adapter tool call tests
  • tests/test_tool_result_sanitization.py - Orphaned tool_result tests (17 test cases)

Run Tests

# Tool result sanitization (standalone)
python -m pytest tests/test_tool_result_sanitization.py --noconftest -c /dev/null -v

Manual Testing Done

  1. Started ccproxy with SDK mode on port 4000
  2. Connected superdesign.dev VSCode plugin
  3. Had extended conversations with multiple tool calls
  4. Verified no more "unexpected tool_use_id" errors
  5. Verified tool calls complete successfully in long conversations

Files Changed

File Lines Description
ccproxy/adapters/openai/streaming.py +29/-8 SSE-only fix, indexing, clearing
ccproxy/adapters/openai/adapter.py +81 Tool result sanitization
ccproxy/claude_sdk/stream_worker.py +13 Atomic broadcast
ccproxy/claude_sdk/stream_handle.py +58/-22 Listener pre-registration, drain
ccproxy/claude_sdk/streaming.py +101/-10 Yield guarantees

Breaking Changes

None. All changes are backward compatible.


🤖 Generated with Claude Code

smhanan added 5 commits December 18, 2025 23:31
Tool calls were only being yielded for SSE format, not dict format
(used by SDK mode). This caused SDK mode clients to miss tool call
responses entirely.

Changes:
- Remove SSE-only condition from tool call yielding
- Add proper tool call indexing for multiple simultaneous calls
- Clear tool_calls dict after yielding to prevent duplicates
- Add debug logging for tool call processing

🤖 Generated with [Claude Code](https://claude.ai/code)
Fixes multiple race conditions affecting fast STDIO MCP tools:

1. Message discarding race (stream_worker.py)
   - Removed separate has_listeners() check
   - Always call broadcast() which handles atomically

2. Listener setup race (stream_handle.py)
   - Split worker lifecycle into create/start phases
   - Pre-register listener before starting worker

3. Interrupt timing issues (stream_handle.py)
   - Added 100ms drain period before interrupt
   - Changed asyncio.create_task() to await

These race conditions were particularly problematic for STDIO tools
(filesystem, chrome-extension) which respond in <1ms, hitting the
race windows that slower API calls don't.

🤖 Generated with [Claude Code](https://claude.ai/code)
Adds error handling and completion tracking around yield statements
in the SDK streaming processor:

- Added try-except for GeneratorExit at 5 locations
- Added completion tracking (chunks_sent/total_chunks)
- Added logging for interrupted blocks with completion ratio
- Ensures partial block delivery is tracked and logged

This helps diagnose issues when clients disconnect mid-stream
and ensures proper cleanup of streaming resources.

🤖 Generated with [Claude Code](https://claude.ai/code)
Adds _sanitize_tool_results() to handle cases where conversation
history is compacted and tool_result blocks become orphaned (their
matching tool_use blocks are removed).

The Anthropic API requires each tool_result to have a matching
tool_use in the immediately preceding assistant message. When
clients compact history, this invariant can be violated.

Fix:
- Scan for valid tool_use IDs in preceding assistant message
- Remove orphaned tool_result blocks that don't match
- Convert orphaned results to text blocks to preserve information
- Log warnings when sanitization occurs

Error fixed:
"unexpected tool_use_id found in tool_result blocks"

🤖 Generated with [Claude Code](https://claude.ai/code)
Adds comprehensive test coverage for the tool call fixes:

1. test_tool_call_streaming_fix.py
   - Tests tool calls work in dict format (SDK mode)
   - Tests multiple tool calls are properly indexed
   - Tests SSE format still works (no regression)
   - Tests complex tool call arguments

2. test_tool_result_sanitization.py (17 test cases)
   - Tests valid tool_results are preserved
   - Tests orphaned tool_results are removed
   - Tests mixed valid/orphaned scenarios
   - Tests conversation compaction scenario
   - Tests edge cases (empty, complex content, etc.)

🤖 Generated with [Claude Code](https://claude.ai/code)
@CaddyGlow
Copy link
Copy Markdown
Owner

Hello, did you try v0.2 first? https://github.com/CaddyGlow/ccproxy-api/tree/dev/v0.2 I have to switch main to this version but didn't get proper time to finish cleaning up the log output. It should be ready to use though. I put a lot of effort into improving the handling of transformations between APIs. I'll happily apply any fixes to this branch.

uvx --with "ccproxy-api[all]==0.2.0" ccproxy serve --port 8000

@saxyguy81
Copy link
Copy Markdown
Author

This PR has been superseded by a new PR targeting dev/v0.2 branch. Please see #31 for the updated implementation that works with the dev/v0.2 architecture.

@CaddyGlow CaddyGlow force-pushed the main branch 2 times, most recently from c982f22 to e8d882b Compare January 4, 2026 23:41
@CaddyGlow
Copy link
Copy Markdown
Owner

Fix in v0.2.0

@CaddyGlow CaddyGlow closed this Jan 5, 2026
@saxyguy81
Copy link
Copy Markdown
Author

Superseded by the dev/v0.2 PR: #31 (merged).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants