DatabricksAdapter yields text content during tool-calling rounds, causing response duplication
Problem
When using DatabricksAdapter with Claude models via Databricks Model Serving (OpenAI-compatible endpoint), the model produces substantial text content alongside tool calls in the same response turn. The adapter's streamCompletion() method yields all delta.content text as message_delta events regardless of whether the same response also contains tool calls.
In a multi-step ReAct loop, this causes the model's intermediate text (which often includes the full answer) to be emitted and displayed to the user on every tool-calling turn — resulting in the same answer appearing 3-4× in the final output.
Root Cause Analysis
Claude's native API uses a trained-in tool-use system prompt that shapes the model to produce brief commentary before tool_use blocks. The Databricks OpenAI-compatible translation layer does not include this trained-in prompt, so Claude produces more verbose text during tool-calling rounds.
The adapter loop (run() method, line ~196) correctly breaks when no tool calls are present, but during tool-calling rounds, streamCompletion() yields ALL delta.content as message_delta events — there's no mechanism to suppress or buffer text that arrives in a turn that also contains tool calls.
Expected Behavior
Text content from tool-calling rounds should either:
- Be suppressed (not yielded as
message_delta) when the same response also contains tool_calls
- Be buffered until
streamCompletion returns, and only yielded if no tool calls were detected
- Be marked with metadata so consumers can distinguish intermediate text from final-answer text
Current Workaround
We added:
- A system prompt instruction telling the model to only write brief status notes during tool-calling turns
- Client-side text buffering in the SSE transport layer that discards accumulated text when a
function_call event arrives in the same round
Environment
- AppKit version: 0.38.1
- Model:
databricks-claude-opus-4-6 via Model Serving
- Adapter:
DatabricksAdapter.fromServingEndpoint()
Reproduction
- Create an agent with
maxSteps: 10 and multiple tools (e.g., describeTable, executeSql)
- Ask a question that triggers 3+ tool calls
- Observe that the assistant's text response contains the full answer repeated once per tool-calling round
DatabricksAdapter yields text content during tool-calling rounds, causing response duplication
Problem
When using
DatabricksAdapterwith Claude models via Databricks Model Serving (OpenAI-compatible endpoint), the model produces substantial text content alongside tool calls in the same response turn. The adapter'sstreamCompletion()method yields alldelta.contenttext asmessage_deltaevents regardless of whether the same response also contains tool calls.In a multi-step ReAct loop, this causes the model's intermediate text (which often includes the full answer) to be emitted and displayed to the user on every tool-calling turn — resulting in the same answer appearing 3-4× in the final output.
Root Cause Analysis
Claude's native API uses a trained-in tool-use system prompt that shapes the model to produce brief commentary before
tool_useblocks. The Databricks OpenAI-compatible translation layer does not include this trained-in prompt, so Claude produces more verbose text during tool-calling rounds.The adapter loop (
run()method, line ~196) correctly breaks when no tool calls are present, but during tool-calling rounds,streamCompletion()yields ALLdelta.contentasmessage_deltaevents — there's no mechanism to suppress or buffer text that arrives in a turn that also contains tool calls.Expected Behavior
Text content from tool-calling rounds should either:
message_delta) when the same response also containstool_callsstreamCompletionreturns, and only yielded if no tool calls were detectedCurrent Workaround
We added:
function_callevent arrives in the same roundEnvironment
databricks-claude-opus-4-6via Model ServingDatabricksAdapter.fromServingEndpoint()Reproduction
maxSteps: 10and multiple tools (e.g.,describeTable,executeSql)