Open
Conversation
Add generate_stream method to LLM clients for streaming responses: - Add StreamChunk schema for partial response chunks (thinking, content, tool_call_delta, tool_call_complete, done) - Add generate_stream abstract method to LLMClientBase - Implement streaming in OpenAIClient via chat.completions.create(stream=True) - Implement streaming in AnthropicClient via messages.stream() - Add generate_stream to LLMClient wrapper - Buffer partial tool calls and emit tool_call_complete when JSON is complete - Add tests for streaming functionality Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…back - Add `stream` parameter to Agent (default True) enabling real-time token output in the agent loop - Refactor run() to dispatch to _run_step_stream() or _run_step_nonstream() - Extract tool call execution into _execute_tool_calls() helper - Streaming: buffers thinking/content, emits tool_call_complete when JSON parses, executes all tools after done event - Add --no-stream CLI flag to disable streaming and use generate() - All 12 streaming tests pass Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The Anthropic SDK sends: - type="text"/"thinking" as TOP-LEVEL event types (content in event.text/event.thinking) - type="content_block_delta" wraps content in block.delta.text / block.delta.thinking - type="signature" events carry no streamable content Also fix unit test mocks to match real SDK event structure. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tput - Add sys.stdout.reconfigure(line_buffering=False) in cli.py to force unbuffered output at the file descriptor level - Use sys.stdout.write() + flush() instead of print() in the streaming loop in agent.py for immediate character-by-character display - This ensures thinking and content chunks appear live rather than buffering until the stream completes Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
MiniMax API sends BOTH top-level text/thinking events AND content_block_delta events with identical content. Previously both were yielded, creating duplicate chunks. Fix: use top-level text/thinking events only (they arrive first with complete content), skip content_block_delta for text/thinking, use content_block_delta only for tool_use blocks. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
generate_stream()async generator method across all LLM clients (Anthropic, OpenAI)StreamChunkschema with chunk types:thinking,content,tool_call_delta,tool_call_complete,done_run_step_stream()in agent.py that streams thinking and content live--no-streamCLI flag to disable streamingsys.stdout.reconfigure(line_buffering=False)) for real-time terminal outputTest plan
Closes #71