feat: streaming LLM responses by TumCucTom · Pull Request #84 · MiniMax-AI/Mini-Agent

TumCucTom · 2026-04-05T23:28:41Z

Summary

Implement generate_stream() async generator method across all LLM clients (Anthropic, OpenAI)
Add StreamChunk schema with chunk types: thinking, content, tool_call_delta, tool_call_complete, done
Add _run_step_stream() in agent.py that streams thinking and content live
Add --no-stream CLI flag to disable streaming
Force unbuffered stdout (sys.stdout.reconfigure(line_buffering=False)) for real-time terminal output

Test plan

12 streaming unit tests passing
Manual testing confirms live output appears progressively

Closes #71

Add generate_stream method to LLM clients for streaming responses: - Add StreamChunk schema for partial response chunks (thinking, content, tool_call_delta, tool_call_complete, done) - Add generate_stream abstract method to LLMClientBase - Implement streaming in OpenAIClient via chat.completions.create(stream=True) - Implement streaming in AnthropicClient via messages.stream() - Add generate_stream to LLMClient wrapper - Buffer partial tool calls and emit tool_call_complete when JSON is complete - Add tests for streaming functionality Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…back - Add `stream` parameter to Agent (default True) enabling real-time token output in the agent loop - Refactor run() to dispatch to _run_step_stream() or _run_step_nonstream() - Extract tool call execution into _execute_tool_calls() helper - Streaming: buffers thinking/content, emits tool_call_complete when JSON parses, executes all tools after done event - Add --no-stream CLI flag to disable streaming and use generate() - All 12 streaming tests pass Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…events

The Anthropic SDK sends: - type="text"/"thinking" as TOP-LEVEL event types (content in event.text/event.thinking) - type="content_block_delta" wraps content in block.delta.text / block.delta.thinking - type="signature" events carry no streamable content Also fix unit test mocks to match real SDK event structure. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…tput - Add sys.stdout.reconfigure(line_buffering=False) in cli.py to force unbuffered output at the file descriptor level - Use sys.stdout.write() + flush() instead of print() in the streaming loop in agent.py for immediate character-by-character display - This ensures thinking and content chunks appear live rather than buffering until the stream completes Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

MiniMax API sends BOTH top-level text/thinking events AND content_block_delta events with identical content. Previously both were yielded, creating duplicate chunks. Fix: use top-level text/thinking events only (they arrive first with complete content), skip content_block_delta for text/thinking, use content_block_delta only for tool_use blocks. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

TumCucTom and others added 10 commits April 5, 2026 18:14

fix(llm): add missing await in LLMClient.generate_stream

d1a9103

fix(agent): await generate_stream coroutine before iterating

4c05778

fix(llm): remove erroneous await in generate_stream wrapper

10d394e

fix(anthropic): remove invalid stream=True from messages.stream() call

21f5cde

fix(llm): use getattr for stop_reason/finish_reason on raw streaming …

6edce95

…events

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: streaming LLM responses#84

feat: streaming LLM responses#84
TumCucTom wants to merge 10 commits intoMiniMax-AI:mainfrom
TumCucTom:feat/streaming-llm

TumCucTom commented Apr 5, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

TumCucTom commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

TumCucTom commented Apr 5, 2026 •

edited

Loading