Streaming Communication in Multi-Agent Reasoning
Zhen Yang1 ·
Xiaogang Xu3 ·
Wen Wang3 ·
Cong Chen3 ·
Xander Xu2* ·
Ying-Cong Chen1,4*
1HKUST(GZ) · 2Alibaba Group · 3ZJU · 4HKUST
*Co-corresponding authors
pip install openai
python StreamMA.pyOr in your own script:
import asyncio
from StreamMA import (
StreamMA, RunLogger,
PROMPT_A_CHAIN, PROMPT_B_CHAIN, PROMPT_C_CHAIN,
)
config = {
"Agent_A": {"system_prompt": PROMPT_A_CHAIN, "next": ["Agent_B"]},
"Agent_B": {"system_prompt": PROMPT_B_CHAIN, "next": ["Agent_C"]},
"Agent_C": {"system_prompt": PROMPT_C_CHAIN},
}
async def main():
logger = RunLogger(input="***", logger_path="logger.json")
logger.start()
await StreamMA.run(
config, "your problem here",
api_key="***", base_url="***", model="***",
logger=logger,
)
logger.finish(config)
asyncio.run(main())The DAG is fully driven by the config dict — change the topology by editing each agent's next: [...], and pair every agent with its own system_prompt:
# Chain A → B → C
{"A": {..., "next": ["B"]}, "B": {..., "next": ["C"]}, "C": {...}}
# Tree A → {B, C}
{"A": {..., "next": ["B", "C"]}, "B": {...}, "C": {...}}
# Graph A → B → C, with shortcut A → C
{"A": {..., "next": ["B", "C"]}, "B": {..., "next": ["C"]}, "C": {...}}RunLogger records per-agent token counts, KV-cache hits, API time, and an ASCII timeline of streaming segments:
{
"summary": {
"agents": {
"Agent_A": {"segments": 1, "prefill_tokens": 190, "cached_tokens": 0, "kv_cache_hit_ratio": 0.0, "decode_tokens": 4084, "api_time": 123.30},
"Agent_B": {"segments": 4, "prefill_tokens": 30373, "cached_tokens": 10624, "kv_cache_hit_ratio": 0.3498, "decode_tokens": 10175, "api_time": 308.74},
"Agent_C": {"segments": 4, "prefill_tokens": 42755, "cached_tokens": 23040, "kv_cache_hit_ratio": 0.5389, "decode_tokens": 7197, "api_time": 221.08}
},
"agent_count": 3,
"total_prefill_tokens": 73318,
"total_cached_tokens": 33664,
"total_decode_tokens": 21456,
"total_tokens": 94774,
"speedup_analysis": {
"api_time": 653.12,
"wall_time": 376.02,
"speedup": 1.74,
"total_kv_cache_hit_ratio": 0.4592,
"critical_path_time": 653.12,
"streaming_speedup": 1.74,
"timeline": [
"[Timeline]",
" Agent Agent_A:█████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 123.3s",
" Agent Agent_B:░░░░░███████████████████████████████████████████████░░░ 308.7s",
" Agent Agent_C:░░░░░░░░░░░░░████████████░░░░░░███████████████████ 221.1s",
" ──────────────────────────────────────────────────",
" │ │ │ │ │ │ │ │ │ │ ",
" 0.0s 40.0s 98.0s 183.7s 240.6s 288.6s 353.1s ",
"",
" Legend: █ = processing, ░ = idle"
]
}
}
}If you find StreamMA useful, please cite:
@article{yang2026streamma,
title={Streaming Communication in Multi-Agent Reasoning},
author={Yang, Zhen and Xu, Xiaogang and Wang, Wen and Chen, Cong and Xu, Xander and Chen, Ying-Cong},
journal={arXiv preprint arXiv:2606.05158},
year={2026}
}