| Wiki | Documentation | Blog | Paper |
ThunderAgent is a fast and easy-to-use library for agentic inference and rollout.
ThunderAgent is fast with:
- Agentic program-aware scheduler that increases KV-cache hit rate and reduces memory imbalance across nodes, increasing agentic inference throughputs 1.5-3.6x across multiple agentic workflows.
- Tool-call lifecycle management with automatic resource reclaim for more stable and reliable long-running rollouts
ThunderAgent is flexible and easy to use with:
-
OpenAI-compatible API passthrough with only one changing, adding
Program_idto the sending API. -
Multiple agentic RL training example like Search-R1 agent with slime and mini-swe-agent with SkyRL.
-
Real-time visualization of agentic trajectory metrics including total tokens, tool-use time, and per-program profiling.
ThunderAgent sits between agent clients and the infrastructure layer as an agentic workflow scheduler. On one hand, it improves inference throughput of vLLM/SGLang across multiple GPU nodes through program-aware scheduling. On the other hand, it provides a unified tool management interface for resources like Docker containers and remote APIs.
ThunderAgent improves vLLM throughput by 1.5–3.6× across diverse agentic workloads including SWE-Agent, OpenHands, and ToolOrchestra.
Install ThunderAgent from source:
git clone git@github.com:HaoKang-Timmy/ThunderAgent.git
cd ThunderAgent
pip install -e .How to use? Choose one backend you like, for example vllm.
uv pip install vllm --torch-backend=auto # install vllm
vllm serve Qwen/Qwen3-32B --port 8000 # serve a model
thunderagent --backend-type vllm --backends http://localhost:8000 --port 9000 --metrics --profile # launch ThunderAgent, make sure to send request through 9000.How to embed with your own agentic workflow?
# original openai sender
openai.client.chat.completions.create(
model=self.config.model_name,
messages=messages,
)
# ThunderAgent openai sender
extra_body = {}
extra_body["program_id"] = "unique_id"
# if you use docker for your agentic workflow
# extra_body["docker_ids"] = ["docker_id1", "docker_id2", ...]
openai.client.chat.completions.create(
model=self.config.model_name,
messages=messages,
extra_body = extra_body
)We welcome and value any contributions and collaborations. Please create a pull request.
If you use ThunderAgent for your research, please cite our paper:
For enterprises interested in adopting or deploying ThunderAgent at scale, including technical consulting, sponsorship opportunities, or partnership inquiries, please contact us at hkang342@gatech.edu or Simran@together.ai


