Skip to content

ThunderAgent-org/ThunderAgent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

120 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ThunderAgent

Fast, simple and program-aware agentic inference system.

| Wiki | Documentation | Blog | Paper |


About

ThunderAgent is a fast and easy-to-use library for agentic inference and rollout.

ThunderAgent is fast with:

  • Agentic program-aware scheduler that increases KV-cache hit rate and reduces memory imbalance across nodes, increasing agentic inference throughputs 1.5-3.6x across multiple agentic workflows.
  • Tool-call lifecycle management with automatic resource reclaim for more stable and reliable long-running rollouts

ThunderAgent is flexible and easy to use with:

  • OpenAI-compatible API passthrough with only one changing, adding Program_id to the sending API.

  • Multiple inference support for vLLM and SGLang

  • Multiple agentic RL training example like Search-R1 agent with slime and mini-swe-agent with SkyRL.

  • Real-time visualization of agentic trajectory metrics including total tokens, tool-use time, and per-program profiling.

Overview

ThunderAgent sits between agent clients and the infrastructure layer as an agentic workflow scheduler. On one hand, it improves inference throughput of vLLM/SGLang across multiple GPU nodes through program-aware scheduling. On the other hand, it provides a unified tool management interface for resources like Docker containers and remote APIs.

ThunderAgent Architecture

Inference & Evaluation Results

ThunderAgent improves vLLM throughput by 1.5–3.6× across diverse agentic workloads including SWE-Agent, OpenHands, and ToolOrchestra.

Inference Pipeline Results

Getting Started

Install ThunderAgent from source:

git clone git@github.com:HaoKang-Timmy/ThunderAgent.git
cd ThunderAgent
pip install -e .

How to use? Choose one backend you like, for example vllm.

uv pip install vllm --torch-backend=auto # install vllm

vllm serve Qwen/Qwen3-32B --port 8000 # serve a model

thunderagent --backend-type vllm --backends http://localhost:8000 --port 9000 --metrics --profile # launch ThunderAgent, make sure to send request through 9000.

How to embed with your own agentic workflow?

# original openai sender
openai.client.chat.completions.create(
            model=self.config.model_name,
            messages=messages,
          )
# ThunderAgent openai sender
extra_body = {}
extra_body["program_id"] = "unique_id"
# if you use docker for your agentic workflow
# extra_body["docker_ids"] = ["docker_id1", "docker_id2", ...]
openai.client.chat.completions.create(
            model=self.config.model_name,
            messages=messages,
            extra_body = extra_body
          )

Contributing

We welcome and value any contributions and collaborations. Please create a pull request.

Citation

If you use ThunderAgent for your research, please cite our paper:

Contact Us

For enterprises interested in adopting or deploying ThunderAgent at scale, including technical consulting, sponsorship opportunities, or partnership inquiries, please contact us at hkang342@gatech.edu or Simran@together.ai

About

A simple, fast and robust program-aware agentic inference system.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages