AgentHub is the LLM API Hub for the Agent era, built for high-precision autonomous agents.
📢 Follow us on X: or join our Discord Community
-
🔗 Unified: A consistent and intuitive interface for developing agents across different LLMs.
-
🎯 Precise: Automatically handles interleaved thinking during multi-step tool calls, preventing performance degradation.
-
🧭 Traceable: Provides lightweight yet fine-grained tracing for debugging and auditing LLM executions.
Switch different LLMs with zero code changes and no performance loss.
Audit LLM executions by adding a single trace_id parameter, no database required.
agenthub.mp4
| Model Name | Vendor | Reasoning | Tool Use | Image Understanding | Image Generation | Speech Generation |
|---|---|---|---|---|---|---|
| Gemini 3/3.1 | Official/Google Vertex AI | ✅ | ✅ | ✅ | ✅ | ✅ |
| Claude 4.6 | Official/Amazon Bedrock/UModelVerse | ✅ | ✅ | ✅ | ❌ | ❌ |
| GPT-5.4/5.5 | Official/UModelVerse | ✅ | ✅ | ✅ | ❌ | ❌ |
| Kimi-K2.5 | Official/OpenRouter/SiliconFlow | ✅ | ✅ | ✅ | ❌ | ❌ |
| GLM-5 | Official/OpenRouter/SiliconFlow | ✅ | ✅ | ❌ | ❌ | ❌ |
| Qwen3 | OpenRouter/SiliconFlow/vLLM | ✅ | ✅ | ❌ | ❌ | ❌ |
Install from PyPI:
uv add agenthub-python
# or
pip install agenthub-pythonBuild from source:
cd src_py && makeSee src_py/README.md for comprehensive usage examples and API documentation.
Install from npm:
npm install @prismshadow/agenthubBuild from source:
cd src_ts && make install && make buildSee src_ts/README.md for comprehensive usage examples and API documentation.
AutoLLMClient is the main class for interacting with the AgentHub SDK. It provides the following methods:
(async) streaming_response(messages, config): Streams the response of LLMs in a stateless manner.(async) streaming_response_stateful(message, config): Streams the response of LLMs in a stateful manner.clear_history(): Clears the history of the stateful LLM client.get_history(): Returns the history of the stateful LLM client.set_history(history): Replaces the history of the stateful LLM client with a copy of the provided list.
Note
We recommend using the stateful interface when calling the AgentHub SDK.
Python Example:
import asyncio
import os
from agenthub import AutoLLMClient
os.environ["OPENAI_API_KEY"] = "your-openai-api-key"
async def main():
client = AutoLLMClient(model="gpt-5.5")
async for event in client.streaming_response_stateful(
message={
"role": "user",
"content_items": [{"type": "text", "text": "Say 'Hello, World!'"}]
},
config={"temperature": 1.0}
):
print(event)
asyncio.run(main())
# {'role': 'assistant', 'event_type': 'delta', 'content_items': [{'type': 'text', 'text': 'Hello'}], 'usage_metadata': None, 'finish_reason': None}
# {'role': 'assistant', 'event_type': 'delta', 'content_items': [{'type': 'text', 'text': ','}], 'usage_metadata': None, 'finish_reason': None}
# {'role': 'assistant', 'event_type': 'delta', 'content_items': [{'type': 'text', 'text': ' World'}], 'usage_metadata': None, 'finish_reason': None}
# {'role': 'assistant', 'event_type': 'delta', 'content_items': [{'type': 'text', 'text': '!'}], 'usage_metadata': None, 'finish_reason': None}
# {'role': 'assistant', 'event_type': 'stop', 'content_items': [], 'usage_metadata': {'cached_tokens': 0, 'prompt_tokens': 12, 'thoughts_tokens': 0, 'response_tokens': 8}, 'finish_reason': 'stop'}TypeScript Example:
import { AutoLLMClient } from "@prismshadow/agenthub";
process.env.OPENAI_API_KEY = "your-openai-api-key";
async function main() {
const client = new AutoLLMClient({ model: "gpt-5.5" });
for await (const event of client.streamingResponseStateful({
message: {
role: "user",
content_items: [{ type: "text", text: "Say 'Hello, World!'" }]
},
config: {}
})) {
console.log(event);
}
}
main().catch(console.error);
// {'role': 'assistant', 'event_type': 'delta', 'content_items': [{'type': 'text', 'text': 'Hello'}], 'usage_metadata': null, 'finish_reason': null}
// {'role': 'assistant', 'event_type': 'delta', 'content_items': [{'type': 'text', 'text': ','}], 'usage_metadata': null, 'finish_reason': null}
// {'role': 'assistant', 'event_type': 'delta', 'content_items': [{'type': 'text', 'text': ' World'}], 'usage_metadata': null, 'finish_reason': null}
// {'role': 'assistant', 'event_type': 'delta', 'content_items': [{'type': 'text', 'text': '!'}], 'usage_metadata': null, 'finish_reason': null}
// {'role': 'assistant', 'event_type': 'stop', 'content_items': [], 'usage_metadata': {'cached_tokens': 0, 'prompt_tokens': 12, 'thoughts_tokens': 0, 'response_tokens': 8}, 'finish_reason': 'stop'}Python Example
import asyncio
import os
from agenthub import AutoLLMClient
os.environ["ANTHROPIC_API_KEY"] = "your-anthropic-api-key"
async def main():
client = AutoLLMClient(model="claude-sonnet-4-6")
async for event in client.streaming_response_stateful(
message={
"role": "user",
"content_items": [{"type": "text", "text": "Say 'Hello, World!'"}]
},
config={}
):
print(event)
asyncio.run(main())TypeScript Example
import { AutoLLMClient } from "@prismshadow/agenthub";
process.env.ANTHROPIC_API_KEY = "your-anthropic-api-key";
async function main() {
const client = new AutoLLMClient({ model: "claude-sonnet-4-6" });
for await (const event of client.streamingResponseStateful({
message: {
role: "user",
content_items: [{"type": "text", "text": "Say 'Hello, World!'"}]
},
config: {}
})) {
console.log(event);
}
}
main().catch(console.error);Python Example
import asyncio
import os
from agenthub import AutoLLMClient
os.environ["GLM_API_KEY"] = "your-openrouter-api-key"
os.environ["GLM_BASE_URL"] = "https://openrouter.ai/api/v1"
async def main():
client = AutoLLMClient(model="z-ai/glm-5")
async for event in client.streaming_response_stateful(
message={
"role": "user",
"content_items": [{"type": "text", "text": "Say 'Hello, World!'"}]
},
config={}
):
print(event)
asyncio.run(main())TypeScript Example
import { AutoLLMClient } from "@prismshadow/agenthub";
process.env.GLM_API_KEY = "your-openrouter-api-key";
process.env.GLM_BASE_URL = "https://openrouter.ai/api/v1";
async function main() {
const client = new AutoLLMClient({ model: "z-ai/glm-5" });
for await (const event of client.streamingResponseStateful({
message: {
role: "user",
content_items: [{"type": "text", "text": "Say 'Hello, World!'"}]
},
config: {}
})) {
console.log(event);
}
}
main().catch(console.error);Python Example
import asyncio
import os
from agenthub import AutoLLMClient
os.environ["QWEN3_API_KEY"] = "your-siliconflow-api-key"
os.environ["QWEN3_BASE_URL"] = "https://api.siliconflow.cn/v1"
async def main():
client = AutoLLMClient(model="Qwen/Qwen3-8B")
async for event in client.streaming_response_stateful(
message={
"role": "user",
"content_items": [{"type": "text", "text": "Say 'Hello, World!'"}]
},
config={}
):
print(event)
asyncio.run(main())TypeScript Example
import { AutoLLMClient } from "@prismshadow/agenthub";
process.env.QWEN3_API_KEY = "your-siliconflow-api-key";
process.env.QWEN3_BASE_URL = "https://api.siliconflow.cn/v1";
async function main() {
const client = new AutoLLMClient({ model: "Qwen/Qwen3-8B" });
for await (const event of client.streamingResponseStateful({
message: {
role: "user",
content_items: [{ type: "text", text: "Say 'Hello, World!'" }],
},
config: {}
})) {
console.log(event);
}
}
main().catch(console.error);UniConfig is an object that contains the configuration for LLMs.
Example UniConfig:
{
"max_tokens": 1024,
"temperature": 1.0,
"tools": [
{
"name": "get_current_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA"
}
},
"required": ["location"]
}
}
],
"thinking_summary": true,
"thinking_level": "none | low | medium | high",
"tool_choice": "auto | required | none",
"system_prompt": "You are a helpful assistant.",
"prompt_caching": "enable | disable | enhance",
"image_config": {"aspect_ratio": "4:3", "image_size": "1K"},
"tts_config": [{"voice": "Kore"}],
"trace_id": null
}UniMessage is an object that contains the input for LLMs.
Example UniMessage:
{
"role": "user | assistant",
"content_items": [
{"type": "text", "text": "How are you doing?"},
{"type": "image_url", "image_url": "https://example.com/image.jpg"},
{"type": "inline_data", "mime_type": "image/jpeg", "data": "base64-encoded-image"},
{"type": "thinking", "thinking": "I am thinking.", "signature": "0x123456"},
{"type": "inline_thinking", "mime_type": "image/jpeg", "data": "base64-encoded-image"},
{"type": "tool_call", "name": "math", "arguments": {"expression": "2 + 3"}, "tool_call_id": "123"},
{"type": "tool_result", "text": "2 + 3 = 5", "images": [], "tool_call_id": "123"}
]
}UniEvent is an object that contains streaming output of LLMs.
Example UniEvent:
{
"role": "assistant",
"event_type": "delta",
"content_items": [
{"type": "partial_tool_call", "name": "math", "arguments": "", "tool_call_id": "123"}
],
"usage_metadata": {
"cached_tokens": null,
"prompt_tokens": 10,
"thoughts_tokens": null,
"response_tokens": 1
},
"finish_reason": null,
"created_at": 1694502400000
}AgentHub provides detailed token usage information through the usage_metadata field in streaming events.
The usage_metadata object contains four fields:
cached_tokens: Cached input tokensprompt_tokens: Non-cached input tokensthoughts_tokens: Chain-of-thought output tokensresponse_tokens: Non-chain-of-thought output tokens
You can calculate the total token usage as follows:
input_tokens = cached_tokens + prompt_tokensoutput_tokens = thoughts_tokens + response_tokenstotal_tokens = input_tokens + output_tokens
█████████████ ░░░░░░░░░░░░░ → LLM → ███████████████ ░░░░░░░░░░░░░░░
cached_tokens prompt_tokens thoughts_tokens response_tokens
input_tokens output_tokens
We provide a tracer to help you monitor and debug your LLM executions. You can enable tracing by setting the trace_id parameter to a unique identifier in the config object.
async for event in client.streaming_response_stateful(
message={
"role": "user",
"content_items": [{"type": "text", "text": "Say 'Hello, World!'"}]
},
config={"trace_id": "unique-trace-id"}
):
print(event)cd src_py && uv run python -m agenthub.integration.tracer --host 127.0.0.1 --port 25750cd src_ts && npm run tracerThen you can view the tracing output in the dashboard at http://localhost:25750/.
We provide a LLM playground to help you test your LLMs.
cd src_py && uv run python -m agenthub.integration.playground --host 127.0.0.1 --port 25751cd src_ts && npm run playgroundYou can access the playground at http://localhost:25751/.
Licensed under the Apache License, Version 2.0. See LICENSE for details.



