A production Agent framework built around Scenarios — pre-compiled execution paths that skip the ReAct loop entirely on known intents.
Other frameworks treat every request as an open-ended reasoning problem. SwiftAgentX disagrees. In production, 80% of traffic is predictable: "check my order status", "what's your return policy", "book a slot at 3pm". For these, a ReAct loop is overkill — three to five LLM calls, several seconds of latency, a token bill that nobody can explain.
A Scenario is a pre-compiled execution path:
agent.register_scenario("order_status", ScenarioConfig(
name="Order Status",
triggers=["order", "where is my", "shipment"],
tool_chain=[
ToolChainStep(tool="order_db", query_template="$order_id"),
ToolChainStep(tool="courier_api", condition="status=in_transit"),
],
cache_ttl=120,
output_type="direct", # no second LLM call to "format" the answer
))When the LIGHT model classifies a request as a weather / order_status /
balance_check scenario, SwiftAgentX executes the chain directly —
no ReAct loop, no second LLM call. One classification step (LIGHT model,
~200 ms), one tool chain, done.
This is the framework's biggest design bet, and the place it pulls ahead of LangChain / AutoGen / CrewAI by a margin that actually matters in production.
Scenarios sit in the middle of a four-tier execution model. All numbers
below are measured against DashScope Qwen — 20 iterations per scenario,
LIGHT=qwen-flash, HEAVY=qwen-turbo (v0.3.3). Reproducible from this
repo with one command (see benchmarks/).
| Request type | Path | P50 latency | P95 | LLM calls |
|---|---|---|---|---|
| KB exact match / cache hit | Pipeline short-circuit | 0.02 ms | 0.1 ms | 0 |
| Known intent (Scenario) | Pre-compiled tool chain | 526 ms | 1.6 s | 1 (LIGHT only) |
| Open conversation | Direct LLM | 1.2 s | 2.4 s | 2–3 |
| Multi-step reasoning | Full ReAct loop | 1.8 s | 2.6 s | 2–3 |
Measured 2026-05-29 on swiftagentx 0.3.3, 20 iterations/scenario. The two cheap tiers (cache + scenario) cost 0–1 LLM calls — that's the headline. The ReAct dedup guard keeps multi-step loops short, so even the deepest tier rarely pays for more than 2–3 calls.
A LIGHT model picks the path. A HEAVY model only runs when the request genuinely needs open-ended reasoning. The two cheap tiers (cache + scenario) together cover the predictable bulk of production traffic at 0–1 LLM calls per request — that's the headline. Reproduce the numbers with:
git clone https://github.com/Caxson/swiftagentx.git
cd swiftagentx
pip install -e ".[dev,openai,benchmark]"
export DASHSCOPE_API_KEY=sk-...
python benchmarks/real_runner.py --iterations 30A Scenario is not just a static tool list. Steps in a chain can be:
- A native Python
Tool - (v0.3+) An MCP tool — any Model Context Protocol server's exposed tools, no Python wrapper required
- (v0.3+) A hook — a conditional handler that branches into an LLM call, a sub-agent dispatch, or external shell logic when the chain hits a particular state
This is how Scenarios stay fast and extensible: the routing decision is cheap, but each step can reach into the full agent toolkit when needed.
| SwiftAgentX | LangChain | AutoGen | CrewAI | |
|---|---|---|---|---|
| Pre-compiled Scenario shortcut | ✅ core differentiator | ❌ no equivalent | ❌ no equivalent | ❌ no equivalent |
| FAQ / cache-hit returns with 0 LLM calls | ✅ | 1-3 LLM calls | 2+ LLM calls | 2+ LLM calls |
| Built-in three-level cache (KB / tool / session) | ✅ | partial | ❌ | ❌ |
| Dual-model routing (LIGHT/HEAVY) baked in | ✅ | DIY | DIY | DIY |
| Pipeline stage short-circuit (KB / security / feature flags) | ✅ | DIY | ❌ | ❌ |
| Streaming with fine-grained event types | ✅ 12 types | ✅ | partial | ✅ |
Framework-agnostic core (no HTTP in core/) |
✅ | n/a | n/a | n/a |
| Test suite size | 218 tests, < 0.5 s | huge | huge | medium |
LangChain is broader. SwiftAgentX is sharper for the predictable-traffic production patterns where latency and per-request LLM cost actually move the needle.
- You ship an Agent product where most requests are predictable (customer service, order ops, FAQ, internal copilots, AI outbound) and only a small tail needs real open-ended reasoning.
- You care about P95 latency and per-request LLM cost as first-class metrics, not afterthoughts.
- You want a framework you can read in one afternoon (4k lines of source) and modify without fear.
- You're comfortable wiring tools, KBs, and scenarios in Python instead of YAML/DSL.
If you want a kitchen-sink toolkit with every integration imaginable, use LangChain. If you want a small, fast, opinionated core where Scenarios are the unit of design, read on.
| Feature | What it does | |
|---|---|---|
| 🎯 | Scenarios | Pre-compiled execution paths that skip the ReAct loop on known intents — the headline abstraction. Each chain step is a Python tool, an MCP tool, or a conditional hook. |
| 🪜 | Tiered execution | Pipeline short-circuit → Scenario → ReAct → Direct, picked per request by a LIGHT classifier. |
| ⚖️ | Dual-model routing | ModelTier.LIGHT for classification, ModelTier.HEAVY for reasoning — ~30× cost spread on real providers. |
| ⚡ | Three-level cache | KB exact match (global), tool result (per-user), session variables. Independent TTLs, periodic cleanup. |
| 🚦 | Pipeline stages | KB short-circuit, security checks, feature flags, or any custom logic before cache/route. Stages CONTINUE, SHORT_CIRCUIT, or ABORT. |
| 📚 | Knowledge base ABC | Built-in TF-IDF MemoryKnowledgeBase for local dev; bring your own (Weaviate, Elasticsearch, pgvector) via a 3-method ABC. |
| 📡 | SSE streaming | 12 event types (THINKING, ACTION, OBSERVATION, ANSWER, …) with heartbeats. |
| 🛠️ | Admin API | Status, tools, cache, config, KB endpoints as a Flask blueprint and a FastAPI router. Framework-agnostic core. |
| 🧅 | Middleware pipeline | Tracing, retries, input validation, error sanitization. Hook into any stage. |
| 🪶 | No HTTP in core | httpx is optional — run SwiftAgentX in a Lambda, a Celery worker, or a notebook. |
The v0.2.0 release hardens what's already here. v0.3+ goes after the 2026-era patterns from frameworks like Claude Code:
- MCP server support — Scenarios and ReAct can use tools from any MCP server. One-line registration.
- 4-layer Memory — Current question / last-4-turns verbatim / reference window / incremental rolling summary. Topic-change detection triggers re-summarization.
- Hook system — Lifecycle hooks (pre/post tool, pre/post classify) and semantic hooks (topic change, scenario step conditional).
- Sub-agent dispatch — From inside ReAct or a Scenario step, spawn a focused sub-agent with isolated context. Parallel dispatch supported.
- Skill-in-ReAct — Markdown-defined workflows the ReAct loop can pull in on demand (different from Scenarios, which are pre-compiled and fast).
- Worktree-style workspace — File sandbox per session for agents that generate documents.
- Cache-friendly prompt order — Anthropic / OpenAI prompt cache optimization wired into the framework.
- Lazy tool loading — When a registry grows past a threshold, LIGHT model picks the relevant category before HEAVY sees schemas.
pip install swiftagentxWith optional dependencies:
pip install swiftagentx[openai] # httpx for async OpenAI-compatible calls
pip install swiftagentx[flask] # Flask SSE adapter
pip install swiftagentx[fastapi] # FastAPI SSE adapter
pip install swiftagentx[all] # Everythingimport asyncio
from swiftagentx import Agent, DummyModelClient
async def main():
agent = Agent(model=DummyModelClient(api_key="test", model="dummy"))
response = await agent.run("Hello!")
print(response.answer)
asyncio.run(main())Needs
pip install "swiftagentx[openai]"(brings in httpx + SOCKS support). Inside mainland China, also prependHTTP_PROXY= HTTPS_PROXY= ALL_PROXY=when calling China-based endpoints (Aliyun DashScope, etc.) so httpx doesn't try to tunnel through your foreign proxy.
import os, asyncio
from swiftagentx import Agent
from swiftagentx.providers.openai_compatible import OpenAICompatibleProvider
async def main():
agent = Agent(
# OpenAI:
# model=OpenAICompatibleProvider(
# api_key=os.environ["OPENAI_API_KEY"], model="gpt-4o",
# api_base="https://api.openai.com/v1",
# ),
# Aliyun DashScope (Qwen — what the benchmarks above use):
model=OpenAICompatibleProvider(
api_key=os.environ["DASHSCOPE_API_KEY"],
model="qwen-flash",
api_base="https://dashscope.aliyuncs.com/compatible-mode/v1",
),
# DeepSeek:
# model=OpenAICompatibleProvider(
# api_key=os.environ["DEEPSEEK_API_KEY"], model="deepseek-chat",
# api_base="https://api.deepseek.com/v1",
# ),
)
# `session_id` is optional: a single Agent instance shares one default
# session across calls, so a simple CLI bot has memory out of the box.
# Multi-user servers should pass an explicit session_id per user.
response = await agent.run("Explain quantum computing in one sentence.")
print(response.answer)
asyncio.run(main())Works with any OpenAI-compatible endpoint — OpenAI, Azure OpenAI, DeepSeek, DashScope, Together, Fireworks, etc. Pick the snippet that matches your provider and set the matching env var.
Agent.run(text) accepts session_id= and user_id= keyword arguments.
Every turn that shares the same session_id shares one LayeredMemory
(L1 current / L2 last 4 turns verbatim / L3 reference / L4 rolling summary).
Without a session_id, the agent uses one stable default session id
generated at construction time — so a simple CLI chatbot with a single
Agent instance "just works":
agent = Agent(model=OpenAICompatibleProvider(...))
while user_input := input("You: "):
response = await agent.run(user_input) # default session shared across turns
print("Bot:", response.answer)For a multi-user server, pass an explicit session_id per user instead.
from swiftagentx import Agent, Tool, ToolOutput, DummyModelClient
class WeatherTool(Tool):
def __init__(self):
super().__init__(name="weather", description="Get weather for a city")
async def execute(self, context, **kwargs):
city = kwargs.get("city", "unknown")
return ToolOutput(success=True, result=f"Sunny, 25C in {city}")
async def main():
agent = Agent(model=DummyModelClient(api_key="test", model="dummy"))
agent.register_tool(WeatherTool())
response = await agent.run("What's the weather in Beijing?")
print(response.answer)Use a fast, cheap model for intent classification and a powerful model for reasoning:
from swiftagentx import Agent, ModelTier
from swiftagentx.providers.openai_compatible import OpenAICompatibleProvider
light = OpenAICompatibleProvider(api_key=key, model="gpt-3.5-turbo", api_base=base)
heavy = OpenAICompatibleProvider(api_key=key, model="gpt-4", api_base=base)
agent = Agent(
models={
ModelTier.LIGHT: light, # Intent classification (~200ms)
ModelTier.HEAVY: heavy, # ReAct reasoning & response generation
},
)Skip the ReAct loop for common request patterns:
from swiftagentx import Agent, ScenarioConfig, ToolChainStep, DummyModelClient
agent = Agent(model=DummyModelClient(api_key="test", model="dummy"))
agent.register_tool(WeatherTool())
agent.register_scenario("weather", ScenarioConfig(
name="Weather Query",
description="Get weather information",
triggers=["weather", "temperature", "forecast"],
tool_chain=[
ToolChainStep(tool="weather", query_template="$city"),
],
cache_ttl=1800,
output_type="direct",
))When the light model classifies a request as a "weather" scenario, the framework executes the tool chain directly — no ReAct loop, no extra LLM calls.
from swiftagentx import Agent, AgentRequest, SSEStreamAdapter, DummyModelClient
async def main():
agent = Agent(model=DummyModelClient(api_key="test", model="dummy"))
request = AgentRequest(user_id="u1", session_id="s1", user_input="Hello")
adapter = SSEStreamAdapter()
response = await agent.run_stream(request, adapter)
# Events are available via adapter.event_generator()
# In a web context, pipe this to an SSE responseAttach a knowledge base to your agent. Exact matches are returned instantly, skipping LLM processing entirely:
from swiftagentx import Agent, DummyModelClient, MemoryKnowledgeBase, Document
async def main():
agent = Agent(model=DummyModelClient(api_key="test", model="dummy"))
kb = MemoryKnowledgeBase()
await kb.add_documents([
Document(doc_id="faq-1", content="Return policy: 7-day no-questions-asked returns"),
Document(doc_id="faq-2", content="Points can be redeemed in the member store"),
])
agent.set_knowledge_base(kb) # Auto-registers KnowledgeBaseTool
response = await agent.run("Return policy: 7-day no-questions-asked returns")
# → Exact match (score=1.0), returned directly without LLM callUse KnowledgeBaseStage in the pipeline for pre-processing short-circuit:
from swiftagentx import KnowledgeBaseStage
agent.pipeline.add_stage(KnowledgeBaseStage(kb=kb, threshold=0.95))Implement the KnowledgeBase ABC to integrate with Weaviate, Elasticsearch, or any vector store. See Knowledge Base Guide.
Monitor and manage your agent at runtime:
from swiftagentx.admin import AdminService, create_flask_admin_blueprint
service = AdminService(agent)
# Flask
bp = create_flask_admin_blueprint(service)
app.register_blueprint(bp, url_prefix="/admin")
# FastAPI
from swiftagentx.admin import create_fastapi_admin_router
router = create_fastapi_admin_router(service)
app.include_router(router, prefix="/admin")Available endpoints:
| Method | Path | Description |
|---|---|---|
| GET | /admin/status |
Agent status, tool count, cache stats, uptime |
| GET | /admin/tools |
Registered tools with JSON Schema |
| GET | /admin/cache/stats |
Cache hit statistics |
| POST | /admin/cache/clear |
Clear cache (all or by level) |
| GET | /admin/config |
Current config (secrets masked) |
| PUT | /admin/config |
Update config at runtime |
| POST | /admin/kb/search |
Search knowledge base |
| POST | /admin/kb/documents |
Add documents |
| DELETE | /admin/kb/documents/:id |
Delete a document |
| GET | /admin/kb/stats |
KB document count and provider |
Security: Admin endpoints have no built-in authentication. Add your own middleware in production. See Admin Guide.
from flask import Flask
from swiftagentx import Agent, DummyModelClient
from swiftagentx.web.flask_adapter import create_flask_blueprint
app = Flask(__name__)
agent = Agent(model=DummyModelClient(api_key="test", model="dummy"))
app.register_blueprint(create_flask_blueprint(agent))
# POST /api/v1/agent/sse — SSE streaming endpoint
# GET /api/v1/agent/health — Health checkfrom fastapi import FastAPI
from swiftagentx import Agent, DummyModelClient
from swiftagentx.web.fastapi_adapter import create_fastapi_router
app = FastAPI()
agent = Agent(model=DummyModelClient(api_key="test", model="dummy"))
app.include_router(create_fastapi_router(agent))Two ways to hook into the request lifecycle.
A. Subclass Agent and override — simplest for project-local logic:
from swiftagentx import Agent
class MyAgent(Agent):
async def on_request_start(self, context): ... # request received
async def on_before_classify(self, context): ... # before intent classification
async def on_after_classify(self, context, intent): ... # after intent classification
async def on_before_tool_call(self, context, tool_name, params): ...
async def on_after_tool_call(self, context, tool_name, result): ...
async def on_before_respond(self, context, answer): # may rewrite answer
return answer
async def on_request_end(self, context, response): ... # request finishedEach override is optional; the framework calls the base no-op when you don't override.
B. HookRegistry — declarative, no subclassing (v0.3+):
from swiftagentx import HookEvent, HookResult, PythonHook
async def log_tool(ctx):
print(f"tool {ctx.tool_name}({ctx.tool_args}) → {ctx.tool_result}")
return HookResult()
agent.hooks.register(PythonHook(
name="log_tools", events={HookEvent.AFTER_TOOL_CALL}, handler=log_tool,
))Twelve lifecycle events are dispatched: SESSION_START, REQUEST_START,
BEFORE_CLASSIFY, AFTER_CLASSIFY, BEFORE_SCENARIO_STEP,
AFTER_SCENARIO_STEP, BEFORE_TOOL_CALL, AFTER_TOOL_CALL,
BEFORE_REACT_ITER, AFTER_REACT_ITER, BEFORE_RESPOND, REQUEST_END
— plus semantic events like TOPIC_CHANGE. Handlers can return
HookResult(action="short_circuit", answer=...) to bypass the rest of
the request (useful for security policies / rate limiters / quota checks).
Both styles coexist and fire at the same boundary — subclass methods first, then registered hooks.
from swiftagentx import Agent, Middleware, DummyModelClient
class LoggingMiddleware(Middleware):
async def process(self, context, next_handler):
print(f"[LOG] Processing: {context.get('user_input', '')}")
result = await next_handler(context)
print(f"[LOG] Done")
return result
agent = Agent(model=DummyModelClient(api_key="test", model="dummy"))
agent.use(LoggingMiddleware())from swiftagentx import Agent, SwiftAgentConfig, DummyModelClient
agent = Agent(
model=DummyModelClient(api_key="test", model="dummy"),
config=SwiftAgentConfig(
name="MyAgent",
max_iterations=5,
enable_cache=True,
max_input_length=5000,
debug=False, # Set True to expose error details
sse_heartbeat_interval=5.0,
max_cache_entries_per_level=10000,
),
)A request descends through the tiers and stops at the shallowest one that can answer it — cache and KB hits return with zero LLM calls, known intents fire a pre-compiled Scenario for one, and only genuinely open-ended requests pay for the full ReAct loop.
Full execution pipeline (text)
User Request
|
v
[Middleware Chain] ──> TracingMiddleware, custom middleware, ...
|
v
[Pipeline Stages]
├─ [KnowledgeBaseStage] ─── exact match? ──> SHORT_CIRCUIT (return directly)
├─ [Custom Stages] ─── security check, feature flags, ...
|
v
[Input Validation] ─── too long? ──> Reject
|
v
[Cache Check] ─── hit? ──> Return cached answer (0ms)
|
v
[Intent Classification] (Light Model, ~200ms)
|
├─ SCENARIO ──> Scenario Toolchain ──> Direct / LLM-formatted response
├─ REACT ────> ReAct Loop (Heavy Model) ──> Thought → Action → Observation → ... → Answer
└─ DIRECT ───> Direct LLM Response (Heavy Model)
|
v
[Lifecycle Hooks] ──> on_before_respond
|
v
[SSE Stream / Response]
| Level | Scope | Key | TTL | Use Case |
|---|---|---|---|---|
| L1 - KB | Global | Query hash | Configurable (default 1h) | Knowledge base exact match |
| L2 - Code | Per-user + platform | User + platform + query hash | Configurable (default 5m) | Tool execution results |
| L3 - Dynamic | Per-session | Variable name | No expiry | Session state variables |
| Scenario | Per-scenario | Custom template | Configurable | Toolchain results |
swiftagentx/
├── core/ # Agent, memory, model client, cache, prompt, parameter, router, pipeline
├── models/ # Pydantic schemas (AgentRequest, AgentResponse, config)
├── tools/ # Tool base class, registry, executor, termination checker, scenario engine
├── knowledge_base/ # KnowledgeBase ABC, MemoryKB (TF-IDF), KnowledgeBaseTool, KnowledgeBaseStage
├── admin/ # AdminService, Flask Blueprint, FastAPI Router
├── stream/ # SSE adapter and event builder
├── providers/ # LLM providers (OpenAI-compatible, DummyModelClient)
├── storage/ # Storage backend abstraction (memory, extensible)
├── middleware/ # Middleware chain (tracing, custom)
└── web/ # Web framework adapters (Flask, FastAPI)
| Document | Description |
|---|---|
| Architecture | System overview, dual-model strategy, cache, pipeline, ReAct loop |
| Tools Guide | Custom tool development |
| Scenarios Guide | Scenario toolchain configuration |
| Knowledge Base Guide | KB integration, MemoryKB, custom backends |
| Streaming Guide | SSE events, Flask/FastAPI integration, frontend examples |
| Admin Guide | Admin API, authentication, endpoints |
| Deployment Guide | Gunicorn, Uvicorn, Docker, Nginx |
- Python >= 3.9
- Core dependencies:
pydantic >= 2.0,PyYAML >= 6.0 - No HTTP dependency in core —
httpxis optional (forOpenAICompatibleProvider)
Apache-2.0
面向生产环境的 Agent 框架,围绕 Scenario(预编译执行路径)构建—— 在已知意图上完全跳过 ReAct 循环。
其它框架把每个请求都当作开放式推理问题对待。SwiftAgentX 不这么想。 在生产环境中,80% 的流量是可预测的:"查订单状态"、"问退货政策"、 "预约 3 点的时段"。对这些请求来说,ReAct 循环是杀鸡用牛刀—— 3-5 次 LLM 调用、几秒延迟、一份没人解释得清楚的 token 账单。
Scenario 是一条预编译的执行路径:
agent.register_scenario("order_status", ScenarioConfig(
name="Order Status",
triggers=["订单", "我的快递在哪", "发货", "shipment"],
tool_chain=[
ToolChainStep(tool="order_db", query_template="$order_id"),
ToolChainStep(tool="courier_api", condition="status=in_transit"),
],
cache_ttl=120,
output_type="direct", # 不需要二次 LLM 调用来"润色"答案
))当 LIGHT 模型把请求分类为 weather / order_status / balance_check 这类
场景时,SwiftAgentX 直接跑工具链——不进 ReAct 循环,没有第二次 LLM
调用。一次分类(LIGHT 模型,~200ms),一条工具链,结束。
这是框架最大的设计赌注,也是它在生产环境延迟和成本上真正甩开 LangChain / AutoGen / CrewAI 的地方。
Scenario 位于四层执行模型的中央。所有数据用 DashScope Qwen 实测——
每个场景 20 次迭代,LIGHT=qwen-flash,HEAVY=qwen-turbo(v0.3.3),
一行命令就能在你自己机器上复现(见 benchmarks/)。
| 请求类型 | 执行路径 | P50 延迟 | P95 | LLM 调用次数 |
|---|---|---|---|---|
| 缓存命中 / KB 精准匹配 | Pipeline 短路 | 0.02 ms | 0.1 ms | 0 |
| 已知意图(Scenario) | 预编译工具链 | 526 ms | 1.6 s | 1(仅 LIGHT) |
| 开放式对话 | 直接 LLM 回复 | 1.2 s | 2.4 s | 2–3 |
| 多步推理 | 完整 ReAct 循环 | 1.8 s | 2.6 s | 2–3 |
实测于 2026-05-29,swiftagentx 0.3.3,每场景 20 次迭代。两个廉价层 (缓存 + Scenario)只花 0–1 次 LLM 调用——这就是核心卖点。ReAct 去重护栏让多步循环保持精简,最深的一层也很少超过 2–3 次调用。
LIGHT 模型挑路径。HEAVY 模型只在请求确实需要开放式推理时才启动。 两条便宜的路径(缓存 + Scenario)合起来覆盖生产环境绝大多数可预测的流量, 每个请求 0-1 次 LLM 调用——这就是头号卖点。复现:
git clone https://github.com/Caxson/swiftagentx.git
cd swiftagentx
pip install -e ".[dev,openai,benchmark]"
export DASHSCOPE_API_KEY=sk-...
python benchmarks/real_runner.py --iterations 30Scenario 不只是一个静态工具列表。链中的步骤可以是:
- 一个原生 Python
Tool - (v0.3+)一个 MCP 工具——任何 Model Context Protocol server 暴露的 工具,不需要写 Python wrapper
- (v0.3+)一个 hook——条件触发器,当工具链命中特定状态时分支到 LLM 调用、子 Agent 调度、或外部 shell 逻辑
这就是 Scenario 既快又能扩展的方式:路由决策很便宜,但每一步都能在 需要时调用整个 Agent 工具箱。
| SwiftAgentX | LangChain | AutoGen | CrewAI | |
|---|---|---|---|---|
| 预编译 Scenario 短路 | ✅ 核心差异化 | ❌ 无对应概念 | ❌ 无对应概念 | ❌ 无对应概念 |
| FAQ / 缓存命中 0 LLM 调用 | ✅ | 1-3 LLM 调用 | 2+ LLM 调用 | 2+ LLM 调用 |
| 内置三级缓存(KB / Tool / Session) | ✅ | 部分支持 | ❌ | ❌ |
| 双模型路由(LIGHT/HEAVY)原生内置 | ✅ | 自己接 | 自己接 | 自己接 |
| Pipeline 阶段短路(KB / 安全 / 功能开关) | ✅ | 自己写 | ❌ | ❌ |
| 流式细粒度事件类型 | ✅ 12 种 | ✅ | 部分 | ✅ |
框架无关核心(core/ 不依赖 HTTP) |
✅ | n/a | n/a | n/a |
| 测试套件 | 218 个测试,< 0.5 秒 | 庞大 | 庞大 | 中等 |
LangChain 更广。SwiftAgentX 更专——专于流量可预测、延迟和单次 LLM 成本是命门的生产场景。
- 你做的 Agent 产品中,多数请求是可预测的(客服、订单运营、FAQ、 内部 copilot、AI 外呼),只有少数尾部需要真正的开放式推理。
- 你把 P95 延迟和单次请求 LLM 成本当作一等公民指标,不是事后再说。
- 你想要一个一下午能读完(4k 行源码)、改起来不害怕的框架。
- 你习惯用 Python 配置 tool / KB / scenario,不喜欢 YAML/DSL。
如果你想要"什么集成都有"的瑞士军刀工具包,去用 LangChain。如果你想要 小而快、Scenario 是设计单元的框架,继续往下看。
| 特性 | 说明 | |
|---|---|---|
| 🎯 | Scenario | 在已知意图上跳过 ReAct 循环的预编译执行路径——框架头号抽象。链中每一步可以是 Python tool、MCP tool 或条件 hook。 |
| 🪜 | 分层执行 | Pipeline 短路 → Scenario → ReAct → Direct,由 LIGHT 分类器为每个请求挑路径。 |
| ⚖️ | 双模型路由 | ModelTier.LIGHT 做分类,ModelTier.HEAVY 做推理——真实 provider 上 ~30× 成本差。 |
| ⚡ | 三级缓存 | KB 精准匹配(全局)、工具结果(按用户)、会话变量。各自独立 TTL,周期清理。 |
| 🚦 | Pipeline 阶段 | cache/route 之前插入 KB 短路、安全检查、功能开关等自定义逻辑。阶段可返回 CONTINUE / SHORT_CIRCUIT / ABORT。 |
| 📚 | 知识库 ABC | 内置 TF-IDF MemoryKnowledgeBase 本地开发;通过 3 方法 ABC 对接 Weaviate / Elasticsearch / pgvector。 |
| 📡 | SSE 流式 | 12 种事件类型(THINKING / ACTION / OBSERVATION / ANSWER 等),带心跳保活。 |
| 🛠️ | 管理后台 | Status、tools、cache、config、KB 端点,Flask Blueprint 和 FastAPI Router 都内置,核心层框架无关。 |
| 🧅 | 中间件流水线 | 追踪、重试、输入验证、错误脱敏,每个阶段都能挂 hook。 |
| 🪶 | 核心层无 HTTP 依赖 | httpx 可选——可在 Lambda、Celery worker 或 Notebook 里跑。 |
v0.2.0 把现有的部分打磨扎实。v0.3+ 引入受 Claude Code 等 2026 范式 框架启发的设计:
- MCP server 支持 — Scenario 和 ReAct 都能用任何 MCP server 的 tool。 一行注册。
- 4 层 Memory — 当前问题 / 最近 4 轮 verbatim / 参考窗口 / 增量滚动 摘要。话题切换检测自动触发重新摘要。
- Hook 系统 — 生命周期 hook(pre/post tool、pre/post classify)+ 语义 hook(话题切换、Scenario 步骤条件触发)。
- 子 Agent 调度 — 从 ReAct 或 Scenario 步骤内部,spawn 一个上下文 隔离的专项子 Agent。支持并行调度。
- Skill-in-ReAct — ReAct 循环可以按需调用的 markdown 定义的工作流 (与 Scenario 互补:Scenario 预编译且快,Skill 通用且解释执行)。
- Worktree-style 工作目录 — 为生成文档的 Agent 提供每会话沙箱。
- Cache-friendly prompt 顺序 — Anthropic / OpenAI prompt cache 优化 内置到框架。
- Tool 延迟加载 — 当 registry 数量超过阈值时,LIGHT 模型先挑类别 再让 HEAVY 看 schema。
pip install swiftagentx可选依赖:
pip install swiftagentx[openai] # httpx,用于异步 OpenAI 兼容调用
pip install swiftagentx[flask] # Flask SSE 适配器
pip install swiftagentx[fastapi] # FastAPI SSE 适配器
pip install swiftagentx[all] # 全部安装import asyncio
from swiftagentx import Agent, DummyModelClient
async def main():
agent = Agent(model=DummyModelClient(api_key="test", model="dummy"))
response = await agent.run("你好!")
print(response.answer)
asyncio.run(main())需要
pip install "swiftagentx[openai]"(包含 httpx + SOCKS 支持)。 国内调用国内服务(如阿里云 DashScope)时,前面加HTTP_PROXY= HTTPS_PROXY= ALL_PROXY=避免 httpx 走海外代理失败。
import os, asyncio
from swiftagentx import Agent
from swiftagentx.providers.openai_compatible import OpenAICompatibleProvider
async def main():
agent = Agent(
# OpenAI:
# model=OpenAICompatibleProvider(
# api_key=os.environ["OPENAI_API_KEY"], model="gpt-4o",
# api_base="https://api.openai.com/v1",
# ),
# 阿里云 DashScope (Qwen,benchmark 用的就是这套):
model=OpenAICompatibleProvider(
api_key=os.environ["DASHSCOPE_API_KEY"],
model="qwen-flash",
api_base="https://dashscope.aliyuncs.com/compatible-mode/v1",
),
# DeepSeek:
# model=OpenAICompatibleProvider(
# api_key=os.environ["DEEPSEEK_API_KEY"], model="deepseek-chat",
# api_base="https://api.deepseek.com/v1",
# ),
)
# 不传 session_id 也行——同一 Agent 实例的多次 run 共享一个默认 session,
# 单用户 CLI 聊天开箱即用。多用户服务端再为每个用户传自己的 session_id。
response = await agent.run("用一句话解释量子计算。")
print(response.answer)
asyncio.run(main())Agent.run(text) 接受 session_id= 和 user_id= 关键字参数。同一 session_id
的所有 turn 共享同一份 LayeredMemory(L1 当前问题 / L2 最近 4 轮 verbatim /
L3 参考窗口 / L4 滚动摘要)。不传 session_id 时,Agent 用一个构造时生成的
稳定默认 session id,所以单 Agent 实例的 CLI 聊天机器人"开箱即用":
agent = Agent(model=OpenAICompatibleProvider(...))
while user_input := input("You: "):
response = await agent.run(user_input) # 默认 session 跨轮共享
print("Bot:", response.answer)多用户服务端场景下,每个用户传自己的 session_id 即可隔离。
支持任何 OpenAI 兼容端点(OpenAI、Azure OpenAI、DeepSeek、通义千问 DashScope 等)。
from swiftagentx import Agent, Tool, ToolOutput, DummyModelClient
class WeatherTool(Tool):
def __init__(self):
super().__init__(name="weather", description="查询城市天气")
async def execute(self, context, **kwargs):
city = kwargs.get("city", "未知")
return ToolOutput(success=True, result=f"{city}:晴,25°C")
async def main():
agent = Agent(model=DummyModelClient(api_key="test", model="dummy"))
agent.register_tool(WeatherTool())
response = await agent.run("北京天气怎么样?")
print(response.answer)用快速廉价的模型做意图分类,用强力模型做推理:
from swiftagentx import Agent, ModelTier
from swiftagentx.providers.openai_compatible import OpenAICompatibleProvider
light = OpenAICompatibleProvider(api_key=key, model="gpt-3.5-turbo", api_base=base)
heavy = OpenAICompatibleProvider(api_key=key, model="gpt-4", api_base=base)
agent = Agent(
models={
ModelTier.LIGHT: light, # 意图分类(~200ms)
ModelTier.HEAVY: heavy, # ReAct 推理和回复生成
},
)跳过 ReAct 循环,直接执行预定义工具链:
from swiftagentx import Agent, ScenarioConfig, ToolChainStep, DummyModelClient
agent = Agent(model=DummyModelClient(api_key="test", model="dummy"))
agent.register_tool(WeatherTool())
agent.register_scenario("weather", ScenarioConfig(
name="天气查询",
description="查询指定城市天气",
triggers=["天气", "气温", "下雨"],
tool_chain=[
ToolChainStep(tool="weather", query_template="$city"),
],
cache_ttl=1800, # 缓存 30 分钟
output_type="direct", # 直接返回工具结果,无需 LLM 二次处理
))当轻量模型将请求分类为 "weather" 场景时,框架直接执行工具链——不进 ReAct 循环,不产生额外 LLM 调用。
from swiftagentx import Agent, AgentRequest, SSEStreamAdapter, DummyModelClient
async def main():
agent = Agent(model=DummyModelClient(api_key="test", model="dummy"))
request = AgentRequest(user_id="u1", session_id="s1", user_input="你好")
adapter = SSEStreamAdapter()
response = await agent.run_stream(request, adapter)
# 事件通过 adapter.event_generator() 获取
# 在 Web 场景中,将其接入 SSE 响应即可为 Agent 接入知识库。精准匹配的结果直接返回,无需 LLM 处理:
from swiftagentx import Agent, DummyModelClient, MemoryKnowledgeBase, Document
async def main():
agent = Agent(model=DummyModelClient(api_key="test", model="dummy"))
kb = MemoryKnowledgeBase()
await kb.add_documents([
Document(doc_id="faq-1", content="退货政策:7天无理由退换货"),
Document(doc_id="faq-2", content="会员积分可在商城兑换礼品"),
])
agent.set_knowledge_base(kb) # 自动注册 KnowledgeBaseTool
response = await agent.run("退货政策:7天无理由退换货")
# → 精准匹配 (score=1.0),直接返回,无需 LLM 调用在请求管道中使用 KnowledgeBaseStage 实现预处理短路:
from swiftagentx import KnowledgeBaseStage
agent.pipeline.add_stage(KnowledgeBaseStage(kb=kb, threshold=0.95))实现 KnowledgeBase ABC 即可对接 Weaviate、Elasticsearch 或任何向量存储。详见 知识库指南。
运行时监控和管理 Agent:
from swiftagentx.admin import AdminService, create_flask_admin_blueprint
service = AdminService(agent)
# Flask
bp = create_flask_admin_blueprint(service)
app.register_blueprint(bp, url_prefix="/admin")
# FastAPI
from swiftagentx.admin import create_fastapi_admin_router
router = create_fastapi_admin_router(service)
app.include_router(router, prefix="/admin")可用端点:
| 方法 | 路径 | 说明 |
|---|---|---|
| GET | /admin/status |
Agent 状态、工具数、缓存统计、运行时间 |
| GET | /admin/tools |
已注册工具列表及 JSON Schema |
| GET | /admin/cache/stats |
缓存命中统计 |
| POST | /admin/cache/clear |
清除缓存(全部或按层级) |
| GET | /admin/config |
当前配置(敏感值脱敏) |
| PUT | /admin/config |
运行时更新配置 |
| POST | /admin/kb/search |
搜索知识库 |
| POST | /admin/kb/documents |
添加文档 |
| DELETE | /admin/kb/documents/:id |
删除文档 |
| GET | /admin/kb/stats |
知识库文档数量和提供者 |
安全提示:Admin 端点不内置认证。生产环境请自行添加中间件。详见 管理后台指南。
from flask import Flask
from swiftagentx import Agent, DummyModelClient
from swiftagentx.web.flask_adapter import create_flask_blueprint
app = Flask(__name__)
agent = Agent(model=DummyModelClient(api_key="test", model="dummy"))
app.register_blueprint(create_flask_blueprint(agent))
# POST /api/v1/agent/sse — SSE 流式端点
# GET /api/v1/agent/health — 健康检查from fastapi import FastAPI
from swiftagentx import Agent, DummyModelClient
from swiftagentx.web.fastapi_adapter import create_fastapi_router
app = FastAPI()
agent = Agent(model=DummyModelClient(api_key="test", model="dummy"))
app.include_router(create_fastapi_router(agent))两种风格挂钩。
A. 子类重写 Agent — 项目内部逻辑最简单:
from swiftagentx import Agent
class MyAgent(Agent):
async def on_request_start(self, context): ... # 收到请求
async def on_before_classify(self, context): ... # 意图分类前
async def on_after_classify(self, context, intent): ... # 意图分类后
async def on_before_tool_call(self, context, tool_name, params): ...
async def on_after_tool_call(self, context, tool_name, result): ...
async def on_before_respond(self, context, answer): # 可改写答复
return answer
async def on_request_end(self, context, response): ... # 请求结束每个重写都可选,没重写就调框架的空实现。
B. HookRegistry — 声明式,不需要子类(v0.3+):
from swiftagentx import HookEvent, HookResult, PythonHook
async def log_tool(ctx):
print(f"tool {ctx.tool_name}({ctx.tool_args}) → {ctx.tool_result}")
return HookResult()
agent.hooks.register(PythonHook(
name="log_tools", events={HookEvent.AFTER_TOOL_CALL}, handler=log_tool,
))框架派发 12 个 lifecycle 事件:SESSION_START、REQUEST_START、
BEFORE_CLASSIFY、AFTER_CLASSIFY、BEFORE_SCENARIO_STEP、
AFTER_SCENARIO_STEP、BEFORE_TOOL_CALL、AFTER_TOOL_CALL、
BEFORE_REACT_ITER、AFTER_REACT_ITER、BEFORE_RESPOND、REQUEST_END
——加上 TOPIC_CHANGE 等语义事件。Handler 可返回
HookResult(action="short_circuit", answer=...) 跳过后续请求处理
(用于安全策略 / 限流 / 配额检查等)。
两种风格可以同时用——同一时刻先调子类方法,再 dispatch 注册的 hook。
from swiftagentx import Agent, Middleware, DummyModelClient
class LoggingMiddleware(Middleware):
async def process(self, context, next_handler):
print(f"[日志] 处理请求: {context.get('user_input', '')}")
result = await next_handler(context)
print(f"[日志] 处理完成")
return result
agent = Agent(model=DummyModelClient(api_key="test", model="dummy"))
agent.use(LoggingMiddleware())from swiftagentx import Agent, SwiftAgentConfig, DummyModelClient
agent = Agent(
model=DummyModelClient(api_key="test", model="dummy"),
config=SwiftAgentConfig(
name="MyAgent",
max_iterations=5, # ReAct 最大迭代次数
enable_cache=True, # 启用三级缓存
max_input_length=5000, # 输入最大长度
debug=False, # 生产环境设为 False,隐藏错误详情
sse_heartbeat_interval=5.0,
max_cache_entries_per_level=10000,
),
)请求逐层下探,能在哪一层解决就在哪一层停——缓存和 KB 命中 0 次 LLM 直接返回,已知意图触发预编译 Scenario 只花 1 次,只有真正开放式的请求 才付完整 ReAct 循环的代价。
完整执行管道(文本版)
用户请求
|
v
[中间件链] ──> TracingMiddleware, 自定义中间件, ...
|
v
[请求管道]
├─ [KnowledgeBaseStage] ─── 精准匹配? ──> 短路返回
├─ [自定义阶段] ─── 安全检查, 功能开关, ...
|
v
[输入验证] ─── 超长? ──> 拒绝
|
v
[缓存检查] ─── 命中? ──> 返回缓存结果 (0ms)
|
v
[意图分类] (轻量模型, ~200ms)
|
├─ SCENARIO ──> 场景工具链 ──> 直接返回 / LLM 格式化
├─ REACT ────> ReAct 循环 (重量模型) ──> 思考 → 行动 → 观察 → ... → 回答
└─ DIRECT ───> 直接 LLM 回复 (重量模型)
|
v
[生命周期钩子] ──> on_before_respond
|
v
[SSE 流式 / 响应返回]
| 层级 | 作用域 | 缓存键 | 过期策略 | 使用场景 |
|---|---|---|---|---|
| L1 - KB | 全局 | 查询哈希 | 可配置(默认 1 小时) | 知识库精准匹配 |
| L2 - Code | 按用户+平台 | 用户 + 平台 + 查询哈希 | 可配置(默认 5 分钟) | 工具执行结果 |
| L3 - Dynamic | 按会话 | 变量名 | 不过期 | 会话状态变量 |
| Scenario | 按场景 | 自定义模板 | 可配置 | 工具链结果 |
swiftagentx/
├── core/ # Agent 核心、记忆、模型客户端、缓存、提示词、参数、路由、流水线
├── models/ # Pydantic 数据模型(AgentRequest、AgentResponse、配置)
├── tools/ # 工具基类、注册表、执行器、终止检查器、场景引擎
├── knowledge_base/ # 知识库 ABC、MemoryKB(TF-IDF)、KnowledgeBaseTool、KnowledgeBaseStage
├── admin/ # AdminService、Flask Blueprint、FastAPI Router
├── stream/ # SSE 适配器和事件构建器
├── providers/ # LLM 提供者(OpenAI 兼容、DummyModelClient)
├── storage/ # 存储后端抽象(内存实现,可扩展)
├── middleware/ # 中间件链(追踪、自定义)
└── web/ # Web 框架适配器(Flask、FastAPI)
| 文档 | 内容 |
|---|---|
| 架构总览 | 系统架构、双模型策略、三级缓存、Pipeline、ReAct 循环 |
| 工具开发指南 | 自定义工具开发 |
| 场景工具链指南 | 场景工具链配置 |
| 知识库指南 | 知识库集成、MemoryKB 用法、自定义后端 |
| 流式指南 | SSE 事件、Flask/FastAPI 集成、前端示例 |
| 管理后台指南 | Admin API、认证、端点列表 |
| 部署指南 | Gunicorn、Uvicorn、Docker、Nginx |
- Python >= 3.9
- 核心依赖:
pydantic >= 2.0、PyYAML >= 6.0 - 核心无 HTTP 依赖 —
httpx为可选项(用于OpenAICompatibleProvider)
Apache-2.0

