Skip to content

draft: 实现 tool search tool 模式 — 对齐 Claude/OpenAI 官方工具搜索设计#6557

Draft
w31r4 wants to merge 19 commits intoAstrBotDevs:masterfrom
w31r4:feature/tool-search-tool
Draft

draft: 实现 tool search tool 模式 — 对齐 Claude/OpenAI 官方工具搜索设计#6557
w31r4 wants to merge 19 commits intoAstrBotDevs:masterfrom
w31r4:feature/tool-search-tool

Conversation

@w31r4
Copy link
Copy Markdown
Contributor

@w31r4 w31r4 commented Mar 18, 2026

实现 tool_search 工具搜索模式,允许 LLM 按需发现和加载工具,解决工具数量增长后的上下文膨胀和选择准确率下降问题。

设计对齐 Claude tool searchOpenAI tool search 官方规范,采用三路径架构:

  • Claude 路径:发送完整工具目录 + defer_loading: true,搜索结果以 tool_reference 内容块返回,工具参数每轮不变以最大化 prompt cache 命中
  • 通用路径:本地 BM25 模拟,物理过滤工具参数(仅发送 core + tool_search + 已发现工具),单调递增保持前缀稳定
  • Full/Skills-like 模式:完全不受影响

Modifications / 改动点

新增模块(astrbot/core/tools/):

  • tool_catalog.py — 不可变 ToolCatalog,将工具分为 core(内置/HandoffTool/用户配置)和 deferred 两组
  • tool_search_index.py — 无状态 BM25 搜索索引(jieba + rank-bm25),搜索范围覆盖工具名、描述、参数名、参数描述
  • discovery_state.py — 会话级工具发现状态追踪(只增不减),独立于消息历史,不受上下文压缩影响
  • tools_assembler.py — 无状态工具参数组装器,保证 core → tool_search → discovered 的稳定排序
  • tool_search_tool.py — LLM 可调用的 FunctionTool,返回结构化 JSON 搜索结果
  • strategy.py — ToolSearchStrategy 抽象基类(两方法接口)
  • generic_strategy.py — 通用路径策略,物理过滤 + JSON 文本结果
  • claude_strategy.py — Claude 路径策略,defer_loading + tool_reference 内容块 + schema override

修改文件:

  • tool_loop_agent_runner.py — 替换临时 fallback 为完整 tool_search 初始化(provider 检测 → catalog 构建 → 策略选择 → 工具组装 → prompt 注入),支持 provider fallback 时重建策略并复用已发现工具
  • astr_main_agent.py — tool_search/auto 模式的 system prompt 组装(由 runner 在 mode resolution 后注入,避免重复注入)
  • astr_main_agent_resources.py — 新增 TOOL_CALL_PROMPT_TOOL_SEARCH_MODE 常量
  • tool.py — 支持 schema override,Claude 路径的 Anthropic 专用序列化挂载点
  • anthropic_source.py — tool_search 结果转换为 tool_reference 内容块
  • config/default.py — threshold 默认值 10→25(避免 subagent 误触发),更新配置描述
  • internal.py — tool_search/auto 保留为合法配置值
  • Dashboard i18n(zh-CN / en-US / ru-RU)— 更新 tool_search 配置提示文案

核心设计要点:

  1. 工具目录不可变(ToolCatalog frozen dataclass),搜索索引无状态,发现状态只增不减 → 工具参数前缀在会话内保持稳定
  2. Claude 路径每轮发送完整目录(相同 tools 参数 → prompt cache 命中),通用路径单调递增
  3. Provider fallback 切换时重建序列化策略但复用 discovery state,不丢已发现工具
  4. auto 模式阈值基于 active tools 数量(排除 inactive),默认 25
  5. 未引入新依赖(复用已有 jieba + rank-bm25)
  • This is NOT a breaking change. / 这不是一个破坏性变更。

Screenshots or Test Results / 运行截图或测试结果

$ python -m pytest tests/unit/test_tool_catalog.py tests/unit/test_tool_search_index.py \
    tests/unit/test_discovery_state.py tests/unit/test_tools_assembler.py \
    tests/unit/test_tool_search_tool.py tests/unit/test_generic_strategy.py \
    tests/unit/test_claude_strategy.py tests/unit/test_tool_search_integration.py -q

134 passed, 4 warnings in 20.45s

测试覆盖:

模块 测试文件 测试数
ToolCatalog test_tool_catalog.py 16
ToolSearchIndex test_tool_search_index.py 15
DiscoveryState test_discovery_state.py 13
ToolsAssembler test_tools_assembler.py 8
ToolSearchTool test_tool_search_tool.py 17
GenericStrategy test_generic_strategy.py 13
ClaudeStrategy test_claude_strategy.py 27
Integration test_tool_search_integration.py 25
合计 8 个测试文件 134

Checklist / 检查清单

  • 😊 如果 PR 中有新加入的功能,已经通过 Issue / 邮件等方式和作者讨论过。
  • 👀 我的更改经过了良好的测试,并已在上方提供了"验证步骤"和"运行截图"
  • 🤓 我确保没有引入新依赖库(复用已有 jieba + rank-bm25)。
  • 😮 我的更改没有引入恶意代码。
  • ⚠️ 我已认真阅读并理解以上所有内容,确保本次提交符合规范。
  • 🚀 我确保本次开发基于 master 分支,并将代码合并至主分支。

w31r4 added 19 commits March 18, 2026 16:35
…l fallback

- Delete tool_search.py and tool_search_index.py (never committed, removed from disk)
- Remove _init_tool_search_mode, _fallback_to_full_mode, _partition_tools from runner
- Remove tool_search_config parameter from runner reset() signature
- Remove TOOL_CALL_PROMPT_TOOL_SEARCH_MODE constant and all references
- Add fallback: tool_search/auto modes now degrade to full mode with warning log
- Preserve skills_like mode code completely intact
- Update internal.py to accept tool_search/auto as valid config values
- Change tool_search threshold default from 10 to 25
- Update default.py metadata hints with threshold guidance
- Update zh-CN, en-US, ru-RU i18n hints to mention default 25
- All other tool_search config keys preserved unchanged
- 16 test functions covering CAT-01 (immutability), CAT-02 (partition logic), CAT-03 (deterministic ordering)
- Helper functions for FunctionTool, HandoffTool, MCPTool creation
- All tests fail with ModuleNotFoundError (RED state)
- Frozen pydantic dataclass with core_tools and deferred_tools tuples
- _is_core() classifies: HandoffTool always core, builtins core when auto_always_load_builtin=True, MCPTool always deferred, plugins deferred
- from_tool_set() factory with alphabetical sorting and inactive filtering
- get_tool() O(1) lookup via _by_name index built in model_validator
- All 16 tests pass, no regressions
- 4 test classes: TestIndexBuild, TestSearch, TestImmutability, TestMaxResults
- 15 test methods covering IDX-01 through IDX-04 plus edge cases
- Shared corpus of 12 tools across diverse domains for meaningful BM25 IDF
- Tests import from non-existent module (RED phase confirmed)
- Frozen pydantic dataclass matching ToolCatalog immutability pattern
- BM25 index built from name + description + param names + param descriptions
- Tokenization via jieba + shared hit_stopwords.txt (loaded once at module level)
- search() returns ranked (FunctionTool, float) tuples via get_scores() with score > 0 filtering
- max_results parameter (default 5) limits returned results
- Empty corpus, empty query, and small corpus edge cases handled gracefully
- No get_top_n, no inject_into, no loaded_tool_names, no mutable external references
- 6 test classes covering DSC-01, DSC-02, DSC-03, ASM-01, ASM-02, ASM-03
- Tests fail on import (RED) because implementation modules do not exist yet
- TestAppendOnly: add/get, insertion order, len, contains, empty initial
- TestMonotonicAppend: dedup, no remove/clear/pop methods
- TestIndependence: standalone construction, immutable snapshots
- TestAssemblyOrdering: core+search+discovered order, missing tool skip
- TestStablePrefix: identical object references across turns
- TestMonotonicGrowth: prefix invariant across turns, new ToolSet per call
- DiscoveryState: append-only session tracker with O(1) dedup via set+list
- DiscoveryState: add() returns bool, get_discovered_names() returns tuple
- DiscoveryState: no remove/clear/pop methods (monotonic append by design)
- ToolsAssembler: static build_tools() produces ToolSet with stable prefix
- ToolsAssembler: assembly order is core + tool_search + discovered
- ToolsAssembler: missing catalog tools silently skipped (graceful degradation)
- All 21 tests pass (GREEN); no regressions in existing suite
…-04)

- 4 test classes covering registration, structured result, discovery, no-mutation
- 14 test methods with comprehensive coverage of all requirements
- Tests fail with ImportError (RED phase -- implementation not yet created)
- Pydantic @DataClass subclass of FunctionTool with name="tool_search"
- Delegates to ToolSearchIndex.search() for BM25-ranked matches
- Registers discoveries in DiscoveryState.add() (append-only)
- Returns JSON string with query echo, matches, total_found
- Handles empty query and missing index with error JSON
- All 17 unit tests pass (TST-01 through TST-04)
- TestBuildToolSet: GEN-01 build_tool_set returns filtered tools
- TestToolSearchResult: GEN-02 tool_search returns structured JSON
- TestMultiTurnDiscovery: GEN-03 discovered tools appear on next turn only
- TestNoProviderSpecificFields: GEN-04 no provider-specific fields
…rategy

- ToolSearchStrategy ABC with build_tool_set() and get_tool_search_tool()
- GenericToolSearchStrategy wires Phase 2-5 components behind clean interface
- Session-scoped DiscoveryState and ToolSearchTool ownership
- All 13 tests pass covering GEN-01 through GEN-04
- TestBuildToolDicts: CLN-01 full catalog with defer_loading
- TestFormatToolResult: CLN-02 tool_reference content blocks
- TestToolDictsStability: CLN-03 same object identity every call
- TestServerToolBlocks: CLN-04 server block type recognition
- TestABCCompliance: ABC contract satisfaction
- Full catalog serialization with defer_loading: true on deferred tools only
- format_tool_result() converts ToolSearchTool JSON to tool_reference blocks
- build_tool_dicts() returns same pre-computed list on every call (CLN-03)
- is_server_tool_block() recognizes server_tool_use and tool_search_tool_result
- Satisfies ToolSearchStrategy ABC with build_tool_set() and get_tool_search_tool()
- Add TOOL_CALL_PROMPT_TOOL_SEARCH_MODE constant in resources
- Add system prompt branching for tool_search/auto modes
- Pass tool_search_config kwarg to runner.reset()
- Create test file covering all 8 requirements (PRV-01, PRV-02, MOD-01..04, SYS-01, SYS-02)
- Add _is_claude_provider() helper for provider type detection (PRV-01, PRV-02)
- Replace fallback stub with full tool_search init: auto threshold check,
  ToolCatalog partitioning, ToolSearchIndex construction, strategy selection
- Route to ClaudeToolSearchStrategy or GenericToolSearchStrategy based on provider
- Fallback to full mode on no deferred tools (MOD-03) or init exception (MOD-04)
- Add per-turn tool set reassembly via strategy.build_tool_set() before each LLM call
- Append tool_search system prompt only after mode resolves to tool_search (SYS-01, SYS-02)
- All 19 integration tests pass (PRV-01, PRV-02, MOD-01..04, SYS-01, SYS-02)
build_main_agent() was injecting TOOL_CALL_PROMPT_TOOL_SEARCH_MODE before
mode resolution, while runner.reset() injected it again after resolution.
This caused the prompt to appear twice when tool_search active, and once
when auto→full fallback (incorrect — should be absent).

Fix: build_main_agent() now uses the generic TOOL_CALL_PROMPT for
tool_search/auto modes. The runner's post-resolution injection in reset()
is the single correct injection point.
@auto-assign auto-assign bot requested review from Soulter and anka-afk March 18, 2026 08:39
Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @w31r4, your pull request is larger than the review limit of 150000 diff characters

@dosubot dosubot bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Mar 18, 2026
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

此拉取请求旨在通过引入创新的工具搜索模式,显著提升大型语言模型处理复杂任务的能力。通过允许 LLM 动态发现和加载所需工具,它解决了传统工具集成中上下文过载和工具选择效率低下的核心挑战。这一改进不仅优化了资源利用,还通过与主流 LLM 平台(如 Claude 和 OpenAI)的官方设计对齐,确保了解决方案的稳健性和未来兼容性,为用户提供了更智能、更灵活的交互体验。

Highlights

  • 新增工具搜索模式: 引入 tool_search 模式,允许大型语言模型 (LLM) 按需发现和加载工具,有效解决工具数量增长带来的上下文膨胀和选择准确率下降问题。
  • 对齐官方设计规范: 设计上对齐 Claude 和 OpenAI 的官方工具搜索规范,采用三路径架构:Claude 路径、通用路径和 Full/Skills-like 模式。
  • 核心模块与架构: 新增 tool_catalog.py, tool_search_index.py, discovery_state.py, tools_assembler.py, tool_search_tool.py, strategy.py, generic_strategy.py, claude_strategy.py 等模块,构建了工具搜索的核心功能。
  • 关键设计原则: 确保工具目录不可变、搜索索引无状态、发现状态只增不减,以保证工具参数前缀在会话内保持稳定,并支持提供商切换时的策略重建与状态复用。
  • 配置与国际化: 更新了 config/default.py 中的 tool_search 配置项,并对 Dashboard 的国际化文件进行了相应更新,增加了 tool_search 模式的提示文案。
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@dosubot dosubot bot added the area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. label Mar 18, 2026
@dosubot
Copy link
Copy Markdown

dosubot bot commented Mar 18, 2026

Related Documentation

1 document(s) may need updating based on files changed in this PR:

AstrBotTeam's Space

pr4697的改动
View Suggested Changes
@@ -34,6 +34,34 @@
 
 #### 配置说明
 SubAgent 的定义与 Persona 配置一致,需在配置文件中指定 tools、skills、name、description 等。
+
+**工具调用模式配置(tool_schema_mode)**
+
+系统支持以下四种工具调用模式(`provider_settings.tool_schema_mode`):
+
+- **full**:完整参数模式 - 一次性下发所有工具的完整参数定义
+- **skills_like**:两阶段模式 - 先下发工具名称与描述,LLM 选择后再下发完整参数
+- **tool_search**:工具搜索模式 - 对齐 Claude/OpenAI 官方工具搜索设计,仅下发核心工具,LLM 按需通过 tool_search 工具搜索发现更多工具,解决工具数量增长后的上下文膨胀和选择准确率下降问题
+- **auto**:自动选择模式 - 根据工具数量(threshold 阈值)自动选择 full 或 tool_search 模式(默认阈值 25 个工具)
+
+推荐:25 个工具以上建议开启 tool_search 或 auto 模式,以优化上下文使用和工具选择准确率。
+
+**工具搜索配置(tool_search)**
+
+当使用 tool_search 或 auto 模式时,可通过 `provider_settings.tool_search` 配置对象自定义工具搜索行为:
+
+- `threshold`(int,默认 25):工具搜索模式触发阈值,工具总数超过此值时触发工具搜索
+- `max_results`(int,默认 5):每次调用 tool_search 时返回的最大匹配工具数
+- `always_loaded_tools`(list[str],默认 []):始终加载的工具列表,这些工具始终对 LLM 可见
+- `auto_always_load_builtin`(bool,默认 True):是否自动始终加载内置工具(如定时任务、知识库查询等)
+
+**工具搜索架构设计**
+
+tool_search 模式采用三路径架构:
+
+- **Claude 路径**:发送完整工具目录 + `defer_loading: true`,搜索结果以 `tool_reference` 内容块返回,工具参数每轮不变以最大化 prompt cache 命中
+- **通用路径**:本地 BM25 模拟,物理过滤工具参数(仅发送 core + tool_search + 已发现工具),单调递增保持前缀稳定
+- **Full/Skills-like 模式**:完全不受影响
 
 **架构重构(PR #5722)**
 
@@ -469,6 +497,20 @@
 
 #### 逻辑改进
 工具注册和配置加载逻辑已优化,确保子代理配置的正确性和工具的动态注册。FunctionTool 新增 `is_background_task` 属性,支持异步后台任务。
+
+**工具调用模式(tool_schema_mode)**
+
+系统支持以下四种工具调用模式:
+
+- **full**:完整参数模式 - 一次性下发所有工具的完整参数定义
+- **skills_like**:两阶段模式 - 先下发工具名称与描述,LLM 选择后再下发完整参数
+- **tool_search**:工具搜索模式 - 对齐 Claude/OpenAI 官方工具搜索设计,采用三路径架构:
+  * **Claude 路径**:发送完整工具目录 + `defer_loading: true`,搜索结果以 `tool_reference` 内容块返回,工具参数每轮不变以最大化 prompt cache 命中
+  * **通用路径**:本地 BM25 模拟,物理过滤工具参数(仅发送 core + tool_search + 已发现工具),单调递增保持前缀稳定
+  * **Full/Skills-like 模式**:完全不受影响
+- **auto**:自动模式 - 根据工具数量(threshold 阈值)自动选择 full 或 tool_search
+
+tool_search 模式设计用于解决工具数量增长后的上下文膨胀和工具选择准确率下降问题,通过允许 LLM 按需发现工具而非一次性加载所有工具。推荐在工具数量超过 25 个时使用 tool_search 或 auto 模式。
 
 #### MCP 客户端初始化(PR #5993)
 

[Accept] [Decline]

Note: You must be authenticated to accept/decline updates.

How did I do? Any feedback?  Join Discord

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

此拉取请求引入了工具搜索模式,旨在解决工具数量增长带来的上下文膨胀和选择准确率下降问题。设计上与 Claude 和 OpenAI 的官方工具搜索规范对齐,采用了三路径架构:Claude 路径、通用路径和 Full/Skills-like 模式。新增了 ToolCatalogToolSearchIndexDiscoveryStateToolsAssemblerToolSearchToolToolSearchStrategy 等模块,并修改了现有文件以集成新功能。测试覆盖率良好,核心设计要点清晰,确保了工具参数前缀的稳定性、Provider 切换时的策略重建和已发现工具的复用。总体而言,这是一项经过深思熟虑且实现良好的功能。

Comment on lines +238 to +247
except Exception:
logger.warning(
"tool_search initialization failed; falling back to 'full' mode.",
exc_info=True,
)
effective_mode = "full"
self._tool_search_catalog = None
self._tool_search_index = None
self._tool_search_discovery_state = None
self._tool_search_strategy = None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

try...except Exception 块过于宽泛,可能会捕获并隐藏初始化过程中发生的特定错误,导致调试困难。建议捕获更具体的异常类型,或者至少在 except 块中记录完整的堆栈信息,以便更好地诊断问题。

@w31r4 w31r4 changed the title feat: 实现 tool search tool 模式 — 对齐 Claude/OpenAI 官方工具搜索设计 darft: 实现 tool search tool 模式 — 对齐 Claude/OpenAI 官方工具搜索设计 Mar 18, 2026
@RC-CHN RC-CHN changed the title darft: 实现 tool search tool 模式 — 对齐 Claude/OpenAI 官方工具搜索设计 draft: 实现 tool search tool 模式 — 对齐 Claude/OpenAI 官方工具搜索设计 Mar 18, 2026
@RC-CHN RC-CHN marked this pull request as draft March 18, 2026 08:52
@Soulter Soulter force-pushed the master branch 2 times, most recently from faf411f to 0068960 Compare April 19, 2026 09:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant