draft: 实现 tool search tool 模式 — 对齐 Claude/OpenAI 官方工具搜索设计#6557
draft: 实现 tool search tool 模式 — 对齐 Claude/OpenAI 官方工具搜索设计#6557w31r4 wants to merge 19 commits intoAstrBotDevs:masterfrom
Conversation
…l fallback - Delete tool_search.py and tool_search_index.py (never committed, removed from disk) - Remove _init_tool_search_mode, _fallback_to_full_mode, _partition_tools from runner - Remove tool_search_config parameter from runner reset() signature - Remove TOOL_CALL_PROMPT_TOOL_SEARCH_MODE constant and all references - Add fallback: tool_search/auto modes now degrade to full mode with warning log - Preserve skills_like mode code completely intact - Update internal.py to accept tool_search/auto as valid config values
- Change tool_search threshold default from 10 to 25 - Update default.py metadata hints with threshold guidance - Update zh-CN, en-US, ru-RU i18n hints to mention default 25 - All other tool_search config keys preserved unchanged
- 16 test functions covering CAT-01 (immutability), CAT-02 (partition logic), CAT-03 (deterministic ordering) - Helper functions for FunctionTool, HandoffTool, MCPTool creation - All tests fail with ModuleNotFoundError (RED state)
- Frozen pydantic dataclass with core_tools and deferred_tools tuples - _is_core() classifies: HandoffTool always core, builtins core when auto_always_load_builtin=True, MCPTool always deferred, plugins deferred - from_tool_set() factory with alphabetical sorting and inactive filtering - get_tool() O(1) lookup via _by_name index built in model_validator - All 16 tests pass, no regressions
- 4 test classes: TestIndexBuild, TestSearch, TestImmutability, TestMaxResults - 15 test methods covering IDX-01 through IDX-04 plus edge cases - Shared corpus of 12 tools across diverse domains for meaningful BM25 IDF - Tests import from non-existent module (RED phase confirmed)
- Frozen pydantic dataclass matching ToolCatalog immutability pattern - BM25 index built from name + description + param names + param descriptions - Tokenization via jieba + shared hit_stopwords.txt (loaded once at module level) - search() returns ranked (FunctionTool, float) tuples via get_scores() with score > 0 filtering - max_results parameter (default 5) limits returned results - Empty corpus, empty query, and small corpus edge cases handled gracefully - No get_top_n, no inject_into, no loaded_tool_names, no mutable external references
- 6 test classes covering DSC-01, DSC-02, DSC-03, ASM-01, ASM-02, ASM-03 - Tests fail on import (RED) because implementation modules do not exist yet - TestAppendOnly: add/get, insertion order, len, contains, empty initial - TestMonotonicAppend: dedup, no remove/clear/pop methods - TestIndependence: standalone construction, immutable snapshots - TestAssemblyOrdering: core+search+discovered order, missing tool skip - TestStablePrefix: identical object references across turns - TestMonotonicGrowth: prefix invariant across turns, new ToolSet per call
- DiscoveryState: append-only session tracker with O(1) dedup via set+list - DiscoveryState: add() returns bool, get_discovered_names() returns tuple - DiscoveryState: no remove/clear/pop methods (monotonic append by design) - ToolsAssembler: static build_tools() produces ToolSet with stable prefix - ToolsAssembler: assembly order is core + tool_search + discovered - ToolsAssembler: missing catalog tools silently skipped (graceful degradation) - All 21 tests pass (GREEN); no regressions in existing suite
…-04) - 4 test classes covering registration, structured result, discovery, no-mutation - 14 test methods with comprehensive coverage of all requirements - Tests fail with ImportError (RED phase -- implementation not yet created)
- Pydantic @DataClass subclass of FunctionTool with name="tool_search" - Delegates to ToolSearchIndex.search() for BM25-ranked matches - Registers discoveries in DiscoveryState.add() (append-only) - Returns JSON string with query echo, matches, total_found - Handles empty query and missing index with error JSON - All 17 unit tests pass (TST-01 through TST-04)
- TestBuildToolSet: GEN-01 build_tool_set returns filtered tools - TestToolSearchResult: GEN-02 tool_search returns structured JSON - TestMultiTurnDiscovery: GEN-03 discovered tools appear on next turn only - TestNoProviderSpecificFields: GEN-04 no provider-specific fields
…rategy - ToolSearchStrategy ABC with build_tool_set() and get_tool_search_tool() - GenericToolSearchStrategy wires Phase 2-5 components behind clean interface - Session-scoped DiscoveryState and ToolSearchTool ownership - All 13 tests pass covering GEN-01 through GEN-04
- TestBuildToolDicts: CLN-01 full catalog with defer_loading - TestFormatToolResult: CLN-02 tool_reference content blocks - TestToolDictsStability: CLN-03 same object identity every call - TestServerToolBlocks: CLN-04 server block type recognition - TestABCCompliance: ABC contract satisfaction
- Full catalog serialization with defer_loading: true on deferred tools only - format_tool_result() converts ToolSearchTool JSON to tool_reference blocks - build_tool_dicts() returns same pre-computed list on every call (CLN-03) - is_server_tool_block() recognizes server_tool_use and tool_search_tool_result - Satisfies ToolSearchStrategy ABC with build_tool_set() and get_tool_search_tool()
- Add TOOL_CALL_PROMPT_TOOL_SEARCH_MODE constant in resources - Add system prompt branching for tool_search/auto modes - Pass tool_search_config kwarg to runner.reset() - Create test file covering all 8 requirements (PRV-01, PRV-02, MOD-01..04, SYS-01, SYS-02)
- Add _is_claude_provider() helper for provider type detection (PRV-01, PRV-02) - Replace fallback stub with full tool_search init: auto threshold check, ToolCatalog partitioning, ToolSearchIndex construction, strategy selection - Route to ClaudeToolSearchStrategy or GenericToolSearchStrategy based on provider - Fallback to full mode on no deferred tools (MOD-03) or init exception (MOD-04) - Add per-turn tool set reassembly via strategy.build_tool_set() before each LLM call - Append tool_search system prompt only after mode resolves to tool_search (SYS-01, SYS-02) - All 19 integration tests pass (PRV-01, PRV-02, MOD-01..04, SYS-01, SYS-02)
build_main_agent() was injecting TOOL_CALL_PROMPT_TOOL_SEARCH_MODE before mode resolution, while runner.reset() injected it again after resolution. This caused the prompt to appear twice when tool_search active, and once when auto→full fallback (incorrect — should be absent). Fix: build_main_agent() now uses the generic TOOL_CALL_PROMPT for tool_search/auto modes. The runner's post-resolution injection in reset() is the single correct injection point.
There was a problem hiding this comment.
Sorry @w31r4, your pull request is larger than the review limit of 150000 diff characters
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! 此拉取请求旨在通过引入创新的工具搜索模式,显著提升大型语言模型处理复杂任务的能力。通过允许 LLM 动态发现和加载所需工具,它解决了传统工具集成中上下文过载和工具选择效率低下的核心挑战。这一改进不仅优化了资源利用,还通过与主流 LLM 平台(如 Claude 和 OpenAI)的官方设计对齐,确保了解决方案的稳健性和未来兼容性,为用户提供了更智能、更灵活的交互体验。 Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
|
Related Documentation 1 document(s) may need updating based on files changed in this PR: AstrBotTeam's Space pr4697的改动View Suggested Changes@@ -34,6 +34,34 @@
#### 配置说明
SubAgent 的定义与 Persona 配置一致,需在配置文件中指定 tools、skills、name、description 等。
+
+**工具调用模式配置(tool_schema_mode)**
+
+系统支持以下四种工具调用模式(`provider_settings.tool_schema_mode`):
+
+- **full**:完整参数模式 - 一次性下发所有工具的完整参数定义
+- **skills_like**:两阶段模式 - 先下发工具名称与描述,LLM 选择后再下发完整参数
+- **tool_search**:工具搜索模式 - 对齐 Claude/OpenAI 官方工具搜索设计,仅下发核心工具,LLM 按需通过 tool_search 工具搜索发现更多工具,解决工具数量增长后的上下文膨胀和选择准确率下降问题
+- **auto**:自动选择模式 - 根据工具数量(threshold 阈值)自动选择 full 或 tool_search 模式(默认阈值 25 个工具)
+
+推荐:25 个工具以上建议开启 tool_search 或 auto 模式,以优化上下文使用和工具选择准确率。
+
+**工具搜索配置(tool_search)**
+
+当使用 tool_search 或 auto 模式时,可通过 `provider_settings.tool_search` 配置对象自定义工具搜索行为:
+
+- `threshold`(int,默认 25):工具搜索模式触发阈值,工具总数超过此值时触发工具搜索
+- `max_results`(int,默认 5):每次调用 tool_search 时返回的最大匹配工具数
+- `always_loaded_tools`(list[str],默认 []):始终加载的工具列表,这些工具始终对 LLM 可见
+- `auto_always_load_builtin`(bool,默认 True):是否自动始终加载内置工具(如定时任务、知识库查询等)
+
+**工具搜索架构设计**
+
+tool_search 模式采用三路径架构:
+
+- **Claude 路径**:发送完整工具目录 + `defer_loading: true`,搜索结果以 `tool_reference` 内容块返回,工具参数每轮不变以最大化 prompt cache 命中
+- **通用路径**:本地 BM25 模拟,物理过滤工具参数(仅发送 core + tool_search + 已发现工具),单调递增保持前缀稳定
+- **Full/Skills-like 模式**:完全不受影响
**架构重构(PR #5722)**
@@ -469,6 +497,20 @@
#### 逻辑改进
工具注册和配置加载逻辑已优化,确保子代理配置的正确性和工具的动态注册。FunctionTool 新增 `is_background_task` 属性,支持异步后台任务。
+
+**工具调用模式(tool_schema_mode)**
+
+系统支持以下四种工具调用模式:
+
+- **full**:完整参数模式 - 一次性下发所有工具的完整参数定义
+- **skills_like**:两阶段模式 - 先下发工具名称与描述,LLM 选择后再下发完整参数
+- **tool_search**:工具搜索模式 - 对齐 Claude/OpenAI 官方工具搜索设计,采用三路径架构:
+ * **Claude 路径**:发送完整工具目录 + `defer_loading: true`,搜索结果以 `tool_reference` 内容块返回,工具参数每轮不变以最大化 prompt cache 命中
+ * **通用路径**:本地 BM25 模拟,物理过滤工具参数(仅发送 core + tool_search + 已发现工具),单调递增保持前缀稳定
+ * **Full/Skills-like 模式**:完全不受影响
+- **auto**:自动模式 - 根据工具数量(threshold 阈值)自动选择 full 或 tool_search
+
+tool_search 模式设计用于解决工具数量增长后的上下文膨胀和工具选择准确率下降问题,通过允许 LLM 按需发现工具而非一次性加载所有工具。推荐在工具数量超过 25 个时使用 tool_search 或 auto 模式。
#### MCP 客户端初始化(PR #5993)
Note: You must be authenticated to accept/decline updates. |
There was a problem hiding this comment.
Code Review
此拉取请求引入了工具搜索模式,旨在解决工具数量增长带来的上下文膨胀和选择准确率下降问题。设计上与 Claude 和 OpenAI 的官方工具搜索规范对齐,采用了三路径架构:Claude 路径、通用路径和 Full/Skills-like 模式。新增了 ToolCatalog、ToolSearchIndex、DiscoveryState、ToolsAssembler、ToolSearchTool 和 ToolSearchStrategy 等模块,并修改了现有文件以集成新功能。测试覆盖率良好,核心设计要点清晰,确保了工具参数前缀的稳定性、Provider 切换时的策略重建和已发现工具的复用。总体而言,这是一项经过深思熟虑且实现良好的功能。
| except Exception: | ||
| logger.warning( | ||
| "tool_search initialization failed; falling back to 'full' mode.", | ||
| exc_info=True, | ||
| ) | ||
| effective_mode = "full" | ||
| self._tool_search_catalog = None | ||
| self._tool_search_index = None | ||
| self._tool_search_discovery_state = None | ||
| self._tool_search_strategy = None |
faf411f to
0068960
Compare
实现 tool_search 工具搜索模式,允许 LLM 按需发现和加载工具,解决工具数量增长后的上下文膨胀和选择准确率下降问题。
设计对齐 Claude tool search 和 OpenAI tool search 官方规范,采用三路径架构:
defer_loading: true,搜索结果以tool_reference内容块返回,工具参数每轮不变以最大化 prompt cache 命中Modifications / 改动点
新增模块(
astrbot/core/tools/):tool_catalog.py— 不可变 ToolCatalog,将工具分为 core(内置/HandoffTool/用户配置)和 deferred 两组tool_search_index.py— 无状态 BM25 搜索索引(jieba + rank-bm25),搜索范围覆盖工具名、描述、参数名、参数描述discovery_state.py— 会话级工具发现状态追踪(只增不减),独立于消息历史,不受上下文压缩影响tools_assembler.py— 无状态工具参数组装器,保证 core → tool_search → discovered 的稳定排序tool_search_tool.py— LLM 可调用的 FunctionTool,返回结构化 JSON 搜索结果strategy.py— ToolSearchStrategy 抽象基类(两方法接口)generic_strategy.py— 通用路径策略,物理过滤 + JSON 文本结果claude_strategy.py— Claude 路径策略,defer_loading + tool_reference 内容块 + schema override修改文件:
tool_loop_agent_runner.py— 替换临时 fallback 为完整 tool_search 初始化(provider 检测 → catalog 构建 → 策略选择 → 工具组装 → prompt 注入),支持 provider fallback 时重建策略并复用已发现工具astr_main_agent.py— tool_search/auto 模式的 system prompt 组装(由 runner 在 mode resolution 后注入,避免重复注入)astr_main_agent_resources.py— 新增TOOL_CALL_PROMPT_TOOL_SEARCH_MODE常量tool.py— 支持 schema override,Claude 路径的 Anthropic 专用序列化挂载点anthropic_source.py— tool_search 结果转换为tool_reference内容块config/default.py— threshold 默认值 10→25(避免 subagent 误触发),更新配置描述internal.py— tool_search/auto 保留为合法配置值核心设计要点:
Screenshots or Test Results / 运行截图或测试结果
测试覆盖:
Checklist / 检查清单