Skip to content

feat(tools): dynamic tool schema filtering to reduce context waste and improve selection accuracy (#2020)#2026

Merged
bug-ops merged 2 commits intomainfrom
dynamic-tool-schema-filtering
Mar 20, 2026
Merged

feat(tools): dynamic tool schema filtering to reduce context waste and improve selection accuracy (#2020)#2026
bug-ops merged 2 commits intomainfrom
dynamic-tool-schema-filtering

Conversation

@bug-ops
Copy link
Owner

@bug-ops bug-ops commented Mar 20, 2026

Summary

Implement dynamic tool schema filtering based on arxiv 2411.15399 ("Less is More: Better Reasoning for Language Models with Fewer Tools"). Filters which tool schemas are exposed to the LLM on a per-turn basis, reducing context waste and improving tool selection accuracy.

Implementation

  • New module: zeph-tools/src/schema_filter.rs with ToolSchemaFilter, InclusionReason enum, cosine similarity ranking
  • Config-driven: [agent.tool_filter] section with enabled, top_k, always_on, min_description_words
  • Ranking strategy: Top-K by similarity (default 6) + always-on tools + name mentions + short MCP descriptions
  • Dynamic filtering: Iteration 0 uses filtered set, iterations 1+ use full set (prevents tool starvation)
  • Graceful degradation: Embedding failure → all tools included
  • Startup: Embeds all tool descriptions at initialization via maybe_init_tool_schema_filter()

Token Savings

  • 30-60% reduction in tool schema tokens per request (tested at 20+ tools)
  • Scales linearly with tool count: at 50 tools ~76% reduction

Quality

  • ✅ 5956 tests pass (0 failures)
  • ✅ Clippy clean (zero warnings, --features full)
  • ✅ All validators approved (security, performance, testing, code review)
  • ✅ Spec-compliant: all 3 critical issues from architecture review resolved
  • ✅ Disabled by default (`enabled: false`) — opt-in until empirically validated
  • ✅ Backward compatible: old configs work without `[agent.tool_filter]`

Related

Commits

  • `6ca9cd71` feat(tools): add dynamic tool schema filtering
  • `2f871977` fix(tools): address validation findings

Test Plan

  • Unit tests: 14 schema_filter + 8 cosine_similarity tests
  • Integration: context builder → native executor
  • Regression: no failures, feature disabled by default
  • Live: enable config, observe token reduction

All validators signed off. Ready to merge.

bug-ops added 2 commits March 20, 2026 12:01
Implement embedding-based tool schema filtering that reduces the number
of tool definitions sent to the LLM per turn. Only the most relevant
tools are selected based on cosine similarity between the user query
and pre-computed tool description embeddings.

Key design decisions:
- Single `compute_filtered_tool_ids()` call per turn in rebuild_system_prompt()
- Cached result consumed by native tool path (iteration 0 only, full set for 1+)
- Config-driven always-on classification (no ToolDef struct changes)
- Rank-based top-K selection (model-agnostic, no threshold tuning)
- Pre-filters: tool name string matching + short MCP description auto-include
- InclusionReason enum for observability (AlwaysOn/NameMentioned/SimilarityRank/ShortDescription)
- Disabled by default (enabled=false), opt-in via [agent.tool_filter]
- Prompt-path providers skip filtering (preserves stable cache block)
- Graceful degradation when embedding is unsupported or fails

Config section:
  [agent.tool_filter]
  enabled = false
  top_k = 6
  always_on = ["memory_search", "memory_save", "load_skill", "bash", "read", "edit"]
  min_description_words = 5
)

- Extract cosine_similarity to zeph-common::math (DRY: single canonical
  source used by zeph-tools and zeph-memory via re-export)
- Add maybe_init_tool_schema_filter() async builder and wire into all 3
  entry points (runner, daemon, acp) so the filter is actually initialized
- Switch find_mentioned_tool_ids to word-boundary-aware matching to prevent
  false positives (e.g. "read" no longer matches inside "thread")
- Add NoEmbedding fallback: tools without cached embeddings (e.g. MCP tools
  added after startup) are auto-included instead of silently dropped
@bug-ops bug-ops added tools Tool execution and MCP integration research Research-driven improvement labels Mar 20, 2026
@github-actions github-actions bot added documentation Improvements or additions to documentation memory zeph-memory crate (SQLite) rust Rust code changes core zeph-core crate enhancement New feature or request size/XL Extra large PR (500+ lines) labels Mar 20, 2026
@bug-ops bug-ops merged commit 4e2f0f1 into main Mar 20, 2026
25 checks passed
@bug-ops bug-ops deleted the dynamic-tool-schema-filtering branch March 20, 2026 11:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core zeph-core crate documentation Improvements or additions to documentation enhancement New feature or request memory zeph-memory crate (SQLite) research Research-driven improvement rust Rust code changes size/XL Extra large PR (500+ lines) tools Tool execution and MCP integration

Projects

None yet

Development

Successfully merging this pull request may close these issues.

research(tools): dynamic tool schema filtering to reduce context waste and improve selection accuracy

1 participant