A Precision-driven Tool Recommendation (PTR) system for filtering MCP (Model Context Protocol) tools based on conversation context. Fetch only relevant tool for the ongoing conversation and save cost while increasing the precision of your LLM Response.
Developed by OppieAI
Watch the full explanation of how ToolsFilter works and its impact on LLM performance
- 🎥 Explainer Video
- Why?
- Solution
- Architecture
- Performance
- Quick Start
- Usage Example
- API Endpoints
- Development
- Configuration
- Documentation
- References
- License
Modern LLMs with access to large tool suites face a critical performance degradation issue: the more tools available, the lower the accuracy becomes. This phenomenon is well-documented in research and practical implementations:
Recent studies using MCPGauge evaluated six commercial LLMs with 30 MCP tool suites and revealed alarming findings:
- 9.5% accuracy drop on average when LLMs have automated access to MCP tools
- 3.25× to 236.5× increase in input token volume, creating massive computational overhead
- "Non-trivial friction" between retrieved context and the model's internal reasoning
- Models struggle with instruction compliance when too many tools are available
This accuracy degradation with increased tool count is demonstrated in this analysis video, showing how model performance deteriorates as more tools are introduced.
- 🚀 Multi-Stage Search Pipeline: Semantic + BM25 + Cross-Encoder + LTR ranking
- 🎯 High-Performance Results: Perfect P@1 and MRR across all search strategies
- 🧠 Learning-to-Rank: XGBoost model with 46+ engineered features (NDCG@10: 0.975)
- 🔧 OpenAI Function Calling Compatible: Flat tool structure following OpenAI specification
- ⚡ Multiple Embedding Providers: Voyage AI, OpenAI, Cohere with automatic fallback
- 💾 Intelligent Multi-Layer Caching: Redis for queries, results, and tool indices
- 🎯 Qdrant Vector Database: High-performance vector search with model-specific collections
- 📊 Comprehensive Evaluation: Built-in framework with F1, MRR, NDCG@k metrics
- 🔄 Message Format Compatibility: Claude and OpenAI conversation formats
- 📝 Collection Metadata Tracking: Model versioning and automatic dimension handling
- 🔁 Robust Fallback Mechanisms: Secondary embedding models and graceful degradation
Instead of overwhelming your LLM with 100+ tools, get precisely the 3-5 most relevant ones:
- Before: 236× token overhead, 9.5% accuracy loss
- After: 95%+ precision, perfect recall on relevant tools, minimal token usage
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ FastAPI App │────▶│ Message Parser │────▶│ Search Pipeline │
└────────┬────────┘ └──────────────────┘ └─────────┬───────┘
│ │
│ ┌─────────────────────────────┼─────────┐
│ │ │ │ │
┌────────▼────────┐ ┌──────▼─────┐ ┌─────────▼────────┐ │ ┌───────▼──────┐
│ Redis Cache │ │ Embedding │ │ Qdrant Vector │ │ │ LTR Reranker │
│ │ │ Service │ │ Database │ │ │ (XGBoost) │
│ • Query Cache │ │ (LiteLLM) │ │ │ │ │ │
│ • Results Cache │ │ • Voyage │ │ • Semantic Search│ │ │ • 46 Features│
│ • Tool Index │ │ • OpenAI │ │ • BM25 Hybrid │ │ │ • NDCG@10 Opt│
└─────────────────┘ │ • Fallback │ │ • Cross-Encoder │ │ │ │
└────────────┘ └──────────────────┘ │ └──────────────┘
│
┌───────────────────────┘
│
┌──────▼──────┐
│ Multi-Stage │
│ Filtering │
│ │
│ 1. Semantic │
│ 2. BM25 │
│ 3. Rerank │
│ 4. LTR │
└─────────────┘
- semantic_only: Pure vector similarity search
- hybrid_basic: BM25 + semantic search combination
- hybrid_cross_encoder: + Cross-encoder reranking
- hybrid_ltr_full: + Learning-to-Rank optimization
Search Strategy Comparison: (With 300+ noise (Genuine APIs) tools to resemble real-world)
Strategy | F1 Score | MRR | P@1 | NDCG@10 | Best For |
---|---|---|---|---|---|
hybrid_basic | 0.359 ⭐ | 1.000 | 1.000 | 0.975 ⭐ | General-purpose, balanced performance |
semantic_only | 0.328 | 1.000 ⭐ | 1.000 ⭐ | 0.870 | Simple queries, exact matches |
hybrid_cross_encoder | 0.359 | 1.000 | 1.000 | 0.964 | Complex queries requiring reranking |
hybrid_ltr_full | 0.359 | 1.000 | 1.000 | 0.942 | Learning-based optimization |
⭐ = Best performer for that metric
Key Achievements:
- Perfect Precision@1: All strategies achieve 1.000 P@1
- Perfect MRR: All strategies achieve 1.000 Mean Reciprocal Rank
- Strong NDCG Performance: Up to 0.975 NDCG@10 with hybrid_basic
- Consistent F1 Scores: 0.328-0.359 across different approaches
Learning-to-Rank Training Results:
- Cross-Validation NDCG@10: 0.9167 ± 0.0567
- Training Data: 18,354 samples with 46 features
- Top Features: action_alignment (32.7%), query_type_analyze (33.9%), exact_name_match (19.5%)
- Training Speed: <5 seconds with XGBoost
✅ Completed:
Pre-index all tools on startup- Implemented vector store cachingImplement connection pooling- Added Redis and Qdrant connection poolingAdd batch embedding generation- Optimized embedding pipelineOptimize vector search parameters- Tuned similarity thresholds
🎯 In Progress:
- Improve LTR model with better class balancing
- Enhance feature engineering for interaction signals
- Optimize NDCG@5 performance for top-precision use cases
- Python 3.11+
- Docker and Docker Compose
- API keys for embedding providers (Voyage AI, OpenAI, or Cohere)
- Clone the repository:
git clone https://github.com/yourusername/ToolsFilter.git
cd ToolsFilter
- Create a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
- Install dependencies:
pip install -r requirements.txt
- Copy environment variables:
cp .env.example .env
- Edit
.env
and add your API keys:
# Embedding Service Keys (at least one required)
VOYAGE_API_KEY=your_voyage_api_key
OPENAI_API_KEY=your_openai_api_key # Optional fallback
COHERE_API_KEY=your_cohere_api_key # Optional
# Important: Include provider prefix in model names
PRIMARY_EMBEDDING_MODEL=voyage/voyage-2
FALLBACK_EMBEDDING_MODEL=openai/text-embedding-3-small
# Start all services including the API
make up
# Or manually:
docker-compose up -d
# View logs
make logs
# Stop services
make down
# Start in development mode
make up-dev
# Or manually:
docker-compose -f docker-compose.yml -f docker-compose.dev.yml up
- Start only Qdrant and Redis:
docker-compose up -d qdrant redis
- Run the API:
python -m src.api.main
The API will be available at http://localhost:8000
Once running, visit:
- Swagger UI:
http://localhost:8000/docs
- ReDoc:
http://localhost:8000/redoc
import requests
# Filter tools based on conversation
response = requests.post(
"http://localhost:8000/api/v1/tools/filter",
json={
"messages": [
{"role": "user", "content": "I need to search for Python files in the project"}
],
"available_tools": [
{
"type": "function",
"name": "grep",
"description": "Search for patterns in files",
"parameters": {
"type": "object",
"properties": {
"pattern": {"type": "string", "description": "Search pattern"}
},
"required": ["pattern"]
},
"strict": true
},
{
"type": "function",
"name": "find",
"description": "Find files by name",
"parameters": {
"type": "object",
"properties": {
"name": {"type": "string", "description": "File name pattern"}
},
"required": ["name"]
},
"strict": true
}
]
}
)
print(response.json())
# {
# "recommended_tools": [
# {"tool_name": "find", "confidence": 0.95},
# {"tool_name": "grep", "confidence": 0.85}
# ],
# "metadata": {"processing_time_ms": 42}
# }
POST /api/v1/tools/filter
- Filter tools based on conversation contextGET /api/v1/tools/search
- Search tools by text queryPOST /api/v1/tools/register
- Register new tools (for batch indexing)GET /api/v1/tools/info
- Get information about indexed toolsGET /api/v1/collections
- List all vector store collections with metadataGET /health
- Health check endpoint
{
"recommended_tools": [
{
"tool_name": "find",
"confidence": 0.85,
"reasoning": "High relevance to file search operations"
}
],
"metadata": {
"processing_time_ms": 45.2,
"embedding_model": "voyage/voyage-2",
"total_tools_analyzed": 20,
"conversation_messages": 3,
"request_id": "uuid-here",
"conversation_patterns": ["file_search", "code_analysis"]
}
}
# Run unit tests
pytest tests/ -v
# Run comprehensive evaluation with all strategies
docker exec ptr_api python -m src.evaluation.run_evaluation
# Run strategy comparison
docker exec ptr_api python -m src.evaluation.evaluation_framework.comparison
# Train LTR model
docker exec ptr_api python -m src.scripts.train_ltr
# Run ToolBench evaluation
docker exec ptr_api python -m src.evaluation.toolbench_evaluator
# Run simple API test
python test_api.py
Refer to the latest comparison report: evaluation_results/comparison_20250823_153715.markdown
Key findings:
- hybrid_basic strategy performs best overall (F1: 0.359, NDCG@10: 0.975)
- All strategies achieve perfect P@1 and MRR (1.000)
- LTR model shows consistent performance with cross-validation NDCG@10: 0.6167 ± 0.0567
# Linting
ruff check src/
# Type checking
mypy src/
# Formatting
black src/
# Start the load test UI
locust -f tests/load_test.py
Key configuration options in .env
:
PRIMARY_EMBEDDING_MODEL
: Main embedding model (default: voyage-2)FALLBACK_EMBEDDING_MODEL
: Fallback model (default: text-embedding-3-small)MAX_TOOLS_TO_RETURN
: Maximum tools to return (default: 10)SIMILARITY_THRESHOLD
: Minimum similarity score (default: 0.7)
The system automatically creates model-specific collections to handle different embedding dimensions:
- Collections are named as:
tools_<model_name>
(e.g.,tools_voyage_voyage_3
) - Each collection stores metadata including model name, dimension, and creation time
- Switching between models is seamless - the system will use the appropriate collection
- Use the
/api/v1/collections
endpoint to view all collections
Important: When changing embedding models, you'll need to re-index your tools as embeddings from different models are not compatible.
The system supports automatic fallback to a secondary embedding model when the primary model fails:
- Configure
FALLBACK_EMBEDDING_MODEL
in your.env
file - Separate vector store collections are maintained for each model
- On primary model failure (e.g., rate limits, API errors), requests automatically use the fallback
- The
embedding_model
field in responses indicates which model was used - Both models must be properly configured with valid API keys
See the /documentation
directory for:
Inspired by PTR Paper
This project uses a dual licensing model:
- Non-Commercial Use: Free for research, education, and personal projects
- Commercial Use: Requires a separate commercial license
See LICENSE for full terms.
For commercial licensing, contact: sales@oppie.ai