An intelligent Model Context Protocol (MCP) server that automatically routes queries to the most suitable AI model based on task requirements, cost constraints, and performance characteristics.
- Intelligent Routing: Automatically analyzes queries to determine task type (coding, analysis, creative writing, etc.)
- Cost Optimization: Recommends models based on budget constraints and cost-per-token
- Performance Tiers: Supports premium, standard, fast, and budget model tiers
- Multi-Provider: Includes models from OpenAI, Anthropic, Google, and open-source options
- Flexible Priorities: Optimize for cost, performance, speed, or balanced approach
- Model Comparison: Side-by-side comparison of different models
- Cost Estimation: Calculate estimated costs before running queries
| Model | Provider | Tier | Cost/1K Tokens | Strengths | Vision | Functions |
|---|---|---|---|---|---|---|
| GPT-5 | OpenAI | Premium | $0.050 | Reasoning, coding, analysis, math, creative | ✅ | ✅ |
| Claude Opus 4.1 | Anthropic | Premium | $0.015 | Reasoning, analysis, creative, coding, math | ✅ | ✅ |
| Claude Sonnet 4.5 | Anthropic | Premium | $0.003 | Coding, reasoning, analysis, creative, chat | ✅ | ✅ |
| Gemini 2.5 Pro | Premium | $0.00375 | Reasoning, coding, analysis, math, creative | ✅ | ✅ |
| Model | Provider | Tier | Cost/1K Tokens | Strengths | Vision | Functions |
|---|---|---|---|---|---|---|
| GPT-4 | OpenAI | Premium | $0.030 | Reasoning, coding, analysis, math | ❌ | ✅ |
| GPT-3.5 Turbo | OpenAI | Fast | $0.002 | Chat, summarization, translation | ❌ | ✅ |
| Claude 3 Opus | Anthropic | Premium | $0.015 | Reasoning, analysis, creative, coding | ✅ | ❌ |
| Claude 3 Sonnet | Anthropic | Standard | $0.003 | Coding, analysis, chat | ✅ | ❌ |
| Claude 3 Haiku | Anthropic | Fast | $0.00025 | Chat, summarization, fast responses | ❌ | ❌ |
| Gemini Pro | Standard | $0.00125 | Reasoning, coding, analysis | ❌ | ❌ | |
| Llama 2 70B | Meta | Budget | $0.0008 | Chat, coding, summarization | ❌ | ❌ |
- Install dependencies:
pip install -r requirements.txt- Make the script executable:
chmod +x multi_model_orchestrator.pyAdd to your claude_desktop_config.json:
{
"mcpServers": {
"multi-model-orchestrator": {
"command": "python",
"args": [
"/path/to/multi_model_orchestrator.py"
]
}
}
}Add to your MCP settings:
{
"mcp.servers": {
"multi-model-orchestrator": {
"command": "python",
"args": ["/path/to/multi_model_orchestrator.py"]
}
}
}Get AI model recommendations based on your query and requirements.
Parameters:
query(required): The user query or task descriptionpriority(optional): What to optimize for - "balanced", "cost", "performance", or "speed" (default: "balanced")max_cost_per_1k(optional): Maximum acceptable cost per 1k tokens
Example:
{
"query": "Write a complex Python function to optimize database queries",
"priority": "performance"
}Response:
{
"analysis": {
"task_type": "coding",
"estimated_tokens": 150,
"complexity": "high",
"requires_vision": false,
"requires_function_calling": false
},
"recommendation": {
"recommended_model": "claude-3-opus",
"provider": "Anthropic",
"tier": "premium",
"estimated_cost_per_1k": 0.015,
"strengths": ["reasoning", "analysis", "creative", "coding"],
"reason": "optimized for coding, premium tier performance",
"alternatives": [...]
}
}Compare multiple AI models side by side.
Parameters:
models(required): Array of model names to compare
Example:
{
"models": ["gpt-4", "claude-3-opus", "claude-3-sonnet"]
}Analyze a query without making a recommendation.
Parameters:
query(required): The query to analyze
Example:
{
"query": "Translate this document from English to Spanish"
}Filter models by specific criteria.
Parameters:
task_type(optional): Filter by task typetier(optional): Filter by performance tiermax_cost(optional): Maximum cost per 1k tokensrequires_vision(optional): Requires vision capabilities
Example:
{
"task_type": "coding",
"max_cost": 0.01,
"tier": "standard"
}Calculate the estimated cost for running a query.
Parameters:
model(required): Model nameinput_tokens(required): Estimated input tokensoutput_tokens(required): Estimated output tokens
Example:
{
"model": "claude-3-sonnet",
"input_tokens": 500,
"output_tokens": 1000
}# Query: "Summarize this article in 3 bullet points"
# Priority: cost
# Result: claude-3-haiku (lowest cost, optimized for summarization)# Query: "Analyze this codebase and suggest architectural improvements"
# Priority: performance
# Result: gpt-4 or claude-3-opus (premium tier, strong reasoning)# Query: "What's the weather like?"
# Priority: speed
# Result: gpt-3.5-turbo or claude-3-haiku (fast response)# Query: "Write a blog post about AI"
# Priority: balanced
# max_cost_per_1k: 0.005
# Result: claude-3-sonnet or gemini-pro (within budget, good quality)The orchestrator automatically detects task types:
- Coding: Keywords like "code", "function", "debug", "programming"
- Analysis: Keywords like "analyze", "compare", "evaluate"
- Creative: Keywords like "write", "story", "poem", "creative"
- Math: Keywords like "calculate", "math", "solve"
- Translation: Keywords like "translate", "translation"
- Summarization: Keywords like "summarize", "summary", "brief"
- Reasoning: Keywords like "reasoning", "logic", "explain why"
- Chat: Default for general conversation
The server provides two resources:
- models://catalog - Complete model catalog with capabilities
- models://routing-rules - Current routing rules and logic
Edit the MODELS dictionary in multi_model_orchestrator.py:
MODELS = {
"your-model-name": ModelInfo(
name="your-model-name",
provider="YourProvider",
tier=ModelTier.STANDARD,
cost_per_1k_tokens=0.005,
strengths=["coding", "analysis"],
max_tokens=8192,
supports_vision=False,
supports_function_calling=True
)
}Modify the recommend_model() method to adjust scoring:
# Increase weight for task type matching
if task_type.value in model_info.strengths:
score += 50 # Adjust this value┌─────────────────────────────────────────────────┐
│ MCP Client (Claude Desktop) │
└────────────────────┬────────────────────────────┘
│
│ MCP Protocol
│
┌────────────────────▼────────────────────────────┐
│ Multi-Model Orchestrator Server │
│ │
│ ┌────────────────────────────────────────┐ │
│ │ Query Analysis Engine │ │
│ │ - Task type detection │ │
│ │ - Complexity assessment │ │
│ │ - Requirement extraction │ │
│ └────────────────────────────────────────┘ │
│ │
│ ┌────────────────────────────────────────┐ │
│ │ Model Recommendation Engine │ │
│ │ - Score-based selection │ │
│ │ - Cost optimization │ │
│ │ - Performance matching │ │
│ └────────────────────────────────────────┘ │
│ │
│ ┌────────────────────────────────────────┐ │
│ │ Model Database │ │
│ │ - Capabilities │ │
│ │ - Costs │ │
│ │ - Performance tiers │ │
│ └────────────────────────────────────────┘ │
└─────────────────────────────────────────────────┘
- Real-time cost tracking
- Usage analytics and reporting
- A/B testing between models
- Custom routing rules via configuration
- Integration with actual API providers
- Model performance benchmarking
- Historical query analysis
- Rate limiting support
- Multi-model ensemble responses
Test the server manually:
# Run the server
python multi_model_orchestrator.py
# In another terminal, test with MCP Inspector
npx @modelcontextprotocol/inspector python multi_model_orchestrator.py- Ensure Python 3.10+ is installed
- Check that all dependencies are installed:
pip install -r requirements.txt - Verify the script path in your configuration
- Check that your query is being analyzed correctly
- Try different priority modes
- Verify max_cost constraints aren't too restrictive
- Ensure proper JSON format for parameters
- Check the MCP client logs for detailed error messages
To extend this MCP server:
- Add new models to the
MODELSdictionary - Enhance task type detection in
analyze_query() - Adjust scoring logic in
recommend_model() - Add new tools to handle additional use cases
MIT License - Feel free to use and modify for your needs.
Created as a demonstration of MCP server capabilities for intelligent model routing.
Note: This is a routing and recommendation tool. It does not actually call the AI model APIs. You would need to integrate with the respective provider SDKs to execute queries on the recommended models.