Multi-Model Orchestrator MCP Server

An intelligent Model Context Protocol (MCP) server that automatically routes queries to the most suitable AI model based on task requirements, cost constraints, and performance characteristics.

Features

Intelligent Routing: Automatically analyzes queries to determine task type (coding, analysis, creative writing, etc.)
Cost Optimization: Recommends models based on budget constraints and cost-per-token
Performance Tiers: Supports premium, standard, fast, and budget model tiers
Multi-Provider: Includes models from OpenAI, Anthropic, Google, and open-source options
Flexible Priorities: Optimize for cost, performance, speed, or balanced approach
Model Comparison: Side-by-side comparison of different models
Cost Estimation: Calculate estimated costs before running queries

Supported Models

Latest Generation Models (2024-2025)

Model	Provider	Tier	Cost/1K Tokens	Strengths	Vision	Functions
GPT-5	OpenAI	Premium	$0.050	Reasoning, coding, analysis, math, creative	✅	✅
Claude Opus 4.1	Anthropic	Premium	$0.015	Reasoning, analysis, creative, coding, math	✅	✅
Claude Sonnet 4.5	Anthropic	Premium	$0.003	Coding, reasoning, analysis, creative, chat	✅	✅
Gemini 2.5 Pro	Google	Premium	$0.00375	Reasoning, coding, analysis, math, creative	✅	✅

Previous Generation Models

Model	Provider	Tier	Cost/1K Tokens	Strengths	Vision	Functions
GPT-4	OpenAI	Premium	$0.030	Reasoning, coding, analysis, math	❌	✅
GPT-3.5 Turbo	OpenAI	Fast	$0.002	Chat, summarization, translation	❌	✅
Claude 3 Opus	Anthropic	Premium	$0.015	Reasoning, analysis, creative, coding	✅	❌
Claude 3 Sonnet	Anthropic	Standard	$0.003	Coding, analysis, chat	✅	❌
Claude 3 Haiku	Anthropic	Fast	$0.00025	Chat, summarization, fast responses	❌	❌
Gemini Pro	Google	Standard	$0.00125	Reasoning, coding, analysis	❌	❌
Llama 2 70B	Meta	Budget	$0.0008	Chat, coding, summarization	❌	❌

Installation

Install dependencies:

pip install -r requirements.txt

Make the script executable:

chmod +x multi_model_orchestrator.py

Configuration

Claude Desktop Configuration

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "multi-model-orchestrator": {
      "command": "python",
      "args": [
        "/path/to/multi_model_orchestrator.py"
      ]
    }
  }
}

VS Code Configuration

Add to your MCP settings:

{
  "mcp.servers": {
    "multi-model-orchestrator": {
      "command": "python",
      "args": ["/path/to/multi_model_orchestrator.py"]
    }
  }
}

Available Tools

1. recommend_model

Get AI model recommendations based on your query and requirements.

Parameters:

query (required): The user query or task description
priority (optional): What to optimize for - "balanced", "cost", "performance", or "speed" (default: "balanced")
max_cost_per_1k (optional): Maximum acceptable cost per 1k tokens

Example:

{
  "query": "Write a complex Python function to optimize database queries",
  "priority": "performance"
}

Response:

{
  "analysis": {
    "task_type": "coding",
    "estimated_tokens": 150,
    "complexity": "high",
    "requires_vision": false,
    "requires_function_calling": false
  },
  "recommendation": {
    "recommended_model": "claude-3-opus",
    "provider": "Anthropic",
    "tier": "premium",
    "estimated_cost_per_1k": 0.015,
    "strengths": ["reasoning", "analysis", "creative", "coding"],
    "reason": "optimized for coding, premium tier performance",
    "alternatives": [...]
  }
}

2. compare_models

Compare multiple AI models side by side.

Parameters:

models (required): Array of model names to compare

Example:

{
  "models": ["gpt-4", "claude-3-opus", "claude-3-sonnet"]
}

3. analyze_task

Analyze a query without making a recommendation.

Parameters:

query (required): The query to analyze

Example:

{
  "query": "Translate this document from English to Spanish"
}

4. list_models_by_criteria

Filter models by specific criteria.

Parameters:

task_type (optional): Filter by task type
tier (optional): Filter by performance tier
max_cost (optional): Maximum cost per 1k tokens
requires_vision (optional): Requires vision capabilities

Example:

{
  "task_type": "coding",
  "max_cost": 0.01,
  "tier": "standard"
}

5. estimate_cost

Calculate the estimated cost for running a query.

Parameters:

model (required): Model name
input_tokens (required): Estimated input tokens
output_tokens (required): Estimated output tokens

Example:

{
  "model": "claude-3-sonnet",
  "input_tokens": 500,
  "output_tokens": 1000
}

Usage Examples

Example 1: Cost-Optimized Query

# Query: "Summarize this article in 3 bullet points"
# Priority: cost
# Result: claude-3-haiku (lowest cost, optimized for summarization)

Example 2: Performance-Optimized Complex Task

# Query: "Analyze this codebase and suggest architectural improvements"
# Priority: performance
# Result: gpt-4 or claude-3-opus (premium tier, strong reasoning)

Example 3: Speed-Optimized Simple Chat

# Query: "What's the weather like?"
# Priority: speed
# Result: gpt-3.5-turbo or claude-3-haiku (fast response)

Example 4: Budget Constraint

# Query: "Write a blog post about AI"
# Priority: balanced
# max_cost_per_1k: 0.005
# Result: claude-3-sonnet or gemini-pro (within budget, good quality)

Task Type Detection

The orchestrator automatically detects task types:

Coding: Keywords like "code", "function", "debug", "programming"
Analysis: Keywords like "analyze", "compare", "evaluate"
Creative: Keywords like "write", "story", "poem", "creative"
Math: Keywords like "calculate", "math", "solve"
Translation: Keywords like "translate", "translation"
Summarization: Keywords like "summarize", "summary", "brief"
Reasoning: Keywords like "reasoning", "logic", "explain why"
Chat: Default for general conversation

Resources

The server provides two resources:

models://catalog - Complete model catalog with capabilities
models://routing-rules - Current routing rules and logic

Customization

Adding New Models

Edit the MODELS dictionary in multi_model_orchestrator.py:

MODELS = {
    "your-model-name": ModelInfo(
        name="your-model-name",
        provider="YourProvider",
        tier=ModelTier.STANDARD,
        cost_per_1k_tokens=0.005,
        strengths=["coding", "analysis"],
        max_tokens=8192,
        supports_vision=False,
        supports_function_calling=True
    )
}

Adjusting Routing Logic

Modify the recommend_model() method to adjust scoring:

# Increase weight for task type matching
if task_type.value in model_info.strengths:
    score += 50  # Adjust this value

Architecture

┌─────────────────────────────────────────────────┐
│           MCP Client (Claude Desktop)           │
└────────────────────┬────────────────────────────┘
                     │
                     │ MCP Protocol
                     │
┌────────────────────▼────────────────────────────┐
│         Multi-Model Orchestrator Server         │
│                                                  │
│  ┌────────────────────────────────────────┐    │
│  │       Query Analysis Engine             │    │
│  │  - Task type detection                  │    │
│  │  - Complexity assessment                │    │
│  │  - Requirement extraction               │    │
│  └────────────────────────────────────────┘    │
│                                                  │
│  ┌────────────────────────────────────────┐    │
│  │       Model Recommendation Engine       │    │
│  │  - Score-based selection                │    │
│  │  - Cost optimization                    │    │
│  │  - Performance matching                 │    │
│  └────────────────────────────────────────┘    │
│                                                  │
│  ┌────────────────────────────────────────┐    │
│  │          Model Database                 │    │
│  │  - Capabilities                         │    │
│  │  - Costs                                │    │
│  │  - Performance tiers                    │    │
│  └────────────────────────────────────────┘    │
└─────────────────────────────────────────────────┘

Future Enhancements

Real-time cost tracking
Usage analytics and reporting
A/B testing between models
Custom routing rules via configuration
Integration with actual API providers
Model performance benchmarking
Historical query analysis
Rate limiting support
Multi-model ensemble responses

Testing

Test the server manually:

# Run the server
python multi_model_orchestrator.py

# In another terminal, test with MCP Inspector
npx @modelcontextprotocol/inspector python multi_model_orchestrator.py

Troubleshooting

Server won't start

Ensure Python 3.10+ is installed
Check that all dependencies are installed: pip install -r requirements.txt
Verify the script path in your configuration

No models recommended

Check that your query is being analyzed correctly
Try different priority modes
Verify max_cost constraints aren't too restrictive

Tool calls failing

Ensure proper JSON format for parameters
Check the MCP client logs for detailed error messages

Contributing

To extend this MCP server:

Add new models to the MODELS dictionary
Enhance task type detection in analyze_query()
Adjust scoring logic in recommend_model()
Add new tools to handle additional use cases

License

MIT License - Feel free to use and modify for your needs.

Author

Created as a demonstration of MCP server capabilities for intelligent model routing.

Note: This is a routing and recommendation tool. It does not actually call the AI model APIs. You would need to integrate with the respective provider SDKs to execute queries on the recommended models.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
__pycache__		__pycache__
CHANGELOG.md		CHANGELOG.md
INDEX.md		INDEX.md
MODEL_SELECTION_GUIDE.md		MODEL_SELECTION_GUIDE.md
QUICKSTART.md		QUICKSTART.md
README.md		README.md
UPDATE_SUMMARY.md		UPDATE_SUMMARY.md
claude_desktop_config.example.json		claude_desktop_config.example.json
multi_model_orchestrator.py		multi_model_orchestrator.py
requirements.txt		requirements.txt
test_orchestrator.py		test_orchestrator.py

Pakawat-Dev/multi_model_mcp

Folders and files

Latest commit

History

Repository files navigation

Multi-Model Orchestrator MCP Server

Features

Supported Models

Latest Generation Models (2024-2025)

Previous Generation Models

Installation

Configuration

Claude Desktop Configuration

VS Code Configuration

Available Tools

1. recommend_model

2. compare_models

3. analyze_task

4. list_models_by_criteria

5. estimate_cost

Usage Examples

Example 1: Cost-Optimized Query

Example 2: Performance-Optimized Complex Task

Example 3: Speed-Optimized Simple Chat

Example 4: Budget Constraint

Task Type Detection

Resources

Customization

Adding New Models

Adjusting Routing Logic

Architecture

Future Enhancements

Testing

Troubleshooting

Server won't start

No models recommended

Tool calls failing

Contributing

License

Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages