Skip to content

Pakawat-Dev/multi_model_mcp

Repository files navigation

Multi-Model Orchestrator MCP Server

An intelligent Model Context Protocol (MCP) server that automatically routes queries to the most suitable AI model based on task requirements, cost constraints, and performance characteristics.

Features

  • Intelligent Routing: Automatically analyzes queries to determine task type (coding, analysis, creative writing, etc.)
  • Cost Optimization: Recommends models based on budget constraints and cost-per-token
  • Performance Tiers: Supports premium, standard, fast, and budget model tiers
  • Multi-Provider: Includes models from OpenAI, Anthropic, Google, and open-source options
  • Flexible Priorities: Optimize for cost, performance, speed, or balanced approach
  • Model Comparison: Side-by-side comparison of different models
  • Cost Estimation: Calculate estimated costs before running queries

Supported Models

Latest Generation Models (2024-2025)

Model Provider Tier Cost/1K Tokens Strengths Vision Functions
GPT-5 OpenAI Premium $0.050 Reasoning, coding, analysis, math, creative
Claude Opus 4.1 Anthropic Premium $0.015 Reasoning, analysis, creative, coding, math
Claude Sonnet 4.5 Anthropic Premium $0.003 Coding, reasoning, analysis, creative, chat
Gemini 2.5 Pro Google Premium $0.00375 Reasoning, coding, analysis, math, creative

Previous Generation Models

Model Provider Tier Cost/1K Tokens Strengths Vision Functions
GPT-4 OpenAI Premium $0.030 Reasoning, coding, analysis, math
GPT-3.5 Turbo OpenAI Fast $0.002 Chat, summarization, translation
Claude 3 Opus Anthropic Premium $0.015 Reasoning, analysis, creative, coding
Claude 3 Sonnet Anthropic Standard $0.003 Coding, analysis, chat
Claude 3 Haiku Anthropic Fast $0.00025 Chat, summarization, fast responses
Gemini Pro Google Standard $0.00125 Reasoning, coding, analysis
Llama 2 70B Meta Budget $0.0008 Chat, coding, summarization

Installation

  1. Install dependencies:
pip install -r requirements.txt
  1. Make the script executable:
chmod +x multi_model_orchestrator.py

Configuration

Claude Desktop Configuration

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "multi-model-orchestrator": {
      "command": "python",
      "args": [
        "/path/to/multi_model_orchestrator.py"
      ]
    }
  }
}

VS Code Configuration

Add to your MCP settings:

{
  "mcp.servers": {
    "multi-model-orchestrator": {
      "command": "python",
      "args": ["/path/to/multi_model_orchestrator.py"]
    }
  }
}

Available Tools

1. recommend_model

Get AI model recommendations based on your query and requirements.

Parameters:

  • query (required): The user query or task description
  • priority (optional): What to optimize for - "balanced", "cost", "performance", or "speed" (default: "balanced")
  • max_cost_per_1k (optional): Maximum acceptable cost per 1k tokens

Example:

{
  "query": "Write a complex Python function to optimize database queries",
  "priority": "performance"
}

Response:

{
  "analysis": {
    "task_type": "coding",
    "estimated_tokens": 150,
    "complexity": "high",
    "requires_vision": false,
    "requires_function_calling": false
  },
  "recommendation": {
    "recommended_model": "claude-3-opus",
    "provider": "Anthropic",
    "tier": "premium",
    "estimated_cost_per_1k": 0.015,
    "strengths": ["reasoning", "analysis", "creative", "coding"],
    "reason": "optimized for coding, premium tier performance",
    "alternatives": [...]
  }
}

2. compare_models

Compare multiple AI models side by side.

Parameters:

  • models (required): Array of model names to compare

Example:

{
  "models": ["gpt-4", "claude-3-opus", "claude-3-sonnet"]
}

3. analyze_task

Analyze a query without making a recommendation.

Parameters:

  • query (required): The query to analyze

Example:

{
  "query": "Translate this document from English to Spanish"
}

4. list_models_by_criteria

Filter models by specific criteria.

Parameters:

  • task_type (optional): Filter by task type
  • tier (optional): Filter by performance tier
  • max_cost (optional): Maximum cost per 1k tokens
  • requires_vision (optional): Requires vision capabilities

Example:

{
  "task_type": "coding",
  "max_cost": 0.01,
  "tier": "standard"
}

5. estimate_cost

Calculate the estimated cost for running a query.

Parameters:

  • model (required): Model name
  • input_tokens (required): Estimated input tokens
  • output_tokens (required): Estimated output tokens

Example:

{
  "model": "claude-3-sonnet",
  "input_tokens": 500,
  "output_tokens": 1000
}

Usage Examples

Example 1: Cost-Optimized Query

# Query: "Summarize this article in 3 bullet points"
# Priority: cost
# Result: claude-3-haiku (lowest cost, optimized for summarization)

Example 2: Performance-Optimized Complex Task

# Query: "Analyze this codebase and suggest architectural improvements"
# Priority: performance
# Result: gpt-4 or claude-3-opus (premium tier, strong reasoning)

Example 3: Speed-Optimized Simple Chat

# Query: "What's the weather like?"
# Priority: speed
# Result: gpt-3.5-turbo or claude-3-haiku (fast response)

Example 4: Budget Constraint

# Query: "Write a blog post about AI"
# Priority: balanced
# max_cost_per_1k: 0.005
# Result: claude-3-sonnet or gemini-pro (within budget, good quality)

Task Type Detection

The orchestrator automatically detects task types:

  • Coding: Keywords like "code", "function", "debug", "programming"
  • Analysis: Keywords like "analyze", "compare", "evaluate"
  • Creative: Keywords like "write", "story", "poem", "creative"
  • Math: Keywords like "calculate", "math", "solve"
  • Translation: Keywords like "translate", "translation"
  • Summarization: Keywords like "summarize", "summary", "brief"
  • Reasoning: Keywords like "reasoning", "logic", "explain why"
  • Chat: Default for general conversation

Resources

The server provides two resources:

  1. models://catalog - Complete model catalog with capabilities
  2. models://routing-rules - Current routing rules and logic

Customization

Adding New Models

Edit the MODELS dictionary in multi_model_orchestrator.py:

MODELS = {
    "your-model-name": ModelInfo(
        name="your-model-name",
        provider="YourProvider",
        tier=ModelTier.STANDARD,
        cost_per_1k_tokens=0.005,
        strengths=["coding", "analysis"],
        max_tokens=8192,
        supports_vision=False,
        supports_function_calling=True
    )
}

Adjusting Routing Logic

Modify the recommend_model() method to adjust scoring:

# Increase weight for task type matching
if task_type.value in model_info.strengths:
    score += 50  # Adjust this value

Architecture

┌─────────────────────────────────────────────────┐
│           MCP Client (Claude Desktop)           │
└────────────────────┬────────────────────────────┘
                     │
                     │ MCP Protocol
                     │
┌────────────────────▼────────────────────────────┐
│         Multi-Model Orchestrator Server         │
│                                                  │
│  ┌────────────────────────────────────────┐    │
│  │       Query Analysis Engine             │    │
│  │  - Task type detection                  │    │
│  │  - Complexity assessment                │    │
│  │  - Requirement extraction               │    │
│  └────────────────────────────────────────┘    │
│                                                  │
│  ┌────────────────────────────────────────┐    │
│  │       Model Recommendation Engine       │    │
│  │  - Score-based selection                │    │
│  │  - Cost optimization                    │    │
│  │  - Performance matching                 │    │
│  └────────────────────────────────────────┘    │
│                                                  │
│  ┌────────────────────────────────────────┐    │
│  │          Model Database                 │    │
│  │  - Capabilities                         │    │
│  │  - Costs                                │    │
│  │  - Performance tiers                    │    │
│  └────────────────────────────────────────┘    │
└─────────────────────────────────────────────────┘

Future Enhancements

  • Real-time cost tracking
  • Usage analytics and reporting
  • A/B testing between models
  • Custom routing rules via configuration
  • Integration with actual API providers
  • Model performance benchmarking
  • Historical query analysis
  • Rate limiting support
  • Multi-model ensemble responses

Testing

Test the server manually:

# Run the server
python multi_model_orchestrator.py

# In another terminal, test with MCP Inspector
npx @modelcontextprotocol/inspector python multi_model_orchestrator.py

Troubleshooting

Server won't start

  • Ensure Python 3.10+ is installed
  • Check that all dependencies are installed: pip install -r requirements.txt
  • Verify the script path in your configuration

No models recommended

  • Check that your query is being analyzed correctly
  • Try different priority modes
  • Verify max_cost constraints aren't too restrictive

Tool calls failing

  • Ensure proper JSON format for parameters
  • Check the MCP client logs for detailed error messages

Contributing

To extend this MCP server:

  1. Add new models to the MODELS dictionary
  2. Enhance task type detection in analyze_query()
  3. Adjust scoring logic in recommend_model()
  4. Add new tools to handle additional use cases

License

MIT License - Feel free to use and modify for your needs.

Author

Created as a demonstration of MCP server capabilities for intelligent model routing.


Note: This is a routing and recommendation tool. It does not actually call the AI model APIs. You would need to integrate with the respective provider SDKs to execute queries on the recommended models.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages