Skip to content

Thormatt/Delphi

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Delphi - Multi-model AI consensus

AI Multi-Model Consensus MCP Server

Complex questions deserve multiple perspectives.
Delphi queries diverse AI models, synthesizes their views, and surfaces genuine consensus.

RequirementsQuick StartHow It WorksExampleFull Docs

License: MIT Node >= 18 TypeScript 5.0 OpenRouter Powered


The Problem

Complex questions rarely have simple answers. A single AI model gives you one perspective shaped by its training data and architecture. For nuanced topics—technical trade-offs, research questions, multi-faceted decisions—one viewpoint isn't enough.

The insight: Different AI models reason differently. When multiple models independently arrive at the same conclusion, your confidence increases. When they disagree, you've found genuine complexity worth exploring.

Delphi is not a truth oracle—it's a structured deliberation workflow that surfaces consensus and fault lines.


Requirements

  • Node.js 18+
  • API key for one of the supported gateways (see below)
  • Claude Desktop or any MCP-compatible client

Supported Gateways

Delphi works with any OpenAI-compatible API:

Gateway Get API Key Notes
OpenRouter (default) openrouter.ai/keys Widest model selection
Together.ai together.ai Open-source focus
Fireworks.ai fireworks.ai Fast inference
Groq groq.com Ultra-fast, limited models
Deepinfra deepinfra.com Budget alternative

Quick Start

Add to your MCP client config:

Claude Desktop: ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows)

{
  "mcpServers": {
    "delphi": {
      "command": "npx",
      "args": ["-y", "delphi-mcp"],
      "env": {
        "DELPHI_API_KEY": "your-api-key"
      }
    }
  }
}

Restart your MCP client. Done.

Using a different gateway

Set DELPHI_BASE_URL to use Together, Fireworks, Groq, etc:

{
  "mcpServers": {
    "delphi": {
      "command": "npx",
      "args": ["-y", "delphi-mcp"],
      "env": {
        "DELPHI_API_KEY": "your-api-key",
        "DELPHI_BASE_URL": "https://api.together.xyz/v1"
      }
    }
  }
}
Gateway Base URL
OpenRouter (default) https://openrouter.ai/api/v1
Together.ai https://api.together.xyz/v1
Fireworks.ai https://api.fireworks.ai/inference/v1
Groq https://api.groq.com/openai/v1
Deepinfra https://api.deepinfra.com/v1/openai
Global install
npm install -g delphi-mcp
{
  "mcpServers": {
    "delphi": {
      "command": "delphi-mcp",
      "env": { "DELPHI_API_KEY": "your-api-key" }
    }
  }
}

How It Works

Delphi Process Flow

  1. Round 1: Independent responses — Each model answers without seeing others (true independence)
  2. Round 2+: Deliberation — Models see the synthesis and can revise, challenge, or hold position
  3. Convergence detection — Stops when 85% agreement is reached
  4. Claim analysis — Single-source claims get flagged for verification

Hallucination Detection


Example Output

Question: Should we use microservices or a monolith for a new e-commerce platform?

Consensus (87% agreement after 3 rounds)

Round 1 — Initial Positions:

Model Position
Claude Opus 4 Monolith first, extract services later
GPT-4o Microservices for scalability from day one
Gemini 2.0 Depends on team size and experience
DeepSeek V3 Modular monolith as middle ground

Round 2 — After seeing each other's reasoning:

  • GPT-4o revised: "Agreed that premature microservices add complexity. Team size matters."
  • Claude maintained position but acknowledged: "Microservices make sense if team is 50+ engineers"
  • All models converged on team size as the key factor

Round 3 — Final Synthesis:

Claim Strength Agreement
Start with monolith for teams < 20 engineers unanimous 5/5
Modular boundaries enable future extraction unanimous 5/5
Microservices add 3-5x operational overhead strong 4/5
Extract services only when team/traffic demands strong 4/5
Kubernetes required for microservices disputed 2/5

Key Disagreement Surfaced:

"Kubernetes required for microservices" — Claude and DeepSeek disagreed, noting alternatives like ECS, Nomad, or even simple VM deployments. This flags an area where the "conventional wisdom" may be overconfident.

Control Drift: 45% — A single model would have given a more opinionated answer without surfacing the team-size nuance or the Kubernetes debate.


Presets

Preset Tier Rounds Grounding Cost Use Case
quick fast 2 off ~$0.04 Quick checks
balanced standard 4 off ~$0.20 General queries
research premium 6 on ~$0.50 Deep analysis
factcheck standard 3 on ~$0.25 Verify claims

When to Use Delphi

Use Delphi for:

  • Complex technical decisions with trade-offs
  • Research questions with multiple valid perspectives
  • High-dimensional problems (many factors to weigh)
  • Topics where experts genuinely disagree
  • Validating important conclusions before acting

Skip Delphi for:

  • Simple factual lookups → single model is fine
  • Creative writing → diversity unhelpful
  • Real-time chat → too slow
  • Well-defined problems with clear answers

Decision rule: If the question has genuine complexity and the answer matters, use Delphi.


Features

  • Multi-Model Consensus — Claude, GPT-4o, Gemini, DeepSeek working together
  • Dynamic Convergence — Iterates until 85% agreement or surfaces disagreement
  • Claim Strength — See which points are unanimous vs genuinely disputed
  • Revision Rounds — Models can challenge and refine each other's reasoning
  • Expert Personas — Frame panelists as domain experts for deeper analysis
  • Diverse Panel Mode — Assign complementary expert roles within a domain
  • Web Grounding — Optionally verify claims against live sources
  • Budget Controls — Token and cost limits for predictable spend
  • Multiple Formats — Markdown, JSON, HTML, plain text

Expert Personas

Like a real Delphi study, you can frame panelists as domain experts:

{
  "question": "What are the security implications of storing JWTs in localStorage?",
  "expertise": "security"
}

Available domains:

Domain Expert Type
security Security Engineer (15+ years, penetration testing, secure development)
finance Financial Analyst (investment banking, risk management)
medical Medical Researcher (clinical medicine, evidence-based medicine)
legal Legal Expert (corporate law, IP, regulatory compliance)
engineering Software Engineer (system design, architecture patterns)
data-science Data Scientist (ML, statistical analysis)
economics Economist (micro/macro economics, policy analysis)
architecture Systems Architect (distributed systems, cloud platforms)
devops DevOps Engineer (CI/CD, infrastructure automation)
product Product Manager (strategy, user research, go-to-market)

Diverse Panel Mode

Add diversePersonas: true to give each panelist a different complementary role within the domain — just like assembling a real expert panel:

{
  "question": "Should we migrate to microservices?",
  "expertise": "architecture",
  "diversePersonas": true
}

For architecture, this creates a panel of:

  • Cloud architect (AWS/GCP/Azure best practices)
  • Platform architect (internal developer platforms)
  • Data architect (data modeling, warehousing)
  • Integration architect (APIs, messaging)
  • Security architect (zero-trust, identity management)
  • Solutions architect (customer requirements)

Auto-Expertise Mode

For the most authentic Delphi experience, let the administrator automatically determine what experts are needed based on your question:

{
  "question": "Should we implement rate limiting at the API gateway or application layer?",
  "autoExpertise": true
}

The administrator analyzes your question and dynamically generates an optimal expert panel:

Expert Focus Perspective
API Gateway Architect Rate limiting patterns, edge vs origin Infrastructure scalability
Security Engineer DDoS protection, abuse prevention Defensive, assumes adversarial users
Backend Developer Application-level implementation Developer experience, maintainability
SRE/Platform Engineer Observability, failure modes Operational reliability

Why auto-expertise?

  • Mimics how real Delphi studies select experts based on the question
  • No need to guess which domain fits best
  • Gets complementary perspectives without manual configuration
  • Shows rationale for why each expert was chosen

API

Tool Description
delphi_query Multi-model consensus query
delphi_factcheck Fact-check a specific claim
delphi_list_models List available models
delphi_estimate_cost Estimate before running

Documentation

For full technical documentation including:

  • All configuration options
  • Test results & insights
  • Architecture internals
  • Cost analysis
  • Safety features

See docs/TECHNICAL.md


License

MIT — Built by Thor Matthiasson

About

Multi-model AI consensus tool using the Delphi method

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published