GitHub - Thormatt/Delphi: Multi-model AI consensus tool using the Delphi method

Complex questions deserve multiple perspectives.
Delphi queries diverse AI models, synthesizes their views, and surfaces genuine consensus.

Requirements • Quick Start • How It Works • Example • Full Docs

The Problem

Complex questions rarely have simple answers. A single AI model gives you one perspective shaped by its training data and architecture. For nuanced topics—technical trade-offs, research questions, multi-faceted decisions—one viewpoint isn't enough.

The insight: Different AI models reason differently. When multiple models independently arrive at the same conclusion, your confidence increases. When they disagree, you've found genuine complexity worth exploring.

Delphi is not a truth oracle—it's a structured deliberation workflow that surfaces consensus and fault lines.

Requirements

Node.js 18+
API key for one of the supported gateways (see below)
Claude Desktop or any MCP-compatible client

Supported Gateways

Delphi works with any OpenAI-compatible API:

Gateway	Get API Key	Notes
OpenRouter (default)	openrouter.ai/keys	Widest model selection
Together.ai	together.ai	Open-source focus
Fireworks.ai	fireworks.ai	Fast inference
Groq	groq.com	Ultra-fast, limited models
Deepinfra	deepinfra.com	Budget alternative

Quick Start

Add to your MCP client config:

Claude Desktop: ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows)

{
  "mcpServers": {
    "delphi": {
      "command": "npx",
      "args": ["-y", "delphi-mcp"],
      "env": {
        "DELPHI_API_KEY": "your-api-key"
      }
    }
  }
}

Restart your MCP client. Done.

Using a different gateway

Set DELPHI_BASE_URL to use Together, Fireworks, Groq, etc:

{
  "mcpServers": {
    "delphi": {
      "command": "npx",
      "args": ["-y", "delphi-mcp"],
      "env": {
        "DELPHI_API_KEY": "your-api-key",
        "DELPHI_BASE_URL": "https://api.together.xyz/v1"
      }
    }
  }
}

Gateway	Base URL
OpenRouter (default)	`https://openrouter.ai/api/v1`
Together.ai	`https://api.together.xyz/v1`
Fireworks.ai	`https://api.fireworks.ai/inference/v1`
Groq	`https://api.groq.com/openai/v1`
Deepinfra	`https://api.deepinfra.com/v1/openai`

Global install

npm install -g delphi-mcp

{
  "mcpServers": {
    "delphi": {
      "command": "delphi-mcp",
      "env": { "DELPHI_API_KEY": "your-api-key" }
    }
  }
}

How It Works

Round 1: Independent responses — Each model answers without seeing others (true independence)
Round 2+: Deliberation — Models see the synthesis and can revise, challenge, or hold position
Convergence detection — Stops when 85% agreement is reached
Claim analysis — Single-source claims get flagged for verification

Example Output

Question: Should we use microservices or a monolith for a new e-commerce platform?

Consensus (87% agreement after 3 rounds)

Round 1 — Initial Positions:

Model	Position
Claude Opus 4	Monolith first, extract services later
GPT-4o	Microservices for scalability from day one
Gemini 2.0	Depends on team size and experience
DeepSeek V3	Modular monolith as middle ground

Round 2 — After seeing each other's reasoning:

GPT-4o revised: "Agreed that premature microservices add complexity. Team size matters."
Claude maintained position but acknowledged: "Microservices make sense if team is 50+ engineers"
All models converged on team size as the key factor

Round 3 — Final Synthesis:

Claim	Strength	Agreement
Start with monolith for teams < 20 engineers	unanimous	5/5
Modular boundaries enable future extraction	unanimous	5/5
Microservices add 3-5x operational overhead	strong	4/5
Extract services only when team/traffic demands	strong	4/5
Kubernetes required for microservices	disputed	2/5

Key Disagreement Surfaced:

"Kubernetes required for microservices" — Claude and DeepSeek disagreed, noting alternatives like ECS, Nomad, or even simple VM deployments. This flags an area where the "conventional wisdom" may be overconfident.

Control Drift: 45% — A single model would have given a more opinionated answer without surfacing the team-size nuance or the Kubernetes debate.

Presets

Preset	Tier	Rounds	Grounding	Cost	Use Case
`quick`	fast	2	off	~$0.04	Quick checks
`balanced`	standard	4	off	~$0.20	General queries
`research`	premium	6	on	~$0.50	Deep analysis
`factcheck`	standard	3	on	~$0.25	Verify claims

When to Use Delphi

Use Delphi for:

Complex technical decisions with trade-offs
Research questions with multiple valid perspectives
High-dimensional problems (many factors to weigh)
Topics where experts genuinely disagree
Validating important conclusions before acting

Skip Delphi for:

Simple factual lookups → single model is fine
Creative writing → diversity unhelpful
Real-time chat → too slow
Well-defined problems with clear answers

Decision rule: If the question has genuine complexity and the answer matters, use Delphi.

Features

Multi-Model Consensus — Claude, GPT-4o, Gemini, DeepSeek working together
Dynamic Convergence — Iterates until 85% agreement or surfaces disagreement
Claim Strength — See which points are unanimous vs genuinely disputed
Revision Rounds — Models can challenge and refine each other's reasoning
Expert Personas — Frame panelists as domain experts for deeper analysis
Diverse Panel Mode — Assign complementary expert roles within a domain
Web Grounding — Optionally verify claims against live sources
Budget Controls — Token and cost limits for predictable spend
Multiple Formats — Markdown, JSON, HTML, plain text

Expert Personas

Like a real Delphi study, you can frame panelists as domain experts:

{
  "question": "What are the security implications of storing JWTs in localStorage?",
  "expertise": "security"
}

Available domains:

Domain	Expert Type
`security`	Security Engineer (15+ years, penetration testing, secure development)
`finance`	Financial Analyst (investment banking, risk management)
`medical`	Medical Researcher (clinical medicine, evidence-based medicine)
`legal`	Legal Expert (corporate law, IP, regulatory compliance)
`engineering`	Software Engineer (system design, architecture patterns)
`data-science`	Data Scientist (ML, statistical analysis)
`economics`	Economist (micro/macro economics, policy analysis)
`architecture`	Systems Architect (distributed systems, cloud platforms)
`devops`	DevOps Engineer (CI/CD, infrastructure automation)
`product`	Product Manager (strategy, user research, go-to-market)

Diverse Panel Mode

Add diversePersonas: true to give each panelist a different complementary role within the domain — just like assembling a real expert panel:

{
  "question": "Should we migrate to microservices?",
  "expertise": "architecture",
  "diversePersonas": true
}

For architecture, this creates a panel of:

Cloud architect (AWS/GCP/Azure best practices)
Platform architect (internal developer platforms)
Data architect (data modeling, warehousing)
Integration architect (APIs, messaging)
Security architect (zero-trust, identity management)
Solutions architect (customer requirements)

Auto-Expertise Mode

For the most authentic Delphi experience, let the administrator automatically determine what experts are needed based on your question:

{
  "question": "Should we implement rate limiting at the API gateway or application layer?",
  "autoExpertise": true
}

The administrator analyzes your question and dynamically generates an optimal expert panel:

Expert	Focus	Perspective
API Gateway Architect	Rate limiting patterns, edge vs origin	Infrastructure scalability
Security Engineer	DDoS protection, abuse prevention	Defensive, assumes adversarial users
Backend Developer	Application-level implementation	Developer experience, maintainability
SRE/Platform Engineer	Observability, failure modes	Operational reliability

Why auto-expertise?

Mimics how real Delphi studies select experts based on the question
No need to guess which domain fits best
Gets complementary perspectives without manual configuration
Shows rationale for why each expert was chosen

API

Tool	Description
`delphi_query`	Multi-model consensus query
`delphi_factcheck`	Fact-check a specific claim
`delphi_list_models`	List available models
`delphi_estimate_cost`	Estimate before running

Documentation

For full technical documentation including:

All configuration options
Test results & insights
Architecture internals
Cost analysis
Safety features

See docs/TECHNICAL.md

License

MIT — Built by Thor Matthiasson

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
assets		assets
docs		docs
reports		reports
src		src
.env.example		.env.example
.gitignore		.gitignore
.npmignore		.npmignore
LICENSE		LICENSE
README.md		README.md
delphi_codebase.md		delphi_codebase.md
delphi_essay.md		delphi_essay.md
delphi_result.html		delphi_result.html
package-lock.json		package-lock.json
package.json		package.json
pipeline_review.js		pipeline_review.js
test_ledger_dist.js		test_ledger_dist.js
test_results_1767583163335.json		test_results_1767583163335.json
test_results_1767583756484.json		test_results_1767583756484.json
test_results_1767584558483.json		test_results_1767584558483.json
test_results_1767584781127.json		test_results_1767584781127.json
test_results_1767631535128.json		test_results_1767631535128.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

The Problem

Requirements

Supported Gateways

Quick Start

How It Works

Example Output

Consensus (87% agreement after 3 rounds)

Presets

When to Use Delphi

Features

Expert Personas

Diverse Panel Mode

Auto-Expertise Mode

API

Documentation

License

About

Uh oh!

Releases

Packages

Languages

License

Thormatt/Delphi

Folders and files

Latest commit

History

Repository files navigation

The Problem

Requirements

Supported Gateways

Quick Start

How It Works

Example Output

Consensus (87% agreement after 3 rounds)

Presets

When to Use Delphi

Features

Expert Personas

Diverse Panel Mode

Auto-Expertise Mode

API

Documentation

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages