Complex questions deserve multiple perspectives.
Delphi queries diverse AI models, synthesizes their views, and surfaces genuine consensus.
Requirements • Quick Start • How It Works • Example • Full Docs
Complex questions rarely have simple answers. A single AI model gives you one perspective shaped by its training data and architecture. For nuanced topics—technical trade-offs, research questions, multi-faceted decisions—one viewpoint isn't enough.
The insight: Different AI models reason differently. When multiple models independently arrive at the same conclusion, your confidence increases. When they disagree, you've found genuine complexity worth exploring.
Delphi is not a truth oracle—it's a structured deliberation workflow that surfaces consensus and fault lines.
- Node.js 18+
- API key for one of the supported gateways (see below)
- Claude Desktop or any MCP-compatible client
Delphi works with any OpenAI-compatible API:
| Gateway | Get API Key | Notes |
|---|---|---|
| OpenRouter (default) | openrouter.ai/keys | Widest model selection |
| Together.ai | together.ai | Open-source focus |
| Fireworks.ai | fireworks.ai | Fast inference |
| Groq | groq.com | Ultra-fast, limited models |
| Deepinfra | deepinfra.com | Budget alternative |
Add to your MCP client config:
Claude Desktop: ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows)
{
"mcpServers": {
"delphi": {
"command": "npx",
"args": ["-y", "delphi-mcp"],
"env": {
"DELPHI_API_KEY": "your-api-key"
}
}
}
}Restart your MCP client. Done.
Using a different gateway
Set DELPHI_BASE_URL to use Together, Fireworks, Groq, etc:
{
"mcpServers": {
"delphi": {
"command": "npx",
"args": ["-y", "delphi-mcp"],
"env": {
"DELPHI_API_KEY": "your-api-key",
"DELPHI_BASE_URL": "https://api.together.xyz/v1"
}
}
}
}| Gateway | Base URL |
|---|---|
| OpenRouter (default) | https://openrouter.ai/api/v1 |
| Together.ai | https://api.together.xyz/v1 |
| Fireworks.ai | https://api.fireworks.ai/inference/v1 |
| Groq | https://api.groq.com/openai/v1 |
| Deepinfra | https://api.deepinfra.com/v1/openai |
Global install
npm install -g delphi-mcp{
"mcpServers": {
"delphi": {
"command": "delphi-mcp",
"env": { "DELPHI_API_KEY": "your-api-key" }
}
}
}- Round 1: Independent responses — Each model answers without seeing others (true independence)
- Round 2+: Deliberation — Models see the synthesis and can revise, challenge, or hold position
- Convergence detection — Stops when 85% agreement is reached
- Claim analysis — Single-source claims get flagged for verification
Question: Should we use microservices or a monolith for a new e-commerce platform?
Round 1 — Initial Positions:
| Model | Position |
|---|---|
| Claude Opus 4 | Monolith first, extract services later |
| GPT-4o | Microservices for scalability from day one |
| Gemini 2.0 | Depends on team size and experience |
| DeepSeek V3 | Modular monolith as middle ground |
Round 2 — After seeing each other's reasoning:
- GPT-4o revised: "Agreed that premature microservices add complexity. Team size matters."
- Claude maintained position but acknowledged: "Microservices make sense if team is 50+ engineers"
- All models converged on team size as the key factor
Round 3 — Final Synthesis:
| Claim | Strength | Agreement |
|---|---|---|
| Start with monolith for teams < 20 engineers | unanimous | 5/5 |
| Modular boundaries enable future extraction | unanimous | 5/5 |
| Microservices add 3-5x operational overhead | strong | 4/5 |
| Extract services only when team/traffic demands | strong | 4/5 |
| Kubernetes required for microservices | disputed | 2/5 |
Key Disagreement Surfaced:
"Kubernetes required for microservices" — Claude and DeepSeek disagreed, noting alternatives like ECS, Nomad, or even simple VM deployments. This flags an area where the "conventional wisdom" may be overconfident.
Control Drift: 45% — A single model would have given a more opinionated answer without surfacing the team-size nuance or the Kubernetes debate.
| Preset | Tier | Rounds | Grounding | Cost | Use Case |
|---|---|---|---|---|---|
quick |
fast | 2 | off | ~$0.04 | Quick checks |
balanced |
standard | 4 | off | ~$0.20 | General queries |
research |
premium | 6 | on | ~$0.50 | Deep analysis |
factcheck |
standard | 3 | on | ~$0.25 | Verify claims |
Use Delphi for:
- Complex technical decisions with trade-offs
- Research questions with multiple valid perspectives
- High-dimensional problems (many factors to weigh)
- Topics where experts genuinely disagree
- Validating important conclusions before acting
Skip Delphi for:
- Simple factual lookups → single model is fine
- Creative writing → diversity unhelpful
- Real-time chat → too slow
- Well-defined problems with clear answers
Decision rule: If the question has genuine complexity and the answer matters, use Delphi.
- Multi-Model Consensus — Claude, GPT-4o, Gemini, DeepSeek working together
- Dynamic Convergence — Iterates until 85% agreement or surfaces disagreement
- Claim Strength — See which points are unanimous vs genuinely disputed
- Revision Rounds — Models can challenge and refine each other's reasoning
- Expert Personas — Frame panelists as domain experts for deeper analysis
- Diverse Panel Mode — Assign complementary expert roles within a domain
- Web Grounding — Optionally verify claims against live sources
- Budget Controls — Token and cost limits for predictable spend
- Multiple Formats — Markdown, JSON, HTML, plain text
Like a real Delphi study, you can frame panelists as domain experts:
{
"question": "What are the security implications of storing JWTs in localStorage?",
"expertise": "security"
}Available domains:
| Domain | Expert Type |
|---|---|
security |
Security Engineer (15+ years, penetration testing, secure development) |
finance |
Financial Analyst (investment banking, risk management) |
medical |
Medical Researcher (clinical medicine, evidence-based medicine) |
legal |
Legal Expert (corporate law, IP, regulatory compliance) |
engineering |
Software Engineer (system design, architecture patterns) |
data-science |
Data Scientist (ML, statistical analysis) |
economics |
Economist (micro/macro economics, policy analysis) |
architecture |
Systems Architect (distributed systems, cloud platforms) |
devops |
DevOps Engineer (CI/CD, infrastructure automation) |
product |
Product Manager (strategy, user research, go-to-market) |
Add diversePersonas: true to give each panelist a different complementary role within the domain — just like assembling a real expert panel:
{
"question": "Should we migrate to microservices?",
"expertise": "architecture",
"diversePersonas": true
}For architecture, this creates a panel of:
- Cloud architect (AWS/GCP/Azure best practices)
- Platform architect (internal developer platforms)
- Data architect (data modeling, warehousing)
- Integration architect (APIs, messaging)
- Security architect (zero-trust, identity management)
- Solutions architect (customer requirements)
For the most authentic Delphi experience, let the administrator automatically determine what experts are needed based on your question:
{
"question": "Should we implement rate limiting at the API gateway or application layer?",
"autoExpertise": true
}The administrator analyzes your question and dynamically generates an optimal expert panel:
| Expert | Focus | Perspective |
|---|---|---|
| API Gateway Architect | Rate limiting patterns, edge vs origin | Infrastructure scalability |
| Security Engineer | DDoS protection, abuse prevention | Defensive, assumes adversarial users |
| Backend Developer | Application-level implementation | Developer experience, maintainability |
| SRE/Platform Engineer | Observability, failure modes | Operational reliability |
Why auto-expertise?
- Mimics how real Delphi studies select experts based on the question
- No need to guess which domain fits best
- Gets complementary perspectives without manual configuration
- Shows rationale for why each expert was chosen
| Tool | Description |
|---|---|
delphi_query |
Multi-model consensus query |
delphi_factcheck |
Fact-check a specific claim |
delphi_list_models |
List available models |
delphi_estimate_cost |
Estimate before running |
For full technical documentation including:
- All configuration options
- Test results & insights
- Architecture internals
- Cost analysis
- Safety features
MIT — Built by Thor Matthiasson