Make Claude argue both sides of any technical decision — then decide. Uses Claude Opus 4.7's extended thinking to steelman every position, weigh trade-offs, and give a verdict with calibrated confidence.
Every engineering team wastes weeks on the same debates:
- PostgreSQL or MongoDB?
- Microservices or monolith?
- REST or GraphQL?
- Rewrite or refactor?
- AWS, GCP, or Cloudflare?
The arguments are circular because nobody steelmans the other side. People defend the option they already prefer, attack a strawman of the alternative, and miss the second-order effects.
Asking an LLM "which is better?" just gives you the popular answer. That's not what you need — you need rigorous analysis of both sides for your specific context.
debate "PostgreSQL vs MongoDB" --context "Multi-tenant B2B SaaS, 50 customers, mostly read-heavy"debate does three things in one Opus 4.7 extended-thinking pass:
- STEELMANS each position — best argument, contexts where it wins, honest weaknesses
- WEIGHS the trade-offs against your specific context
- DECIDES with confidence score + decisive factors + conditions to reconsider
$ debate "Microservices or monolith" \
--context "5-person team, B2B SaaS, 10k DAU, 20-min deploys are killing us"╔══════════════════════════════════════════════════════════╗
║ ⚖️ DEBATE ║
║ Microservices or monolith ║
║ Context: 5-person team, B2B SaaS, 20-min deploys... ║
╚══════════════════════════════════════════════════════════╝
┌─ Monolith ────────────────────────────────────────────────┐
│ A single deployable that wins on simplicity and velocity │
│ at small team sizes. │
│ │
│ Strongest arguments: │
│ 1. Cognitive overhead is minimal │
│ One codebase, one debugger, one log stream. │
│ ev: GitLab ran a monolith to $400M ARR. │
│ 2. Refactoring is cheap │
│ Function-level moves don't need API contracts. │
│ 3. Local dev loop is fast │
│ No service mesh, no docker-compose orchestra. │
│ │
│ Best fit: │
│ ✓ Team size under 15 │
│ ✓ Single product / single domain │
│ ✓ Velocity matters more than scale │
│ │
│ Honest weaknesses: │
│ ✗ One bad query can take down everything │
│ ✗ Deploy any change = deploy all changes │
└────────────────────────────────────────────────────────────┘
┌─ Microservices ───────────────────────────────────────────┐
│ Independent services that win on team autonomy at scale. │
│ │
│ Strongest arguments: │
│ 1. Teams can ship independently │
│ No coordination tax between unrelated changes. │
│ 2. Failure isolation │
│ Auth bug doesn't take down the billing service. │
│ 3. Polyglot freedom │
│ ML service in Python, edge service in Rust. │
│ │
│ Best fit: │
│ ✓ Teams of 20+ with distinct bounded contexts │
│ ✓ Independent scaling requirements per service │
│ │
│ Honest weaknesses: │
│ ✗ Distributed systems complexity tax │
│ ✗ Tracing a request across N services │
│ ✗ Network failures become a daily problem │
└────────────────────────────────────────────────────────────┘
╔══════════════════════════════════════════════════════════╗
║ 🏆 VERDICT ║
║ ║
║ Winner: Monolith ║
║ Confidence: 87% ║
║ ║
║ At 5 engineers and 10k DAU, microservices solve a ║
║ problem you don't have (team coordination) by creating ║
║ ones you can't afford (distributed systems complexity). ║
║ The 20-minute deploy is a CI problem, not an ║
║ architecture problem — split your test suite into ║
║ parallel runners and you'll cut it to 4 minutes without ║
║ any architectural change. ║
║ ║
║ Decisive factors: ║
║ • Team size << microservices break-even point ║
║ • Single bounded context (B2B SaaS) ║
║ • Actual pain (slow deploys) has cheaper solution ║
║ ║
║ When to reconsider: ║
║ → Team grows past 20 engineers ║
║ → Distinct domains emerge (e.g. adding a marketplace) ║
║ → Service-level scaling requirements diverge sharply ║
╚══════════════════════════════════════════════════════════╝
🧠 Extended Thinking Excerpt:
"Let me actually weigh this for the specific context. 5 engineers
is well below the team size where microservices typically pay off
(usually 20+). But the real pain mentioned is 20-min deploys —
that's actually a CI problem masquerading as an architecture
problem. Before reaching for the more complex solution, I should
check whether..."
Asking a normal LLM "monolith or microservices?" gives you whatever Hacker News currently believes. That's not analysis.
Extended thinking lets Opus 4.7:
- Hold two positions in mind simultaneously and reason about each one fairly
- Reason about the specific context instead of pattern-matching to the popular answer
- Catch hidden assumptions ("the pain is deploys → maybe the real fix is CI, not architecture")
- Calibrate confidence honestly — says 0.55 when it's actually close
A smaller model picks the popular answer. Opus 4.7 with extended thinking actually does the work.
pip install debate-ai
export ANTHROPIC_API_KEY=sk-ant-...# Basic
debate "REST vs GraphQL"
# With context (drastically improves verdict quality)
debate "Postgres vs Mongo" --context "Multi-tenant SaaS, 100 customers, mostly OLTP"
# Context from file
debate "Rewrite Rails app in Go" --context-file ./project-context.md
# More options (3+)
debate "AWS, GCP, or Cloudflare for a global edge API"
# Output formats
debate "Vue vs React" --output markdown > decision.md
debate "Vue vs React" --output json | jq .verdict.confidence
# More thinking
debate "Are we ready to move off Heroku" --thinking-budget 16000from anthropic import Anthropic
from debate import run_debate
from debate.report import print_terminal
client = Anthropic()
result = run_debate(
"Kafka or Redis Streams for our event bus",
client,
context="50 events/sec peak, 3-month retention, 2 engineers maintaining it",
thinking_budget=10000,
)
print_terminal(result)
# Or access structured data
print(f"Winner: {result.verdict.winner} ({result.verdict.confidence:.0%})")
for factor in result.verdict.decisive_factors:
print(f" → {factor}")| Where | What |
|---|---|
| Tech RFCs | Pre-fill the "alternatives considered" section with steelmanned versions |
| Architecture reviews | Get an outside view before locking in a decision |
| Eng leadership | Help juniors see why an option wins, not just that it wins |
| Personal decisions | "Should I learn Rust or Zig next?" — actually analyzed |
| Pre-mortem | Run a debate, then deliberately implement the losing side as a thought experiment |
debate/
├── cli.py # Click CLI — `debate "<question>"`
├── engine.py # Opus 4.7 extended thinking orchestrator
├── types.py # Pydantic models (Position, Verdict, DebateResult)
└── report.py # Rich terminal + Markdown output
The engine does it all in one Opus 4.7 call with extended thinking — model identifies options, steelmans each, weighs them in the thinking blocks, and outputs structured JSON. No multi-step orchestration, no chained calls — extended thinking IS the orchestration.
MIT © bhupendra05
Built because every "AWS vs GCP" thread on Hacker News loses thousands of engineer-hours to circular arguments. Now you can settle it in 30 seconds — with reasoning visible.