debate ⚖️

Make Claude argue both sides of any technical decision — then decide. Uses Claude Opus 4.7's extended thinking to steelman every position, weigh trade-offs, and give a verdict with calibrated confidence.

The problem

Every engineering team wastes weeks on the same debates:

PostgreSQL or MongoDB?
Microservices or monolith?
REST or GraphQL?
Rewrite or refactor?
AWS, GCP, or Cloudflare?

The arguments are circular because nobody steelmans the other side. People defend the option they already prefer, attack a strawman of the alternative, and miss the second-order effects.

Asking an LLM "which is better?" just gives you the popular answer. That's not what you need — you need rigorous analysis of both sides for your specific context.

What it does

debate "PostgreSQL vs MongoDB" --context "Multi-tenant B2B SaaS, 50 customers, mostly read-heavy"

debate does three things in one Opus 4.7 extended-thinking pass:

STEELMANS each position — best argument, contexts where it wins, honest weaknesses
WEIGHS the trade-offs against your specific context
DECIDES with confidence score + decisive factors + conditions to reconsider

Demo

$ debate "Microservices or monolith" \
    --context "5-person team, B2B SaaS, 10k DAU, 20-min deploys are killing us"

╔══════════════════════════════════════════════════════════╗
║  ⚖️  DEBATE                                              ║
║  Microservices or monolith                               ║
║  Context: 5-person team, B2B SaaS, 20-min deploys...     ║
╚══════════════════════════════════════════════════════════╝

┌─ Monolith ────────────────────────────────────────────────┐
│ A single deployable that wins on simplicity and velocity  │
│ at small team sizes.                                      │
│                                                            │
│ Strongest arguments:                                       │
│   1. Cognitive overhead is minimal                        │
│      One codebase, one debugger, one log stream.          │
│      ev: GitLab ran a monolith to $400M ARR.              │
│   2. Refactoring is cheap                                  │
│      Function-level moves don't need API contracts.       │
│   3. Local dev loop is fast                                │
│      No service mesh, no docker-compose orchestra.        │
│                                                            │
│ Best fit:                                                  │
│   ✓ Team size under 15                                     │
│   ✓ Single product / single domain                         │
│   ✓ Velocity matters more than scale                       │
│                                                            │
│ Honest weaknesses:                                         │
│   ✗ One bad query can take down everything                 │
│   ✗ Deploy any change = deploy all changes                 │
└────────────────────────────────────────────────────────────┘

┌─ Microservices ───────────────────────────────────────────┐
│ Independent services that win on team autonomy at scale.  │
│                                                            │
│ Strongest arguments:                                       │
│   1. Teams can ship independently                          │
│      No coordination tax between unrelated changes.       │
│   2. Failure isolation                                     │
│      Auth bug doesn't take down the billing service.      │
│   3. Polyglot freedom                                      │
│      ML service in Python, edge service in Rust.          │
│                                                            │
│ Best fit:                                                  │
│   ✓ Teams of 20+ with distinct bounded contexts            │
│   ✓ Independent scaling requirements per service           │
│                                                            │
│ Honest weaknesses:                                         │
│   ✗ Distributed systems complexity tax                     │
│   ✗ Tracing a request across N services                    │
│   ✗ Network failures become a daily problem                │
└────────────────────────────────────────────────────────────┘

╔══════════════════════════════════════════════════════════╗
║  🏆 VERDICT                                              ║
║                                                           ║
║  Winner: Monolith                                         ║
║  Confidence: 87%                                          ║
║                                                           ║
║  At 5 engineers and 10k DAU, microservices solve a       ║
║  problem you don't have (team coordination) by creating  ║
║  ones you can't afford (distributed systems complexity). ║
║  The 20-minute deploy is a CI problem, not an            ║
║  architecture problem — split your test suite into       ║
║  parallel runners and you'll cut it to 4 minutes without ║
║  any architectural change.                                ║
║                                                           ║
║  Decisive factors:                                        ║
║    • Team size << microservices break-even point          ║
║    • Single bounded context (B2B SaaS)                    ║
║    • Actual pain (slow deploys) has cheaper solution      ║
║                                                           ║
║  When to reconsider:                                      ║
║    → Team grows past 20 engineers                         ║
║    → Distinct domains emerge (e.g. adding a marketplace)  ║
║    → Service-level scaling requirements diverge sharply   ║
╚══════════════════════════════════════════════════════════╝

🧠 Extended Thinking Excerpt:
"Let me actually weigh this for the specific context. 5 engineers
is well below the team size where microservices typically pay off
(usually 20+). But the real pain mentioned is 20-min deploys —
that's actually a CI problem masquerading as an architecture
problem. Before reaching for the more complex solution, I should
check whether..."

Why extended thinking matters here

Asking a normal LLM "monolith or microservices?" gives you whatever Hacker News currently believes. That's not analysis.

Extended thinking lets Opus 4.7:

Hold two positions in mind simultaneously and reason about each one fairly
Reason about the specific context instead of pattern-matching to the popular answer
Catch hidden assumptions ("the pain is deploys → maybe the real fix is CI, not architecture")
Calibrate confidence honestly — says 0.55 when it's actually close

A smaller model picks the popular answer. Opus 4.7 with extended thinking actually does the work.

Install

pip install debate-ai
export ANTHROPIC_API_KEY=sk-ant-...

Usage

CLI

# Basic
debate "REST vs GraphQL"

# With context (drastically improves verdict quality)
debate "Postgres vs Mongo" --context "Multi-tenant SaaS, 100 customers, mostly OLTP"

# Context from file
debate "Rewrite Rails app in Go" --context-file ./project-context.md

# More options (3+)
debate "AWS, GCP, or Cloudflare for a global edge API"

# Output formats
debate "Vue vs React" --output markdown > decision.md
debate "Vue vs React" --output json | jq .verdict.confidence

# More thinking
debate "Are we ready to move off Heroku" --thinking-budget 16000

Python

from anthropic import Anthropic
from debate import run_debate
from debate.report import print_terminal

client = Anthropic()

result = run_debate(
    "Kafka or Redis Streams for our event bus",
    client,
    context="50 events/sec peak, 3-month retention, 2 engineers maintaining it",
    thinking_budget=10000,
)

print_terminal(result)

# Or access structured data
print(f"Winner: {result.verdict.winner} ({result.verdict.confidence:.0%})")
for factor in result.verdict.decisive_factors:
    print(f"  → {factor}")

Use cases

Where	What
Tech RFCs	Pre-fill the "alternatives considered" section with steelmanned versions
Architecture reviews	Get an outside view before locking in a decision
Eng leadership	Help juniors see why an option wins, not just that it wins
Personal decisions	"Should I learn Rust or Zig next?" — actually analyzed
Pre-mortem	Run a debate, then deliberately implement the losing side as a thought experiment

Architecture

debate/
├── cli.py          # Click CLI — `debate "<question>"`
├── engine.py       # Opus 4.7 extended thinking orchestrator
├── types.py        # Pydantic models (Position, Verdict, DebateResult)
└── report.py       # Rich terminal + Markdown output

The engine does it all in one Opus 4.7 call with extended thinking — model identifies options, steelmans each, weighs them in the thinking blocks, and outputs structured JSON. No multi-step orchestration, no chained calls — extended thinking IS the orchestration.

License

Built because every "AWS vs GCP" thread on Hacker News loses thousands of engineer-hours to circular arguments. Now you can settle it in 30 seconds — with reasoning visible.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github		.github
debate		debate
examples		examples
tests		tests
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

debate ⚖️

The problem

What it does

Demo

Why extended thinking matters here

Install

Usage

CLI

Python

Use cases

Architecture

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

debate ⚖️

The problem

What it does

Demo

Why extended thinking matters here

Install

Usage

CLI

Python

Use cases

Architecture

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages