Skip to content

bhupendra05/debate

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

debate ⚖️

Make Claude argue both sides of any technical decision — then decide. Uses Claude Opus 4.7's extended thinking to steelman every position, weigh trade-offs, and give a verdict with calibrated confidence.

Python Built on Claude Opus 4.7 License


The problem

Every engineering team wastes weeks on the same debates:

  • PostgreSQL or MongoDB?
  • Microservices or monolith?
  • REST or GraphQL?
  • Rewrite or refactor?
  • AWS, GCP, or Cloudflare?

The arguments are circular because nobody steelmans the other side. People defend the option they already prefer, attack a strawman of the alternative, and miss the second-order effects.

Asking an LLM "which is better?" just gives you the popular answer. That's not what you need — you need rigorous analysis of both sides for your specific context.

What it does

debate "PostgreSQL vs MongoDB" --context "Multi-tenant B2B SaaS, 50 customers, mostly read-heavy"

debate does three things in one Opus 4.7 extended-thinking pass:

  1. STEELMANS each position — best argument, contexts where it wins, honest weaknesses
  2. WEIGHS the trade-offs against your specific context
  3. DECIDES with confidence score + decisive factors + conditions to reconsider

Demo

$ debate "Microservices or monolith" \
    --context "5-person team, B2B SaaS, 10k DAU, 20-min deploys are killing us"
╔══════════════════════════════════════════════════════════╗
║  ⚖️  DEBATE                                              ║
║  Microservices or monolith                               ║
║  Context: 5-person team, B2B SaaS, 20-min deploys...     ║
╚══════════════════════════════════════════════════════════╝

┌─ Monolith ────────────────────────────────────────────────┐
│ A single deployable that wins on simplicity and velocity  │
│ at small team sizes.                                      │
│                                                            │
│ Strongest arguments:                                       │
│   1. Cognitive overhead is minimal                        │
│      One codebase, one debugger, one log stream.          │
│      ev: GitLab ran a monolith to $400M ARR.              │
│   2. Refactoring is cheap                                  │
│      Function-level moves don't need API contracts.       │
│   3. Local dev loop is fast                                │
│      No service mesh, no docker-compose orchestra.        │
│                                                            │
│ Best fit:                                                  │
│   ✓ Team size under 15                                     │
│   ✓ Single product / single domain                         │
│   ✓ Velocity matters more than scale                       │
│                                                            │
│ Honest weaknesses:                                         │
│   ✗ One bad query can take down everything                 │
│   ✗ Deploy any change = deploy all changes                 │
└────────────────────────────────────────────────────────────┘

┌─ Microservices ───────────────────────────────────────────┐
│ Independent services that win on team autonomy at scale.  │
│                                                            │
│ Strongest arguments:                                       │
│   1. Teams can ship independently                          │
│      No coordination tax between unrelated changes.       │
│   2. Failure isolation                                     │
│      Auth bug doesn't take down the billing service.      │
│   3. Polyglot freedom                                      │
│      ML service in Python, edge service in Rust.          │
│                                                            │
│ Best fit:                                                  │
│   ✓ Teams of 20+ with distinct bounded contexts            │
│   ✓ Independent scaling requirements per service           │
│                                                            │
│ Honest weaknesses:                                         │
│   ✗ Distributed systems complexity tax                     │
│   ✗ Tracing a request across N services                    │
│   ✗ Network failures become a daily problem                │
└────────────────────────────────────────────────────────────┘

╔══════════════════════════════════════════════════════════╗
║  🏆 VERDICT                                              ║
║                                                           ║
║  Winner: Monolith                                         ║
║  Confidence: 87%                                          ║
║                                                           ║
║  At 5 engineers and 10k DAU, microservices solve a       ║
║  problem you don't have (team coordination) by creating  ║
║  ones you can't afford (distributed systems complexity). ║
║  The 20-minute deploy is a CI problem, not an            ║
║  architecture problem — split your test suite into       ║
║  parallel runners and you'll cut it to 4 minutes without ║
║  any architectural change.                                ║
║                                                           ║
║  Decisive factors:                                        ║
║    • Team size << microservices break-even point          ║
║    • Single bounded context (B2B SaaS)                    ║
║    • Actual pain (slow deploys) has cheaper solution      ║
║                                                           ║
║  When to reconsider:                                      ║
║    → Team grows past 20 engineers                         ║
║    → Distinct domains emerge (e.g. adding a marketplace)  ║
║    → Service-level scaling requirements diverge sharply   ║
╚══════════════════════════════════════════════════════════╝

🧠 Extended Thinking Excerpt:
"Let me actually weigh this for the specific context. 5 engineers
is well below the team size where microservices typically pay off
(usually 20+). But the real pain mentioned is 20-min deploys —
that's actually a CI problem masquerading as an architecture
problem. Before reaching for the more complex solution, I should
check whether..."

Why extended thinking matters here

Asking a normal LLM "monolith or microservices?" gives you whatever Hacker News currently believes. That's not analysis.

Extended thinking lets Opus 4.7:

  • Hold two positions in mind simultaneously and reason about each one fairly
  • Reason about the specific context instead of pattern-matching to the popular answer
  • Catch hidden assumptions ("the pain is deploys → maybe the real fix is CI, not architecture")
  • Calibrate confidence honestly — says 0.55 when it's actually close

A smaller model picks the popular answer. Opus 4.7 with extended thinking actually does the work.

Install

pip install debate-ai
export ANTHROPIC_API_KEY=sk-ant-...

Usage

CLI

# Basic
debate "REST vs GraphQL"

# With context (drastically improves verdict quality)
debate "Postgres vs Mongo" --context "Multi-tenant SaaS, 100 customers, mostly OLTP"

# Context from file
debate "Rewrite Rails app in Go" --context-file ./project-context.md

# More options (3+)
debate "AWS, GCP, or Cloudflare for a global edge API"

# Output formats
debate "Vue vs React" --output markdown > decision.md
debate "Vue vs React" --output json | jq .verdict.confidence

# More thinking
debate "Are we ready to move off Heroku" --thinking-budget 16000

Python

from anthropic import Anthropic
from debate import run_debate
from debate.report import print_terminal

client = Anthropic()

result = run_debate(
    "Kafka or Redis Streams for our event bus",
    client,
    context="50 events/sec peak, 3-month retention, 2 engineers maintaining it",
    thinking_budget=10000,
)

print_terminal(result)

# Or access structured data
print(f"Winner: {result.verdict.winner} ({result.verdict.confidence:.0%})")
for factor in result.verdict.decisive_factors:
    print(f"  → {factor}")

Use cases

Where What
Tech RFCs Pre-fill the "alternatives considered" section with steelmanned versions
Architecture reviews Get an outside view before locking in a decision
Eng leadership Help juniors see why an option wins, not just that it wins
Personal decisions "Should I learn Rust or Zig next?" — actually analyzed
Pre-mortem Run a debate, then deliberately implement the losing side as a thought experiment

Architecture

debate/
├── cli.py          # Click CLI — `debate "<question>"`
├── engine.py       # Opus 4.7 extended thinking orchestrator
├── types.py        # Pydantic models (Position, Verdict, DebateResult)
└── report.py       # Rich terminal + Markdown output

The engine does it all in one Opus 4.7 call with extended thinking — model identifies options, steelmans each, weighs them in the thinking blocks, and outputs structured JSON. No multi-step orchestration, no chained calls — extended thinking IS the orchestration.

License

MIT © bhupendra05


Built because every "AWS vs GCP" thread on Hacker News loses thousands of engineer-hours to circular arguments. Now you can settle it in 30 seconds — with reasoning visible.

About

Steelman both sides of any technical decision using Claude Opus 4.7's extended thinking — then decide with confidence

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages