An autonomous prediction market trading agent powered by LLM probability estimation. The agent scans Polymarket for mispriced markets, estimates probabilities using a multi-pass Claude pipeline, sizes positions with Kelly criterion, and executes trades or paper-trades against a local portfolio.
Status: Research / paper trading. No confirmed out-of-sample edge yet — see Research Findings.
┌──────────────────────────────────────────────────────────┐
│ Autonomous Agent │
│ │
│ ┌─────────────┐ ┌──────────────┐ ┌────────────────┐ │
│ │ Research │ │ Strategy │ │ Tactical │ │
│ │ (async) │ │ (daily) │ │ (15 min) │ │
│ │ │ │ │ │ │ │
│ │ Web search │ │ Scan markets │ │ Full analysis │ │
│ │ News/dossier│ │ Haiku screen │ │ Sonnet estimate│ │
│ │ Price history│ │ Candidates │ │ Kelly sizing │ │
│ └─────────────┘ └──────────────┘ │ Execute trade │ │
│ └────────────────┘ │
└──────────────────────────────────────────────────────────┘
Each market estimate goes through four passes:
- Base Rate — Historical/domain prior (Claude Sonnet), no market price
- Bayesian Update — Incorporates web-searched research dossier
- Adversarial — Stress-tests the estimate against counterarguments
- Calibration — Optional Brier score correction from historical performance
Haiku handles cheap market screening; Sonnet handles full analysis.
| Database | Purpose |
|---|---|
polymarket_agent.db |
Live portfolio, trades, positions, predictions |
backtest.db |
90K+ resolved markets, simulation runs, hypothesis tracker |
Both use SQLite with WAL mode.
src/polymarket_agent/
analyst/ # LLM probability estimation (4-pass pipeline)
backtest/ # Historical DB, simulator, hypothesis tracker, domain classifier
cli/ # Click CLI entry point and commands
market/ # Gamma API, CLOB client, scanner, filter, storage
trading/ # Live executor, paper trading, Kelly sizing, calibration
risk/ # Kill switch, exposure limits
research/ # Async web search + domain data gathering
storage/ # Main DB schema, price snapshots
config.py # Pydantic settings (all config from .env)
models.py # Shared data models
src/polymarket_dashboard/
app.py # FastAPI app (7-tab UI, port 8050)
db.py # DashboardDB and BacktestDB reader classes
routes/ # portfolio, positions, calibration, operations,
# metrics, evaluation, hypotheses, data
- Python 3.12+
- uv (package manager)
- Anthropic API key (required)
- Tavily API key (required — web search for research)
- Polymarket private key + funder address (only for live trading)
# Clone and install core dependencies
git clone https://github.com/datori/market-edge.git
cd trading
uv sync
# Install dashboard dependencies (optional)
uv sync --extra dashboardCopy .env.example to .env and fill in your keys:
cp .env.example .env# Required
ANTHROPIC_API_KEY=sk-ant-...
TAVILY_API_KEY=tvly-...
# Live trading only
POLYMARKET_PRIVATE_KEY=0x...
POLYMARKET_FUNDER_ADDRESS=0x...
# Optional: Telegram alerts
TELEGRAM_BOT_TOKEN=
TELEGRAM_CHAT_ID=Key defaults (configurable via .env):
- Screening model:
claude-haiku-4-5-20251001 - Analysis model:
claude-sonnet-4-6 - Kelly fraction:
0.5, max position:$50, max portfolio:$500 - Scan interval: 900s, analysis interval: 3600s
# Show config and API key status
uv run polymarket config
# Scan for market opportunities
uv run polymarket scan
# Run the full autonomous agent loop
uv run polymarket run
# Paper trade a specific market
uv run polymarket analyze <market-id>
# Portfolio overview
uv run polymarket portfolio
# Kill switch (stops live trading)
uv run polymarket killuv run polymarket-dashboard --port 8050
# Open http://localhost:8050Seven tabs: Portfolio · Positions · Calibration · Operations · Costs & Ops · Evaluation · Hypotheses
The backtest system requires a separate backtest.db populated from Polymarket's historical data.
# Collect resolved markets (requires Gamma API access)
uv run polymarket backtest collect
# Run a simulation against historical markets
uv run polymarket backtest simulate --regime post-claude4 -n 50
# Multi-model evaluation with budget gating
uv run polymarket backtest evaluate --models haiku sonnet --budget 10 --dry-run
# Domain classification (actuarial vs current_event)
uv run polymarket backtest classify-domains --stats
# Hypothesis tracker
uv run polymarket backtest hypothesis list
uv run polymarket backtest hypothesis propose --name my-hypothesis --description "..."
uv run polymarket backtest hypothesis test <id>Seven experiments run across 22+ simulation runs (hundreds of trials). Full write-up in ideas/when-llm-leads.md and THESIS.md.
| Signal | Brier score |
|---|---|
| LLM base rate (no market price) | 0.2505 — near random |
| LLM anchored final estimate | 0.0261 |
| LLM blind estimate (no anchor) | 0.0905 — 3.5× worse |
| Market price | 0.0155 |
The LLM is a calibration layer on top of the market price, not an independent forecaster. When given the current market price, it produces well-calibrated final estimates. Without it, estimates are 3.5× worse.
Seismic frequency markets showed the strongest apparent signal (+0.014 Brier, n=53 trials): the LLM correctly applies global Poisson base rates (~12–15 M7.0+ earthquakes/year) while the market appeared to underprice intermediate time windows.
However, this signal is primarily a contamination artifact: all 25 seismic markets in the backtest DB resolve in 2025, entirely within the LLM's training window. The edge monotonically decays and reverses past the training cutoff.
What remains genuinely unknown: Whether markets structurally underprice earthquake frequency due to recency bias — independent of LLM training contamination. This requires prospective collection of seismic markets resolving in 2026+.
The system includes a structured experimentation framework (backtest hypothesis) for tracking alpha hypotheses through a lifecycle: proposed → testing → confirmed/rejected. Confidence decays with a 90-day half-life, triggering retests. Four seed hypotheses are pre-loaded.
uv run pytest tests/ # All 254 tests
uv run pytest tests/test_backtest.py -v # Backtest DB, simulator, hypothesis tracker
uv run pytest tests/test_integration.py -v # CLI, market pipeline, trading logicAll tests use tmp_path fixtures; no external services required (API calls are mocked).
uv run ruff check src/
uv run ruff check --fix src/This is an experimental research project. It does not constitute financial advice. Paper trading mode is the default and recommended for experimentation. Live trading requires explicit configuration and carries real financial risk.