An autonomous multi-agent system that discovers research problems, writes academic papers, and iteratively finds optimal trading algorithms — all driven by Claude. Two modes, one CLI. Run it for an hour or a week; it keeps learning.
Point it at a topic. It discovers open problems, decomposes them into sub-questions, dispatches a swarm of Claude agents to research each one, runs experiments, generates figures, writes a LaTeX paper, runs a quality critic, compiles to PDF, and optionally publishes to ArXiv or Zenodo. Every paper also generates a blog post served via a Next.js web UI.
Point it at the crypto markets. It generates trading hypotheses, writes backtesting strategies, tests them against historical data, diagnoses failures, evolves better strategies, and repeats across N independent experiments. After each experiment it reflects on what worked, updates a persistent leaderboard, and extracts structured knowledge into an append-only knowledge graph — so every run makes the next run smarter.
Research mode Trading mode
──────────────────────────────── ────────────────────────────────────────
Gather problems (Tavily + Claude) Outer loop: N experiments
Evaluate viability Discover hypothesis (KG + LB aware)
Decompose into sub-problems Develop strategy code
Research swarm (parallel agents) Inner loop: 5 iterations
Plan experiments Test (backtest) → Diagnose → Evolve
Run code experiments + prototype Reflect (Claude synthesizes learnings)
Test & benchmark results Update leaderboard + knowledge graph
Compare against prior work
Generate figures (matplotlib) Kafka error bus
Write LaTeX paper Healer consumer (auto-fix broken code)
Critic quality check
Compile PDF
Publish (ArXiv / Zenodo)
Blog post → Next.js web UI
Every trading experiment produces two persistent artifacts that survive process restarts and inform future runs:
Leaderboard (outputs/trading/leaderboard.json) — ranks every experiment by composite score (Sharpe + 0.5 × CAGR). Before each new hypothesis is generated, Claude sees the top performers, recent failures, and which symbols have been tried, so it actively diversifies instead of repeating dead ends.
Knowledge graph (outputs/trading/knowledge_graph.json) — an append-only store of Subject-Predicate-Object triples extracted by Claude after every experiment. Compatible with ai-knowledge-graph for visualization. Example triples:
RSI oversold strategy → fails during → sustained bear markets
BTC/USDT 1d → shows → high mean-reversion in 2021
momentum crossover → performs well in → trending markets with low volatility
volume spike signal → requires → minimum 14-day warmup window
These triples are injected into every discovery prompt so hypothesis quality improves continuously across sessions.
Every experiment produces a markdown wiki (outputs/wiki/wiki_*.md) logging the hypothesis, each test iteration's metrics, the diagnosis, and what the evolved strategy changed. Readable by humans, referenced by the pipeline.
When a trading strategy crashes at runtime:
- The error is published to a
trading.errorsKafka topic. - The healer consumer picks it up, sends the broken file + full traceback to Claude, validates the fix, and writes the corrected code to disk.
- A success notification is published to
trading.fixes. - The pipeline resumes from the last checkpoint automatically.
The healer only touches files under outputs/ — it can never corrupt pipeline source code.
A FastAPI server at http://localhost:8585 streams phase start/end/error events via Server-Sent Events while any pipeline is running. The blog web UI connects to it for live status.
Research papers and trading wiki notes are automatically saved to a local database and served as blog posts. Deploy the Next.js frontend and browse all outputs at http://localhost:3000.
# 1. Clone and create virtual environment
python3 -m venv .venv
source .venv/bin/activate
# 2. Install dependencies
pip install -r requirements.txt
# 3. Configure API keys, agent choice, and publish settings
python -m src.main --setup
# 4. Start Kafka (required for trading mode self-healing)
docker compose up -d kafka
# 5. Authenticate the Claude CLI (used as the agent subprocess)
claude loginRequirements: Python 3.11+, Docker (for Kafka), claude CLI on PATH.
.env file (created by --setup or manually):
ANTHROPIC_API_KEY=your_key
TAVILY_API_KEY=your_key# Recommended: one command starts Kafka + healer + pipeline
./run.sh
# Custom experiment count (default: 10)
EXPERIMENTS=5 ./run.sh
# Or run directly
python -m src.main --mode trading --agent claude --experiments 10The pipeline runs indefinitely across the specified number of experiments. Each experiment:
- Generates a new hypothesis (using prior knowledge graph + leaderboard to avoid repeats)
- Writes and backtests a strategy (up to 5 evolve iterations)
- Reflects and updates the knowledge graph + leaderboard
- Starts the next experiment fresh
python -m src.main --mode research --topic "LLM efficiency and compression" --agent claude
# Limit scope for faster runs
python -m src.main --mode research --topic "attention mechanisms" --max-problems 2 --max-accepted 1python -m src.main --deploy-blog
# Blog at http://localhost:3000
# API + monitor at http://localhost:8585python -m src.main [OPTIONS]
Core
--mode {research,trading} Pipeline to run (default: research)
--agent {qwen,claude} Agent CLI to use (default: from config)
--setup Interactive setup wizard
Research
--topic TEXT Seed topic for problem discovery
--max-problems N Cap problems to evaluate (default: all)
--max-accepted N Cap accepted problems to process (default: all)
--max-research N Cap parallel research sub-tasks (default: all)
--publish {none,arxiv-pkg,zenodo} Post-PDF publish action (default: none)
Trading
--experiments N Independent experiments to run (default: 10)
--auto-resume Resume active session without prompting
Blog
--deploy-blog Start blog server only, skip pipeline
| Path | Description |
|---|---|
outputs/trading/strategy_exp{N}.py |
Best strategy from each experiment |
outputs/trading/leaderboard.json |
All experiments ranked by score |
outputs/trading/knowledge_graph.json |
Accumulated SPO triples across all runs |
outputs/wiki/wiki_*.md |
Per-experiment strategy development log |
outputs/{problem_id}/paper_draft.tex |
Generated LaTeX paper |
outputs/{problem_id}/*.pdf |
Compiled PDF |
outputs/metrics_report.json |
Token usage + timing per phase |
localhost:3000 |
Blog web UI |
localhost:8585 |
Real-time monitor / SSE stream |
src/
main.py Orchestrator + CLI entry point
leaderboard.py Trading experiment leaderboard
knowledge_graph.py Append-only SPO knowledge graph
healer_consumer.py Kafka-driven auto-heal service
monitor.py Real-time pipeline monitor (FastAPI + SSE)
metrics.py Token + timing metrics collector
pipeline/
trading/
discover.py Hypothesis generation (KG + leaderboard aware)
develop.py Strategy code generation
test.py Backtesting (investing_algorithm_framework)
diagnose.py Failure root-cause analysis
evolve.py LLM-driven strategy improvement
reflect.py Post-experiment synthesis
wiki.py Markdown log writer
session.py Checkpoint / resume state machine
error_bus.py Kafka publish + wait-for-fix
gather.py Problem discovery (Tavily + Claude)
evaluate.py Viability screening
decompose.py Sub-problem decomposition
research.py Parallel research agent swarm
plan.py Experiment planner
code.py Experimenter + prototyper
test.py Results evaluator
compare.py Baseline comparator
write.py LaTeX paper writer
critic.py Quality reviewer
pdf.py pdflatex compiler
blog_web/ Next.js blog frontend
outputs/ All generated artifacts
run.sh One-command startup (Kafka + healer + pipeline)
docker-compose.yml Kafka service definition
| Component | Library |
|---|---|
| Agent runtime | claude-agent-sdk, anthropic |
| Web search | tavily-python |
| Academic search | semanticscholar, arxiv |
| Backtesting | investing-algorithm-framework |
| Error bus | Apache Kafka via kafka-python |
| API / monitor | FastAPI + uvicorn |
| Blog frontend | Next.js |
| Figures | matplotlib |
| Data validation | pydantic |
| Knowledge graph format | ai-knowledge-graph compatible |