Self-improving multi-agent development system built on Claude Agent SDK. Give it an idea — it designs, builds, tests, and deploys autonomously, then optimizes its own agent prompts via hill-climbing.
The system runs your project through a phased lifecycle, each handled by specialized AI agents:
idea → ideation → specification → architecture → environment-setup
→ development → testing → review → deployment → A/B testing → monitoring
Key features:
- Phase-based orchestration — each phase produces artifacts consumed by the next
- Dynamic agent factory — creates domain-specific agents on the fly (e.g., a "quant researcher" for fintech projects)
- Stack auto-discovery — detects your tech stack and configures LSP servers, MCP servers, and Claude Code plugins automatically
- Self-improvement engine — benchmarks agent performance and evolves prompts via mutation + hill-climbing optimization
- Git worktree sandbox — evaluates mutations in isolated worktrees so your working directory stays clean
- Checkpoint recovery — state persisted to disk after each phase, resume any time
- Node.js 20+
- Claude Code subscription (Pro, Max, or Team) — the system uses
@anthropic-ai/claude-agent-sdkwhich runs through Claude Code's authentication, no separate API key needed
git clone https://github.com/aerbaser/autonomous-dev-system.git
cd autonomous-dev-system
npm install
npm run buildThe system is designed to run inside Claude Code. Open the project directory in Claude Code and ask it to start development:
# Open the project in Claude Code
cd autonomous-dev-system
claude
# Then in Claude Code, say:
# "Run autonomous-dev with idea: Build a real-time collaborative todo app"Or run directly — Claude Code's SDK handles auth automatically when invoked within its context:
npx autonomous-dev run --idea "Build a real-time collaborative todo app with WebSocket sync"
# Check project status
npx autonomous-dev status
# Resume a previously started project (auto-detects state)
npx autonomous-dev run --idea "..."
# Run a specific phase
npx autonomous-dev phase --name testing
# Run self-improvement optimization
npx autonomous-dev optimize --max-iterations 10# Uses tsx for direct TypeScript execution
npm run dev -- run --idea "Build a CLI tool for managing bookmarks"The system looks for config in .autonomous-dev/config.json, or you can pass --config path/to/config.json:
{
"model": "claude-opus-4-6",
"subagentModel": "claude-sonnet-4-6",
"projectDir": ".",
"stateDir": ".autonomous-dev",
"selfImprove": {
"enabled": true,
"maxIterations": 50
},
"deployTarget": {
"provider": "vercel",
"config": {}
}
}Environment variables (all optional):
GITHUB_TOKEN— for GitHub integrationsSLACK_WEBHOOK_URL— for Slack notificationsPOSTHOG_API_KEY— for analytics
Note: No
ANTHROPIC_API_KEYneeded — the system uses Claude Agent SDK which authenticates through your Claude Code subscription.
src/
├── index.ts # CLI entry point (commander)
├── orchestrator.ts # Main phase loop with retry + checkpoints
├── phases/ # Phase handlers (one per lifecycle stage)
│ ├── types.ts # PhaseResult, PhaseHandler types
│ ├── ideation.ts # Idea → structured spec
│ ├── architecture.ts # Spec → tech stack + architecture design
│ ├── environment-setup.ts # Auto-configure LSP, MCP, plugins (parallel)
│ ├── development.ts # Facade → development-runner
│ ├── development-runner.ts # Task decomposition → parallel dev agents
│ ├── development-types.ts # Development phase types
│ ├── testing.ts # Structured output: TestingResultSchema
│ ├── review.ts # Structured output: ReviewResultSchema
│ ├── deployment.ts # Structured output: DeploymentResultSchema
│ ├── ab-testing.ts # A/B test design + analysis
│ └── monitoring.ts # Structured output: MonitoringResultSchema
├── agents/ # Agent management
│ ├── base-blueprints.ts # 7 base agents + getBaseAgentNames()
│ ├── factory.ts # Dynamic domain-specific agent creation
│ ├── registry.ts # Blueprint storage + performance tracking
│ ├── domain-analyzer.ts # Domain classification via LLM
│ └── stack-researcher.ts # Tech stack analysis via LLM
├── self-improve/ # Self-improvement engine
│ ├── optimizer.ts # Main optimization loop (facade)
│ ├── optimizer-runner.ts # Hill-climbing implementation
│ ├── mutation-engine.ts # Prompt/config mutation strategies
│ ├── benchmarks.ts # Benchmark suite runner
│ ├── convergence.ts # Stagnation/plateau detection
│ ├── sandbox.ts # Process + git worktree isolation
│ ├── verifiers.ts # Deterministic + LLM-judged verification
│ └── versioning.ts # Prompt version history
├── environment/ # Stack discovery + configuration
│ ├── lsp-manager.ts # LSP server install (async)
│ ├── mcp-manager.ts # MCP server configuration
│ ├── plugin-manager.ts # Plugin install (async)
│ ├── oss-scanner.ts # Open-source tool scanner
│ ├── claude-md-generator.ts # CLAUDE.md generation
│ └── validator.ts # Input validation for LSP/MCP/plugins
├── hooks/ # Claude Code hook handlers
│ ├── quality-gate.ts # Lint check on TaskCompleted (async)
│ ├── security.ts # Command/path deny-list enforcement
│ ├── idle-handler.ts # Idle agent management
│ ├── audit-logger.ts # Operation audit trail (JSONL)
│ ├── notifications.ts # Slack/webhook alerts
│ └── improvement-tracker.ts # Tool usage metrics (TTL-evicted)
├── state/ # Persistent state management
│ ├── project-state.ts # Immutable state, ALL_PHASES, phase transitions
│ └── session-store.ts # Session persistence
├── types/
│ └── llm-schemas.ts # Zod schemas for all JSON parsing + structured output
└── utils/ # Shared utilities
├── shared.ts # extractFirstJson, isApiRetry, isRecord, errMsg, wrapUserInput
├── config.ts # Zod-validated config loading
├── retry.ts # Exponential backoff retry
├── sdk-helpers.ts # consumeQuery, getQueryPermissions, getMaxTurns
├── progress.ts # Typed EventEmitter for phase progress
└── templates.ts # Prompt templates
benchmarks/ # External benchmark definitions
├── code-quality/tasks.json
├── test-generation/tasks.json
├── spec-completeness/tasks.json
├── architecture-quality/tasks.json
└── domain-specific/README.md
tests/ # 193 tests across 29 files
| Command | Description |
|---|---|
autonomous-dev run --idea "..." |
Start autonomous development |
autonomous-dev status |
Show project state |
autonomous-dev phase --name <phase> |
Run specific phase |
autonomous-dev optimize |
Run self-improvement loop |
npm run build # Compile TypeScript
npm run dev # Run with tsx (no build)
npm run test # Run all tests
npm run test:watch # Watch mode
npm run typecheck # Type checking
npm run lint # ESLintThe optimizer evolves agent prompts through a benchmark-driven loop:
- Benchmark current agent performance (code quality, test generation, spec completeness, architecture, build success)
- Mutate the worst-performing agent's prompt/config
- Evaluate the mutation (optionally in an isolated git worktree)
- Accept if score improved, rollback if not
- Repeat until convergence (stagnation or plateau detected)
Custom benchmarks can be added to benchmarks/<category>/tasks.json.
The system is production-ready. All phases implemented, 193 tests passing, input sanitization via XML delimiters, full Zod schema validation, cost tracking across all phases, and ESLint enforced in CI. See TODO.md for remaining low-priority items.
MIT