A multi-agent AI system that takes a plain-English idea and produces a working, tested, deployable application.
$ python main.py "Build a URL shortener with click analytics"
Planner -> Designed Flask + SQLite architecture
Builder -> Wrote 6 source files
Critic -> Caught 2 critical bugs before execution
TestWriter -> Generated 8 pytest tests as a spec
Executor -> App started successfully
Tester -> 8/8 tests passed
Deployer -> Dockerfile + README + .env.example created
COMPLETE in 47 seconds
AutoDev AI is not a code generator. It is a self-correcting engineering system — a pipeline of specialised AI agents that collaborate through a structured LangGraph workflow to build, review, run, debug, test, and deploy software autonomously.
| Code Generator | AutoDev AI |
|---|---|
| Writes code once | Writes, reviews, fixes, retries |
| Bugs discovered at runtime | Critic catches issues before execution |
| No memory of past failures | Memory layer learns from every error |
| You write the tests | Tests generated early as a functional spec |
| You handle deployment | Produces Dockerfile and README automatically |
The project ships with a full web interface built on React, React Flow, TailwindCSS, Zustand, and Framer Motion.
Without an API key — the UI runs in demo mode, replaying a realistic simulation of the full pipeline with animated nodes, streaming logs, and reasoning traces.
With your Anthropic API key — click the key icon in the header, enter your sk-ant-... key, and the UI connects directly to the FastAPI backend. The pipeline runs live, streaming real agent events to your browser in real time.
1. Start the backend server
cd autodev
pip install -r requirements.txt
uvicorn server:app --reload --port 80002. Start the frontend
cd autodev-ui
npm install
npm run devOpen http://localhost:5173. Enter your idea, optionally add your Anthropic API key via the key icon, and click Run Live or Demo.
The frontend opens a WebSocket to ws://localhost:8000/ws and sends:
{ "idea": "Build a URL shortener", "api_key": "sk-ant-..." }The server creates an Anthropic client with your key, runs the LangGraph pipeline in a background thread, and streams events back as JSON:
{ "type": "agent_start", "agent": "Planner" }
{ "type": "log", "agent": "Planner", "message": "Analyzing idea..." }
{ "type": "agent_done", "agent": "Planner", "status": "success", "trace": [...] }
{ "type": "complete" }The UI maps these events to animated node states, live log entries, and reasoning trace panels in real time.
Your API key is never stored anywhere — it stays in browser memory for the duration of the session and is sent only over the WebSocket connection to your local server.
START
|
Planner adaptive thinking -> structured JSON plan
|
Builder tool calls -> writes all source files
|
Critic --(issues found)--> Builder (max 2 cycles)
|
+--(approved)-->
TestWriter writes tests as a spec before execution
|
Executor pip install -> run
|
+----------+----------+
| |
success failure
| |
Tester Debugger memory + escalation logic
| |
Deployer +--> Executor (retry loop)
|
Git (optional: --github)
|
END
The graph has two real back-edges managed by LangGraph — critic -> builder and debugger -> executor — forming proper retry loops. All agents share a typed StateGraph state so every decision is auditable.
Uses adaptive thinking on claude-opus-4-6 to reason through architecture trade-offs before producing a strict JSON plan covering tech stack, file list, entry point, dependencies, and setup steps. Every downstream agent builds from this single source of truth.
Equipped with create_file, read_file, and list_files tools. Claude decides what to write — there are no templates. On subsequent passes it accepts a structured issue list from the Critic and applies targeted fixes before rewriting.
Reviews all generated code before it runs. Catches missing imports, undefined variables, incorrect framework usage, database tables used before creation, and path assumptions that break at runtime. Returns APPROVED or NEEDS_FIXES with a structured issue list. Critical issues route back to the Builder. This single step significantly reduces debug cycles.
Writes tests/test_main.py before the app is executed. These tests act as a functional spec. When the Debugger runs, it receives the test file alongside the runtime error — it knows what the code was supposed to do, not just that it crashed.
Deterministic runner. Installs pip packages, runs setup steps, then launches the entry point. Returns a structured result with stdout, stderr, exit code, and duration. A timeout is treated as a successful start since server processes do not exit cleanly.
The most sophisticated agent in the system.
- Memory injection — queries
MemoryStorefor past errors matching the current failure. If the same error was fixed in a previous session, Claude sees what worked. - Error fingerprinting — hashes the root traceback line to detect repeated failures across iterations.
- Escalation — same error twice triggers a full rewrite of the affected file. Same error three times triggers a rewrite of all core files.
- Persistence — records every error and fix in memory for future projects.
Runs the spec written by TestWriter. Reports pass/fail counts. Surfaced in the final summary.
Generates production-ready artifacts: Dockerfile, docker-compose.yml, project README.md, and .env.example.
Runs git init, writes .gitignore, and commits. With --github --github-token, creates a public GitHub repository via the REST API and pushes.
workspaces/.memory/<project>.json
{
"error_patterns": {
"a3f9c1": {
"error_snippet": "ModuleNotFoundError: No module named 'flask'",
"count": 2,
"fixes": [
{
"summary": "Added flask import to app.py",
"files": ["app.py"],
"outcome": "resolved"
}
]
}
}
}Human-readable JSON. Persists across runs. The Debugger's effectiveness improves with use — past fixes are injected directly into the prompt when a matching error is detected.
git clone https://github.com/Shrithu10/AutoDev-AI.git
cd AutoDev-AI/autodev
pip install -r requirements.txt
export ANTHROPIC_API_KEY=sk-ant-...Requirements: Python 3.11+
# Build from an idea
python main.py "Build a URL shortener with click analytics"
# Increase the debug budget
python main.py "Build a REST API for a blog with authentication" --max-iter 8
# Show Claude's thinking blocks as they stream
python main.py "Build a CLI expense tracker" --demo
# Push to GitHub after build
python main.py "Build a weather dashboard" --github --github-token ghp_xxx
# Interactive prompt
python main.py --interactive| Flag | Default | Description |
|---|---|---|
idea |
— | The app to build |
--max-iter N |
6 |
Max build/debug cycles |
--demo |
off | Stream Claude's thinking blocks live |
--github |
off | Auto-push to GitHub after deploy |
--github-token |
$GITHUB_TOKEN |
GitHub personal access token |
--interactive |
off | Prompt for idea at runtime |
"URL shortener with click analytics and a dashboard""REST API for a blog with posts, comments, and authentication""CLI todo app with SQLite and due dates""File organiser that sorts a downloads folder by file type""Simple key-value store with a REST interface""CSV report generator from a SQLite database"
autodev/ # Python backend
├── agents/
│ ├── base_agent.py # Streaming tool-use loop, event emission
│ ├── planner_agent.py # JSON plan via adaptive thinking
│ ├── builder_agent.py # File-writing tools
│ ├── critic_agent.py # Static review before execution
│ ├── test_writer_agent.py # Early test spec
│ ├── executor_agent.py # Subprocess runner
│ ├── debug_agent.py # Memory-augmented debugger
│ ├── tester_agent.py # Runs test spec
│ ├── deployment_agent.py # Dockerfile and README generation
│ └── git_agent.py # Git commit and GitHub push
├── core/
│ ├── orchestrator.py # LangGraph StateGraph + run_pipeline()
│ ├── state.py # Typed shared state
│ ├── workspace.py # Path-safe file I/O
│ ├── memory.py # Persistent error learning
│ └── events.py # Thread-local WebSocket event bus
├── tools/
│ └── code_runner.py # Subprocess execution
├── ui/
│ └── terminal.py # Terminal output
├── server.py # FastAPI WebSocket server
├── main.py # CLI entry point
├── config.py
└── requirements.txt
autodev-ui/ # React frontend
├── src/
│ ├── components/
│ │ ├── AgentGraph.jsx # React Flow pipeline visualisation
│ │ ├── LogPanel.jsx # Live log stream
│ │ ├── TracePanel.jsx # Agent reasoning trace
│ │ └── InputSection.jsx # Idea input + API key field
│ ├── store/
│ │ └── useStore.js # Zustand state + WebSocket client
│ ├── data/
│ │ └── simulation.js # Demo mode data
│ └── App.jsx
├── .env.example
└── package.json
LangGraph for orchestration. The retry loops are real graph back-edges, not Python control flow. Conditional edges read from state to make routing decisions. This makes the system inspectable and checkpointable.
Tool-use for file writes. Agents write files via structured tool calls, not string output parsing. Every write is logged and path-traversal-safe. Claude decides the content; the harness controls where it lands.
Adaptive thinking on every call.
thinking: {type: "adaptive"} lets Claude decide how much to reason per request — minimal for simple tool calls, deep for architecture and debugging. In --demo mode the thinking streams to the terminal.
Tests before execution. Writing tests before running the app gives the Debugger a functional spec. The fix target becomes precise rather than speculative.
User-supplied API key.
The WebSocket server accepts the Anthropic API key per-request. No key is stored server-side. The thread-local event bus (core/events.py) routes events from agents back to the WebSocket stream without coupling agent code to network concerns.
Demo mode without a key. The frontend works fully without a backend. If no valid API key is provided, the UI replays a pre-scripted simulation of the pipeline, making it easy to explore the interface before committing to a live run.
MIT