AutoDev AI

A multi-agent AI system that takes a plain-English idea and produces a working, tested, deployable application.

$ python main.py "Build a URL shortener with click analytics"

Planner     -> Designed Flask + SQLite architecture
Builder     -> Wrote 6 source files
Critic      -> Caught 2 critical bugs before execution
TestWriter  -> Generated 8 pytest tests as a spec
Executor    -> App started successfully
Tester      -> 8/8 tests passed
Deployer    -> Dockerfile + README + .env.example created

COMPLETE in 47 seconds

Overview

AutoDev AI is not a code generator. It is a self-correcting engineering system — a pipeline of specialised AI agents that collaborate through a structured LangGraph workflow to build, review, run, debug, test, and deploy software autonomously.

Code Generator	AutoDev AI
Writes code once	Writes, reviews, fixes, retries
Bugs discovered at runtime	Critic catches issues before execution
No memory of past failures	Memory layer learns from every error
You write the tests	Tests generated early as a functional spec
You handle deployment	Produces Dockerfile and README automatically

Web UI

The project ships with a full web interface built on React, React Flow, TailwindCSS, Zustand, and Framer Motion.

Without an API key — the UI runs in demo mode, replaying a realistic simulation of the full pipeline with animated nodes, streaming logs, and reasoning traces.

With your Anthropic API key — click the key icon in the header, enter your sk-ant-... key, and the UI connects directly to the FastAPI backend. The pipeline runs live, streaming real agent events to your browser in real time.

Running the UI

1. Start the backend server

cd autodev
pip install -r requirements.txt
uvicorn server:app --reload --port 8000

2. Start the frontend

cd autodev-ui
npm install
npm run dev

Open http://localhost:5173. Enter your idea, optionally add your Anthropic API key via the key icon, and click Run Live or Demo.

How the connection works

The frontend opens a WebSocket to ws://localhost:8000/ws and sends:

{ "idea": "Build a URL shortener", "api_key": "sk-ant-..." }

The server creates an Anthropic client with your key, runs the LangGraph pipeline in a background thread, and streams events back as JSON:

{ "type": "agent_start", "agent": "Planner" }
{ "type": "log",         "agent": "Planner", "message": "Analyzing idea..." }
{ "type": "agent_done",  "agent": "Planner", "status": "success", "trace": [...] }
{ "type": "complete" }

The UI maps these events to animated node states, live log entries, and reasoning trace panels in real time.

Your API key is never stored anywhere — it stays in browser memory for the duration of the session and is sent only over the WebSocket connection to your local server.

Architecture

START
  |
Planner       adaptive thinking -> structured JSON plan
  |
Builder       tool calls -> writes all source files
  |
Critic   --(issues found)--> Builder   (max 2 cycles)
  |
  +--(approved)-->
                 TestWriter    writes tests as a spec before execution
                     |
                 Executor      pip install -> run
                     |
          +----------+----------+
          |                     |
       success               failure
          |                     |
       Tester              Debugger    memory + escalation logic
          |                     |
       Deployer                 +--> Executor (retry loop)
          |
       Git (optional: --github)
          |
         END

The graph has two real back-edges managed by LangGraph — critic -> builder and debugger -> executor — forming proper retry loops. All agents share a typed StateGraph state so every decision is auditable.

Agents

Planner

Uses adaptive thinking on claude-opus-4-6 to reason through architecture trade-offs before producing a strict JSON plan covering tech stack, file list, entry point, dependencies, and setup steps. Every downstream agent builds from this single source of truth.

Builder

Equipped with create_file, read_file, and list_files tools. Claude decides what to write — there are no templates. On subsequent passes it accepts a structured issue list from the Critic and applies targeted fixes before rewriting.

Critic

Reviews all generated code before it runs. Catches missing imports, undefined variables, incorrect framework usage, database tables used before creation, and path assumptions that break at runtime. Returns APPROVED or NEEDS_FIXES with a structured issue list. Critical issues route back to the Builder. This single step significantly reduces debug cycles.

TestWriter

Writes tests/test_main.py before the app is executed. These tests act as a functional spec. When the Debugger runs, it receives the test file alongside the runtime error — it knows what the code was supposed to do, not just that it crashed.

Executor

Deterministic runner. Installs pip packages, runs setup steps, then launches the entry point. Returns a structured result with stdout, stderr, exit code, and duration. A timeout is treated as a successful start since server processes do not exit cleanly.

Debugger

The most sophisticated agent in the system.

Memory injection — queries MemoryStore for past errors matching the current failure. If the same error was fixed in a previous session, Claude sees what worked.
Error fingerprinting — hashes the root traceback line to detect repeated failures across iterations.
Escalation — same error twice triggers a full rewrite of the affected file. Same error three times triggers a rewrite of all core files.
Persistence — records every error and fix in memory for future projects.

Tester

Runs the spec written by TestWriter. Reports pass/fail counts. Surfaced in the final summary.

Deployer

Generates production-ready artifacts: Dockerfile, docker-compose.yml, project README.md, and .env.example.

GitAgent

Runs git init, writes .gitignore, and commits. With --github --github-token, creates a public GitHub repository via the REST API and pushes.

Memory

workspaces/.memory/<project>.json

{
  "error_patterns": {
    "a3f9c1": {
      "error_snippet": "ModuleNotFoundError: No module named 'flask'",
      "count": 2,
      "fixes": [
        {
          "summary": "Added flask import to app.py",
          "files": ["app.py"],
          "outcome": "resolved"
        }
      ]
    }
  }
}

Human-readable JSON. Persists across runs. The Debugger's effectiveness improves with use — past fixes are injected directly into the prompt when a matching error is detected.

CLI Installation

git clone https://github.com/Shrithu10/AutoDev-AI.git
cd AutoDev-AI/autodev

pip install -r requirements.txt

export ANTHROPIC_API_KEY=sk-ant-...

Requirements: Python 3.11+

CLI Usage

# Build from an idea
python main.py "Build a URL shortener with click analytics"

# Increase the debug budget
python main.py "Build a REST API for a blog with authentication" --max-iter 8

# Show Claude's thinking blocks as they stream
python main.py "Build a CLI expense tracker" --demo

# Push to GitHub after build
python main.py "Build a weather dashboard" --github --github-token ghp_xxx

# Interactive prompt
python main.py --interactive

Flags

Flag	Default	Description
`idea`	—	The app to build
`--max-iter N`	`6`	Max build/debug cycles
`--demo`	off	Stream Claude's thinking blocks live
`--github`	off	Auto-push to GitHub after deploy
`--github-token`	`$GITHUB_TOKEN`	GitHub personal access token
`--interactive`	off	Prompt for idea at runtime

Example prompts

"URL shortener with click analytics and a dashboard"
"REST API for a blog with posts, comments, and authentication"
"CLI todo app with SQLite and due dates"
"File organiser that sorts a downloads folder by file type"
"Simple key-value store with a REST interface"
"CSV report generator from a SQLite database"

Project structure

autodev/                        # Python backend
├── agents/
│   ├── base_agent.py           # Streaming tool-use loop, event emission
│   ├── planner_agent.py        # JSON plan via adaptive thinking
│   ├── builder_agent.py        # File-writing tools
│   ├── critic_agent.py         # Static review before execution
│   ├── test_writer_agent.py    # Early test spec
│   ├── executor_agent.py       # Subprocess runner
│   ├── debug_agent.py          # Memory-augmented debugger
│   ├── tester_agent.py         # Runs test spec
│   ├── deployment_agent.py     # Dockerfile and README generation
│   └── git_agent.py            # Git commit and GitHub push
├── core/
│   ├── orchestrator.py         # LangGraph StateGraph + run_pipeline()
│   ├── state.py                # Typed shared state
│   ├── workspace.py            # Path-safe file I/O
│   ├── memory.py               # Persistent error learning
│   └── events.py               # Thread-local WebSocket event bus
├── tools/
│   └── code_runner.py          # Subprocess execution
├── ui/
│   └── terminal.py             # Terminal output
├── server.py                   # FastAPI WebSocket server
├── main.py                     # CLI entry point
├── config.py
└── requirements.txt

autodev-ui/                     # React frontend
├── src/
│   ├── components/
│   │   ├── AgentGraph.jsx      # React Flow pipeline visualisation
│   │   ├── LogPanel.jsx        # Live log stream
│   │   ├── TracePanel.jsx      # Agent reasoning trace
│   │   └── InputSection.jsx    # Idea input + API key field
│   ├── store/
│   │   └── useStore.js         # Zustand state + WebSocket client
│   ├── data/
│   │   └── simulation.js       # Demo mode data
│   └── App.jsx
├── .env.example
└── package.json

Design decisions

LangGraph for orchestration. The retry loops are real graph back-edges, not Python control flow. Conditional edges read from state to make routing decisions. This makes the system inspectable and checkpointable.

Tool-use for file writes. Agents write files via structured tool calls, not string output parsing. Every write is logged and path-traversal-safe. Claude decides the content; the harness controls where it lands.

Adaptive thinking on every call. thinking: {type: "adaptive"} lets Claude decide how much to reason per request — minimal for simple tool calls, deep for architecture and debugging. In --demo mode the thinking streams to the terminal.

Tests before execution. Writing tests before running the app gives the Debugger a functional spec. The fix target becomes precise rather than speculative.

User-supplied API key. The WebSocket server accepts the Anthropic API key per-request. No key is stored server-side. The thread-local event bus (core/events.py) routes events from agents back to the WebSocket stream without coupling agent code to network concerns.

Demo mode without a key. The frontend works fully without a backend. If no valid API key is provided, the UI replays a pre-scripted simulation of the pipeline, making it easy to explore the interface before committing to a live run.

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AutoDev AI

Overview

Web UI

Running the UI

How the connection works

Architecture

Agents

Planner

Builder

Critic

TestWriter

Executor

Debugger

Tester

Deployer

GitAgent

Memory

CLI Installation

CLI Usage

Flags

Example prompts

Project structure

Design decisions

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
agents		agents
core		core
tools		tools
ui		ui
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.py		config.py
main.py		main.py
requirements.txt		requirements.txt
server.py		server.py

Folders and files

Latest commit

History

Repository files navigation

AutoDev AI

Overview

Web UI

Running the UI

How the connection works

Architecture

Agents

Planner

Builder

Critic

TestWriter

Executor

Debugger

Tester

Deployer

GitAgent

Memory

CLI Installation

CLI Usage

Flags

Example prompts

Project structure

Design decisions

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages