AutoForgeAI · leonvanzyl · Apr 11, 2026 · Apr 11, 2026
diff --git a/.claude/templates/auto_improve_prompt.template.md b/.claude/templates/auto_improve_prompt.template.md
@@ -0,0 +1,160 @@
+## YOUR ROLE - AUTO-IMPROVE AGENT
+
+You are running in **auto-improve mode**. Your entire job this session is to make the application **meaningfully better** in exactly ONE way. The project is already finished — all existing features pass. You are here to polish, enhance, and evolve it.
+
+This is a FRESH context window. You have no memory of previous sessions. Previous auto-improve sessions may have already added improvements. Your job is to pick ONE new improvement, implement it, and commit it.
+
+### STEP 1: GET YOUR BEARINGS
+
+Start by orienting yourself:
+
+```bash
+# Understand the project
+pwd
+ls -la
+cat app_spec.txt 2>/dev/null || cat .autoforge/prompts/app_spec.txt 2>/dev/null
+
+# See what's been done recently (previous auto-improvements, other commits)
+git log --oneline -20
+
+# See recent progress notes if they exist
+tail -200 claude-progress.txt 2>/dev/null || true
+```
+
+Then use MCP tools to check feature status:
+
+```
+Use the feature_get_stats tool
+Use the feature_get_summary tool
+```
+
+You are looking at an app that someone is running in "autopilot polish" mode. Respect what is already there. Read some of the actual source to get a feel for the codebase.
+
+### STEP 2: CHOOSE ONE MEANINGFUL IMPROVEMENT
+
+Brainstorm silently, then pick exactly ONE improvement. Valid categories:
+
+- **Performance** — cache a hot path, remove an N+1, memoize an expensive component, debounce a noisy handler
+- **UX / UI polish** — empty states, loading states, error states, keyboard shortcuts, micro-interactions, accessibility
+- **Visual design** — spacing, typography, color hierarchy, alignment, iconography
+- **Small new feature** — a natural next step that fits the app's purpose
+- **Security hardening** — input validation, authorization checks, rate limits, secret handling
+- **Refactor for clarity** — extract a confused function, rename a misleading variable, split a file that has outgrown itself
+- **Accessibility** — focus rings, aria-labels, keyboard navigation, color contrast
+- **Dependency / config** — bump a safe dep, tighten a lint rule that would catch a real class of bugs
+
+**Choose deliberately:**
+- The improvement must be genuinely useful to an end user or to future developers.
+- Prefer improvements that complement what's already there over inventing new scope.
+- If the app has obvious rough edges, fix those first before inventing new features.
+- Do NOT touch any feature on the Kanban that is currently `in_progress` — leave it alone.
+- Avoid duplicating past improvements (read `git log` to see what's already been done).
+
+### STEP 3: ADD THE IMPROVEMENT AS A FEATURE
+
+Call the `feature_create` MCP tool with:
+
+- `category`: e.g., `"Performance"`, `"UX Polish"`, `"Security"`, `"Refactor"`, `"Accessibility"`, `"New Feature"`
+- `name`: a short imperative title, e.g., `"Add empty state to project list"`
+- `description`: 1-3 sentences explaining what the change is and why it matters
+- `steps`: 3-5 concrete acceptance steps (what must be true when this is done)
+
+**Record the returned feature ID.** You will use it in later steps. Then mark it in progress:
+
+```
+Use the feature_mark_in_progress tool with feature_id={your_new_id}
+```
+
+### STEP 4: IMPLEMENT THE IMPROVEMENT
+
+Implement the change fully. Keep scope tight:
+
+- Edit only the files you need to change.
+- Don't add speculative abstractions or "while I'm here" refactors.
+- Don't add comments/docstrings to code you didn't touch.
+- Don't rename things that don't need renaming.
+- If you discover a bug that is NOT your chosen improvement, leave it alone (or note it in `claude-progress.txt` for a future session).
+
+If your improvement is a UI change, actually look at the result — take a screenshot with `playwright-cli` if the dev server is running, or at minimum open the relevant component and verify your edit makes sense.
+
+### STEP 5: VERIFY WITH LINT / TYPECHECK / BUILD
+
+**Mandatory.** Before committing, confirm the code still compiles cleanly. Pick the right commands based on the project type (check `package.json`, `pyproject.toml`, `Cargo.toml`, etc.).
+
+Typical command sets:
+
+- **Node / TypeScript / Vite / Next**: `npm run lint && npm run build`
+  (or `npm run typecheck` if it exists as a separate script)
+- **Python**: `ruff check . && mypy .` (or whatever is configured in `pyproject.toml`)
+- **Rust**: `cargo check && cargo clippy`
+- **Go**: `go vet ./... && go build ./...`
+
+**Resolve any issues your change introduced.** If lint/typecheck/build was already failing before your change (unrelated breakage), do NOT "fix" the unrelated failures — that's scope creep. Revert your change and pick a different improvement if the codebase is in a broken baseline state.
+
+### STEP 6: MARK THE FEATURE PASSING
+
+Call the feature MCP tool:
+
+```
+Use the feature_mark_passing tool with feature_id={your_new_id}
+```
+
+### STEP 7: CREATE A COMMIT
+
+Stage your changes and commit with a **short, concise, TLDR-style message**. One line for the subject, optionally one or two more for the "why". No verbose bullet lists, no trailing summaries.
+
+```bash
+git status
+git add <specific files you changed>
+git commit -m "Add empty state to project list when no projects exist"
+```
+
+Good commit message examples:
+- `"Cache project stats query to cut dashboard load time"`
+- `"Add keyboard shortcut (Cmd+K) to open command palette"`
+- `"Harden upload endpoint against oversized files"`
+- `"Extract confused session handling into its own module"`
+
+Bad commit message examples:
+- `"Various improvements"` (too vague)
+- `"Made the app better by implementing several changes to improve UX including..."` (too long)
+
+### STEP 8: EXIT THIS SESSION
+
+When the commit is created successfully, your work for this session is done. Do NOT try to find a second improvement — one per session is the rule. Stop and let the next scheduled tick handle the next improvement.
+
+---
+
+## GUARDRAILS (READ CAREFULLY)
+
+1. **One improvement per session.** If you finish early, don't start another. Exit cleanly.
+2. **Never skip lint / typecheck / build.** If they fail, fix or revert.
+3. **Never commit broken code.** A commit with failing lint/build is worse than no commit.
+4. **Don't touch features other agents are working on** (anything with `in_progress=True`).
+5. **Don't bypass the feature MCP tools.** Create a real Kanban feature for your change so it shows up in the UI.
+6. **Keep commit messages under 72 characters for the subject line.**
+7. **Don't add dependencies you don't need.** If the improvement needs a new package, be sure it's justified.
+8. **Respect the existing architecture.** Don't rewrite patterns the project has already committed to.
+
+---
+
+## BROWSER AUTOMATION (OPTIONAL)
+
+If your improvement is visual and the dev server is running, you may use `playwright-cli` to verify it renders correctly:
+
+- Open: `playwright-cli open http://localhost:PORT`
+- Screenshot: `playwright-cli screenshot`
+- Read the screenshot file to verify visual appearance
+- Close: `playwright-cli close`
+
+Browser verification is **optional** in auto-improve mode. Lint + typecheck + build is mandatory; visual verification is a bonus when relevant.
+
+---
+
+## SUCCESS CRITERIA
+
+A successful auto-improve session ends with:
+1. One new feature on the Kanban, marked passing.
+2. A clean git commit with a short TLDR message.
+3. No lint / typecheck / build errors introduced.
+4. The agent exits cleanly without starting a second improvement.
diff --git a/agent.py b/agent.py
@@ -31,6 +31,7 @@
 )
 from prompts import (
     copy_spec_to_project,
+    get_auto_improve_prompt,
     get_batch_feature_prompt,
     get_coding_prompt,
     get_initializer_prompt,
@@ -163,6 +164,7 @@ async def run_autonomous_agent(
     agent_type: Optional[str] = None,
     testing_feature_id: Optional[int] = None,
     testing_feature_ids: Optional[list[int]] = None,
+    auto_improve: bool = False,
 ) -> None:
     """
     Run the autonomous agent loop.
@@ -177,6 +179,9 @@ async def run_autonomous_agent(
         agent_type: Type of agent: "initializer", "coding", "testing", or None (auto-detect)
         testing_feature_id: For testing agents, the pre-claimed feature ID to test (legacy single mode)
         testing_feature_ids: For testing agents, list of feature IDs to batch test
+        auto_improve: If True, run in auto-improve mode (agent creates one
+            improvement feature, implements it, commits, and exits). Takes
+            precedence over other prompt selection branches.
     """
     print("\n" + "=" * 70)
     print("  AUTONOMOUS CODING AGENT")
@@ -185,6 +190,8 @@ async def run_autonomous_agent(
     print(f"Model: {model}")
     if agent_type:
         print(f"Agent type: {agent_type}")
+    if auto_improve:
+        print("Mode: AUTO-IMPROVE (one improvement + commit per session)")
     if yolo_mode:
         print("Mode: YOLO (testing agents disabled)")
     if feature_ids and len(feature_ids) > 1:
@@ -240,7 +247,8 @@ async def run_autonomous_agent(
 
         # Check if all features are already complete (before starting a new session)
         # Skip this check if running as initializer (needs to create features first)
-        if not is_initializer and iteration == 1:
+        # or auto-improve mode (intentionally runs against finished projects)
+        if not is_initializer and not auto_improve and iteration == 1:
             passing, in_progress, total, _nhi = count_passing_tests(project_dir)
             if total > 0 and passing == total:
                 print("\n" + "=" * 70)
@@ -262,7 +270,11 @@ async def run_autonomous_agent(
         client = create_client(project_dir, model, yolo_mode=yolo_mode, agent_type=agent_type)
 
         # Choose prompt based on agent type
-        if agent_type == "initializer":
+        # auto_improve takes precedence over other branches — it's a distinct
+        # mode where the agent creates its own feature before implementing it.
+        if auto_improve:
+            prompt = get_auto_improve_prompt(project_dir, yolo_mode=yolo_mode)
+        elif agent_type == "initializer":
             prompt = get_initializer_prompt(project_dir)
         elif agent_type == "testing":
             prompt = get_testing_prompt(project_dir, testing_feature_id, testing_feature_ids)

diff --git a/autonomous_agent_demo.py b/autonomous_agent_demo.py
@@ -186,6 +186,17 @@ def parse_args() -> argparse.Namespace:
         help="Max features per coding agent batch (1-15, default: 3)",
     )
 
+    parser.add_argument(
+        "--auto-improve",
+        action="store_true",
+        default=False,
+        help=(
+            "Run in auto-improve mode: a single agent session that analyses "
+            "the codebase, creates one improvement feature, implements it, "
+            "verifies with lint/typecheck/build, commits, and exits."
+        ),
+    )
+
     return parser.parse_args()
 
 
@@ -262,7 +273,22 @@ def main() -> None:
             return
 
     try:
-        if args.agent_type:
+        if args.auto_improve:
+            # Auto-improve mode: single agent session, one improvement per run.
+            # Bypasses the parallel orchestrator entirely — auto-improve is
+            # always single-agent, single-feature, and exits after one commit.
+            print("[AUTO-IMPROVE] Starting single-session improvement run...", flush=True)
+            asyncio.run(
+                run_autonomous_agent(
+                    project_dir=project_dir,
+                    model=args.model,
+                    max_iterations=1,
+                    yolo_mode=args.yolo,
+                    agent_type="coding",
+                    auto_improve=True,
+                )
+            )
+        elif args.agent_type:
             # Subprocess mode - spawned by orchestrator for a specific role
             asyncio.run(
                 run_autonomous_agent(

diff --git a/prompts.py b/prompts.py
@@ -151,6 +151,30 @@ def get_coding_prompt(project_dir: Path | None = None, yolo_mode: bool = False)
     return prompt
 
 
+def get_auto_improve_prompt(project_dir: Path | None = None, yolo_mode: bool = False) -> str:
+    """Load the auto-improve agent prompt (project-specific if available).
+
+    The auto-improve prompt instructs the agent to analyze an already-finished
+    project, pick ONE meaningful improvement, create a feature on the Kanban,
+    implement it, verify with lint/typecheck/build, mark passing, and commit.
+
+    Args:
+        project_dir: Optional project directory for project-specific prompts
+        yolo_mode: If True, strip browser automation sections for YOLO-mode
+            token savings. Browser verification is already optional in
+            auto-improve mode, so this is a small adjustment.
+
+    Returns:
+        The auto-improve prompt, optionally stripped of browser testing.
+    """
+    prompt = load_prompt("auto_improve_prompt", project_dir)
+
+    if yolo_mode:
+        prompt = _strip_browser_testing_sections(prompt)
+
+    return prompt
+
+
 def get_testing_prompt(
     project_dir: Path | None = None,
     testing_feature_id: int | None = None,