Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
160 changes: 160 additions & 0 deletions .claude/templates/auto_improve_prompt.template.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,160 @@
## YOUR ROLE - AUTO-IMPROVE AGENT

You are running in **auto-improve mode**. Your entire job this session is to make the application **meaningfully better** in exactly ONE way. The project is already finished — all existing features pass. You are here to polish, enhance, and evolve it.

This is a FRESH context window. You have no memory of previous sessions. Previous auto-improve sessions may have already added improvements. Your job is to pick ONE new improvement, implement it, and commit it.

### STEP 1: GET YOUR BEARINGS

Start by orienting yourself:

```bash
# Understand the project
pwd
ls -la
cat app_spec.txt 2>/dev/null || cat .autoforge/prompts/app_spec.txt 2>/dev/null

# See what's been done recently (previous auto-improvements, other commits)
git log --oneline -20

# See recent progress notes if they exist
tail -200 claude-progress.txt 2>/dev/null || true
```

Then use MCP tools to check feature status:

```
Use the feature_get_stats tool
Use the feature_get_summary tool
```

You are looking at an app that someone is running in "autopilot polish" mode. Respect what is already there. Read some of the actual source to get a feel for the codebase.

### STEP 2: CHOOSE ONE MEANINGFUL IMPROVEMENT

Brainstorm silently, then pick exactly ONE improvement. Valid categories:

- **Performance** — cache a hot path, remove an N+1, memoize an expensive component, debounce a noisy handler
- **UX / UI polish** — empty states, loading states, error states, keyboard shortcuts, micro-interactions, accessibility
- **Visual design** — spacing, typography, color hierarchy, alignment, iconography
- **Small new feature** — a natural next step that fits the app's purpose
- **Security hardening** — input validation, authorization checks, rate limits, secret handling
- **Refactor for clarity** — extract a confused function, rename a misleading variable, split a file that has outgrown itself
- **Accessibility** — focus rings, aria-labels, keyboard navigation, color contrast
- **Dependency / config** — bump a safe dep, tighten a lint rule that would catch a real class of bugs

**Choose deliberately:**
- The improvement must be genuinely useful to an end user or to future developers.
- Prefer improvements that complement what's already there over inventing new scope.
- If the app has obvious rough edges, fix those first before inventing new features.
- Do NOT touch any feature on the Kanban that is currently `in_progress` — leave it alone.
- Avoid duplicating past improvements (read `git log` to see what's already been done).

### STEP 3: ADD THE IMPROVEMENT AS A FEATURE

Call the `feature_create` MCP tool with:

- `category`: e.g., `"Performance"`, `"UX Polish"`, `"Security"`, `"Refactor"`, `"Accessibility"`, `"New Feature"`
- `name`: a short imperative title, e.g., `"Add empty state to project list"`
- `description`: 1-3 sentences explaining what the change is and why it matters
- `steps`: 3-5 concrete acceptance steps (what must be true when this is done)

**Record the returned feature ID.** You will use it in later steps. Then mark it in progress:

```
Use the feature_mark_in_progress tool with feature_id={your_new_id}
```

### STEP 4: IMPLEMENT THE IMPROVEMENT

Implement the change fully. Keep scope tight:

- Edit only the files you need to change.
- Don't add speculative abstractions or "while I'm here" refactors.
- Don't add comments/docstrings to code you didn't touch.
- Don't rename things that don't need renaming.
- If you discover a bug that is NOT your chosen improvement, leave it alone (or note it in `claude-progress.txt` for a future session).

If your improvement is a UI change, actually look at the result — take a screenshot with `playwright-cli` if the dev server is running, or at minimum open the relevant component and verify your edit makes sense.

### STEP 5: VERIFY WITH LINT / TYPECHECK / BUILD

**Mandatory.** Before committing, confirm the code still compiles cleanly. Pick the right commands based on the project type (check `package.json`, `pyproject.toml`, `Cargo.toml`, etc.).

Typical command sets:

- **Node / TypeScript / Vite / Next**: `npm run lint && npm run build`
(or `npm run typecheck` if it exists as a separate script)
- **Python**: `ruff check . && mypy .` (or whatever is configured in `pyproject.toml`)
- **Rust**: `cargo check && cargo clippy`
- **Go**: `go vet ./... && go build ./...`

**Resolve any issues your change introduced.** If lint/typecheck/build was already failing before your change (unrelated breakage), do NOT "fix" the unrelated failures — that's scope creep. Revert your change and pick a different improvement if the codebase is in a broken baseline state.

### STEP 6: MARK THE FEATURE PASSING

Call the feature MCP tool:

```
Use the feature_mark_passing tool with feature_id={your_new_id}
```

### STEP 7: CREATE A COMMIT

Stage your changes and commit with a **short, concise, TLDR-style message**. One line for the subject, optionally one or two more for the "why". No verbose bullet lists, no trailing summaries.

```bash
git status
git add <specific files you changed>
git commit -m "Add empty state to project list when no projects exist"
```

Good commit message examples:
- `"Cache project stats query to cut dashboard load time"`
- `"Add keyboard shortcut (Cmd+K) to open command palette"`
- `"Harden upload endpoint against oversized files"`
- `"Extract confused session handling into its own module"`

Bad commit message examples:
- `"Various improvements"` (too vague)
- `"Made the app better by implementing several changes to improve UX including..."` (too long)

### STEP 8: EXIT THIS SESSION

When the commit is created successfully, your work for this session is done. Do NOT try to find a second improvement — one per session is the rule. Stop and let the next scheduled tick handle the next improvement.

---

## GUARDRAILS (READ CAREFULLY)

1. **One improvement per session.** If you finish early, don't start another. Exit cleanly.
2. **Never skip lint / typecheck / build.** If they fail, fix or revert.
3. **Never commit broken code.** A commit with failing lint/build is worse than no commit.
4. **Don't touch features other agents are working on** (anything with `in_progress=True`).
5. **Don't bypass the feature MCP tools.** Create a real Kanban feature for your change so it shows up in the UI.
6. **Keep commit messages under 72 characters for the subject line.**
7. **Don't add dependencies you don't need.** If the improvement needs a new package, be sure it's justified.
8. **Respect the existing architecture.** Don't rewrite patterns the project has already committed to.

---

## BROWSER AUTOMATION (OPTIONAL)

If your improvement is visual and the dev server is running, you may use `playwright-cli` to verify it renders correctly:

- Open: `playwright-cli open http://localhost:PORT`
- Screenshot: `playwright-cli screenshot`
- Read the screenshot file to verify visual appearance
- Close: `playwright-cli close`

Browser verification is **optional** in auto-improve mode. Lint + typecheck + build is mandatory; visual verification is a bonus when relevant.

---

## SUCCESS CRITERIA

A successful auto-improve session ends with:
1. One new feature on the Kanban, marked passing.
2. A clean git commit with a short TLDR message.
3. No lint / typecheck / build errors introduced.
4. The agent exits cleanly without starting a second improvement.
16 changes: 14 additions & 2 deletions agent.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@
)
from prompts import (
copy_spec_to_project,
get_auto_improve_prompt,
get_batch_feature_prompt,
get_coding_prompt,
get_initializer_prompt,
Expand Down Expand Up @@ -163,6 +164,7 @@ async def run_autonomous_agent(
agent_type: Optional[str] = None,
testing_feature_id: Optional[int] = None,
testing_feature_ids: Optional[list[int]] = None,
auto_improve: bool = False,
) -> None:
"""
Run the autonomous agent loop.
Expand All @@ -177,6 +179,9 @@ async def run_autonomous_agent(
agent_type: Type of agent: "initializer", "coding", "testing", or None (auto-detect)
testing_feature_id: For testing agents, the pre-claimed feature ID to test (legacy single mode)
testing_feature_ids: For testing agents, list of feature IDs to batch test
auto_improve: If True, run in auto-improve mode (agent creates one
improvement feature, implements it, commits, and exits). Takes
precedence over other prompt selection branches.
"""
print("\n" + "=" * 70)
print(" AUTONOMOUS CODING AGENT")
Expand All @@ -185,6 +190,8 @@ async def run_autonomous_agent(
print(f"Model: {model}")
if agent_type:
print(f"Agent type: {agent_type}")
if auto_improve:
print("Mode: AUTO-IMPROVE (one improvement + commit per session)")
if yolo_mode:
print("Mode: YOLO (testing agents disabled)")
if feature_ids and len(feature_ids) > 1:
Expand Down Expand Up @@ -240,7 +247,8 @@ async def run_autonomous_agent(

# Check if all features are already complete (before starting a new session)
# Skip this check if running as initializer (needs to create features first)
if not is_initializer and iteration == 1:
# or auto-improve mode (intentionally runs against finished projects)
if not is_initializer and not auto_improve and iteration == 1:
passing, in_progress, total, _nhi = count_passing_tests(project_dir)
if total > 0 and passing == total:
print("\n" + "=" * 70)
Expand All @@ -262,7 +270,11 @@ async def run_autonomous_agent(
client = create_client(project_dir, model, yolo_mode=yolo_mode, agent_type=agent_type)

# Choose prompt based on agent type
if agent_type == "initializer":
# auto_improve takes precedence over other branches — it's a distinct
# mode where the agent creates its own feature before implementing it.
if auto_improve:
prompt = get_auto_improve_prompt(project_dir, yolo_mode=yolo_mode)
elif agent_type == "initializer":
prompt = get_initializer_prompt(project_dir)
elif agent_type == "testing":
prompt = get_testing_prompt(project_dir, testing_feature_id, testing_feature_ids)
Expand Down
28 changes: 27 additions & 1 deletion autonomous_agent_demo.py
Original file line number Diff line number Diff line change
Expand Up @@ -186,6 +186,17 @@ def parse_args() -> argparse.Namespace:
help="Max features per coding agent batch (1-15, default: 3)",
)

parser.add_argument(
"--auto-improve",
action="store_true",
default=False,
help=(
"Run in auto-improve mode: a single agent session that analyses "
"the codebase, creates one improvement feature, implements it, "
"verifies with lint/typecheck/build, commits, and exits."
),
)

return parser.parse_args()


Expand Down Expand Up @@ -262,7 +273,22 @@ def main() -> None:
return

try:
if args.agent_type:
if args.auto_improve:
# Auto-improve mode: single agent session, one improvement per run.
# Bypasses the parallel orchestrator entirely — auto-improve is
# always single-agent, single-feature, and exits after one commit.
print("[AUTO-IMPROVE] Starting single-session improvement run...", flush=True)
asyncio.run(
run_autonomous_agent(
project_dir=project_dir,
model=args.model,
max_iterations=1,
yolo_mode=args.yolo,
agent_type="coding",
auto_improve=True,
)
)
elif args.agent_type:
# Subprocess mode - spawned by orchestrator for a specific role
asyncio.run(
run_autonomous_agent(
Expand Down
24 changes: 24 additions & 0 deletions prompts.py
Original file line number Diff line number Diff line change
Expand Up @@ -151,6 +151,30 @@ def get_coding_prompt(project_dir: Path | None = None, yolo_mode: bool = False)
return prompt


def get_auto_improve_prompt(project_dir: Path | None = None, yolo_mode: bool = False) -> str:
"""Load the auto-improve agent prompt (project-specific if available).

The auto-improve prompt instructs the agent to analyze an already-finished
project, pick ONE meaningful improvement, create a feature on the Kanban,
implement it, verify with lint/typecheck/build, mark passing, and commit.

Args:
project_dir: Optional project directory for project-specific prompts
yolo_mode: If True, strip browser automation sections for YOLO-mode
token savings. Browser verification is already optional in
auto-improve mode, so this is a small adjustment.

Returns:
The auto-improve prompt, optionally stripped of browser testing.
"""
prompt = load_prompt("auto_improve_prompt", project_dir)

if yolo_mode:
prompt = _strip_browser_testing_sections(prompt)

return prompt


def get_testing_prompt(
project_dir: Path | None = None,
testing_feature_id: int | None = None,
Expand Down
Loading
Loading