Skip to content

cloud-007/testforge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TestForge

Mutation-driven test-backfill agent. Writes pytest suites that catch real bugs, not just lines.

Hourglass AI Agent Challenge — Coding Agent track

What it does

You point TestForge at a public Python repo and a poorly-tested module. It:

  1. Clones the repo into a temp workspace.
  2. Reads the module.
  3. Drafts a pytest suite.
  4. Runs pytest until everything is green.
  5. Runs mutmut to mutate the source and see which mutants survive.
  6. Reads the diff of each surviving mutant and writes a targeted assertion that kills it.
  7. Repeats steps 5–6 until the mutation score hits the target (or the budget caps it).
  8. Forks the repo, opens a PR with the suite and a written explanation of what each test catches.

No human in the loop after invocation.

Why mutation score, not coverage

Line coverage rewards weak tests. assert result is not None covers a function but doesn't catch a sign-flip bug. Mutation score is the only objective signal of test quality — and it's the only one an LLM can reason about and self-correct against. TestForge's loop is a closed-loop optimizer where the reward signal is "how many mutants did I kill."

Usage

pip install testforge
export ANTHROPIC_API_KEY=...
export GITHUB_TOKEN=...   # gh CLI must be authenticated

testforge \
  --repo https://github.com/owner/repo \
  --module src/path/module.py \
  --target-mutation-score 0.90 \
  --max-iterations 6 \
  --budget-usd 3

Add --no-pr to write tests to a local clone without forking/PR'ing.

Architecture

PLANNER (Claude Opus 4.7, tool-use, prompt-cached)
    ↓
TOOL EXECUTOR (read_module, write_test_file, lint, run_pytest, run_mutmut, read_mutant_diff, finish)
    ↓
OBSERVER (state machine + 5 stop conditions)
    ↓ loop ↑   |
              finish → fork + PR

Stop conditions: target-met / iter-cap / $-cap / oscillation (2 consecutive iters with zero new mutant kills) / agent self-declared finish.

What's NOT in v1

Python only. Synchronous, mostly-pure-function modules only. No async, no I/O-heavy modules, no Hypothesis. See docs/superpowers/specs/2026-05-02-testforge-design.md for the full scope contract.

Documentation

  • Overview — the problem, the insight, the rubric fit
  • Architecture — module map, the agent loop in detail, design decisions
  • Quickstart — install, test, run, troubleshoot
  • Interview prep — anticipated Q&A about why-this-not-that

Demo

[2-min video link — added on submission day]

About

Mutation-driven test-backfill agent — autonomous Python agent that writes pytest suites which kill mutmut mutants, not just hit coverage. Built for the Hourglass AI Agent Challenge (Coding Agent track).

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages