TestForge

Mutation-driven test-backfill agent. Writes pytest suites that catch real bugs, not just lines.

What it does

You point TestForge at a public Python repo and a poorly-tested module. It:

Clones the repo into a temp workspace.
Reads the module.
Drafts a pytest suite.
Runs pytest until everything is green.
Runs mutmut to mutate the source and see which mutants survive.
Reads the diff of each surviving mutant and writes a targeted assertion that kills it.
Repeats steps 5–6 until the mutation score hits the target (or the budget caps it).
Forks the repo, opens a PR with the suite and a written explanation of what each test catches.

No human in the loop after invocation.

Why mutation score, not coverage

Line coverage rewards weak tests. assert result is not None covers a function but doesn't catch a sign-flip bug. Mutation score is the only objective signal of test quality — and it's the only one an LLM can reason about and self-correct against. TestForge's loop is a closed-loop optimizer where the reward signal is "how many mutants did I kill."

Usage

pip install testforge
export ANTHROPIC_API_KEY=...
export GITHUB_TOKEN=...   # gh CLI must be authenticated

testforge \
  --repo https://github.com/owner/repo \
  --module src/path/module.py \
  --target-mutation-score 0.90 \
  --max-iterations 6 \
  --budget-usd 3

Add --no-pr to write tests to a local clone without forking/PR'ing.

Architecture

PLANNER (Claude Opus 4.7, tool-use, prompt-cached)
    ↓
TOOL EXECUTOR (read_module, write_test_file, lint, run_pytest, run_mutmut, read_mutant_diff, finish)
    ↓
OBSERVER (state machine + 5 stop conditions)
    ↓ loop ↑   |
              finish → fork + PR

Stop conditions: target-met / iter-cap / $-cap / oscillation (2 consecutive iters with zero new mutant kills) / agent self-declared finish.

What's NOT in v1

Python only. Synchronous, mostly-pure-function modules only. No async, no I/O-heavy modules, no Hypothesis. See docs/superpowers/specs/2026-05-02-testforge-design.md for the full scope contract.

Documentation

Overview — the problem, the insight, the rubric fit
Architecture — module map, the agent loop in detail, design decisions
Quickstart — install, test, run, troubleshoot
Interview prep — anticipated Q&A about why-this-not-that

Demo

[2-min video link — added on submission day]

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
docs		docs
scripts		scripts
src/testforge		src/testforge
tests		tests
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
pyrightconfig.json		pyrightconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TestForge

What it does

Why mutation score, not coverage

Usage

Architecture

What's NOT in v1

Documentation

Demo

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

TestForge

What it does

Why mutation score, not coverage

Usage

Architecture

What's NOT in v1

Documentation

Demo

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages