Small, test-gated cleanup commits by an AI coding agent.
Think of it as a supervised janitor loop: the agent proposes a cleanup, your tests decide if it stays.
Once installed:
continuous-refactoring init
continuous-refactoring run-once \
--with codex --model gpt-5 --effort high \
--extensions .pyThat gives you one pass on the current branch. If validation passes, it leaves you with a local commit to inspect.
continuous-refactoring run \
--with codex --model gpt-5 --effort high \
--extensions .py \
--max-refactors 10 \
--max-attempts 2That keeps sweeping targets until it runs out, hits your caps, or starts failing.
- Resolves a target from
--targets,--globs,--extensions, or--paths, with optional natural-language scoping via--scope-instruction. - Runs the agent with a refactoring prompt + your "taste" guidelines.
- Runs your validation command (default:
uv run pytest). - If green and there's a diff, it commits locally and leaves the branch for you to inspect.
- Repeats until it runs out of targets, hits the retry budget, or stacks too many failures.
- Python 3.10+
- uv
codexorclaudeCLI on yourPATH(whichever backend you pick)- A git repo with a clean worktree
uv sync
uv pip install -e .That gives you the continuous-refactoring command.
# 1. Register the repo (creates a project dir under ~/.local/share/continuous-refactoring)
continuous-refactoring init
# 2. (Optional) Write your refactoring taste — either edit the file, have an agent interview you,
# or refine an existing draft collaboratively
continuous-refactoring taste --interview --with codex --model gpt-5 --effort high
continuous-refactoring taste --refine --with codex --model gpt-5 --effort high
# 3. Do one pass
continuous-refactoring run-once \
--with codex --model gpt-5 --effort high \
--extensions .py
# 4. Or run the loop over a batch of targets
continuous-refactoring run \
--with claude --model claude-opus-4-6 --effort high \
--globs 'src/**/*.py' \
--max-refactors 10 \
--max-attempts 2 \
--sleep 5| Command | What it does |
|---|---|
init |
Registers this directory as a project, creates a default taste.md, and can store --live-migrations-dir. |
taste |
Prints the active taste file path. Add --interview to have an agent author it, --refine to iteratively improve an existing taste doc, --upgrade to refresh stale taste dimensions, --global for the shared file, and --force to let --interview overwrite custom content after writing a .bak. |
run-once |
Single pass on one resolved target. No retry. If there is a diff and validation passes, it commits locally and prints the diffstat. |
run |
The loop. Iterates targets, retries on failure, and commits successful targets locally. |
upgrade |
Checks that the global config manifest is current, rewrites it idempotently, and warns if the global taste file is stale. |
review list |
Lists migrations flagged for human review (awaiting_human_review). |
review perform <migration> |
Starts an interactive agent session to resolve a flagged migration's review. Requires --with, --model, and --effort. |
Target resolution is first-match-wins:
--targets > --globs > --extensions > --paths
These flags are not mutually exclusive, but only the highest-priority populated source is used.
--targets path/to/targets.jsonl— explicit list; one JSON object per line withdescription,files, optionalscoping,model-override,effort-override.--globs 'src/**/*.py:tests/**/*.py'— colon-separated globs; each matched file becomes its own target.--extensions .py,.ts— shorthand that expands to**/*.py,**/*.ts; each matched file becomes its own target.--paths a.py:b.py— literal paths, all treated as one target.--scope-instruction "clean up the auth module"— extra free-text scoping. If file-based targeting resolves nothing, this becomes the useful fallback context.
If you provide none of --targets, --globs, --extensions, or --paths, then run and run-once require --scope-instruction.
init --live-migrations-dir PATH— enables the larger-refactoring workflow for this project. The path is stored repo-relative in the project registry and created if missing.taste --refine— opens a collaborative editing session for the taste file. The agent keeps refining until you tell it to write, then the session ends automatically after the settled write.taste --upgrade— re-interviews for taste dimensions added since your last version. No-op when already current; usetaste --refineif you want to rework the doc anyway.taste --force— only applies to--interview; it allows a customized taste file to be overwritten after backing it up totaste.md.bak.
--with,--model,--effort— required agent backend/model/effort selection.--repo-root PATH— repository root; defaults to the current directory.--validation-command— defaults touv run pytest. Swap it for whatever keeps your repo honest.--timeout— per-agent-call timeout in seconds.--show-agent-logs/--show-command-logs— mirror output to your terminal instead of just logging.--refactoring-prompt— override the default refactoring prompt.--fix-prompt— override the retry amendment prompt. Useful forrun; accepted byrun-oncefor flag symmetry.
--max-attempts N— per-target retry budget.1= no retry,0= unlimited (which means permanently broken targets will never give up).--max-refactors N— cap the number of targets per run. Required unless you use--targets.--max-consecutive-failures N— bail after N targets fail in a row. Default 3.--sleep SECONDS— pause between completed targets. Useful when you want a long batch without hammering the repo or your agent budget.--commit-message-prefix TEXT— prefix for successful refactor or migration-plan commits. Defaultcontinuous refactor.
- Refuses to start with a dirty worktree.
- Runs on the current branch. Commits land there.
runbaselines your validation command before touching anything. If the baseline is already red, it stops.- On a failed attempt, resets back to the pre-attempt HEAD and cleans workspace changes before retrying or moving on.
- Watchdog kills any agent or test process that's been silent for 5 minutes.
Each run writes to $TMPDIR/continuous-refactoring/<run-id>/:
summary.json— rolling status, counts, per-attempt statsevents.jsonl— structured event logrun.log— human-readable logattempt-NNN/[retry-NN/]refactor/— per-attempt agent + test stdout/stderr
The path prints at startup. Grep it when something goes sideways.
The taste file is a short bullet list of your refactoring preferences. It gets injected into every agent prompt.
- Project taste:
~/.local/share/continuous-refactoring/projects/<uuid>/taste.md - Global taste:
~/.local/share/continuous-refactoring/global/taste.md
Project taste wins over global. Use taste --interview to bootstrap one, taste --refine to rework it with an agent, or edit the file directly any time.
When a cleanup is too big for a single commit — needs a plan, touches many files, or should ship in stages — use the migrations model.
continuous-refactoring init --live-migrations-dir migrations/This tells the CLI where to store migration artifacts. The path is repo-relative, stored in the XDG project registry (no project config file is created). When this directory is unset, the one-shot cleanup path remains byte-identical to the default behavior.
Each run / run-once tick now checks for eligible migration work before falling back to single-commit cleanups:
- Classify — a classifier agent reads the target and decides:
cohesive-cleanup(one-shot path) orneeds-plan(migration path). - Plan — for
needs-plantargets, a six-stage planning workflow runs: generate approaches → pick best → expand into phases → review → revise → final review. Artifacts land under<live-migrations-dir>/<migration-name>/. - Execute — each phase is a self-contained unit of work. The tick picks the oldest eligible migration, checks whether its current phase precondition is satisfied, and executes it on the current branch. Phase completion is judged against the phase file's
## Definition of Done; commit message identifies the migration asmigration/<name>/<phase-file>.md.
<live-migrations-dir>/
<migration-name>/
manifest.json # status, phases, wake-up schedule
plan.md # the expanded plan
approaches/ # candidate approaches considered during planning
phase-1-<name>.md # per-phase specification
phase-2-<name>.md
...
__intentional_skips__/ # migrations rejected at final review
Migrations don't run on every tick. The scheduler now separates activity from retry cooldown:
last_touchrecords the latest migration activity.cooldown_untilgates repeated checks only after the migration was deferred or blocked.
A migration is eligible when all of:
- Any
cooldown_untilhas elapsed. - Either
wake_up_onis unset,wake_up_onhas elapsed, or the migration has been stale for ≥7 days.
That means successful phase execution does not make the next phase wait 6
hours. Phases whose preconditions are already satisfied can advance back-to-back
in the same run until the migration is actually blocked. The 6-hour cooldown
still applies after ready: no, future wake-ups, unverifiable phases, or
similar deferrals so the loop does not hammer stuck migrations.
Each migration moves through phases sequentially.
- The manifest stores each phase's precondition — what must already be true before execution may start.
- Each phase markdown file stores its Definition of Done under
## Definition of Done— what must be true for that phase to count as completed.
Before executing a phase, a ready-check agent verifies that the current phase precondition is met. Possible outcomes:
- ready: yes — phase executes; on green tests, the phase is marked done, any prior deferral markers are cleared, and the migration advances immediately to the next phase.
- ready: no — manifest activity is bumped, a retry cooldown is started, and a future
wake_up_onis recorded when needed; the tick moves on. - ready: unverifiable — the migration is flagged
awaiting_human_reviewand put on cooldown. Usereview listto find it andreview perform <migration> --with ... --model ... --effort ...to resolve it interactively.
Human-facing migration references use the relative phase spec path, for example phase-2-failure-report.md. The manifest cursor stores the phase name, not a numeric index.
Rollout mechanics — feature flag names, deploy tooling, metric dashboards, canary analysis — are the coding agent's responsibility, not a CLI-visible concern. The CLI manages the planning and phased execution workflow; the agent writes whatever rollout code the plan calls for.