feat: rune review v0.2 — conflict + dead-rule detection with TUI#16
Conversation
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Wrap tomllib.load() to re-raise TOMLDecodeError as ValueError with the file path in the message. Add _REQUIRED_FIELDS constant and explicit per-field validation before constructing PatchEntry, so missing fields raise a clear ValueError naming the specific field instead of an obscure TypeError. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Evaluates each fixture file independently (one ChunkRef per rule-line) so cross-file corpus noise does not inflate FP counts against the n01-n05 semantic-miss negatives. Current detector scores P=1.00 R=1.00 on the 20+20 labeled fixture. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Launch ReviewApp when `rune review` is called without --json; deferred import keeps textual out of the L1 fast path. Adds cold-start perf ceiling test (0.95s window, 0.85s sleep budget). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add scripts/build_manifest.py to generate TOML deployment manifests for v0.2. Extend rune patch apply to handle pre_sha256="" (new-file case) by creating parent dirs and writing the payload instead of erroring on MISSING. Verified end-to-end against a temp dir; site-packages untouched. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
codex-devlab
left a comment
There was a problem hiding this comment.
Code Review — feat/review-v0.2
Assessment: NEEDS_CHANGES — substantive correctness issues in the apply pipeline (loader/applier SHA mismatch, backup path flattening, headless heuristic deletes content) plus a TUI promise that isn't actually rendered. Important tier is mostly cleanup but a few items (payload SHA verify, manifest TARGETS drift) should land before this PR goes near a real site-packages target.
26 commits / 81 files / +1553 / −0. Spec at ~/ToolSet/rune/spec-2026-06-11.md, plan at ~/ToolSet/rune/plan-2026-06-11.md (consensus-approved via RALPLAN).
Strengths
-
Clean lazy-loading discipline —
rune/review/cli.py:48,54,79,95defers heavy imports (dead_events,conflict_nli,applier,tui) inside their conditional branches.tests/test_import_hygiene.pyenforces this viasys.modulessnapshots after--jsoninvoke. Right pattern, well tested. -
Fail-closed L2 design —
rune/review/conflict_nli.py:23-27refuses to run without cached model +--allow-model-download, with actionable stderr pointing athuggingface-cli download.tests/test_l2_fail_closed.pyexercises withHF_HUB_OFFLINE=1. -
Per-chunk SHA staleness gate —
rune/review/applier.py:43-47verifies all ops first before any mutation. Combined with reverse-order application this is the right shape for a multi-op editor (modulo C1). -
Manifest hardening —
rune/patches/manifest.py:21-31wrapstomllib.loadin try/except and validates required fields.tests/test_patch_manifest.pycovers malformed-TOML + missing-field. -
Additive-only schema policy —
rune/review/SCHEMA_POLICY.mdplus real JSON Schema atrune/review/schema.jsonwithtests/test_review_json_schema.pyvalidating live output. Real contract.
Critical (must fix before merge)
C1. Loader SHA and applier SHA disagree — every real apply will raise StaleChunkError
rune/review/loader.py:12 vs rune/review/applier.py:44-45
- Loader:
sha = hashlib.sha256(c.text.encode("utf-8")).hexdigest()— hashesChunk.textas inventory produces it (trailing\npreserved). - Applier:
current = _read_chunk(...).rstrip("\n"); current_sha = hashlib.sha256(current.encode("utf-8")).hexdigest()— strips trailing\nbefore hashing.
These hash different byte sequences whenever the chunk ends in \n (common case). Result: in normal use, --apply --yes on a freshly-scanned unmutated repo raises StaleChunkError.
tests/test_apply_staleness.py only passes because it constructs ChunkRef.text = "Always use TS." (no trailing newline) by hand — it does NOT route through load_chunk_refs. No end-to-end test loads + applies. tests/test_restore_roundtrip.py does subprocess rune review --apply --yes but passes only if the inventory adapter happens to strip newlines from chunk text.
Fix: centralize as _sha_chunk(text) in a shared helper; pick one canonical form (recommend text.rstrip("\n")). Add an integration test: load_chunk_refs → apply_operations(kind="keep") on a real CLAUDE.md, assert no StaleChunkError.
C2. NLI label-index assumption is brittle across models
rune/review/conflict_nli.py:32-35
scores = model.predict([(ref_a.text, ref_b.text), (ref_b.text, ref_a.text)])
contradiction_score = max(scores[0][0], scores[1][0])For cross-encoder/nli-deberta-v3-base (default), label order is [contradiction, entailment, neutral], so [0] IS contradiction. Correct today.
For --nli-small → cross-encoder/nli-distilroberta-base, the label order is not enforced across checkpoint revisions. The inline comment literally admits "MNLI label order is typically [contradiction, entailment, neutral]". The code blindly indexes [0] for both.
Fix: read the mapping from the model:
id2label = model.config.id2label
contradiction_idx = next(i for i, lbl in id2label.items() if "contradict" in lbl.lower())
contradiction_score = max(scores[0][contradiction_idx], scores[1][contradiction_idx])Add a unit test (NLI-gated) that asserts the label resolution works for both models.
C3. --restore flattens paths — silent data loss on multi-file repos
rune/review/applier.py:24-29 and rune/review/cli.py:29-40
Backup: _backup_file does rel = file.name; dst = backup_root / rel — flattens to basename. Two CLAUDE.md files at different paths in the same apply session overwrite each other in the snapshot.
Restore: target = path / f.name for f in snap.iterdir() — restores by basename to repo root. A subdir/CLAUDE.md is silently moved to repo_root/CLAUDE.md on restore.
Fix: backup by relative path. Either pass repo_root into apply_operations (store at snapshot_root / file.relative_to(repo_root)) OR iterate operation_log.json on restore and use the recorded op["file"] path.
C4. --apply --yes deletes content via headless heuristic with no spec coverage
rune/review/cli.py:80-86
ops: list = []
for c in conflicts:
ops.append(Operation(kind="delete", ref=c.b))
for d in dead:
ops.append(Operation(kind="delete", ref=d.chunk))The "delete b-side of every conflict + delete every dead-static candidate" heuristic is not in the spec — the spec/plan has a TUI for the user to interactively select. b is just j > i from find_lexical_conflicts (file order — arbitrary). For a = "Always use TS" / b = "Never use TS", this silently deletes "Never". Combined with find_dead_rules_static's weak signal (substring match on every file path), Go-targeted rules in a Python repo would be deleted with no user input.
The test tests/test_restore_roundtrip.py documents this with the comment "Apply (headless heuristic: delete the b-side of every conflict)" — the test author admits the heuristic is hacky.
Fix options: (a) require --select <ids> or --from-report <path.json> for non-interactive apply; (b) rename to --apply-heuristic distinct from --apply and document loudly; (c) at minimum print deletion plan to stderr before applying.
C5. TUI staleness banner is never rendered; r (rescan) raises
rune/review/tui.py:37-49
compose()never yields a banner widget —stale_banner_visibleis read only by the test, never displayed. The "≤1s mid-session staleness banner" promise (commit4871683) is satisfied for tests, not for users.- Interval is not cancelled in
action_quit/on_unmount— race risk on shutdown. - Once
stale_banner_visible = True, never resets. Therbinding (line 19) has noaction_rescanmethod → pressingrraises in Textual.
Fix: render a real banner (reactive Static), implement action_rescan (clear selected, reset _initial_mtimes), cancel interval on exit. Add TUI test asserting the banner widget is visible after mtime change (not just the bool).
C7. No atomic write — partial multi-file apply on crash
rune/review/applier.py:62
path.write_text("".join(lines)) is sequential per file. If process dies after file #1 is written but before file #2, repo is half-applied with no rollback marker.
Fix: write to path.with_suffix(path.suffix + ".rune-tmp") then os.replace. Even better, write ALL temp files first, then atomically rename them all (or rollback if any rename fails). The backup mechanism allows manual recovery but only if the user knows to invoke --restore.
(C6 was a misread on first pass — withdrawn.)
Important (should fix soon)
- I1 L1 P/R test uses per-file evaluation (
tests/test_l1_precision_recall.py:28-31) — degenerate setup that doesn't measure realistic corpus-level precision. Recommend cross-file evaluation OR explicit documentation that the metric is synthetic. - I2
find_dead_rules_staticrglobs the whole repo per chunk (rune/review/dead_static.py:24-28,43) — O(n*m). Walk once, build sets, then look up. - I3 No conflict-pair dedup (
rune/review/conflict_lexical.py:62-71) — duplicate triples in same chunk emit duplicate pairs. - I4
_read_chunkre-reads file per op (rune/review/applier.py:19-21,44,51) — read once per file. - I5 No bounds check on
start_line/end_line(rune/review/applier.py:52-57) —commentbranch raisesIndexErrorifend_line > len(lines);deleteis silent no-op. - I6
--apply+--restore: restore wins silently (rune/review/cli.py:29-40) — should be explicit mutual-exclusion error. - I7
--apply+--json:--jsonsilently ignored (rune/review/cli.py:75-92) — either print apply log as JSON or refuse the combo. - I8
verify_entry"OK if matches pre OR post" collapses two states (rune/patches/manifest.py:46-48) — replace withPENDING/APPLIED/DRIFT. - I9
apply_cmdnew-file branch doesn't verify payload SHA matchespost_sha256(rune/cli/patch.py:34-42) — corrupted payload silently creates wrong file. Apply to existing-file branch too. - I10
--events-window-daysaccepted but silently ignored (rune/review/dead_events.py:13) — at minimum warn when non-default. - I11 Undeclared deps (
scripts/build_manifest.py:15vspyproject.toml) —tomli_w,jsonschema,pytest-asyncioall used but not in dev extras. - I12
Source.triggerdiscarded then re-parsed via regex (rune/review/loader.py:20→rune/review/dead_static.py:18) — pick one source of truth.
Minor (polish)
- M1 L1 detector is O(n²) — bucket by (verb, object) for v0.3.
- M2 "skip if NEG within 10 chars" heuristic (
rune/review/conflict_lexical.py:34) — replace with anchored lookahead. - M3 L2 candidate generation reuses L1 output — narrow recall ceiling. By definition can't catch the 5 L1_LIMITS modes.
- M4
TARGETSinscripts/build_manifest.py:18-28hardcoded — new modules silently skipped. - M5 Generated
rune/patches/manifest.toml+payloads/untracked but not gitignored — add to.gitignore. - M6 TUI
_labelshows"finding N"with no path/kind — users can't tell what they're selecting. - M7
_detailwidget yielded but never updated — pressing j/k navigates but pane stays empty. - M8
tests/test_tui_cold_start.py:14-21measures the test's own sleep, not cold-start time. - M9 Unused
pytestimports (minor flake8 lint). - M10
ReviewReport.to_dict()is one-way; nofrom_dictfor future--replay.
Follow-up issues will be opened separately and cross-referenced from this PR.
Follow-up tracking — review issues + next phasesNew tracking issues from this review (10)Critical (must-fix; some are PR-blocking, some are scoped-out follow-ups)
Important
Minor
PR-blocking review items NOT separately ticketed (must fix in this PR)These should land as additional commits on
Important review items, small enough to bundle into this PR
Minor review items
Next phases (v0.3+)Independent from this PR's must-fixes:
Question on existing open issues (phase-1 tasks)Issues #5–#12 (Task 5–10, 11, 13) are all #13 (E2E Benchmarks), #14 ( Suggested merge sequence:
|
Add regression test proving C1 was real: applier's .rstrip('\n') does not
match producer's .strip(), so a chunk line with trailing spaces causes a
spurious StaleChunkError on an unmodified file. Two existing hand-rolled
tests are updated to use blank-line paragraph boundaries so the tokenizer's
line-counting aligns correctly. The new test FAILS under HEAD and will PASS
after Commit A lands.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…nds (C1+C3+I4+I5)
- C1: align applier's chunk normalisation with producer's .strip() (was .rstrip("\n"))
so SHA comparison no longer false-positives on trailing-whitespace chunks
- C3: _backup_file now accepts base_root and preserves the full relative path,
preventing basename collisions for files with the same name in different subdirs
- I4: read each file once into `lines`; pass list to _read_chunk_from_lines instead
of re-reading inside the staleness loop (N reads → 1 read per file)
- I5: bounds-check every op's line range before any mutation; raises ValueError
with a clear message when end_line exceeds file length
- Update all callers (cli.py, patches payload, tests) for new base_root parameter
- Add tests/test_apply_bounds.py covering I5 (out-of-bounds) and C3 (relative backup)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…dentity (C4+I6+I7) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Review fixes landed — 5 thematic commitsPer RALPLAN consensus plan ( Test count: 118 → 128 passed + 4 skipped (NLI-gated; no regressions; +10 new test cases). Commit-to-finding map
Caller audit (per plan §4 Commit C)
Not addressed in this PR (deferred per consensus plan)These remain as separate follow-up issues:
Recommend closing #22 as resolved by Commit A. Next stepCI gate via 🤖 Generated with Claude Code |
Summary
Adds
rune review— a third top-level verb that detects rule conflicts and dead rules across CLAUDE.md/AGENTS.md/etc., presents them in a Textual TUI, and applies fixes with a SHA-256-safe backup pipeline. Plusrune patch apply/verifyfor reproducible site-packages deployment.26 commits across 5 phases. Designed via RALPLAN consensus (Architect SOUND + Critic APPROVE in 2 iterations); see spec at
~/ToolSet/rune/spec-2026-06-11.mdand plan at~/ToolSet/rune/plan-2026-06-11.md.What's new
rune/patches/manifest.py,rune/cli/patch.pyverify/applysubcommands;applyhandles new-file case (emptypre_sha256); ≤500ms verify on 20 entriesrune/review/conflict_lexical.pyrune/review/dead_static.pyrune/review/conflict_nli.py--allow-model-downloadrequired); P≥0.85 ∧ R≥0.75 budgetrune/review/dead_events.py--events-window-daysflag (stub for v0.3 windowing)rune/review/tui.pyrune/review/applier.py.rune/backups/<ISO>/timestamped snapshots;--apply --yes/--restore/--keep-backups 10/--no-prune; reverse-line-order editsscripts/build_manifest.pyrune/review/{schema.json, SCHEMA_POLICY.md, L1_LIMITS.md, BENCHMARKS.md}Lazy import boundary
rune review --json(L1 path) does NOT importtextual,torch,transformers, orsentence_transformers— verified viasys.modulessnapshot tests.analyze/optimizestartup unaffected.Test plan
RUNE_RUN_NLI=1+ cached model — opt-inrune patch applyend-to-end verified against temp target dir: 18 entries, post-applyverifyreports all OKrune review --apply --yesround-trip: apply → restore → byte-identical to pre-applyscripts/build_manifest.py --target-root <dest>+rune patch applyhuggingface-cli download cross-encoder/nli-deberta-v3-base && RUNE_RUN_NLI=1 pytest tests/test_l2_*.py)rune review /path/with/CLAUDE.md— should launch Textual app, q to exit cleanly)Open questions deferred to v0.3+
--applylockfilerune review --watchmodeevents_window_daysactual windowing (currently accepted, not yet applied to event filtering)🤖 Generated with Claude Code