You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This commit was created on GitHub.com and signed with GitHub’s verified signature.
Changed
Citation accuracy: corrected three over-reaching attributions to real sources
(no citations removed, no IDs changed). The "A/B trigger policy / cost-benefit
knob" and "cuts wasted edits" are reframed as this suite's own design choices
rather than PreFlect (arXiv 2602.07187) findings — PreFlect reflects on every
plan unconditionally and reports no edit-efficiency metric. The "repo-native
run-ledger over a vendor eval UI" is attributed to this suite as its answer to
the open challenge posed by Code as Agent Harness (arXiv 2605.18747), not as
that paper's claim.
Fixed
Standalone scripts now resolve the loop package when run by path. The
documented invocations python3 scripts/runtime_monitor.py <loop> and python3 scripts/inspect_loop.py <loop> put scripts/ on sys.path (not the
repo root), so the sibling loop package was unimportable and the scripts
silently used their degraded fallbacks — runtime_monitor reported missing RUNLOG.md on the canonical .loop/RUNLOG.md layout, and inspect_loop could not read plan_then_execute from .loop/manifest.yaml.
Both scripts now self-bootstrap the repo root onto sys.path before importing loop.*, matching python -m loop behaviour. The bug was invisible to CI
because python -m pytest already places the repo root on sys.path; added
by-path subprocess regression tests that reproduce the real standalone call.