fix: tolerate separate shift-log commits in cycle verification#15
fix: tolerate separate shift-log commits in cycle verification#15
Conversation
Codex commits fixes and shift-log updates separately, causing both Phractal validation cycles to be rejected. Changed verify_cycle() from per-commit shift-log check to cycle-level check: at least one commit per cycle must include the shift log. Also: expected_cycle_commits returns (min, max) range, shift-log-only commits don't count toward max_fixes_per_cycle, prompt updated with explicit git add instructions for co-committing. Closes task #9.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7fac4019e4
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| base = len(fixes) + (1 if logged_issues else 0) | ||
| return (base, base + 1) |
There was a problem hiding this comment.
Restrict extra commit to shift-log-only changes
expected_cycle_commits() always returns (base, base + 1), and verify_cycle() only checks total commit count against that range, so a cycle can include an unreported code change and still pass by putting code + shift-log in one commit (for example, structured output with fixes=[]/logged_issues=[] now allows 1 commit). This regresses the previous strict consistency check and lets agents bypass per-fix validations (impact/category/files-per-fix) by underreporting fixes; the +1 tolerance should only apply when the additional commit is actually shift-log-only.
Useful? React with 👍 / 👎.
Matches all other git-based test helpers in the file. The flag requires Git >= 2.28 and is not needed for the tests. Addresses code review feedback on PR #15.
…e-reject (#139) Claude cycles returning count-only payloads (fixes_committed/fixes_applied instead of fixes[]) were false-rejected as "structured output implies 0" because the count landed only in prose notes and all three verifier functions read fixes==[]. Adds fixes_count_only: int to CycleResult; _as_cycle_result sets it; expected_fix_commits, allowed_total_cycle_commits, and expected_cycle_commits fall back to it. 18 regression tests reproduce the eval #15 payload shape and verify no violations fire.
…clear Two-cycle test run against Phractal confirms the three merged PRs fixed the scored failures from eval #15. Both cycles accepted (no false rejections), shift log persisted, counters positive. Score 86/100 exceeds the BUILD EVAL GATE threshold of >= 80. - Guard rails: 4 -> 9 (PR #167 count-only payload fix working) - Shift log: 0 -> 9 (durable artifact written and co-committed) - Fix quality: 3 -> 8 (full metadata in accepted cycles) - Discovery: 5 -> 8 (structured output preserved faithfully)
Summary
verify_cycle()from per-commit shift-log check to cycle-level check: at least one commit per cycle must include the shift logexpected_cycle_commits()returns(min, max)range to allow for separate shift-log commitsmax_fixes_per_cycleTest plan
make checkequivalent: ruff + format + pytest all pass