Skip to content

fix: tolerate separate shift-log commits in cycle verification#15

Merged
fazxes merged 2 commits intomainfrom
fix/shift-log-verification-tolerance
Apr 4, 2026
Merged

fix: tolerate separate shift-log commits in cycle verification#15
fazxes merged 2 commits intomainfrom
fix/shift-log-verification-tolerance

Conversation

@fazxes
Copy link
Copy Markdown
Member

@fazxes fazxes commented Apr 4, 2026

Summary

  • Changed verify_cycle() from per-commit shift-log check to cycle-level check: at least one commit per cycle must include the shift log
  • expected_cycle_commits() returns (min, max) range to allow for separate shift-log commits
  • Shift-log-only commits no longer count toward max_fixes_per_cycle
  • Updated agent prompt and SKILL.md with explicit co-commit instructions
  • Merged PR Harden Nightshift from live validation runs #13 (hardening from live validation) with 4-file conflict resolution
  • Closes task feat: cycle-to-cycle state injection #9

Test plan

  • 213 tests passing (5 new verification tolerance tests)
  • make check equivalent: ruff + format + pytest all pass
  • Dry-run works for both codex and claude agents
  • New prompt includes updated shift-log instructions

Codex commits fixes and shift-log updates separately, causing both
Phractal validation cycles to be rejected. Changed verify_cycle() from
per-commit shift-log check to cycle-level check: at least one commit
per cycle must include the shift log.

Also: expected_cycle_commits returns (min, max) range, shift-log-only
commits don't count toward max_fixes_per_cycle, prompt updated with
explicit git add instructions for co-committing.

Closes task #9.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7fac4019e4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread nightshift/cycle.py
Comment on lines +520 to +521
base = len(fixes) + (1 if logged_issues else 0)
return (base, base + 1)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Restrict extra commit to shift-log-only changes

expected_cycle_commits() always returns (base, base + 1), and verify_cycle() only checks total commit count against that range, so a cycle can include an unreported code change and still pass by putting code + shift-log in one commit (for example, structured output with fixes=[]/logged_issues=[] now allows 1 commit). This regresses the previous strict consistency check and lets agents bypass per-fix validations (impact/category/files-per-fix) by underreporting fixes; the +1 tolerance should only apply when the additional commit is actually shift-log-only.

Useful? React with 👍 / 👎.

Matches all other git-based test helpers in the file. The flag
requires Git >= 2.28 and is not needed for the tests.

Addresses code review feedback on PR #15.
@fazxes fazxes merged commit 70d4b65 into main Apr 4, 2026
@fazxes fazxes deleted the fix/shift-log-verification-tolerance branch April 4, 2026 01:26
fazxes added a commit that referenced this pull request Apr 6, 2026
…e-reject (#139)

Claude cycles returning count-only payloads (fixes_committed/fixes_applied instead
of fixes[]) were false-rejected as "structured output implies 0" because the count
landed only in prose notes and all three verifier functions read fixes==[]. Adds
fixes_count_only: int to CycleResult; _as_cycle_result sets it; expected_fix_commits,
allowed_total_cycle_commits, and expected_cycle_commits fall back to it. 18 regression
tests reproduce the eval #15 payload shape and verify no violations fire.
fazxes added a commit that referenced this pull request Apr 8, 2026
…clear

Two-cycle test run against Phractal confirms the three merged PRs fixed
the scored failures from eval #15. Both cycles accepted (no false
rejections), shift log persisted, counters positive. Score 86/100
exceeds the BUILD EVAL GATE threshold of >= 80.

- Guard rails: 4 -> 9 (PR #167 count-only payload fix working)
- Shift log: 0 -> 9 (durable artifact written and co-committed)
- Fix quality: 3 -> 8 (full metadata in accepted cycles)
- Discovery: 5 -> 8 (structured output preserved faithfully)
fazxes added a commit that referenced this pull request Apr 9, 2026
fazxes added a commit that referenced this pull request Apr 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant