auto_gate: max_iterations is the cap, not max_iterations+1#3
Open
gabehamilton wants to merge 1 commit into
Open
auto_gate: max_iterations is the cap, not max_iterations+1#3gabehamilton wants to merge 1 commit into
gabehamilton wants to merge 1 commit into
Conversation
Was `next > max_iterations`. With max_iterations=3 a fresh gate would visit 4 times — 3 [F] Findings cycles plus a 4th force-pass evaluation, which was off-by-one for what users intuitively expect from "max_iterations". Concrete impact: a `findings_empty` policy that never sees a 0 count (no agent stage writes findings_count to context yet) would burn 3 full refine subprocesses before finally force-passing. With each refine taking ~5 minutes in real bootstrap runs, that's a 15-minute floor for every auto-gate the pipeline traverses. Change `>` to `>=` so the Nth visit (when max_iterations=N) force-passes if the policy still hasn't been satisfied. Adds regression test `auto_gate_at_nth_visit_with_findings_force_passes` that asserts visit 3 of a max=3 gate force-passes rather than looping. All previous tests still pass (the boundary they exercised — prior=N, max=N — still produces force-pass under both semantics). Documents the structural limitation in the handler doc comment: agent box stages have no current path to write `findings_count` to the runtime context, so for those gates the policy effectively reduces to "loop until max_iterations". Fixing that requires a separate change (parse agent stdout, or surface output_schema into context). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
This pull request updates the AutoGateHandler to fix an off-by-one error in the iteration limit logic, ensuring that the gate force-passes exactly on the Nth evaluation by changing the comparison from > to >=. The documentation has been updated to clarify this behavior and note current limitations with agent-driven stages, and a regression test has been added to verify the fix. I have no feedback to provide.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
`AutoGateHandler::execute` used `force_pass = next > max_iterations` so `max_iterations=N` permitted N+1 gate visits — N findings → refine cycles plus one final force-pass evaluation. Change `>` to `>=` so the Nth visit force-passes when the policy is still unsatisfied; total gate visits is now bounded by `max_iterations`.
Concrete impact
The `findings_empty` policy reads `findings_count` from the runtime context, but no agent-driven box stage currently writes that key (there's no stdout-parse or output-schema bridge yet — see doc comment update). So for agent-driven loops the policy effectively reduces to "loop until max_iterations." With each refine subprocess taking 3–5 minutes in real bootstrap runs, the extra cycle was burning 5+ minutes per gate.
Repro from a real ai-strategy-builder bootstrap run on `pipelines/bootstrap.dot` (spec_review has `max_iterations=3`):
After this patch, visit #3 force-passes, the pipeline reaches `scaffold_evals`, and the user gets through bootstrap inside the timeout budget.
Tests
Documentation
Updates the `AutoGateHandler` doc comment to spell out the new semantic (`max_iterations=N` means "at most N evaluations") and to call out the structural limitation around `findings_count` not being written by agent stages today.
Depends on
#2 (claude-code adapter: --dangerously-skip-permissions opt-in) — this PR is branched off `claude-code-skip-permissions` because that branch was needed to actually reach the `spec_review` loop and observe the bug in a real run. Merge order: #2 first, then this.
🤖 Generated with Claude Code