Skip to content

auto_gate: max_iterations is the cap, not max_iterations+1#3

Open
gabehamilton wants to merge 1 commit into
claude-code-skip-permissionsfrom
auto-gate-max-iterations-off-by-one
Open

auto_gate: max_iterations is the cap, not max_iterations+1#3
gabehamilton wants to merge 1 commit into
claude-code-skip-permissionsfrom
auto-gate-max-iterations-off-by-one

Conversation

@gabehamilton
Copy link
Copy Markdown

Summary

`AutoGateHandler::execute` used `force_pass = next > max_iterations` so `max_iterations=N` permitted N+1 gate visits — N findings → refine cycles plus one final force-pass evaluation. Change `>` to `>=` so the Nth visit force-passes when the policy is still unsatisfied; total gate visits is now bounded by `max_iterations`.

Concrete impact

The `findings_empty` policy reads `findings_count` from the runtime context, but no agent-driven box stage currently writes that key (there's no stdout-parse or output-schema bridge yet — see doc comment update). So for agent-driven loops the policy effectively reduces to "loop until max_iterations." With each refine subprocess taking 3–5 minutes in real bootstrap runs, the extra cycle was burning 5+ minutes per gate.

Repro from a real ai-strategy-builder bootstrap run on `pipelines/bootstrap.dot` (spec_review has `max_iterations=3`):

{"...,"node_id":"spec_review","notes":"auto gate: findings detected (iteration 1/3)"}
{"...,"node_id":"refine_specs",...}
{"...,"node_id":"spec_review","notes":"auto gate: findings detected (iteration 2/3)"}
{"...,"node_id":"refine_specs",...}
{"...,"node_id":"spec_review","notes":"auto gate: findings detected (iteration 3/3)"}
{"...,"node_id":"refine_specs",...}   ← 4th cycle still running; 1-hour timeout fired before the 5th gate evaluation

After this patch, visit #3 force-passes, the pipeline reaches `scaffold_evals`, and the user gets through bootstrap inside the timeout budget.

Tests

  • Adds `auto_gate_at_nth_visit_with_findings_force_passes` covering the `prior=2, max=3` edge that previously looped.
  • All 12 existing `auto_gate` tests still pass — the boundary they exercised (`prior=N, max=N`) produces force-pass under both `>` and `>=` semantics, so no test was load-bearing on the wrong behavior.

Documentation

Updates the `AutoGateHandler` doc comment to spell out the new semantic (`max_iterations=N` means "at most N evaluations") and to call out the structural limitation around `findings_count` not being written by agent stages today.

Depends on

#2 (claude-code adapter: --dangerously-skip-permissions opt-in) — this PR is branched off `claude-code-skip-permissions` because that branch was needed to actually reach the `spec_review` loop and observe the bug in a real run. Merge order: #2 first, then this.

🤖 Generated with Claude Code

Was `next > max_iterations`. With max_iterations=3 a fresh gate would
visit 4 times — 3 [F] Findings cycles plus a 4th force-pass evaluation,
which was off-by-one for what users intuitively expect from
"max_iterations".

Concrete impact: a `findings_empty` policy that never sees a 0 count
(no agent stage writes findings_count to context yet) would burn 3
full refine subprocesses before finally force-passing. With each refine
taking ~5 minutes in real bootstrap runs, that's a 15-minute floor for
every auto-gate the pipeline traverses.

Change `>` to `>=` so the Nth visit (when max_iterations=N) force-passes
if the policy still hasn't been satisfied. Adds regression test
`auto_gate_at_nth_visit_with_findings_force_passes` that asserts visit
3 of a max=3 gate force-passes rather than looping. All previous tests
still pass (the boundary they exercised — prior=N, max=N — still produces
force-pass under both semantics).

Documents the structural limitation in the handler doc comment: agent
box stages have no current path to write `findings_count` to the runtime
context, so for those gates the policy effectively reduces to
"loop until max_iterations". Fixing that requires a separate change
(parse agent stdout, or surface output_schema into context).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the AutoGateHandler to fix an off-by-one error in the iteration limit logic, ensuring that the gate force-passes exactly on the Nth evaluation by changing the comparison from > to >=. The documentation has been updated to clarify this behavior and note current limitations with agent-driven stages, and a regression test has been added to verify the fix. I have no feedback to provide.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant