fix(engine): treat a crashed test/security gate as failed, not green (ISSUE-57)#104
Merged
Merged
Conversation
…(ISSUE-57) A gate's verdict is "did the agent write its failure file?". fl_gate_timeout only synthesized that file when claude -p exited via timeout (124/137); any other non-zero exit (auth/API error, OOM, killed, plain crash) left no failure file, so the test/security gate mis-read as a pass (green). Generalize the guard (fl_gate_timeout -> fl_gate_crash) to synthesize the failure file on ANY non-zero gate exit, keeping the timeout-specific message and adding a "crashed (exit N)" message for other failures. The project gate already handled non-zero exits this way. Add a bats test proving a crashing gate ends the run not-green with a synthesized failure file.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The Claude-driven
testandsecuritygates decide their verdict purely by "did the agent write its failure file?". The only code that synthesized that file when the agent didn't wasfl_gate_timeout, which bailed on every non-timeout exit (fl_timed_out "$rc" || return 0).So any non-timeout, non-zero exit of
claude -p— auth/API error, OOM, killed, a rejected--settings, budget truncation, a badFL_CLAUDEbinary — wrote no failure file. The gate then read as a pass, and the run shippedOUTCOME=green/ exit 0 with the test and security gates never actually having run. This is the exact failure modegate_pipelinealready guards against (it's fail-closed onrc -ne 0).Fix
Generalize
fl_gate_timeout→fl_gate_crashand change its guard fromfl_timed_out "$rc" || return 0to[ "$rc" -ne 0 ] || return 0, so it synthesizes the failure file on any non-zero gate exit, called unconditionally from_run_gate.rc == 0stays a no-op — a clean agent run that wrote nothing still reads as a pass.[ -f "$ff" ] && return 0).Testing
New bats test (
a gate whose claude call crashes is marked failed, not read as green): stubsclaudetoexit 3only on the test-gate prompt without writing a verdict, runs one iteration, and asserts the run ends not-green (exit 1) with a synthesizedfailures/test.mdnamingcrashed (exit 3).bats --print-output-on-failure tests/— green (via feature-loop'stestgate)bash -n bin/feature-loopparses cleanCloses #57