Skip to content

fix(engine): treat a crashed test/security gate as failed, not green (ISSUE-57)#104

Merged
ecukalla merged 1 commit into
mainfrom
feature/ISSUE-57-gate-crash-reads-green
May 31, 2026
Merged

fix(engine): treat a crashed test/security gate as failed, not green (ISSUE-57)#104
ecukalla merged 1 commit into
mainfrom
feature/ISSUE-57-gate-crash-reads-green

Conversation

@ecukalla
Copy link
Copy Markdown
Owner

Problem

The Claude-driven test and security gates decide their verdict purely by "did the agent write its failure file?". The only code that synthesized that file when the agent didn't was fl_gate_timeout, which bailed on every non-timeout exit (fl_timed_out "$rc" || return 0).

So any non-timeout, non-zero exit of claude -p — auth/API error, OOM, killed, a rejected --settings, budget truncation, a bad FL_CLAUDE binary — wrote no failure file. The gate then read as a pass, and the run shipped OUTCOME=green / exit 0 with the test and security gates never actually having run. This is the exact failure mode gate_pipeline already guards against (it's fail-closed on rc -ne 0).

Fix

Generalize fl_gate_timeoutfl_gate_crash and change its guard from fl_timed_out "$rc" || return 0 to [ "$rc" -ne 0 ] || return 0, so it synthesizes the failure file on any non-zero gate exit, called unconditionally from _run_gate.

  • The timeout path (124/137) keeps its distinct "timed out / exceeded FL_PHASE_TIMEOUT" wording.
  • Other non-zero exits get a generic "crashed (exit N)" message.
  • rc == 0 stays a no-op — a clean agent run that wrote nothing still reads as a pass.
  • An existing failure file the agent already wrote is preserved ([ -f "$ff" ] && return 0).

Testing

New bats test (a gate whose claude call crashes is marked failed, not read as green): stubs claude to exit 3 only on the test-gate prompt without writing a verdict, runs one iteration, and asserts the run ends not-green (exit 1) with a synthesized failures/test.md naming crashed (exit 3).

  • bats --print-output-on-failure tests/ — green (via feature-loop's test gate)
  • bash -n bin/feature-loop parses clean

Closes #57

…(ISSUE-57)

A gate's verdict is "did the agent write its failure file?". fl_gate_timeout
only synthesized that file when claude -p exited via timeout (124/137); any
other non-zero exit (auth/API error, OOM, killed, plain crash) left no failure
file, so the test/security gate mis-read as a pass (green).

Generalize the guard (fl_gate_timeout -> fl_gate_crash) to synthesize the
failure file on ANY non-zero gate exit, keeping the timeout-specific message
and adding a "crashed (exit N)" message for other failures. The project gate
already handled non-zero exits this way.

Add a bats test proving a crashing gate ends the run not-green with a
synthesized failure file.
@ecukalla ecukalla self-assigned this May 31, 2026
@ecukalla ecukalla merged commit 008f116 into main May 31, 2026
5 checks passed
@ecukalla ecukalla deleted the feature/ISSUE-57-gate-crash-reads-green branch May 31, 2026 06:30
@ecukalla ecukalla mentioned this pull request May 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[C1] Crashed test/security gate is silently counted as a pass (green-on-failure)

1 participant