fix(harness): accept zero-failure verifier summaries#1018
Conversation
Fat-harness verifier output can include explicit zero-failure phrases such as 0 failed, 0 errors, or no errors. Treat those as successful summaries while still rejecting non-zero failures and errors so atomic acceptance avoids avoidable false negatives. Constraint: Q00#1010 closed current-run file proof hardening, but verifier test-output parsing still needs the remaining zero-failure false-negative fix before the observation window is clean. Rejected: Blanket allow any output containing passed | would re-accept 1 failed, 3 passed mixed results. Confidence: high Scope-risk: narrow Directive: Keep verifier success parsing conservative for non-zero failures while allowing explicit zero-failure summaries. Tested: uv run pytest -q tests/unit/orchestrator/test_parallel_executor.py -k 'zero_failure_summaries or fat_harness or observe_only or typed_evidence'; uv run pytest -q tests/unit/orchestrator/test_parallel_executor.py tests/unit/orchestrator/test_verifier.py tests/unit/orchestrator/test_runner.py tests/unit/cli/test_run_qa.py; uv run ruff check src/ouroboros/orchestrator/parallel_executor.py tests/unit/orchestrator/test_parallel_executor.py; uv run mypy src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py Not-tested: live ooo run execution. Co-authored-by: OmX <omx@oh-my-codex.dev>
There was a problem hiding this comment.
Review — ouroboros-agent[bot]
Verdict: REQUEST_CHANGES
Reviewing commit
289bdcafor PR #1018
Review record:
4164f1d3-896d-4141-bee1-1508970a27f6
Blocking Findings
| # | File:Line | Severity | Finding |
|### Recovery Notes
First recoverable review artifact generated from codex analysis log.
---|-----------|----------|---------|
| 1 | src/ouroboros/orchestrator/parallel_executor.py:288 | BLOCKING | The new failure regex no longer rejects a standalone failed token. The old guard caught cases like FAILED tests/test_x.py::test_y or tests failed unless they were explicitly 0 failed / no tests failed; after this patch, those strings survive zero_failure_pattern but do not match `\b(failure |
Non-blocking Suggestions
None.
Design Notes
The change is appropriately narrow, but the verifier logic is regex-fragile: expanding the zero-failure allowlist also removed an important negative signal. This path needs tests for both summary-style failures and raw pytest FAILED ... lines.
Reviewed by ouroboros-agent[bot] via Codex deep analysis
Restore the negative signal lost in #1018 while preserving explicit zero-failure summary handling. Constraint: #1018 intentionally accepts zero-failure summaries, but the verifier must still reject raw pytest FAILED lines and generic tests-failed output before #978 P5 readiness. Rejected: Reverting #1018 | zero-failure summaries are valid success evidence and only the standalone failed token regression needed correction. Confidence: high Scope-risk: narrow Directive: Keep zero-failure allowlisting paired with explicit failed-token rejection tests whenever changing verifier output parsing. Tested: uv run pytest -q tests/unit/orchestrator/test_parallel_executor.py -k 'zero_failure_summaries or fat_harness or observe_only or typed_evidence'; uv run pytest -q tests/unit/orchestrator/test_parallel_executor.py tests/unit/orchestrator/test_verifier.py tests/unit/orchestrator/test_runner.py tests/unit/cli/test_run_qa.py; uv run ruff check src/ouroboros/orchestrator/parallel_executor.py tests/unit/orchestrator/test_parallel_executor.py; uv run mypy src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py Not-tested: live runtime transcript containing provider-specific pytest formatting
Summary
Narrow follow-up after #1006/#1010. The fat-harness verifier should not reject common successful test summaries solely because they contain explicit zero-failure text.
What changed
0 failed, 3 passed0 failed, 0 errors, 1 passedno errors, 3 passedno tests failed, 3 passed1 failed, 3 passed2 errors, 1 passedBoundary
ooo runtrustworthy with a fat harness execution path #920/Meta SSOT: AgentOS roadmap sequencing (#920–#960) #961 acceptance invariant.Validation
uv run pytest -q tests/unit/orchestrator/test_parallel_executor.py -k 'zero_failure_summaries or fat_harness or observe_only or typed_evidence'→ 24 passeduv run pytest -q tests/unit/orchestrator/test_parallel_executor.py tests/unit/orchestrator/test_verifier.py tests/unit/orchestrator/test_runner.py tests/unit/cli/test_run_qa.py→ 245 passed, 1 skippeduv run ruff check src/ouroboros/orchestrator/parallel_executor.py tests/unit/orchestrator/test_parallel_executor.py→ passeduv run mypy src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py→ passedRefs #961, #920, #1006, #1010.