Skip to content

fix(harness): accept zero-failure verifier summaries#1018

Merged
shaun0927 merged 1 commit into
Q00:mainfrom
shaun0927:fix/verifier-zero-error-summaries
May 14, 2026
Merged

fix(harness): accept zero-failure verifier summaries#1018
shaun0927 merged 1 commit into
Q00:mainfrom
shaun0927:fix/verifier-zero-error-summaries

Conversation

@shaun0927
Copy link
Copy Markdown
Collaborator

Summary

Narrow follow-up after #1006/#1010. The fat-harness verifier should not reject common successful test summaries solely because they contain explicit zero-failure text.

What changed

  • Accepts explicit zero-failure summaries such as:
    • 0 failed, 3 passed
    • 0 failed, 0 errors, 1 passed
    • no errors, 3 passed
    • no tests failed, 3 passed
  • Continues rejecting non-zero failure/error output such as:
    • 1 failed, 3 passed
    • 2 errors, 1 passed
  • Adds direct regression coverage for the success parser.

Boundary

Validation

  • uv run pytest -q tests/unit/orchestrator/test_parallel_executor.py -k 'zero_failure_summaries or fat_harness or observe_only or typed_evidence' → 24 passed
  • uv run pytest -q tests/unit/orchestrator/test_parallel_executor.py tests/unit/orchestrator/test_verifier.py tests/unit/orchestrator/test_runner.py tests/unit/cli/test_run_qa.py → 245 passed, 1 skipped
  • uv run ruff check src/ouroboros/orchestrator/parallel_executor.py tests/unit/orchestrator/test_parallel_executor.py → passed
  • uv run mypy src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py → passed

Refs #961, #920, #1006, #1010.

Fat-harness verifier output can include explicit zero-failure phrases such as 0 failed, 0 errors, or no errors. Treat those as successful summaries while still rejecting non-zero failures and errors so atomic acceptance avoids avoidable false negatives.

Constraint: Q00#1010 closed current-run file proof hardening, but verifier test-output parsing still needs the remaining zero-failure false-negative fix before the observation window is clean.

Rejected: Blanket allow any output containing passed | would re-accept 1 failed, 3 passed mixed results.

Confidence: high

Scope-risk: narrow

Directive: Keep verifier success parsing conservative for non-zero failures while allowing explicit zero-failure summaries.

Tested: uv run pytest -q tests/unit/orchestrator/test_parallel_executor.py -k 'zero_failure_summaries or fat_harness or observe_only or typed_evidence'; uv run pytest -q tests/unit/orchestrator/test_parallel_executor.py tests/unit/orchestrator/test_verifier.py tests/unit/orchestrator/test_runner.py tests/unit/cli/test_run_qa.py; uv run ruff check src/ouroboros/orchestrator/parallel_executor.py tests/unit/orchestrator/test_parallel_executor.py; uv run mypy src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py

Not-tested: live ooo run execution.

Co-authored-by: OmX <omx@oh-my-codex.dev>
@shaun0927
Copy link
Copy Markdown
Collaborator Author

@ouroboros-agent please review this narrow follow-up after #1006/#1010. It only fixes verifier false-negatives for explicit zero-failure test summaries while preserving rejection of non-zero failures/errors.

Copy link
Copy Markdown
Contributor

@ouroboros-agent ouroboros-agent Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review — ouroboros-agent[bot]

Verdict: REQUEST_CHANGES

Reviewing commit 289bdca for PR #1018

Review record: 4164f1d3-896d-4141-bee1-1508970a27f6

Blocking Findings

| # | File:Line | Severity | Finding |
|### Recovery Notes
First recoverable review artifact generated from codex analysis log.

---|-----------|----------|---------|
| 1 | src/ouroboros/orchestrator/parallel_executor.py:288 | BLOCKING | The new failure regex no longer rejects a standalone failed token. The old guard caught cases like FAILED tests/test_x.py::test_y or tests failed unless they were explicitly 0 failed / no tests failed; after this patch, those strings survive zero_failure_pattern but do not match `\b(failure |

Non-blocking Suggestions

None.

Design Notes

The change is appropriately narrow, but the verifier logic is regex-fragile: expanding the zero-failure allowlist also removed an important negative signal. This path needs tests for both summary-style failures and raw pytest FAILED ... lines.


Reviewed by ouroboros-agent[bot] via Codex deep analysis

@shaun0927 shaun0927 merged commit a695192 into Q00:main May 14, 2026
8 checks passed
shaun0927 added a commit that referenced this pull request May 14, 2026
Restore the negative signal lost in #1018 while preserving explicit zero-failure summary handling.

Constraint: #1018 intentionally accepts zero-failure summaries, but the verifier must still reject raw pytest FAILED lines and generic tests-failed output before #978 P5 readiness.

Rejected: Reverting #1018 | zero-failure summaries are valid success evidence and only the standalone failed token regression needed correction.

Confidence: high

Scope-risk: narrow

Directive: Keep zero-failure allowlisting paired with explicit failed-token rejection tests whenever changing verifier output parsing.

Tested: uv run pytest -q tests/unit/orchestrator/test_parallel_executor.py -k 'zero_failure_summaries or fat_harness or observe_only or typed_evidence'; uv run pytest -q tests/unit/orchestrator/test_parallel_executor.py tests/unit/orchestrator/test_verifier.py tests/unit/orchestrator/test_runner.py tests/unit/cli/test_run_qa.py; uv run ruff check src/ouroboros/orchestrator/parallel_executor.py tests/unit/orchestrator/test_parallel_executor.py; uv run mypy src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py

Not-tested: live runtime transcript containing provider-specific pytest formatting
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant