fix(harness): require current-run proof for atomic file evidence#1010
Merged
Conversation
Verifier PASS must prove that the current run touched claimed files, not merely that a matching file exists in the active workspace. Tighten files_touched support to require non-final runtime transcript evidence and pin tests for stale/read-only file references plus failed targeted test output.\n\nConstraint: Q00#920 now treats typed evidence plus verifier PASS as the atomic AC acceptance invariant.\nRejected: Accepting existing workspace files as proof | stale files can make fabricated files_touched claims pass.\nConfidence: high\nScope-risk: narrow\nDirective: Keep files_touched proof tied to mutation-shaped runtime evidence; do not fall back to broad path mentions or final-message-only claims.\nTested: PYTHONPATH=src pytest -q tests/unit/orchestrator/test_parallel_executor.py tests/unit/orchestrator/test_verifier.py tests/unit/orchestrator/test_runner.py tests/unit/cli/test_run_qa.py\nTested: uv run mypy src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py\nTested: uv run ruff check src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py tests/unit/orchestrator/test_parallel_executor.py\nTested: uv run ruff format --check src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py tests/unit/orchestrator/test_parallel_executor.py\nNot-tested: Live adapter transcript variants outside unit-covered AgentMessage shapes.
The files_touched verifier path already records unsupported evidence and continues, so the immediately repeated files_touched branch is unreachable. Removing it keeps the Q00#1010 hardening focused without changing behavior.\n\nConstraint: Q00#1010 must remain a narrow follow-up to Q00#1006 and preserve the Q00#978 P5 boundary.\nRejected: Leave duplicate branch in place | it creates reviewer noise in a safety-sensitive verifier path.\nConfidence: high\nScope-risk: narrow\nDirective: Keep future files_touched changes tied to explicit runtime-evidence proof and regression tests.\nTested: uv run pytest -q tests/unit/orchestrator/test_parallel_executor.py tests/unit/orchestrator/test_verifier.py tests/unit/orchestrator/test_runner.py tests/unit/cli/test_run_qa.py\nTested: uv run mypy src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py\nTested: uv run ruff check src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py tests/unit/orchestrator/test_parallel_executor.py\nTested: uv run ruff format --check src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py tests/unit/orchestrator/test_parallel_executor.py\nNot-tested: Full repository test suite.
Collaborator
Author
|
Merge-readiness review for #1010:
Proceeding with merge as the bounded #1006 hardening follow-up. #978 P5 remains gated by the #961 release-cycle observation rule. |
This was referenced May 14, 2026
shaun0927
added a commit
that referenced
this pull request
May 14, 2026
Fat-harness verifier output can include explicit zero-failure phrases such as 0 failed, 0 errors, or no errors. Treat those as successful summaries while still rejecting non-zero failures and errors so atomic acceptance avoids avoidable false negatives. Constraint: #1010 closed current-run file proof hardening, but verifier test-output parsing still needed the remaining zero-failure false-negative fix before the observation window was clean. Rejected: Blanket allow any output containing passed | would re-accept 1 failed, 3 passed mixed results. Confidence: high Scope-risk: narrow Directive: Keep verifier success parsing conservative for non-zero failures while allowing explicit zero-failure summaries. Tested: uv run pytest -q tests/unit/orchestrator/test_parallel_executor.py -k 'zero_failure_summaries or fat_harness or observe_only or typed_evidence'; uv run pytest -q tests/unit/orchestrator/test_parallel_executor.py tests/unit/orchestrator/test_verifier.py tests/unit/orchestrator/test_runner.py tests/unit/cli/test_run_qa.py; uv run ruff check src/ouroboros/orchestrator/parallel_executor.py tests/unit/orchestrator/test_parallel_executor.py; uv run mypy src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py Not-tested: live ooo run execution. Co-authored-by: OmX <omx@oh-my-codex.dev>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
files_touchedclaims require current-run runtime evidence, not merely an existing file under the active workspace.Why this follow-up exists
While executing the #961/#920 recommended action review after #1006, I found that the merged verifier could still accept a stale workspace file as
files_touchedproof. That weakens the intended "typed evidence + verifier PASS" invariant because a prior-run file could support a current-run claim.This PR is a narrow hardening follow-up. It does not change the #961 sequencing state, does not remove legacy self-report fallback, and does not start #978 P5.
Validation
PYTHONPATH=src pytest -q tests/unit/orchestrator/test_parallel_executor.py tests/unit/orchestrator/test_verifier.py tests/unit/orchestrator/test_runner.py tests/unit/cli/test_run_qa.py→ 237 passed, 1 skippeduv run mypy src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.pyuv run ruff check src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py tests/unit/orchestrator/test_parallel_executor.pyuv run ruff format --check src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py tests/unit/orchestrator/test_parallel_executor.pyRefs #920
Refs #961
Follow-up to #1006