Skip to content

fix(harness): require current-run proof for atomic file evidence#1010

Merged
shaun0927 merged 2 commits into
Q00:mainfrom
shaun0927:fix/atomic-verifier-file-proof
May 14, 2026
Merged

fix(harness): require current-run proof for atomic file evidence#1010
shaun0927 merged 2 commits into
Q00:mainfrom
shaun0927:fix/atomic-verifier-file-proof

Conversation

@shaun0927
Copy link
Copy Markdown
Collaborator

Summary

  • Tightens the newly merged Require verifier PASS for fat-harness AC acceptance #1006 atomic verifier boundary so files_touched claims require current-run runtime evidence, not merely an existing file under the active workspace.
  • Keeps active-workspace bounds, but only accepts basename-only file proof when the transcript plausibly reports a write/update/create/generate action.
  • Pins regressions for stale preexisting files, read-only file references, and targeted failed test output.

Why this follow-up exists

While executing the #961/#920 recommended action review after #1006, I found that the merged verifier could still accept a stale workspace file as files_touched proof. That weakens the intended "typed evidence + verifier PASS" invariant because a prior-run file could support a current-run claim.

This PR is a narrow hardening follow-up. It does not change the #961 sequencing state, does not remove legacy self-report fallback, and does not start #978 P5.

Validation

  • PYTHONPATH=src pytest -q tests/unit/orchestrator/test_parallel_executor.py tests/unit/orchestrator/test_verifier.py tests/unit/orchestrator/test_runner.py tests/unit/cli/test_run_qa.py → 237 passed, 1 skipped
  • uv run mypy src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py
  • uv run ruff check src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py tests/unit/orchestrator/test_parallel_executor.py
  • uv run ruff format --check src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py tests/unit/orchestrator/test_parallel_executor.py

Refs #920
Refs #961
Follow-up to #1006

Verifier PASS must prove that the current run touched claimed files, not merely that a matching file exists in the active workspace. Tighten files_touched support to require non-final runtime transcript evidence and pin tests for stale/read-only file references plus failed targeted test output.\n\nConstraint: Q00#920 now treats typed evidence plus verifier PASS as the atomic AC acceptance invariant.\nRejected: Accepting existing workspace files as proof | stale files can make fabricated files_touched claims pass.\nConfidence: high\nScope-risk: narrow\nDirective: Keep files_touched proof tied to mutation-shaped runtime evidence; do not fall back to broad path mentions or final-message-only claims.\nTested: PYTHONPATH=src pytest -q tests/unit/orchestrator/test_parallel_executor.py tests/unit/orchestrator/test_verifier.py tests/unit/orchestrator/test_runner.py tests/unit/cli/test_run_qa.py\nTested: uv run mypy src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py\nTested: uv run ruff check src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py tests/unit/orchestrator/test_parallel_executor.py\nTested: uv run ruff format --check src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py tests/unit/orchestrator/test_parallel_executor.py\nNot-tested: Live adapter transcript variants outside unit-covered AgentMessage shapes.
@shaun0927
Copy link
Copy Markdown
Collaborator Author

@ouroboros-agent please review this narrow follow-up to merged #1006. It closes the stale-workspace-file proof gap while preserving the #961/#920 boundary: typed evidence + verifier PASS, no #978 P5 legacy removal.

The files_touched verifier path already records unsupported evidence and continues, so the immediately repeated files_touched branch is unreachable. Removing it keeps the Q00#1010 hardening focused without changing behavior.\n\nConstraint: Q00#1010 must remain a narrow follow-up to Q00#1006 and preserve the Q00#978 P5 boundary.\nRejected: Leave duplicate branch in place | it creates reviewer noise in a safety-sensitive verifier path.\nConfidence: high\nScope-risk: narrow\nDirective: Keep future files_touched changes tied to explicit runtime-evidence proof and regression tests.\nTested: uv run pytest -q tests/unit/orchestrator/test_parallel_executor.py tests/unit/orchestrator/test_verifier.py tests/unit/orchestrator/test_runner.py tests/unit/cli/test_run_qa.py\nTested: uv run mypy src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py\nTested: uv run ruff check src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py tests/unit/orchestrator/test_parallel_executor.py\nTested: uv run ruff format --check src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py tests/unit/orchestrator/test_parallel_executor.py\nNot-tested: Full repository test suite.
@shaun0927
Copy link
Copy Markdown
Collaborator Author

Merge-readiness review for #1010:

  • Direction: aligned with Meta SSOT: AgentOS roadmap sequencing (#920–#960) #961/Agent OS roadmap: make ooo run trustworthy with a fat harness execution path #920/Require verifier PASS for fat-harness AC acceptance #1006. This remains a narrow current-run proof hardening for atomic files_touched; it does not remove legacy self-report fallback and does not start Design spine: AgentOS evidence-gated delivery via TraceGuard #978 P5.
  • Cleanup: added commit b5ce0e706 to remove an unreachable duplicate files_touched branch in the verifier path.
  • AI-slop cleanup gate: passed. Scope stayed limited to parallel_executor.py / test_parallel_executor.py; no new fallback or abstraction introduced.
  • Code review: APPROVE / architectural status CLEAR. No CRITICAL/HIGH/MEDIUM/LOW findings remain.
  • Local verification after cleanup:
    • uv run pytest -q tests/unit/orchestrator/test_parallel_executor.py tests/unit/orchestrator/test_verifier.py tests/unit/orchestrator/test_runner.py tests/unit/cli/test_run_qa.py → 237 passed, 1 skipped
    • uv run mypy src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py → success
    • uv run ruff check ... → passed
    • uv run ruff format --check ... → passed
  • GitHub checks: Ruff, MyPy, Python 3.12/3.13/3.14, Bridge TypeScript, enforce-envelope, enforce-boundary all pass.
  • Merge state: CLEAN / MERGEABLE.

Proceeding with merge as the bounded #1006 hardening follow-up. #978 P5 remains gated by the #961 release-cycle observation rule.

@shaun0927 shaun0927 merged commit 1b0af4d into Q00:main May 14, 2026
8 checks passed
shaun0927 added a commit that referenced this pull request May 14, 2026
Fat-harness verifier output can include explicit zero-failure phrases such as 0 failed, 0 errors, or no errors. Treat those as successful summaries while still rejecting non-zero failures and errors so atomic acceptance avoids avoidable false negatives.

Constraint: #1010 closed current-run file proof hardening, but verifier test-output parsing still needed the remaining zero-failure false-negative fix before the observation window was clean.
Rejected: Blanket allow any output containing passed | would re-accept 1 failed, 3 passed mixed results.
Confidence: high
Scope-risk: narrow
Directive: Keep verifier success parsing conservative for non-zero failures while allowing explicit zero-failure summaries.
Tested: uv run pytest -q tests/unit/orchestrator/test_parallel_executor.py -k 'zero_failure_summaries or fat_harness or observe_only or typed_evidence'; uv run pytest -q tests/unit/orchestrator/test_parallel_executor.py tests/unit/orchestrator/test_verifier.py tests/unit/orchestrator/test_runner.py tests/unit/cli/test_run_qa.py; uv run ruff check src/ouroboros/orchestrator/parallel_executor.py tests/unit/orchestrator/test_parallel_executor.py; uv run mypy src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py
Not-tested: live ooo run execution.
Co-authored-by: OmX <omx@oh-my-codex.dev>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant