Require verifier PASS for fat-harness AC acceptance#1006
Conversation
|
@ouroboros-agent please review this clean replacement for #1004. It preserves the final fixed diff in a single commit and CI is green. |
Completes the remaining Q00#920 atomic acceptance invariant by requiring schema-valid typed evidence plus a separate verifier PASS before fat-harness AC success. The verifier rejects final-message-only self-reporting, keeps observe-only mode isolated, classifies operational verifier failures, and bounds fallback proof to runtime transcript or active-workspace evidence. Constraint: Q00#961 keeps Q00#978 P5 legacy self-report removal time-gated behind release-cycle observation; this PR does not remove that fallback. Rejected: Accept schema-valid final-message JSON as verifier proof | that preserves the self-report gap Q00#920 is closing. Rejected: Merge the noisy iterative Q00#1004 review branch | a clean single-commit PR is easier for ouroboros-agent and warden to review. Confidence: high Scope-risk: narrow Directive: Keep future verifier enhancements behind this PASS boundary; do not weaken tests_passed into a global successful-test boolean or allow path proof outside the active workspace. Tested: PYTHONPATH=src pytest -q tests/unit/orchestrator/test_parallel_executor.py tests/unit/orchestrator/test_verifier.py tests/unit/orchestrator/test_runner.py tests/unit/cli/test_run_qa.py Tested: uv run mypy src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py Tested: uv run ruff check src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py tests/unit/orchestrator/test_parallel_executor.py Tested: uv run ruff format --check src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py tests/unit/orchestrator/test_parallel_executor.py Not-tested: Real adapter transcript variants beyond the AgentMessage shapes covered in unit tests. Co-authored-by: OmX <omx@oh-my-codex.dev>
6528682 to
ff445b2
Compare
|
Roadmap gate check: no #961 gate block from warden. Evidence checked:
This is not an approval and not a merge recommendation; it only records that the PR is not blocked by the roadmap sequencing gate. Posted by agentos-roadmap-warden — bot. Reply with |
Verifier PASS must prove the current run, not merely observe that a claimed file exists in the workspace. Tighten file evidence support to require transcript backing and keep test claims tied to successful adjacent test output. Constraint: Q00#920 atomic acceptance now depends on typed evidence plus verifier PASS in the live fat-harness path. Rejected: Accepting existing workspace files as proof | stale files can make fabricated files_touched claims pass. Confidence: high Scope-risk: narrow Directive: Do not weaken atomic verifier support checks without preserving final-message-only and stale-file rejection coverage. Tested: uv run pytest -q tests/unit/orchestrator/test_parallel_executor.py tests/unit/orchestrator/test_verifier.py tests/unit/orchestrator/test_runner.py tests/unit/cli/test_run_qa.py; uv run mypy src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py; uv run ruff check src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py tests/unit/orchestrator/test_parallel_executor.py; uv run ruff format --check src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py tests/unit/orchestrator/test_parallel_executor.py Not-tested: Full repository test suite.
|
Merge-readiness check for #920 PR-6 / #1006:
Proceeding with merge as the narrow #920 completion slice. #978 P5 remains blocked on the release-cycle observation window. |
|
Follow-up: immediate merge is blocked by repository branch policy, and repository auto-merge is disabled. I did not use admin bypass. Current state:
Boundary remains unchanged: do not start #978 P5 or remove legacy self-report until #1006 is merged and the release-cycle observation window has evidence. |
|
Follow-up review fix pushed in e3fbfaf.\n\nWhat changed:\n- Tightened verifier support so an existing workspace file is not enough; the current runtime transcript must also back the file claim. Basename fallback now only accepts plausible file-touch messages (Edit/Write/NotebookEdit or modification verbs), preventing stale files from satisfying atomic evidence.\n- Tightened handling so failed/unsupported targeted test claims do not fall through to generic substring support.\n- Added regressions for stale preexisting file claims and targeted failed test commands; updated positive fixtures to include explicit successful test output.\n\nValidation:\n- ........................................................................ [ 30%] |
|
Correction to the previous comment: shell quoting stripped the inline code spans. The intended summary is: Follow-up review fix pushed in What changed:
Validation:
Boundary remains unchanged: this is still the narrow #920 verifier-PASS completion slice; no legacy self-report removal / no #978 P5. |
|
Follow-up review update for #920 PR-6 / #1006: During the local merge-readiness pass, I found and fixed one verifier false-positive/false-negative edge at new head
Fresh local validation on
Merge boundary remains the same: this still only completes #920 typed evidence + verifier PASS and does not start #978 P5 or remove the legacy fallback. Wait for the refreshed GitHub Python matrix and review signal before merge. |
|
@ouroboros-agent Please re-review current head |
Follow-up validation after local reviewI found and fixed one acceptance-proof gap while reviewing the #920 PR-6 invariant:
Pushed commit: Validation:
Roadmap boundary remains unchanged: this still completes the narrow #920 atomic verifier-PASS acceptance invariant, keeps legacy self-report fallback, and does not start #978 P5. |
Update after merge#1006 merged at head The stale-file proof gap described below is therefore tracked and fixed in #1010, not in merged #1006. Follow-up hardening after #961/#920 action reviewI re-ran the recommended #1006 verification and found one important acceptance-boundary gap before merge-readiness:
Follow-up PR #1010:
Validation for #1010 is green locally and in GitHub CI. Remaining merge gate for #1010: submitted code review. |
|
Updated #1006 with follow-up hardening on head |
|
Tightened the #1006 review-risk area on head e3fbfaf: file evidence now requires runtime transcript support and can no longer pass from mere workspace existence. What changed:
Validation on head e3fbfaf:
Ready for re-review. The remaining milestone boundary is unchanged: this closes the #920 atomic typed-evidence + verifier-PASS acceptance invariant, but does not remove legacy self-report fallback or start #978 P5. |
|
@ouroboros-agent please review the updated head |
Fat-harness verifier output can include explicit zero-failure phrases such as 0 failed or no errors. Treat those as successful summaries while still rejecting non-zero failures and errors so atomic acceptance avoids avoidable false negatives. Constraint: Q00#1006 default fat-harness acceptance depends on transcript-backed test proof without broadening failed mixed-output acceptance. Rejected: Blanket allow any output containing passed | would re-accept 1 failed, 3 passed mixed results. Confidence: high Scope-risk: narrow Directive: Keep verifier success parsing conservative for non-zero failures while allowing explicit zero-failure summaries. Tested: uv run pytest -q tests/unit/orchestrator/test_parallel_executor.py -k 'zero_failure_summaries or fat_harness or observe_only or typed_evidence'; uv run pytest -q tests/unit/orchestrator/test_parallel_executor.py tests/unit/orchestrator/test_verifier.py tests/unit/orchestrator/test_runner.py tests/unit/cli/test_run_qa.py; uv run ruff check src/ouroboros/orchestrator/parallel_executor.py tests/unit/orchestrator/test_parallel_executor.py; uv run mypy src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py Not-tested: live ooo run execution. Co-authored-by: OmX <omx@oh-my-codex.dev>
Fat-harness verifier output can include explicit zero-failure phrases such as 0 failed, 0 errors, or no errors. Treat those as successful summaries while still rejecting non-zero failures and errors so atomic acceptance avoids avoidable false negatives. Constraint: Q00#1006 default fat-harness acceptance depends on transcript-backed test proof without broadening failed mixed-output acceptance. Rejected: Blanket allow any output containing passed | would re-accept 1 failed, 3 passed mixed results. Confidence: high Scope-risk: narrow Directive: Keep verifier success parsing conservative for non-zero failures while allowing explicit zero-failure summaries. Tested: uv run pytest -q tests/unit/orchestrator/test_parallel_executor.py -k 'zero_failure_summaries or fat_harness or observe_only or typed_evidence'; uv run pytest -q tests/unit/orchestrator/test_parallel_executor.py tests/unit/orchestrator/test_verifier.py tests/unit/orchestrator/test_runner.py tests/unit/cli/test_run_qa.py; uv run ruff check src/ouroboros/orchestrator/parallel_executor.py tests/unit/orchestrator/test_parallel_executor.py; uv run mypy src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py Not-tested: live ooo run execution. Co-authored-by: OmX <omx@oh-my-codex.dev>
Verifier PASS must be backed by evidence from the current run, not by read-only mentions, stale files, or zero-passed test summaries. Constraint: Q00#1006 merged the live fat-harness atomic verifier gate; this follow-up keeps that gate from accepting weak structural evidence. Rejected: Generic substring fallback for files_touched and tests_passed | read-only file mentions and zero-pass test output can otherwise satisfy typed evidence. Confidence: high Scope-risk: narrow Directive: Keep files_touched tied to plausible write/generate transcript evidence and tests_passed tied to nonzero successful test output. Tested: uv run pytest tests/unit/orchestrator/test_parallel_executor.py tests/unit/orchestrator/test_verifier.py -q; PYTHONPATH=src pytest -q tests/unit/orchestrator/test_parallel_executor.py tests/unit/orchestrator/test_verifier.py tests/unit/orchestrator/test_runner.py tests/unit/cli/test_run_qa.py; uv run mypy src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py; uv run ruff check src/ouroboros/orchestrator/parallel_executor.py tests/unit/orchestrator/test_parallel_executor.py; uv run ruff format --check src/ouroboros/orchestrator/parallel_executor.py tests/unit/orchestrator/test_parallel_executor.py Not-tested: Full repository test suite.
|
Updated the source branch with the remaining verifier false-negative fix. New branch head: What changed:
Local validation on the updated branch:
Note: GitHub GraphQL currently shows @ouroboros-agent please review the updated source branch head once the PR metadata/checks synchronize. |
The files_touched verifier path already records unsupported evidence and continues, so the immediately repeated files_touched branch is unreachable. Removing it keeps the Q00#1010 hardening focused without changing behavior.\n\nConstraint: Q00#1010 must remain a narrow follow-up to Q00#1006 and preserve the Q00#978 P5 boundary.\nRejected: Leave duplicate branch in place | it creates reviewer noise in a safety-sensitive verifier path.\nConfidence: high\nScope-risk: narrow\nDirective: Keep future files_touched changes tied to explicit runtime-evidence proof and regression tests.\nTested: uv run pytest -q tests/unit/orchestrator/test_parallel_executor.py tests/unit/orchestrator/test_verifier.py tests/unit/orchestrator/test_runner.py tests/unit/cli/test_run_qa.py\nTested: uv run mypy src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py\nTested: uv run ruff check src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py tests/unit/orchestrator/test_parallel_executor.py\nTested: uv run ruff format --check src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py tests/unit/orchestrator/test_parallel_executor.py\nNot-tested: Full repository test suite.
The files_touched verifier path already records unsupported evidence and continues, so the immediately repeated files_touched branch is unreachable. Removing it keeps the #1010 hardening focused without changing behavior.\n\nConstraint: #1010 must remain a narrow follow-up to #1006 and preserve the #978 P5 boundary.\nRejected: Leave duplicate branch in place | it creates reviewer noise in a safety-sensitive verifier path.\nConfidence: high\nScope-risk: narrow\nDirective: Keep future files_touched changes tied to explicit runtime-evidence proof and regression tests.\nTested: uv run pytest -q tests/unit/orchestrator/test_parallel_executor.py tests/unit/orchestrator/test_verifier.py tests/unit/orchestrator/test_runner.py tests/unit/cli/test_run_qa.py\nTested: uv run mypy src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py\nTested: uv run ruff check src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py tests/unit/orchestrator/test_parallel_executor.py\nTested: uv run ruff format --check src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py tests/unit/orchestrator/test_parallel_executor.py\nNot-tested: Full repository test suite.
Keep the Q00#946/Q00#990 MCP projection surface evidence-shaped by making caller-provided seed IDs explicit and rejecting session-only queries that would splice metadata-less multi-execution payloads. Constraint: Q00#961 keeps AgentOS substrate changes narrow and folded into canonical Q00#946 instead of opening new roadmap surfaces. Rejected: Folding this into Q00#1006 verifier PASS work | projection read-model hardening and atomic AC acceptance are separate review surfaces. Confidence: high Scope-risk: narrow Directive: Keep projection query handlers read-only and fail-closed on ambiguous run boundaries. Tested: uv run pytest tests/unit/mcp/tools/test_definitions.py::TestProjectionQueryHandler -q; uv run pytest tests/unit/persistence/test_event_store.py::TestSessionRelatedEvents -q; uv run mypy src/ouroboros/mcp/tools/projection_handlers.py tests/unit/mcp/tools/test_definitions.py; uv run ruff check src/ouroboros/mcp/tools/projection_handlers.py tests/unit/mcp/tools/test_definitions.py Not-tested: Full repository test suite.
Keep the #946/#990 MCP projection surface evidence-shaped by making caller-provided seed IDs explicit and rejecting session-only queries that would splice metadata-less multi-execution payloads. Constraint: #961 keeps AgentOS substrate changes narrow and folded into canonical #946 instead of opening new roadmap surfaces. Rejected: Folding this into #1006 verifier PASS work | projection read-model hardening and atomic AC acceptance are separate review surfaces. Confidence: high Scope-risk: narrow Directive: Keep projection query handlers read-only and fail-closed on ambiguous run boundaries. Tested: uv run pytest tests/unit/mcp/tools/test_definitions.py::TestProjectionQueryHandler -q; uv run pytest tests/unit/persistence/test_event_store.py::TestSessionRelatedEvents -q; uv run mypy src/ouroboros/mcp/tools/projection_handlers.py tests/unit/mcp/tools/test_definitions.py; uv run ruff check src/ouroboros/mcp/tools/projection_handlers.py tests/unit/mcp/tools/test_definitions.py Not-tested: Full repository test suite.
Summary
Clean replacement for #1004. Completes the remaining #920 atomic acceptance invariant by requiring a separate verifier PASS after profile typed-evidence validation in the live
parallel_executoratomic AC boundary.Scope
ACExecutionResultandexecution.ac.typed_evidence.observed.VerifierVerdict.passed == true.TimeoutError,OSError, subprocess errors) into typed STALL verifier verdicts while still surfacing verifier programming bugs.task_cwdor adapter working directory), rejecting absolute paths and parent traversal.tests_passedper claim against the adjacent transcript chunk of a backed successful test command; failed mixed output such as1 failed, 3 passedremains rejected.Milestone boundaries
ooo runtrustworthy with a fat harness execution path #920 completion slice for the unchecked success criterion: typed evidence and verifier PASS before atomic AC acceptance.Validation
PYTHONPATH=src pytest -q tests/unit/orchestrator/test_parallel_executor.py tests/unit/orchestrator/test_verifier.py tests/unit/orchestrator/test_runner.py tests/unit/cli/test_run_qa.py→ 233 passed, 1 skippeduv run mypy src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.pyuv run ruff check src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py tests/unit/orchestrator/test_parallel_executor.pyuv run ruff format --check src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py tests/unit/orchestrator/test_parallel_executor.py