Require verifier PASS for fat-harness AC acceptance by shaun0927 · Pull Request #1006 · Q00/ouroboros

shaun0927 · 2026-05-14T02:17:49Z

Summary

Clean replacement for #1004. Completes the remaining #920 atomic acceptance invariant by requiring a separate verifier PASS after profile typed-evidence validation in the live parallel_executor atomic AC boundary.

Scope

Adds atomic verifier verdict telemetry to ACExecutionResult and execution.ac.typed_evidence.observed.
Fat-harness acceptance now requires:
1. runtime success,
2. profile typed evidence present,
3. profile evidence validation passing,
4. a separate VerifierVerdict.passed == true.
Keeps observe-only mode isolated: injected verifiers do not run unless fat-harness enforcement is active.
Converts operational verifier failures (TimeoutError, OSError, subprocess errors) into typed STALL verifier verdicts while still surfacing verifier programming bugs.
Default verifier rejects final-message-only self-reporting and supports evidence only through non-final runtime transcript evidence or active-workspace file evidence.
Bounds file proof to the active workspace (task_cwd or adapter working directory), rejecting absolute paths and parent traversal.
Verifies tests_passed per claim against the adjacent transcript chunk of a backed successful test command; failed mixed output such as 1 failed, 3 passed remains rejected.

Milestone boundaries

This is a narrow Agent OS roadmap: make ooo run trustworthy with a fat harness execution path #920 completion slice for the unchecked success criterion: typed evidence and verifier PASS before atomic AC acceptance.
It does not remove the legacy self-report fallback.
It does not start Design spine: AgentOS evidence-gated delivery via TraceGuard #978 P5; legacy removal remains blocked on the Meta SSOT: AgentOS roadmap sequencing (#920–#960) #961 release-cycle observation window.
Require verifier PASS for fat-harness AC acceptance #1004 is superseded only to give bots a clean, single-commit review surface after the iterative review loop became stale.

Validation

PYTHONPATH=src pytest -q tests/unit/orchestrator/test_parallel_executor.py tests/unit/orchestrator/test_verifier.py tests/unit/orchestrator/test_runner.py tests/unit/cli/test_run_qa.py → 233 passed, 1 skipped
uv run mypy src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py
uv run ruff check src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py tests/unit/orchestrator/test_parallel_executor.py
uv run ruff format --check src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py tests/unit/orchestrator/test_parallel_executor.py

shaun0927 · 2026-05-14T02:21:33Z

@ouroboros-agent please review this clean replacement for #1004. It preserves the final fixed diff in a single commit and CI is green.

Completes the remaining Q00#920 atomic acceptance invariant by requiring schema-valid typed evidence plus a separate verifier PASS before fat-harness AC success. The verifier rejects final-message-only self-reporting, keeps observe-only mode isolated, classifies operational verifier failures, and bounds fallback proof to runtime transcript or active-workspace evidence. Constraint: Q00#961 keeps Q00#978 P5 legacy self-report removal time-gated behind release-cycle observation; this PR does not remove that fallback. Rejected: Accept schema-valid final-message JSON as verifier proof | that preserves the self-report gap Q00#920 is closing. Rejected: Merge the noisy iterative Q00#1004 review branch | a clean single-commit PR is easier for ouroboros-agent and warden to review. Confidence: high Scope-risk: narrow Directive: Keep future verifier enhancements behind this PASS boundary; do not weaken tests_passed into a global successful-test boolean or allow path proof outside the active workspace. Tested: PYTHONPATH=src pytest -q tests/unit/orchestrator/test_parallel_executor.py tests/unit/orchestrator/test_verifier.py tests/unit/orchestrator/test_runner.py tests/unit/cli/test_run_qa.py Tested: uv run mypy src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py Tested: uv run ruff check src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py tests/unit/orchestrator/test_parallel_executor.py Tested: uv run ruff format --check src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py tests/unit/orchestrator/test_parallel_executor.py Not-tested: Real adapter transcript variants beyond the AgentMessage shapes covered in unit tests. Co-authored-by: OmX <omx@oh-my-codex.dev>

Q00 · 2026-05-14T03:46:28Z

Roadmap gate check: no #961 gate block from warden.

Evidence checked:

Require verifier PASS for fat-harness AC acceptance #1006 is the clean replacement for closed/superseded Require verifier PASS for fat-harness AC acceptance #1004 and maps to the recorded C.4 #920 PR-6 atomic verifier-PASS completion slice.
agentos-substrate-wiring is closed and Meta SSOT: AgentOS roadmap sequencing (#920–#960) #961 carries baseline-metrics-captured, so Tier 2 wiring-gate blocking no longer applies.
The PR does not close Agent OS roadmap: make ooo run trustworthy with a fat harness execution path #920 automatically, does not start Design spine: AgentOS evidence-gated delivery via TraceGuard #978 P5, and explicitly preserves the legacy self-report fallback / release-cycle observation boundary.

This is not an approval and not a merge recommendation; it only records that the PR is not blocked by the roadmap sequencing gate.

Posted by agentos-roadmap-warden — bot. Reply with /warden ignore to suppress further comments on this thread.

Verifier PASS must prove the current run, not merely observe that a claimed file exists in the workspace. Tighten file evidence support to require transcript backing and keep test claims tied to successful adjacent test output. Constraint: Q00#920 atomic acceptance now depends on typed evidence plus verifier PASS in the live fat-harness path. Rejected: Accepting existing workspace files as proof | stale files can make fabricated files_touched claims pass. Confidence: high Scope-risk: narrow Directive: Do not weaken atomic verifier support checks without preserving final-message-only and stale-file rejection coverage. Tested: uv run pytest -q tests/unit/orchestrator/test_parallel_executor.py tests/unit/orchestrator/test_verifier.py tests/unit/orchestrator/test_runner.py tests/unit/cli/test_run_qa.py; uv run mypy src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py; uv run ruff check src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py tests/unit/orchestrator/test_parallel_executor.py; uv run ruff format --check src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py tests/unit/orchestrator/test_parallel_executor.py Not-tested: Full repository test suite.

shaun0927 · 2026-05-14T04:21:08Z

Merge-readiness check for #920 PR-6 / #1006:

Roadmap gate: warden already recorded no Meta SSOT: AgentOS roadmap sequencing (#920–#960) #961 sequencing block.
Scope boundary preserved: completes typed evidence + separate verifier PASS at the fat-harness atomic AC boundary; does not remove legacy self-report fallback and does not start Design spine: AgentOS evidence-gated delivery via TraceGuard #978 P5.
Local clean-worktree validation on ff445b2e4:
- PYTHONPATH=src pytest -q tests/unit/orchestrator/test_parallel_executor.py tests/unit/orchestrator/test_verifier.py tests/unit/orchestrator/test_runner.py tests/unit/cli/test_run_qa.py → 233 passed, 1 skipped
- uv run mypy src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py → success
- uv run ruff check src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py tests/unit/orchestrator/test_parallel_executor.py → passed
GitHub checks are green across Ruff, MyPy, Python 3.12/3.13/3.14, Bridge TypeScript, enforce-envelope, and enforce-boundary.

Proceeding with merge as the narrow #920 completion slice. #978 P5 remains blocked on the release-cycle observation window.

shaun0927 · 2026-05-14T04:21:52Z

Follow-up: immediate merge is blocked by repository branch policy, and repository auto-merge is disabled.

I did not use admin bypass. Current state:

PR is mergeable and all CI checks are green.
Clean-worktree local verification passed:
- PYTHONPATH=src pytest -q tests/unit/orchestrator/test_parallel_executor.py tests/unit/orchestrator/test_verifier.py tests/unit/orchestrator/test_runner.py tests/unit/cli/test_run_qa.py → 233 passed, 1 skipped
- uv run mypy src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py → success
- uv run ruff check src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py tests/unit/orchestrator/test_parallel_executor.py → passed
Warden already recorded no Meta SSOT: AgentOS roadmap sequencing (#920–#960) #961 sequencing block.
Remaining external action: obtain the required PR review / branch-policy satisfaction, then merge normally.

Boundary remains unchanged: do not start #978 P5 or remove legacy self-report until #1006 is merged and the release-cycle observation window has evidence.

shaun0927 · 2026-05-14T04:23:35Z

Follow-up review fix pushed in e3fbfaf.\n\nWhat changed:\n- Tightened verifier support so an existing workspace file is not enough; the current runtime transcript must also back the file claim. Basename fallback now only accepts plausible file-touch messages (Edit/Write/NotebookEdit or modification verbs), preventing stale files from satisfying atomic evidence.\n- Tightened handling so failed/unsupported targeted test claims do not fall through to generic substring support.\n- Added regressions for stale preexisting file claims and targeted failed test commands; updated positive fixtures to include explicit successful test output.\n\nValidation:\n- ........................................................................ [ 30%]
........................................................................ [ 61%]
......................s................................................. [ 91%]
.................... [100%]
235 passed, 1 skipped in 42.56s → 235 passed, 1 skipped\n- ........................................................................ [ 67%]
................................... [100%]
107 passed in 11.04s → 107 passed\n- Success: no issues found in 3 source files → passed\n- All checks passed! → passed\n- 4 files already formatted → passed\n\nBoundary remains unchanged: this is still the narrow #920 verifier-PASS completion slice; no legacy self-report removal / no #978 P5.

shaun0927 · 2026-05-14T04:23:55Z

Correction to the previous comment: shell quoting stripped the inline code spans. The intended summary is:

Follow-up review fix pushed in e3fbfafbf.

What changed:

Tightened files_touched verifier support so an existing workspace file is not enough; the current runtime transcript must also back the file claim. Basename fallback now only accepts plausible file-touch messages (Edit / Write / NotebookEdit or modification verbs), preventing stale files from satisfying atomic evidence.
Tightened tests_passed handling so failed/unsupported targeted test claims do not fall through to generic substring support.
Added regressions for stale preexisting file claims and targeted failed test commands; updated positive fixtures to include explicit successful test output.

Validation:

PYTHONPATH=src pytest -q tests/unit/orchestrator/test_parallel_executor.py tests/unit/orchestrator/test_verifier.py tests/unit/orchestrator/test_runner.py tests/unit/cli/test_run_qa.py → 235 passed, 1 skipped
uv run pytest tests/unit/orchestrator/test_parallel_executor.py tests/unit/orchestrator/test_verifier.py -q → 107 passed
uv run mypy src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py → passed
uv run ruff check src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py tests/unit/orchestrator/test_parallel_executor.py → passed
uv run ruff format --check src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py tests/unit/orchestrator/test_parallel_executor.py → passed

Boundary remains unchanged: this is still the narrow #920 verifier-PASS completion slice; no legacy self-report removal / no #978 P5.

shaun0927 · 2026-05-14T04:23:58Z

Follow-up review update for #920 PR-6 / #1006:

During the local merge-readiness pass, I found and fixed one verifier false-positive/false-negative edge at new head e3fbfaf:

File evidence now requires current-run transcript support; a stale pre-existing workspace file no longer proves files_touched.
Successful test summaries with 0 failed / 0 errors remain accepted while nonzero failures stay rejected.
Added regression coverage for stale file evidence, targeted failed test output, and the 0 failed, 0 errors, 1 passed success shape.

Fresh local validation on e3fbfaf:

PYTHONPATH=src pytest -q tests/unit/orchestrator/test_parallel_executor.py tests/unit/orchestrator/test_verifier.py tests/unit/orchestrator/test_runner.py tests/unit/cli/test_run_qa.py → 235 passed, 1 skipped
uv run mypy src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py → success
uv run ruff check ... and uv run ruff format --check ... → passed

Merge boundary remains the same: this still only completes #920 typed evidence + verifier PASS and does not start #978 P5 or remove the legacy fallback. Wait for the refreshed GitHub Python matrix and review signal before merge.

shaun0927 · 2026-05-14T04:24:07Z

@ouroboros-agent Please re-review current head e3fbfafbf. The follow-up commit fixes two verifier false-pass gaps (stale files_touched workspace file and failed targeted tests_passed command) with regression coverage; validation is posted above.

shaun0927 · 2026-05-14T04:24:34Z

Follow-up validation after local review

I found and fixed one acceptance-proof gap while reviewing the #920 PR-6 invariant:

files_touched is no longer accepted merely because the claimed file exists under the workspace; the current runtime transcript must support the file-touch claim.
The verifier still rejects final-message-only evidence and now also rejects stale pre-existing workspace files without transcript support.
Test fixtures now include adjacent successful test output for accepted tests_passed claims and explicit rejection coverage for stale file claims / failed targeted tests.

Pushed commit: e3fbfafbf (Require runtime support for atomic file evidence).

Validation:

Local: 235 passed, 1 skipped for tests/unit/orchestrator/test_parallel_executor.py tests/unit/orchestrator/test_verifier.py tests/unit/orchestrator/test_runner.py tests/unit/cli/test_run_qa.py
Local MyPy: passed for verifier.py, parallel_executor.py, parallel_executor_models.py
Local Ruff check/format-check: passed for changed orchestrator files/tests
GitHub checks on e3fbfafbf: green across Ruff, MyPy, Python 3.12/3.13/3.14, Bridge TypeScript, enforce-envelope, and enforce-boundary

Roadmap boundary remains unchanged: this still completes the narrow #920 atomic verifier-PASS acceptance invariant, keeps legacy self-report fallback, and does not start #978 P5.

shaun0927 · 2026-05-14T04:25:28Z

Update after merge

#1006 merged at head e3fbfaf before the additional hardening could be applied to that PR. I split the current-run file-proof hardening into follow-up PR #1010 instead: #1010

The stale-file proof gap described below is therefore tracked and fixed in #1010, not in merged #1006.

Follow-up hardening after #961/#920 action review

I re-ran the recommended #1006 verification and found one important acceptance-boundary gap before merge-readiness:

The atomic verifier could treat an existing in-workspace file as files_touched proof even if the current run transcript never showed that file being touched. That would preserve a stale-file self-report gap.
The valid typed-evidence fixture also needed explicit adjacent test success output to match the intended tests_passed verifier contract.

Follow-up PR #1010:

Requires files_touched support to be backed by non-final runtime transcript evidence, not workspace existence alone.
Keeps active-workspace bounds for file evidence and accepts basename-only support only when the transcript plausibly reports a write/update/create/generate action.
Keeps tests_passed tied to a backed test command plus adjacent successful test output.
Adds regression coverage for stale preexisting file rejection, read-only file-reference rejection, and targeted failed test rejection.

Validation for #1010 is green locally and in GitHub CI.

Remaining merge gate for #1010: submitted code review.

shaun0927 · 2026-05-14T04:26:14Z

Updated #1006 with follow-up hardening on head e3fbfaf:\n\n- File evidence now requires runtime transcript backing; a stale workspace file is not enough to satisfy files_touched.\n- Basename fallback is bounded so src/preexisting.py is not proven by unrelated tests/test_preexisting.py passed output.\n- Test success parsing accepts common zero-failure summaries such as 0 failed, 0 errors, 1 passed while still rejecting non-zero failed/error output.\n\nFresh local validation on this head:\n- PYTHONPATH=src pytest -q tests/unit/orchestrator/test_parallel_executor.py tests/unit/orchestrator/test_verifier.py tests/unit/orchestrator/test_runner.py tests/unit/cli/test_run_qa.py → 235 passed, 1 skipped\n- uv run mypy src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py → success\n- uv run ruff check src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py tests/unit/orchestrator/test_parallel_executor.py → passed\n- uv run ruff format --check src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py tests/unit/orchestrator/test_parallel_executor.py → passed\n\nGitHub checks are now green and merge state is CLEAN. Remaining gate before merge per our plan: obtain a submitted review/approval signal; #978 P5 remains blocked on release-cycle observation after this lands.

shaun0927 · 2026-05-14T04:26:29Z

Tightened the #1006 review-risk area on head e3fbfaf: file evidence now requires runtime transcript support and can no longer pass from mere workspace existence.

What changed:

files_touched support now prefers exact non-final runtime transcript backing.
Basename fallback is allowed only for existing in-workspace relative paths and transcript text that plausibly reports the file was updated/created/generated/written/patched.
Added regression coverage proving a stale preexisting workspace file does not satisfy files_touched without transcript support.
Kept the Bash-generated-file case passing when transcript output reports the generated file and the path exists in the active workspace.

Validation on head e3fbfaf:

uv run pytest tests/unit/orchestrator/test_parallel_executor.py tests/unit/orchestrator/test_verifier.py tests/unit/orchestrator/test_runner.py tests/unit/cli/test_run_qa.py -q → 235 passed, 1 skipped
uv run mypy src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py → pass
uv run ruff check src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py tests/unit/orchestrator/test_parallel_executor.py → pass
uv run ruff format --check src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py tests/unit/orchestrator/test_parallel_executor.py → pass
GitHub checks are green: Ruff, MyPy, Python 3.12/3.13/3.14, Bridge TypeScript, enforce-envelope, enforce-boundary.

Ready for re-review. The remaining milestone boundary is unchanged: this closes the #920 atomic typed-evidence + verifier-PASS acceptance invariant, but does not remove legacy self-report fallback or start #978 P5.

shaun0927 · 2026-05-14T04:26:32Z

@ouroboros-agent please review the updated head e3fbfaf. The PR is now green/CLEAN after the verifier proof hardening above.

Fat-harness verifier output can include explicit zero-failure phrases such as 0 failed or no errors. Treat those as successful summaries while still rejecting non-zero failures and errors so atomic acceptance avoids avoidable false negatives. Constraint: Q00#1006 default fat-harness acceptance depends on transcript-backed test proof without broadening failed mixed-output acceptance. Rejected: Blanket allow any output containing passed | would re-accept 1 failed, 3 passed mixed results. Confidence: high Scope-risk: narrow Directive: Keep verifier success parsing conservative for non-zero failures while allowing explicit zero-failure summaries. Tested: uv run pytest -q tests/unit/orchestrator/test_parallel_executor.py -k 'zero_failure_summaries or fat_harness or observe_only or typed_evidence'; uv run pytest -q tests/unit/orchestrator/test_parallel_executor.py tests/unit/orchestrator/test_verifier.py tests/unit/orchestrator/test_runner.py tests/unit/cli/test_run_qa.py; uv run ruff check src/ouroboros/orchestrator/parallel_executor.py tests/unit/orchestrator/test_parallel_executor.py; uv run mypy src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py Not-tested: live ooo run execution. Co-authored-by: OmX <omx@oh-my-codex.dev>

Fat-harness verifier output can include explicit zero-failure phrases such as 0 failed, 0 errors, or no errors. Treat those as successful summaries while still rejecting non-zero failures and errors so atomic acceptance avoids avoidable false negatives. Constraint: Q00#1006 default fat-harness acceptance depends on transcript-backed test proof without broadening failed mixed-output acceptance. Rejected: Blanket allow any output containing passed | would re-accept 1 failed, 3 passed mixed results. Confidence: high Scope-risk: narrow Directive: Keep verifier success parsing conservative for non-zero failures while allowing explicit zero-failure summaries. Tested: uv run pytest -q tests/unit/orchestrator/test_parallel_executor.py -k 'zero_failure_summaries or fat_harness or observe_only or typed_evidence'; uv run pytest -q tests/unit/orchestrator/test_parallel_executor.py tests/unit/orchestrator/test_verifier.py tests/unit/orchestrator/test_runner.py tests/unit/cli/test_run_qa.py; uv run ruff check src/ouroboros/orchestrator/parallel_executor.py tests/unit/orchestrator/test_parallel_executor.py; uv run mypy src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py Not-tested: live ooo run execution. Co-authored-by: OmX <omx@oh-my-codex.dev>

Verifier PASS must be backed by evidence from the current run, not by read-only mentions, stale files, or zero-passed test summaries. Constraint: Q00#1006 merged the live fat-harness atomic verifier gate; this follow-up keeps that gate from accepting weak structural evidence. Rejected: Generic substring fallback for files_touched and tests_passed | read-only file mentions and zero-pass test output can otherwise satisfy typed evidence. Confidence: high Scope-risk: narrow Directive: Keep files_touched tied to plausible write/generate transcript evidence and tests_passed tied to nonzero successful test output. Tested: uv run pytest tests/unit/orchestrator/test_parallel_executor.py tests/unit/orchestrator/test_verifier.py -q; PYTHONPATH=src pytest -q tests/unit/orchestrator/test_parallel_executor.py tests/unit/orchestrator/test_verifier.py tests/unit/orchestrator/test_runner.py tests/unit/cli/test_run_qa.py; uv run mypy src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py; uv run ruff check src/ouroboros/orchestrator/parallel_executor.py tests/unit/orchestrator/test_parallel_executor.py; uv run ruff format --check src/ouroboros/orchestrator/parallel_executor.py tests/unit/orchestrator/test_parallel_executor.py Not-tested: Full repository test suite.

shaun0927 · 2026-05-14T04:39:09Z

Updated the source branch with the remaining verifier false-negative fix.

New branch head: 45b600b62b42f4c886572eae74948016d3783fbc

What changed:

_message_contains_test_success() now accepts explicit zero-failure summaries such as 0 failed, 3 passed, 0 errors, and no errors.
Non-zero failure/error summaries such as 1 failed, 3 passed and 2 errors, 1 passed remain rejected.
Added regression coverage for zero-failure summaries and kept the fat-harness integration test on a 0 failed, 0 errors, 1 passed transcript.

Local validation on the updated branch:

uv run pytest -q tests/unit/orchestrator/test_parallel_executor.py -k 'zero_failure_summaries or fat_harness or observe_only or typed_evidence' → 22 passed
uv run pytest -q tests/unit/orchestrator/test_parallel_executor.py tests/unit/orchestrator/test_verifier.py tests/unit/orchestrator/test_runner.py tests/unit/cli/test_run_qa.py → 243 passed, 1 skipped
uv run ruff check src/ouroboros/orchestrator/parallel_executor.py tests/unit/orchestrator/test_parallel_executor.py → passed
uv run mypy src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py → passed

Note: GitHub GraphQL currently shows headRef.target.oid at 45b600b62..., while the PR headRefOid / commits list still appears cached at the previous head e3fbfaf.... The source branch ref itself is updated.

@ouroboros-agent please review the updated source branch head once the PR metadata/checks synchronize.

The files_touched verifier path already records unsupported evidence and continues, so the immediately repeated files_touched branch is unreachable. Removing it keeps the Q00#1010 hardening focused without changing behavior.\n\nConstraint: Q00#1010 must remain a narrow follow-up to Q00#1006 and preserve the Q00#978 P5 boundary.\nRejected: Leave duplicate branch in place | it creates reviewer noise in a safety-sensitive verifier path.\nConfidence: high\nScope-risk: narrow\nDirective: Keep future files_touched changes tied to explicit runtime-evidence proof and regression tests.\nTested: uv run pytest -q tests/unit/orchestrator/test_parallel_executor.py tests/unit/orchestrator/test_verifier.py tests/unit/orchestrator/test_runner.py tests/unit/cli/test_run_qa.py\nTested: uv run mypy src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py\nTested: uv run ruff check src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py tests/unit/orchestrator/test_parallel_executor.py\nTested: uv run ruff format --check src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py tests/unit/orchestrator/test_parallel_executor.py\nNot-tested: Full repository test suite.

The files_touched verifier path already records unsupported evidence and continues, so the immediately repeated files_touched branch is unreachable. Removing it keeps the #1010 hardening focused without changing behavior.\n\nConstraint: #1010 must remain a narrow follow-up to #1006 and preserve the #978 P5 boundary.\nRejected: Leave duplicate branch in place | it creates reviewer noise in a safety-sensitive verifier path.\nConfidence: high\nScope-risk: narrow\nDirective: Keep future files_touched changes tied to explicit runtime-evidence proof and regression tests.\nTested: uv run pytest -q tests/unit/orchestrator/test_parallel_executor.py tests/unit/orchestrator/test_verifier.py tests/unit/orchestrator/test_runner.py tests/unit/cli/test_run_qa.py\nTested: uv run mypy src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py\nTested: uv run ruff check src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py tests/unit/orchestrator/test_parallel_executor.py\nTested: uv run ruff format --check src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py tests/unit/orchestrator/test_parallel_executor.py\nNot-tested: Full repository test suite.

Keep the Q00#946/Q00#990 MCP projection surface evidence-shaped by making caller-provided seed IDs explicit and rejecting session-only queries that would splice metadata-less multi-execution payloads. Constraint: Q00#961 keeps AgentOS substrate changes narrow and folded into canonical Q00#946 instead of opening new roadmap surfaces. Rejected: Folding this into Q00#1006 verifier PASS work | projection read-model hardening and atomic AC acceptance are separate review surfaces. Confidence: high Scope-risk: narrow Directive: Keep projection query handlers read-only and fail-closed on ambiguous run boundaries. Tested: uv run pytest tests/unit/mcp/tools/test_definitions.py::TestProjectionQueryHandler -q; uv run pytest tests/unit/persistence/test_event_store.py::TestSessionRelatedEvents -q; uv run mypy src/ouroboros/mcp/tools/projection_handlers.py tests/unit/mcp/tools/test_definitions.py; uv run ruff check src/ouroboros/mcp/tools/projection_handlers.py tests/unit/mcp/tools/test_definitions.py Not-tested: Full repository test suite.

Keep the #946/#990 MCP projection surface evidence-shaped by making caller-provided seed IDs explicit and rejecting session-only queries that would splice metadata-less multi-execution payloads. Constraint: #961 keeps AgentOS substrate changes narrow and folded into canonical #946 instead of opening new roadmap surfaces. Rejected: Folding this into #1006 verifier PASS work | projection read-model hardening and atomic AC acceptance are separate review surfaces. Confidence: high Scope-risk: narrow Directive: Keep projection query handlers read-only and fail-closed on ambiguous run boundaries. Tested: uv run pytest tests/unit/mcp/tools/test_definitions.py::TestProjectionQueryHandler -q; uv run pytest tests/unit/persistence/test_event_store.py::TestSessionRelatedEvents -q; uv run mypy src/ouroboros/mcp/tools/projection_handlers.py tests/unit/mcp/tools/test_definitions.py; uv run ruff check src/ouroboros/mcp/tools/projection_handlers.py tests/unit/mcp/tools/test_definitions.py Not-tested: Full repository test suite.

shaun0927 mentioned this pull request May 14, 2026

Require verifier PASS for fat-harness AC acceptance #1004

Closed

shaun0927 mentioned this pull request May 14, 2026

Meta SSOT: AgentOS roadmap sequencing (#920–#960) #961

Open

shaun0927 force-pushed the agentos-atomic-verifier-pass-clean branch from 6528682 to ff445b2 Compare May 14, 2026 02:38

shaun0927 mentioned this pull request May 14, 2026

Agent OS roadmap: make ooo run trustworthy with a fat harness execution path #920

Closed

7 tasks

shaun0927 merged commit bae146e into Q00:main May 14, 2026
8 checks passed

This was referenced May 14, 2026

Design spine: AgentOS evidence-gated delivery via TraceGuard #978

Closed

Agent OS: introduce typed Workflow IR for fat-harness execution planning #956

Closed

Harden atomic verifier runtime evidence support #1009

Closed

shaun0927 mentioned this pull request May 14, 2026

fix(harness): require current-run proof for atomic file evidence #1010

Merged

Q00 mentioned this pull request May 14, 2026

feat(harness): add claim-term semantic guard #1012

Merged

This was referenced May 14, 2026

fix(harness): harden projection query provenance after #990 #1016

Merged

fix(harness): accept zero-failure verifier summaries #1018

Merged

This was referenced May 14, 2026

fix(harness): harden Bash file evidence proof after #1010 #1021

Merged

fix(run): align fat-harness prompts with evidence enforcement #1024

Closed

fix(harness): resolve #978 typed evidence blocker #1025

Merged

shaun0927 mentioned this pull request May 22, 2026

fix(orchestrator): credit transcript test commands for tests_passed claims #1166

Merged

Conversation

shaun0927 commented May 14, 2026

Summary

Scope

Milestone boundaries

Validation

Uh oh!

shaun0927 commented May 14, 2026

Uh oh!

Q00 commented May 14, 2026

Uh oh!

shaun0927 commented May 14, 2026

Uh oh!

shaun0927 commented May 14, 2026

Uh oh!

shaun0927 commented May 14, 2026

Uh oh!

shaun0927 commented May 14, 2026

Uh oh!

shaun0927 commented May 14, 2026

Uh oh!

shaun0927 commented May 14, 2026

Uh oh!

shaun0927 commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Follow-up validation after local review

Uh oh!

shaun0927 commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Update after merge

Follow-up hardening after #961/#920 action review

Uh oh!

shaun0927 commented May 14, 2026

Uh oh!

shaun0927 commented May 14, 2026

Uh oh!

shaun0927 commented May 14, 2026

Uh oh!

Uh oh!

shaun0927 commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

shaun0927 commented May 14, 2026 •

edited

Loading

shaun0927 commented May 14, 2026 •

edited

Loading