Skip to content

Require verifier PASS for fat-harness AC acceptance#1006

Merged
shaun0927 merged 2 commits into
Q00:mainfrom
shaun0927:agentos-atomic-verifier-pass-clean
May 14, 2026
Merged

Require verifier PASS for fat-harness AC acceptance#1006
shaun0927 merged 2 commits into
Q00:mainfrom
shaun0927:agentos-atomic-verifier-pass-clean

Conversation

@shaun0927
Copy link
Copy Markdown
Collaborator

Summary

Clean replacement for #1004. Completes the remaining #920 atomic acceptance invariant by requiring a separate verifier PASS after profile typed-evidence validation in the live parallel_executor atomic AC boundary.

Scope

  • Adds atomic verifier verdict telemetry to ACExecutionResult and execution.ac.typed_evidence.observed.
  • Fat-harness acceptance now requires:
    1. runtime success,
    2. profile typed evidence present,
    3. profile evidence validation passing,
    4. a separate VerifierVerdict.passed == true.
  • Keeps observe-only mode isolated: injected verifiers do not run unless fat-harness enforcement is active.
  • Converts operational verifier failures (TimeoutError, OSError, subprocess errors) into typed STALL verifier verdicts while still surfacing verifier programming bugs.
  • Default verifier rejects final-message-only self-reporting and supports evidence only through non-final runtime transcript evidence or active-workspace file evidence.
  • Bounds file proof to the active workspace (task_cwd or adapter working directory), rejecting absolute paths and parent traversal.
  • Verifies tests_passed per claim against the adjacent transcript chunk of a backed successful test command; failed mixed output such as 1 failed, 3 passed remains rejected.

Milestone boundaries

Validation

  • PYTHONPATH=src pytest -q tests/unit/orchestrator/test_parallel_executor.py tests/unit/orchestrator/test_verifier.py tests/unit/orchestrator/test_runner.py tests/unit/cli/test_run_qa.py → 233 passed, 1 skipped
  • uv run mypy src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py
  • uv run ruff check src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py tests/unit/orchestrator/test_parallel_executor.py
  • uv run ruff format --check src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py tests/unit/orchestrator/test_parallel_executor.py

@shaun0927
Copy link
Copy Markdown
Collaborator Author

@ouroboros-agent please review this clean replacement for #1004. It preserves the final fixed diff in a single commit and CI is green.

Completes the remaining Q00#920 atomic acceptance invariant by requiring schema-valid typed evidence plus a separate verifier PASS before fat-harness AC success. The verifier rejects final-message-only self-reporting, keeps observe-only mode isolated, classifies operational verifier failures, and bounds fallback proof to runtime transcript or active-workspace evidence.

Constraint: Q00#961 keeps Q00#978 P5 legacy self-report removal time-gated behind release-cycle observation; this PR does not remove that fallback.

Rejected: Accept schema-valid final-message JSON as verifier proof | that preserves the self-report gap Q00#920 is closing.

Rejected: Merge the noisy iterative Q00#1004 review branch | a clean single-commit PR is easier for ouroboros-agent and warden to review.

Confidence: high

Scope-risk: narrow

Directive: Keep future verifier enhancements behind this PASS boundary; do not weaken tests_passed into a global successful-test boolean or allow path proof outside the active workspace.

Tested: PYTHONPATH=src pytest -q tests/unit/orchestrator/test_parallel_executor.py tests/unit/orchestrator/test_verifier.py tests/unit/orchestrator/test_runner.py tests/unit/cli/test_run_qa.py

Tested: uv run mypy src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py

Tested: uv run ruff check src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py tests/unit/orchestrator/test_parallel_executor.py

Tested: uv run ruff format --check src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py tests/unit/orchestrator/test_parallel_executor.py

Not-tested: Real adapter transcript variants beyond the AgentMessage shapes covered in unit tests.

Co-authored-by: OmX <omx@oh-my-codex.dev>
@shaun0927 shaun0927 force-pushed the agentos-atomic-verifier-pass-clean branch from 6528682 to ff445b2 Compare May 14, 2026 02:38
@Q00
Copy link
Copy Markdown
Owner

Q00 commented May 14, 2026

Roadmap gate check: no #961 gate block from warden.

Evidence checked:

This is not an approval and not a merge recommendation; it only records that the PR is not blocked by the roadmap sequencing gate.

Posted by agentos-roadmap-warden — bot. Reply with /warden ignore to suppress further comments on this thread.

Verifier PASS must prove the current run, not merely observe that a claimed file exists in the workspace. Tighten file evidence support to require transcript backing and keep test claims tied to successful adjacent test output.

Constraint: Q00#920 atomic acceptance now depends on typed evidence plus verifier PASS in the live fat-harness path.

Rejected: Accepting existing workspace files as proof | stale files can make fabricated files_touched claims pass.

Confidence: high

Scope-risk: narrow

Directive: Do not weaken atomic verifier support checks without preserving final-message-only and stale-file rejection coverage.

Tested: uv run pytest -q tests/unit/orchestrator/test_parallel_executor.py tests/unit/orchestrator/test_verifier.py tests/unit/orchestrator/test_runner.py tests/unit/cli/test_run_qa.py; uv run mypy src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py; uv run ruff check src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py tests/unit/orchestrator/test_parallel_executor.py; uv run ruff format --check src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py tests/unit/orchestrator/test_parallel_executor.py

Not-tested: Full repository test suite.
@shaun0927
Copy link
Copy Markdown
Collaborator Author

Merge-readiness check for #920 PR-6 / #1006:

  • Roadmap gate: warden already recorded no Meta SSOT: AgentOS roadmap sequencing (#920–#960) #961 sequencing block.
  • Scope boundary preserved: completes typed evidence + separate verifier PASS at the fat-harness atomic AC boundary; does not remove legacy self-report fallback and does not start Design spine: AgentOS evidence-gated delivery via TraceGuard #978 P5.
  • Local clean-worktree validation on ff445b2e4:
    • PYTHONPATH=src pytest -q tests/unit/orchestrator/test_parallel_executor.py tests/unit/orchestrator/test_verifier.py tests/unit/orchestrator/test_runner.py tests/unit/cli/test_run_qa.py → 233 passed, 1 skipped
    • uv run mypy src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py → success
    • uv run ruff check src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py tests/unit/orchestrator/test_parallel_executor.py → passed
  • GitHub checks are green across Ruff, MyPy, Python 3.12/3.13/3.14, Bridge TypeScript, enforce-envelope, and enforce-boundary.

Proceeding with merge as the narrow #920 completion slice. #978 P5 remains blocked on the release-cycle observation window.

@shaun0927
Copy link
Copy Markdown
Collaborator Author

Follow-up: immediate merge is blocked by repository branch policy, and repository auto-merge is disabled.

I did not use admin bypass. Current state:

  • PR is mergeable and all CI checks are green.
  • Clean-worktree local verification passed:
    • PYTHONPATH=src pytest -q tests/unit/orchestrator/test_parallel_executor.py tests/unit/orchestrator/test_verifier.py tests/unit/orchestrator/test_runner.py tests/unit/cli/test_run_qa.py → 233 passed, 1 skipped
    • uv run mypy src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py → success
    • uv run ruff check src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py tests/unit/orchestrator/test_parallel_executor.py → passed
  • Warden already recorded no Meta SSOT: AgentOS roadmap sequencing (#920–#960) #961 sequencing block.
  • Remaining external action: obtain the required PR review / branch-policy satisfaction, then merge normally.

Boundary remains unchanged: do not start #978 P5 or remove legacy self-report until #1006 is merged and the release-cycle observation window has evidence.

@shaun0927
Copy link
Copy Markdown
Collaborator Author

Follow-up review fix pushed in e3fbfaf.\n\nWhat changed:\n- Tightened verifier support so an existing workspace file is not enough; the current runtime transcript must also back the file claim. Basename fallback now only accepts plausible file-touch messages (Edit/Write/NotebookEdit or modification verbs), preventing stale files from satisfying atomic evidence.\n- Tightened handling so failed/unsupported targeted test claims do not fall through to generic substring support.\n- Added regressions for stale preexisting file claims and targeted failed test commands; updated positive fixtures to include explicit successful test output.\n\nValidation:\n- ........................................................................ [ 30%]
........................................................................ [ 61%]
......................s................................................. [ 91%]
.................... [100%]
235 passed, 1 skipped in 42.56s → 235 passed, 1 skipped\n- ........................................................................ [ 67%]
................................... [100%]
107 passed in 11.04s → 107 passed\n- Success: no issues found in 3 source files → passed\n- All checks passed! → passed\n- 4 files already formatted → passed\n\nBoundary remains unchanged: this is still the narrow #920 verifier-PASS completion slice; no legacy self-report removal / no #978 P5.

@shaun0927
Copy link
Copy Markdown
Collaborator Author

Correction to the previous comment: shell quoting stripped the inline code spans. The intended summary is:

Follow-up review fix pushed in e3fbfafbf.

What changed:

  • Tightened files_touched verifier support so an existing workspace file is not enough; the current runtime transcript must also back the file claim. Basename fallback now only accepts plausible file-touch messages (Edit / Write / NotebookEdit or modification verbs), preventing stale files from satisfying atomic evidence.
  • Tightened tests_passed handling so failed/unsupported targeted test claims do not fall through to generic substring support.
  • Added regressions for stale preexisting file claims and targeted failed test commands; updated positive fixtures to include explicit successful test output.

Validation:

  • PYTHONPATH=src pytest -q tests/unit/orchestrator/test_parallel_executor.py tests/unit/orchestrator/test_verifier.py tests/unit/orchestrator/test_runner.py tests/unit/cli/test_run_qa.py → 235 passed, 1 skipped
  • uv run pytest tests/unit/orchestrator/test_parallel_executor.py tests/unit/orchestrator/test_verifier.py -q → 107 passed
  • uv run mypy src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py → passed
  • uv run ruff check src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py tests/unit/orchestrator/test_parallel_executor.py → passed
  • uv run ruff format --check src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py tests/unit/orchestrator/test_parallel_executor.py → passed

Boundary remains unchanged: this is still the narrow #920 verifier-PASS completion slice; no legacy self-report removal / no #978 P5.

@shaun0927
Copy link
Copy Markdown
Collaborator Author

Follow-up review update for #920 PR-6 / #1006:

During the local merge-readiness pass, I found and fixed one verifier false-positive/false-negative edge at new head e3fbfaf:

  • File evidence now requires current-run transcript support; a stale pre-existing workspace file no longer proves files_touched.
  • Successful test summaries with 0 failed / 0 errors remain accepted while nonzero failures stay rejected.
  • Added regression coverage for stale file evidence, targeted failed test output, and the 0 failed, 0 errors, 1 passed success shape.

Fresh local validation on e3fbfaf:

  • PYTHONPATH=src pytest -q tests/unit/orchestrator/test_parallel_executor.py tests/unit/orchestrator/test_verifier.py tests/unit/orchestrator/test_runner.py tests/unit/cli/test_run_qa.py → 235 passed, 1 skipped
  • uv run mypy src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py → success
  • uv run ruff check ... and uv run ruff format --check ... → passed

Merge boundary remains the same: this still only completes #920 typed evidence + verifier PASS and does not start #978 P5 or remove the legacy fallback. Wait for the refreshed GitHub Python matrix and review signal before merge.

@shaun0927
Copy link
Copy Markdown
Collaborator Author

@ouroboros-agent Please re-review current head e3fbfafbf. The follow-up commit fixes two verifier false-pass gaps (stale files_touched workspace file and failed targeted tests_passed command) with regression coverage; validation is posted above.

@shaun0927
Copy link
Copy Markdown
Collaborator Author

shaun0927 commented May 14, 2026

Follow-up validation after local review

I found and fixed one acceptance-proof gap while reviewing the #920 PR-6 invariant:

  • files_touched is no longer accepted merely because the claimed file exists under the workspace; the current runtime transcript must support the file-touch claim.
  • The verifier still rejects final-message-only evidence and now also rejects stale pre-existing workspace files without transcript support.
  • Test fixtures now include adjacent successful test output for accepted tests_passed claims and explicit rejection coverage for stale file claims / failed targeted tests.

Pushed commit: e3fbfafbf (Require runtime support for atomic file evidence).

Validation:

  • Local: 235 passed, 1 skipped for tests/unit/orchestrator/test_parallel_executor.py tests/unit/orchestrator/test_verifier.py tests/unit/orchestrator/test_runner.py tests/unit/cli/test_run_qa.py
  • Local MyPy: passed for verifier.py, parallel_executor.py, parallel_executor_models.py
  • Local Ruff check/format-check: passed for changed orchestrator files/tests
  • GitHub checks on e3fbfafbf: green across Ruff, MyPy, Python 3.12/3.13/3.14, Bridge TypeScript, enforce-envelope, and enforce-boundary

Roadmap boundary remains unchanged: this still completes the narrow #920 atomic verifier-PASS acceptance invariant, keeps legacy self-report fallback, and does not start #978 P5.

@shaun0927
Copy link
Copy Markdown
Collaborator Author

shaun0927 commented May 14, 2026

Update after merge

#1006 merged at head e3fbfaf before the additional hardening could be applied to that PR. I split the current-run file-proof hardening into follow-up PR #1010 instead: #1010

The stale-file proof gap described below is therefore tracked and fixed in #1010, not in merged #1006.


Follow-up hardening after #961/#920 action review

I re-ran the recommended #1006 verification and found one important acceptance-boundary gap before merge-readiness:

  • The atomic verifier could treat an existing in-workspace file as files_touched proof even if the current run transcript never showed that file being touched. That would preserve a stale-file self-report gap.
  • The valid typed-evidence fixture also needed explicit adjacent test success output to match the intended tests_passed verifier contract.

Follow-up PR #1010:

  • Requires files_touched support to be backed by non-final runtime transcript evidence, not workspace existence alone.
  • Keeps active-workspace bounds for file evidence and accepts basename-only support only when the transcript plausibly reports a write/update/create/generate action.
  • Keeps tests_passed tied to a backed test command plus adjacent successful test output.
  • Adds regression coverage for stale preexisting file rejection, read-only file-reference rejection, and targeted failed test rejection.

Validation for #1010 is green locally and in GitHub CI.

Remaining merge gate for #1010: submitted code review.

@shaun0927
Copy link
Copy Markdown
Collaborator Author

Updated #1006 with follow-up hardening on head e3fbfaf:\n\n- File evidence now requires runtime transcript backing; a stale workspace file is not enough to satisfy files_touched.\n- Basename fallback is bounded so src/preexisting.py is not proven by unrelated tests/test_preexisting.py passed output.\n- Test success parsing accepts common zero-failure summaries such as 0 failed, 0 errors, 1 passed while still rejecting non-zero failed/error output.\n\nFresh local validation on this head:\n- PYTHONPATH=src pytest -q tests/unit/orchestrator/test_parallel_executor.py tests/unit/orchestrator/test_verifier.py tests/unit/orchestrator/test_runner.py tests/unit/cli/test_run_qa.py → 235 passed, 1 skipped\n- uv run mypy src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py → success\n- uv run ruff check src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py tests/unit/orchestrator/test_parallel_executor.py → passed\n- uv run ruff format --check src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py tests/unit/orchestrator/test_parallel_executor.py → passed\n\nGitHub checks are now green and merge state is CLEAN. Remaining gate before merge per our plan: obtain a submitted review/approval signal; #978 P5 remains blocked on release-cycle observation after this lands.

@shaun0927
Copy link
Copy Markdown
Collaborator Author

Tightened the #1006 review-risk area on head e3fbfaf: file evidence now requires runtime transcript support and can no longer pass from mere workspace existence.

What changed:

  • files_touched support now prefers exact non-final runtime transcript backing.
  • Basename fallback is allowed only for existing in-workspace relative paths and transcript text that plausibly reports the file was updated/created/generated/written/patched.
  • Added regression coverage proving a stale preexisting workspace file does not satisfy files_touched without transcript support.
  • Kept the Bash-generated-file case passing when transcript output reports the generated file and the path exists in the active workspace.

Validation on head e3fbfaf:

  • uv run pytest tests/unit/orchestrator/test_parallel_executor.py tests/unit/orchestrator/test_verifier.py tests/unit/orchestrator/test_runner.py tests/unit/cli/test_run_qa.py -q → 235 passed, 1 skipped
  • uv run mypy src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py → pass
  • uv run ruff check src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py tests/unit/orchestrator/test_parallel_executor.py → pass
  • uv run ruff format --check src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py tests/unit/orchestrator/test_parallel_executor.py → pass
  • GitHub checks are green: Ruff, MyPy, Python 3.12/3.13/3.14, Bridge TypeScript, enforce-envelope, enforce-boundary.

Ready for re-review. The remaining milestone boundary is unchanged: this closes the #920 atomic typed-evidence + verifier-PASS acceptance invariant, but does not remove legacy self-report fallback or start #978 P5.

@shaun0927
Copy link
Copy Markdown
Collaborator Author

@ouroboros-agent please review the updated head e3fbfaf. The PR is now green/CLEAN after the verifier proof hardening above.

@shaun0927 shaun0927 merged commit bae146e into Q00:main May 14, 2026
8 checks passed
shaun0927 added a commit to shaun0927/ouroboros that referenced this pull request May 14, 2026
Fat-harness verifier output can include explicit zero-failure phrases such as 0 failed or no errors. Treat those as successful summaries while still rejecting non-zero failures and errors so atomic acceptance avoids avoidable false negatives.

Constraint: Q00#1006 default fat-harness acceptance depends on transcript-backed test proof without broadening failed mixed-output acceptance.

Rejected: Blanket allow any output containing passed | would re-accept 1 failed, 3 passed mixed results.

Confidence: high

Scope-risk: narrow

Directive: Keep verifier success parsing conservative for non-zero failures while allowing explicit zero-failure summaries.

Tested: uv run pytest -q tests/unit/orchestrator/test_parallel_executor.py -k 'zero_failure_summaries or fat_harness or observe_only or typed_evidence'; uv run pytest -q tests/unit/orchestrator/test_parallel_executor.py tests/unit/orchestrator/test_verifier.py tests/unit/orchestrator/test_runner.py tests/unit/cli/test_run_qa.py; uv run ruff check src/ouroboros/orchestrator/parallel_executor.py tests/unit/orchestrator/test_parallel_executor.py; uv run mypy src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py

Not-tested: live ooo run execution.

Co-authored-by: OmX <omx@oh-my-codex.dev>
shaun0927 added a commit to shaun0927/ouroboros that referenced this pull request May 14, 2026
Fat-harness verifier output can include explicit zero-failure phrases such as 0 failed, 0 errors, or no errors. Treat those as successful summaries while still rejecting non-zero failures and errors so atomic acceptance avoids avoidable false negatives.

Constraint: Q00#1006 default fat-harness acceptance depends on transcript-backed test proof without broadening failed mixed-output acceptance.

Rejected: Blanket allow any output containing passed | would re-accept 1 failed, 3 passed mixed results.

Confidence: high

Scope-risk: narrow

Directive: Keep verifier success parsing conservative for non-zero failures while allowing explicit zero-failure summaries.

Tested: uv run pytest -q tests/unit/orchestrator/test_parallel_executor.py -k 'zero_failure_summaries or fat_harness or observe_only or typed_evidence'; uv run pytest -q tests/unit/orchestrator/test_parallel_executor.py tests/unit/orchestrator/test_verifier.py tests/unit/orchestrator/test_runner.py tests/unit/cli/test_run_qa.py; uv run ruff check src/ouroboros/orchestrator/parallel_executor.py tests/unit/orchestrator/test_parallel_executor.py; uv run mypy src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py

Not-tested: live ooo run execution.

Co-authored-by: OmX <omx@oh-my-codex.dev>
shaun0927 added a commit to shaun0927/ouroboros that referenced this pull request May 14, 2026
Verifier PASS must be backed by evidence from the current run, not by read-only mentions, stale files, or zero-passed test summaries.

Constraint: Q00#1006 merged the live fat-harness atomic verifier gate; this follow-up keeps that gate from accepting weak structural evidence.

Rejected: Generic substring fallback for files_touched and tests_passed | read-only file mentions and zero-pass test output can otherwise satisfy typed evidence.

Confidence: high

Scope-risk: narrow

Directive: Keep files_touched tied to plausible write/generate transcript evidence and tests_passed tied to nonzero successful test output.

Tested: uv run pytest tests/unit/orchestrator/test_parallel_executor.py tests/unit/orchestrator/test_verifier.py -q; PYTHONPATH=src pytest -q tests/unit/orchestrator/test_parallel_executor.py tests/unit/orchestrator/test_verifier.py tests/unit/orchestrator/test_runner.py tests/unit/cli/test_run_qa.py; uv run mypy src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py; uv run ruff check src/ouroboros/orchestrator/parallel_executor.py tests/unit/orchestrator/test_parallel_executor.py; uv run ruff format --check src/ouroboros/orchestrator/parallel_executor.py tests/unit/orchestrator/test_parallel_executor.py

Not-tested: Full repository test suite.
@shaun0927
Copy link
Copy Markdown
Collaborator Author

Updated the source branch with the remaining verifier false-negative fix.

New branch head: 45b600b62b42f4c886572eae74948016d3783fbc

What changed:

  • _message_contains_test_success() now accepts explicit zero-failure summaries such as 0 failed, 3 passed, 0 errors, and no errors.
  • Non-zero failure/error summaries such as 1 failed, 3 passed and 2 errors, 1 passed remain rejected.
  • Added regression coverage for zero-failure summaries and kept the fat-harness integration test on a 0 failed, 0 errors, 1 passed transcript.

Local validation on the updated branch:

  • uv run pytest -q tests/unit/orchestrator/test_parallel_executor.py -k 'zero_failure_summaries or fat_harness or observe_only or typed_evidence' → 22 passed
  • uv run pytest -q tests/unit/orchestrator/test_parallel_executor.py tests/unit/orchestrator/test_verifier.py tests/unit/orchestrator/test_runner.py tests/unit/cli/test_run_qa.py → 243 passed, 1 skipped
  • uv run ruff check src/ouroboros/orchestrator/parallel_executor.py tests/unit/orchestrator/test_parallel_executor.py → passed
  • uv run mypy src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py → passed

Note: GitHub GraphQL currently shows headRef.target.oid at 45b600b62..., while the PR headRefOid / commits list still appears cached at the previous head e3fbfaf.... The source branch ref itself is updated.

@ouroboros-agent please review the updated source branch head once the PR metadata/checks synchronize.

shaun0927 added a commit to shaun0927/ouroboros that referenced this pull request May 14, 2026
The files_touched verifier path already records unsupported evidence and continues, so the immediately repeated files_touched branch is unreachable. Removing it keeps the Q00#1010 hardening focused without changing behavior.\n\nConstraint: Q00#1010 must remain a narrow follow-up to Q00#1006 and preserve the Q00#978 P5 boundary.\nRejected: Leave duplicate branch in place | it creates reviewer noise in a safety-sensitive verifier path.\nConfidence: high\nScope-risk: narrow\nDirective: Keep future files_touched changes tied to explicit runtime-evidence proof and regression tests.\nTested: uv run pytest -q tests/unit/orchestrator/test_parallel_executor.py tests/unit/orchestrator/test_verifier.py tests/unit/orchestrator/test_runner.py tests/unit/cli/test_run_qa.py\nTested: uv run mypy src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py\nTested: uv run ruff check src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py tests/unit/orchestrator/test_parallel_executor.py\nTested: uv run ruff format --check src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py tests/unit/orchestrator/test_parallel_executor.py\nNot-tested: Full repository test suite.
shaun0927 added a commit that referenced this pull request May 14, 2026
The files_touched verifier path already records unsupported evidence and continues, so the immediately repeated files_touched branch is unreachable. Removing it keeps the #1010 hardening focused without changing behavior.\n\nConstraint: #1010 must remain a narrow follow-up to #1006 and preserve the #978 P5 boundary.\nRejected: Leave duplicate branch in place | it creates reviewer noise in a safety-sensitive verifier path.\nConfidence: high\nScope-risk: narrow\nDirective: Keep future files_touched changes tied to explicit runtime-evidence proof and regression tests.\nTested: uv run pytest -q tests/unit/orchestrator/test_parallel_executor.py tests/unit/orchestrator/test_verifier.py tests/unit/orchestrator/test_runner.py tests/unit/cli/test_run_qa.py\nTested: uv run mypy src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py\nTested: uv run ruff check src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py tests/unit/orchestrator/test_parallel_executor.py\nTested: uv run ruff format --check src/ouroboros/orchestrator/verifier.py src/ouroboros/orchestrator/parallel_executor.py src/ouroboros/orchestrator/parallel_executor_models.py tests/unit/orchestrator/test_parallel_executor.py\nNot-tested: Full repository test suite.
shaun0927 added a commit to shaun0927/ouroboros that referenced this pull request May 14, 2026
Keep the Q00#946/Q00#990 MCP projection surface evidence-shaped by making caller-provided seed IDs explicit and rejecting session-only queries that would splice metadata-less multi-execution payloads.

Constraint: Q00#961 keeps AgentOS substrate changes narrow and folded into canonical Q00#946 instead of opening new roadmap surfaces.

Rejected: Folding this into Q00#1006 verifier PASS work | projection read-model hardening and atomic AC acceptance are separate review surfaces.

Confidence: high

Scope-risk: narrow

Directive: Keep projection query handlers read-only and fail-closed on ambiguous run boundaries.

Tested: uv run pytest tests/unit/mcp/tools/test_definitions.py::TestProjectionQueryHandler -q; uv run pytest tests/unit/persistence/test_event_store.py::TestSessionRelatedEvents -q; uv run mypy src/ouroboros/mcp/tools/projection_handlers.py tests/unit/mcp/tools/test_definitions.py; uv run ruff check src/ouroboros/mcp/tools/projection_handlers.py tests/unit/mcp/tools/test_definitions.py

Not-tested: Full repository test suite.
shaun0927 added a commit that referenced this pull request May 14, 2026
Keep the #946/#990 MCP projection surface evidence-shaped by making caller-provided seed IDs explicit and rejecting session-only queries that would splice metadata-less multi-execution payloads.

Constraint: #961 keeps AgentOS substrate changes narrow and folded into canonical #946 instead of opening new roadmap surfaces.

Rejected: Folding this into #1006 verifier PASS work | projection read-model hardening and atomic AC acceptance are separate review surfaces.

Confidence: high

Scope-risk: narrow

Directive: Keep projection query handlers read-only and fail-closed on ambiguous run boundaries.

Tested: uv run pytest tests/unit/mcp/tools/test_definitions.py::TestProjectionQueryHandler -q; uv run pytest tests/unit/persistence/test_event_store.py::TestSessionRelatedEvents -q; uv run mypy src/ouroboros/mcp/tools/projection_handlers.py tests/unit/mcp/tools/test_definitions.py; uv run ruff check src/ouroboros/mcp/tools/projection_handlers.py tests/unit/mcp/tools/test_definitions.py

Not-tested: Full repository test suite.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants