Skip to content

fix(pipeline): honest failure classification (infra vs validation) + CLI validate no longer false-PASS#110

Merged
telivity-otaip merged 1 commit into
mainfrom
pr1-honest-failures
May 30, 2026
Merged

fix(pipeline): honest failure classification (infra vs validation) + CLI validate no longer false-PASS#110
telivity-otaip merged 1 commit into
mainfrom
pr1-honest-failures

Conversation

@telivity-otaip
Copy link
Copy Markdown
Collaborator

Why

Field feedback flagged the worst property of using OTAIP as an oracle: infra errors look like model failures. Two real, OTAIP-owned weaknesses caused it.

What

  • Orchestrator: add FailureClass = 'infra' | 'execution' | 'validation' + classifyFailure(), and a failureClass field on every RunAgentResult failure. contract_missing/agent_missinginfra; agent_errorexecution; gate rejections → validation. Re-exported from @otaip/core. Additive — no change to the success path.
  • CLI validate: previously hardcoded every gate to passed: true and always printed Overall: PASS. Now reports unrun gates as SKIPPED, verdict UNVALIDATED (infra), and exits non-zero. A non-validation can no longer read as a pass.

Tests

Exhaustive classifyFailure table; orchestrator infra/execution/validation cases; CLI test asserting validate never reports PASS for an uncontracted agent. Full suite green.

🤖 Generated with Claude Code

…top CLI validate false-PASS

The orchestrator's RunAgentResult folded registration failures
(contract_missing, agent_missing) into the same shape as genuine gate
rejections, so a consumer could not tell "OTAIP wasn't wired to validate
this" from "the input was rejected" — bad setup could read as a model
regression.

- Add FailureClass ('infra' | 'execution' | 'validation') and
  classifyFailure(); attach failureClass to every RunAgentResult failure.
  Re-export from the pipeline-validator barrel and @otaip/core.
- CLI `validate` previously always printed Overall: PASS (every gate
  hardcoded passed:true). It now reports gates it never ran as SKIPPED,
  marks the verdict UNVALIDATED (infra), and exits non-zero — a
  non-validation can no longer read as a pass.
- Tests: exhaustive classifyFailure table; orchestrator infra/execution
  cases; CLI test asserting validate never reports PASS.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@telivity-otaip telivity-otaip merged commit 591ffec into main May 30, 2026
1 check passed
@telivity-otaip telivity-otaip deleted the pr1-honest-failures branch May 30, 2026 09:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant