Skip to content

feat(core): report inconclusive status when all tests have execution errors #894

@christso

Description

@christso

Problem

When all eval tests fail due to execution errors (e.g., Not Found from a misconfigured model, network failures, auth errors), the run still reports as PASS or FAIL based on the score threshold. This is misleading — no actual evaluation happened, so the result should not be treated as a grading outcome.

Example:

1/3   ❌ violates-lightweight-core | default | ERROR: Not Found
2/3   ❌ violates-ai-first-design | default | ERROR: Not Found
3/3   ❌ follows-principles | default | ERROR: Not Found

This reports as FAIL (score below threshold), but the failure isn't due to low eval scores — it's because the provider couldn't be reached at all.

Proposed Behavior

  • When all tests in a run have execution_error status, report an inconclusive or error exit code/status distinct from threshold failure
  • The JUnit XML output should also reflect this (e.g., tests marked as error rather than failure)
  • CLI should print a clear message: "All tests had execution errors — no evaluation was performed"
  • Consider a distinct exit code (e.g., exit 2 for execution errors vs exit 1 for threshold failure) so CI workflows can differentiate

Acceptance Criteria

  • Distinct exit code when all tests are execution errors
  • Clear CLI messaging distinguishing execution errors from grading failures
  • JUnit XML uses <error> elements (not <failure>) for execution errors

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions