Skip to content

fix(dry-run): mock provider should return schema-valid responses to prevent execution errors #1088

@christso

Description

@christso

Problem

--dry-run uses mock provider responses that don't match evaluator output schemas, causing execution errors when graders try to process them. From AGENTS.md:

Note: --dry-run returns mock responses that don't match evaluator output schemas. Use it only for testing harness flow, not evaluator logic.

This means --dry-run cannot safely be used in CI pipelines or by agents that consume the output — the mock response causes the eval run to fail rather than returning a graceful dry-run result.

Expected behavior

Following the kubectl --dry-run=server and rsync --dry-run pattern: --dry-run should return schema-valid responses that all built-in graders can process without throwing. Callers should be able to pipe --dry-run output into downstream tooling without errors.

Specific cases to fix

  • is-json grader: mock response should be valid JSON (e.g. "{}")
  • contains / equals / regex graders: mock response should be a non-empty string
  • llm-grader / rubric: should either skip evaluation in dry-run mode or return a mock passing score
  • execution-metrics: should return mock metrics (zeroed or placeholder values)
  • Agent provider outputs: mock should match the expected output[] array shape (not just a bare string)

Design note

The fix belongs in the mock provider implementation (wherever dryRun: true is handled in the provider layer), not in validate. The mock response should be realistic enough that graders don't crash — it does not need to be semantically meaningful.

Industry precedent: Terraform plan returns accurate real-state diffs; kubectl --dry-run=server returns the full valid response object. AgentV --dry-run should at minimum return a response that all built-in graders accept without errors.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    in-progressClaimed by an agent — do not duplicate work

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions