Skip to content

feat: eval portability validator — lint evals for self-contained conventions #1140

@christso

Description

@christso

Objective

Add agentv eval validate that checks whether an eval follows self-contained conventions, and --fix that automatically inlines missing dependencies to make the eval portable in place.

Rationale

If evals follow conventions (relative paths, committed workspace templates, local scripts, local targets), the eval directory is already a portable artifact. Rather than building a separate bundle compiler, validate conventions and autofix violations in place.

This is analogous to how agent-skills validates skill structure and conventions.

Current problem: AgentV's examples/ use a shared parent targets.yaml (e.g., examples/features/.agentv/targets.yaml). This means no single example directory is self-contained — they all depend on a parent directory for target resolution. The --fix option resolves this by inlining the resolved targets locally.

Design

Validate mode

agentv eval validate my-eval.yaml

Checks:

  1. Relative paths — all tests:, workspace:, command: references resolve relative to the eval file (no absolute paths)
  2. Files exist — workspace template directory, code-grader scripts, hook scripts all exist at referenced paths
  3. Local targets — targets used by the eval are defined locally (not inherited from parent .agentv/targets.yaml)
  4. No dangling env vars — warn on ${{ VAR }} references that aren't documented or have no default
  5. Target resolutionuse_target chains resolve without circular references
  6. Provenance — warn if eval is not in a git repo (no commit hash for reproducibility)

Output:

✓ my-eval.yaml
  ✓ All test data paths resolve (tests: ./cases.jsonl)
  ✓ Workspace template exists (workspace: ./template/)
  ✓ Code-grader scripts exist (./scripts/verify.sh)
  ✗ Target "claude" resolved from parent .agentv/targets.yaml — not self-contained
  ⚠ ENV_VAR referenced but not documented
  ✓ Git repo detected (commit: abc123)

1 error, 1 warning. Run with --fix to make this eval self-contained.

Fix mode

agentv eval validate my-eval.yaml --fix

What --fix does:

  1. Inlines targets — resolves use_target chains from parent .agentv/targets.yaml files and writes a local targets.yaml (or inlines into EVAL.yaml) with the fully resolved definitions
  2. Resolves relative paths — rewrites any paths that depend on parent directory structure to be relative to the eval file
  3. Reports what it fixed — prints each change made

Properties:

  • Modifies the eval in place — no separate output directory
  • Idempotent — running twice changes nothing
  • Composes with git — commit the fixed eval and it's portable
  • Does NOT copy workspace templates or scripts — those must already be alongside the eval

Exit codes

  • 0 — all checks pass (or all fixable issues were fixed with --fix)
  • 1 — validation errors (non-portable eval, not auto-fixable)
  • Warnings don't fail validation

Non-goals

  • Not a schema validator (that already exists via eval schema validation)
  • Not a bundle compiler — there is no separate output artifact. If validation passes, the eval directory IS the bundle.
  • Not enforcing conventions on all evals — this is opt-in for benchmark-grade portability
  • Not auto-copying workspace templates or scripts — those must be local already. --fix only handles target resolution and path rewriting.

Related

Acceptance signals

  • agentv eval validate checks all listed conventions
  • agentv eval validate --fix inlines resolved targets and rewrites paths in place
  • --fix is idempotent
  • Clear error messages pointing to the specific non-portable reference
  • Exit code 0 when eval is fully self-contained
  • Integrable into CI (exit code gating)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions