Skip to content

feat: add --assertion-type filter to agentv eval run #616

@christso

Description

@christso

Objective

Add an --assertion-type filter to agentv eval run so users can selectively run only specific assertion types during an evaluation. This enables running cheap deterministic judges without invoking expensive LLM judges.

Motivation

Currently agentv eval run executes ALL assertions in a test's assertions: array. The agentv-bench run_eval.py script works around this by implementing its own trigger detection outside of agentv's eval pipeline. With --assertion-type, users can achieve the same selective execution natively:

# Only run code-judge assertions (deterministic, zero cost)
agentv eval run EVAL.yaml --assertion-type code-judge

# Only run skill-trigger assertions
agentv eval run EVAL.yaml --assertion-type skill-trigger

# Run everything except LLM judges
agentv eval run EVAL.yaml --exclude-assertion-type llm-judge

Design latitude

  • Flag naming: --assertion-type vs --judge-type vs --filter-assertion
  • Whether to support include-only, exclude-only, or both
  • Whether filtering applies per-test or globally

Acceptance signals

  • agentv eval run EVAL.yaml --assertion-type code-judge only executes code-judge assertions, skipping llm-judge/contains/etc.
  • Tests with no matching assertions are skipped (or report N/A)
  • Existing behavior unchanged when no filter is specified

Non-goals

  • Changing the orchestrator's assertion execution model beyond filtering
  • Supporting regex or glob patterns for assertion types in v1

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions