Problem
Running evals authored in TypeScript today requires writing a run.ts that imports case modules and calls evaluate(...) explicitly. YAML evals are auto-discovered by the CLI; TS evals aren't.
Evidence: apps/cli/src/commands/eval/commands/run.ts:23 describes the positional args as "Path(s) or glob(s) to evaluation .yaml file(s)". Globbing works for YAML / JSONL / JSON only. The sdk-config-file example covers agentv.config.ts discovery but that's the config, not eval case files.
That missing piece of boilerplate is the main reason the TS authoring path feels second-class next to YAML.
Proposal
A discovery convention:
- CLI discovers
**/*.eval.ts (and **/*.eval.js after a build step, if relevant) the same way it discovers EVAL.yaml / *.eval.yaml.
- Each discovered module default-exports (or named-exports) an
EvalConfig value, and the CLI runs it with the same reporter, inspect, and compare tooling as YAML evals.
- Config discovery precedence and
--filter / --tag / --only flags apply uniformly across YAML and TS evals.
- Runtime: use whatever loader the runtime supports for TS modules (Bun direct import, or tsx / jiti for Node). Document the expectation.
Acceptance criteria
agentv run picks up *.eval.ts files with no extra flags.
- A TS eval and a YAML eval in the same workspace produce identical trace / inspect output.
- Example under
examples/features/ demonstrates a mixed YAML + TS suite.
- Docs updated with discovery rules and the runtime expectation for executing
.ts files.
--workers, --threshold, --tag, --exclude-tag, --filter, --retry-errors, --output, and the cache/output-dir behaviour all work identically for TS-authored suites.
Non-goals
Depends on
Motivation
Closes the DX gap with hand-rolled TS harnesses while keeping agentv's framework advantages — cost tracking, inspect, compare, grader variety. The current cliff is: "use YAML, or write your own runner script." Neither is what a TS-first user wants.
Problem
Running evals authored in TypeScript today requires writing a
run.tsthat imports case modules and callsevaluate(...)explicitly. YAML evals are auto-discovered by the CLI; TS evals aren't.Evidence:
apps/cli/src/commands/eval/commands/run.ts:23describes the positional args as "Path(s) or glob(s) to evaluation .yaml file(s)". Globbing works for YAML / JSONL / JSON only. Thesdk-config-fileexample coversagentv.config.tsdiscovery but that's the config, not eval case files.That missing piece of boilerplate is the main reason the TS authoring path feels second-class next to YAML.
Proposal
A discovery convention:
**/*.eval.ts(and**/*.eval.jsafter a build step, if relevant) the same way it discoversEVAL.yaml/*.eval.yaml.EvalConfigvalue, and the CLI runs it with the same reporter, inspect, and compare tooling as YAML evals.--filter/--tag/--onlyflags apply uniformly across YAML and TS evals.Acceptance criteria
agentv runpicks up*.eval.tsfiles with no extra flags.examples/features/demonstrates a mixed YAML + TS suite..tsfiles.--workers,--threshold,--tag,--exclude-tag,--filter,--retry-errors,--output, and the cache/output-dir behaviour all work identically for TS-authored suites.Non-goals
EvalConfigfields — depends on Close programmatic TS API gap: add beforeAll, budgetUsd, and multi-turn fields to EvalConfig / EvalTestInput #1115 forbeforeAll/budgetUsd/ multi-turn parity first.evaluate()function;.eval.tsfiles use it under the hood.Depends on
beforeAll/budgetUsd/ multi-turn support.Motivation
Closes the DX gap with hand-rolled TS harnesses while keeping agentv's framework advantages — cost tracking, inspect, compare, grader variety. The current cliff is: "use YAML, or write your own runner script." Neither is what a TS-first user wants.