Problem
The programmatic SDK entry point evaluate({ tests, target, ... }) is the path TypeScript-first eval authors use instead of YAML. It works today (see examples/features/sdk-programmatic-api/evaluate.ts) and gives real type-safety wins — autocomplete, refactor rename, compile errors on typos — that YAML can't match.
But several first-class YAML features are not exposed on the public programmatic types, which forces authors to fall back to YAML the moment they need any of them.
Verified against current main:
EvalConfig (packages/core/src/evaluation/evaluate.ts:138) does not expose beforeAll or budgetUsd — both are supported in YAML (execution.total_budget_usd; suite-level beforeAll command).
EvalTestInput (packages/core/src/evaluation/evaluate.ts:83) does not expose turns or aggregation — both are typed on the internal EvalTest and usable from YAML, but not from the programmatic API.
Proposal
Add the following to the public TS types, wiring through to the same orchestrator paths used by the YAML loader:
EvalConfig
EvalTestInput
turns?: readonly ...[] — multi-turn conversation definition (match the existing internal EvalTest shape).
aggregation?: ... — aggregation strategy across turns (match existing internal shape).
Acceptance criteria
EvalConfig accepts beforeAll and budgetUsd and they route to the same code paths as YAML (including budget_exceeded emission on breach).
EvalTestInput accepts turns and aggregation; a multi-turn test can be authored purely in TS with no YAML.
- Existing YAML loader behaviour unchanged; programmatic and YAML paths share the same underlying
EvalTest / suite shape.
- One example added under
examples/features/ exercising all four fields in TS. (The multi-turn-conversation example can be cloned to a TS-authored variant.)
- Docs for the programmatic API list these fields as supported.
- No existing tests regress; new tests cover each field end-to-end via
evaluate().
Non-goals
- Auto-discovery of
*.eval.ts files (tracked separately).
- Renaming existing fields.
- Adding new grader types.
Motivation
Closing this gap makes the TS authoring path a real peer to YAML rather than a subset. Today a user who starts programmatically and hits any of these features has to rewrite their suite in YAML — a needless migration cliff.
Problem
The programmatic SDK entry point
evaluate({ tests, target, ... })is the path TypeScript-first eval authors use instead of YAML. It works today (seeexamples/features/sdk-programmatic-api/evaluate.ts) and gives real type-safety wins — autocomplete, refactor rename, compile errors on typos — that YAML can't match.But several first-class YAML features are not exposed on the public programmatic types, which forces authors to fall back to YAML the moment they need any of them.
Verified against current
main:EvalConfig(packages/core/src/evaluation/evaluate.ts:138) does not exposebeforeAllorbudgetUsd— both are supported in YAML (execution.total_budget_usd; suite-levelbeforeAllcommand).EvalTestInput(packages/core/src/evaluation/evaluate.ts:83) does not exposeturnsoraggregation— both are typed on the internalEvalTestand usable from YAML, but not from the programmatic API.Proposal
Add the following to the public TS types, wiring through to the same orchestrator paths used by the YAML loader:
EvalConfigbeforeAll?: string | readonly string[]— command(s) to run before the suite. Same semantics as YAMLbeforeAll.budgetUsd?: number— suite-level cost cap. Same semantics as YAMLexecution.total_budget_usd(renamed toexecution.budget_usdif Rename execution.total_budget_usd → execution.budget_usd (and suite.totalBudgetUsd → suite.budgetUsd) #1114 lands first).EvalTestInputturns?: readonly ...[]— multi-turn conversation definition (match the existing internalEvalTestshape).aggregation?: ...— aggregation strategy across turns (match existing internal shape).Acceptance criteria
EvalConfigacceptsbeforeAllandbudgetUsdand they route to the same code paths as YAML (includingbudget_exceededemission on breach).EvalTestInputacceptsturnsandaggregation; a multi-turn test can be authored purely in TS with no YAML.EvalTest/ suite shape.examples/features/exercising all four fields in TS. (Themulti-turn-conversationexample can be cloned to a TS-authored variant.)evaluate().Non-goals
*.eval.tsfiles (tracked separately).Motivation
Closing this gap makes the TS authoring path a real peer to YAML rather than a subset. Today a user who starts programmatically and hits any of these features has to rewrite their suite in YAML — a needless migration cliff.