Skip to content

Close programmatic TS API gap: add beforeAll, budgetUsd, and multi-turn fields to EvalConfig / EvalTestInput #1115

@christso

Description

@christso

Problem

The programmatic SDK entry point evaluate({ tests, target, ... }) is the path TypeScript-first eval authors use instead of YAML. It works today (see examples/features/sdk-programmatic-api/evaluate.ts) and gives real type-safety wins — autocomplete, refactor rename, compile errors on typos — that YAML can't match.

But several first-class YAML features are not exposed on the public programmatic types, which forces authors to fall back to YAML the moment they need any of them.

Verified against current main:

  • EvalConfig (packages/core/src/evaluation/evaluate.ts:138) does not expose beforeAll or budgetUsd — both are supported in YAML (execution.total_budget_usd; suite-level beforeAll command).
  • EvalTestInput (packages/core/src/evaluation/evaluate.ts:83) does not expose turns or aggregation — both are typed on the internal EvalTest and usable from YAML, but not from the programmatic API.

Proposal

Add the following to the public TS types, wiring through to the same orchestrator paths used by the YAML loader:

EvalConfig

EvalTestInput

  • turns?: readonly ...[] — multi-turn conversation definition (match the existing internal EvalTest shape).
  • aggregation?: ... — aggregation strategy across turns (match existing internal shape).

Acceptance criteria

  • EvalConfig accepts beforeAll and budgetUsd and they route to the same code paths as YAML (including budget_exceeded emission on breach).
  • EvalTestInput accepts turns and aggregation; a multi-turn test can be authored purely in TS with no YAML.
  • Existing YAML loader behaviour unchanged; programmatic and YAML paths share the same underlying EvalTest / suite shape.
  • One example added under examples/features/ exercising all four fields in TS. (The multi-turn-conversation example can be cloned to a TS-authored variant.)
  • Docs for the programmatic API list these fields as supported.
  • No existing tests regress; new tests cover each field end-to-end via evaluate().

Non-goals

  • Auto-discovery of *.eval.ts files (tracked separately).
  • Renaming existing fields.
  • Adding new grader types.

Motivation

Closing this gap makes the TS authoring path a real peer to YAML rather than a subset. Today a user who starts programmatically and hits any of these features has to rewrite their suite in YAML — a needless migration cliff.

Metadata

Metadata

Assignees

No one assigned

    Labels

    in-progressClaimed by an agent — do not duplicate worksdkRelates to the TypeScript SDK (programmatic API, CLI flags, schema)

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions