Skip to content

feat(core): add beforeAll, budgetUsd, turns, aggregation to programmatic API#1119

Merged
christso merged 1 commit intomainfrom
feat/1115-programmatic-api
Apr 16, 2026
Merged

feat(core): add beforeAll, budgetUsd, turns, aggregation to programmatic API#1119
christso merged 1 commit intomainfrom
feat/1115-programmatic-api

Conversation

@christso
Copy link
Copy Markdown
Collaborator

Summary

Closes the programmatic TS API gap by exposing four YAML-only features on the public SDK types.

Closes #1115

Changes

Types (packages/core/src/evaluation/evaluate.ts)

  • EvalConfig.beforeAll: string | string[] — command(s) run before suite, converted to WorkspaceHookConfig via sh -c
  • EvalConfig.budgetUsd: number — suite cost cap, passed to orchestrator
  • EvalTestInput.turns: ConversationTurnInput[] — multi-turn conversation definition
  • EvalTestInput.aggregation: 'mean' | 'min' | 'max' — score aggregation across turns
  • EvalTestInput.mode: 'conversation' — auto-inferred when turns[] present
  • EvalTestInput.input: Made optional (not needed in conversation mode)
  • New ConversationTurnInput interface exported from @agentv/core

Conversion Logic

  • toBeforeAllHook(): Converts string/array to WorkspaceHookConfig
  • toMessageArray()/extractQuestion(): Extracted helpers for input normalization
  • convertAssertions(): Extracted from inline code for reuse in turn conversion
  • Turn inputs kept as TestMessageContent (matching YAML parser behavior)
  • Validation: throws if input missing on non-conversation test

Example

  • examples/features/sdk-programmatic-api-advanced/ — exercises all four new fields

Tests (packages/core/test/evaluation/evaluate-programmatic-api.test.ts)

11 tests covering: budgetUsd, turns with explicit/inferred mode, expectedOutput on turns, message array turns, aggregation, beforeAll (string and array), combined usage, standard single-turn, and missing-input validation.

E2E Verification

Green (all new tests pass):

$ bun test packages/core/test/evaluation/evaluate-programmatic-api.test.ts
 11 pass, 0 fail

Test Results

472/472 pass, 0 failures (workspace total)

…tic API (#1115)

Close the programmatic TS API gap by adding four YAML-first features to
the public SDK types:

- EvalConfig.beforeAll: string | string[] — suite-level setup command
- EvalConfig.budgetUsd: number — cost cap passed to orchestrator
- EvalTestInput.turns: ConversationTurnInput[] — multi-turn conversations
- EvalTestInput.aggregation: ConversationAggregation — score strategy
- EvalTestInput.mode: "conversation" — inferred automatically from turns[]

New ConversationTurnInput type mirrors YAML turn structure with camelCase.
Input field on EvalTestInput is now optional (omit when using turns[]).

Includes 10 new tests, an advanced SDK example, and full lint/build pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@christso christso merged commit 0ee2e93 into main Apr 16, 2026
4 checks passed
@christso christso deleted the feat/1115-programmatic-api branch April 16, 2026 03:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Close programmatic TS API gap: add beforeAll, budgetUsd, and multi-turn fields to EvalConfig / EvalTestInput

1 participant