feat(core): add beforeAll, budgetUsd, turns, aggregation to programmatic API#1119
Merged
feat(core): add beforeAll, budgetUsd, turns, aggregation to programmatic API#1119
Conversation
…tic API (#1115) Close the programmatic TS API gap by adding four YAML-first features to the public SDK types: - EvalConfig.beforeAll: string | string[] — suite-level setup command - EvalConfig.budgetUsd: number — cost cap passed to orchestrator - EvalTestInput.turns: ConversationTurnInput[] — multi-turn conversations - EvalTestInput.aggregation: ConversationAggregation — score strategy - EvalTestInput.mode: "conversation" — inferred automatically from turns[] New ConversationTurnInput type mirrors YAML turn structure with camelCase. Input field on EvalTestInput is now optional (omit when using turns[]). Includes 10 new tests, an advanced SDK example, and full lint/build pass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes the programmatic TS API gap by exposing four YAML-only features on the public SDK types.
Closes #1115
Changes
Types (
packages/core/src/evaluation/evaluate.ts)EvalConfig.beforeAll:string | string[]— command(s) run before suite, converted toWorkspaceHookConfigviash -cEvalConfig.budgetUsd:number— suite cost cap, passed to orchestratorEvalTestInput.turns:ConversationTurnInput[]— multi-turn conversation definitionEvalTestInput.aggregation:'mean' | 'min' | 'max'— score aggregation across turnsEvalTestInput.mode:'conversation'— auto-inferred whenturns[]presentEvalTestInput.input: Made optional (not needed in conversation mode)ConversationTurnInputinterface exported from@agentv/coreConversion Logic
toBeforeAllHook(): Converts string/array toWorkspaceHookConfigtoMessageArray()/extractQuestion(): Extracted helpers for input normalizationconvertAssertions(): Extracted from inline code for reuse in turn conversionTestMessageContent(matching YAML parser behavior)inputmissing on non-conversation testExample
examples/features/sdk-programmatic-api-advanced/— exercises all four new fieldsTests (
packages/core/test/evaluation/evaluate-programmatic-api.test.ts)11 tests covering: budgetUsd, turns with explicit/inferred mode, expectedOutput on turns, message array turns, aggregation, beforeAll (string and array), combined usage, standard single-turn, and missing-input validation.
E2E Verification
Green (all new tests pass):
Test Results
472/472 pass, 0 failures (workspace total)