Skip to content

BREAKING: consolidate 'metrics' and 'custom_evaluators' into evaluators#149

Merged
krisztianfekete merged 3 commits into
mainfrom
peterj/consolidatemetricsandeval
May 15, 2026
Merged

BREAKING: consolidate 'metrics' and 'custom_evaluators' into evaluators#149
krisztianfekete merged 3 commits into
mainfrom
peterj/consolidatemetricsandeval

Conversation

@peterj
Copy link
Copy Markdown
Contributor

@peterj peterj commented May 14, 2026

Warning

Breaking change. The eval config shape has changed. Existing
YAML config files, JSON API payloads, and MCP evaluate-sessions
requests need migration. See the migration guide below.

Previously we had a list of strings (metrics) and separate custom_evaluators field. this meant that built-in metrics would be provided through the metrics value as a name, and only run-level configuration would be applied -- i.e. there was not way to configure per-builtin-evaluator metrics.

The evaluation config now requires evaluators and agentevals now accepts a single canonical evaluation config field: evaluators.

The legacy metrics field has been removed, and legacy top-level config shapes are no longer accepted. Requests and config files must now define built-in and custom evaluators uniformly under evaluators[].
Before:

{
  "metrics": ["tool_trajectory_avg_score"]
}

After:

{
  "evaluators": [
    { "name": "tool_trajectory_avg_score", "type": "builtin" }
  ]
}

This also means older legacy config fields such as customEvaluators and top-level builtin overrides like judgeModel, threshold, and trajectoryMatchType must be migrated into the corresponding evaluator entries inside evaluators[].

Migration

YAML eval config files

Before:

metrics:
  - tool_trajectory_avg_score
judge_model: gemini-2.5-flash
threshold: 0.8
trajectory_match_type: EXACT
custom_evaluators:
  - { name: my_eval, type: code, path: ./my_eval.py }

After:

evaluators:
  - name: tool_trajectory_avg_score
    type: builtin
    judge_model: gemini-2.5-flash
    threshold: 0.8
    trajectory_match_type: EXACT
  - name: my_eval
    type: code
    path: ./my_eval.py

/api/evaluate, /api/evaluate/stream, /api/evaluate/json, /api/runs

Before:

{ "metrics": ["tool_trajectory_avg_score"], "judgeModel": "gemini-2.5-flash" }

After:

{ "evaluators": [
    { "name": "tool_trajectory_avg_score", "type": "builtin",
      "judgeModel": "gemini-2.5-flash" }
  ]
}

Legacy keys now return 4xx with Extra inputs are not permitted.

MCP /api/streaming/evaluate-sessions

Same shape as /api/evaluate above. The old metrics / judge_model
/ trajectory_match_type top-level fields are rejected.

CLI

agentevals list-metrics has been removed. Use
agentevals evaluator list --source builtin instead.

…s field

Signed-off-by: Peter Jausovec <peter.jausovec@solo.io>
@peterj peterj changed the title consolidate 'metrics' and 'custom_evaluators' into a single field consolidate 'metrics' and 'custom_evaluators' into evaluators May 14, 2026
@peterj peterj requested a review from krisztianfekete May 14, 2026 18:45
peterj and others added 2 commits May 14, 2026 11:46
Signed-off-by: Peter Jausovec <peter.jausovec@solo.io>
@krisztianfekete krisztianfekete changed the title consolidate 'metrics' and 'custom_evaluators' into evaluators BREAKING: consolidate 'metrics' and 'custom_evaluators' into evaluators May 15, 2026
@krisztianfekete krisztianfekete merged commit b93ab07 into main May 15, 2026
10 checks passed
@krisztianfekete krisztianfekete deleted the peterj/consolidatemetricsandeval branch May 15, 2026 08:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants