BREAKING: consolidate 'metrics' and 'custom_evaluators' into `evaluators` by peterj · Pull Request #149 · agentevals-dev/agentevals

peterj · 2026-05-14T18:44:55Z

Warning

Breaking change. The eval config shape has changed. Existing
YAML config files, JSON API payloads, and MCP evaluate-sessions
requests need migration. See the migration guide below.

Previously we had a list of strings (metrics) and separate custom_evaluators field. this meant that built-in metrics would be provided through the metrics value as a name, and only run-level configuration would be applied -- i.e. there was not way to configure per-builtin-evaluator metrics.

The evaluation config now requires evaluators and agentevals now accepts a single canonical evaluation config field: evaluators.

The legacy metrics field has been removed, and legacy top-level config shapes are no longer accepted. Requests and config files must now define built-in and custom evaluators uniformly under evaluators[].
Before:

{
  "metrics": ["tool_trajectory_avg_score"]
}

After:

{
  "evaluators": [
    { "name": "tool_trajectory_avg_score", "type": "builtin" }
  ]
}

This also means older legacy config fields such as customEvaluators and top-level builtin overrides like judgeModel, threshold, and trajectoryMatchType must be migrated into the corresponding evaluator entries inside evaluators[].

Migration

YAML eval config files

Before:

metrics:
  - tool_trajectory_avg_score
judge_model: gemini-2.5-flash
threshold: 0.8
trajectory_match_type: EXACT
custom_evaluators:
  - { name: my_eval, type: code, path: ./my_eval.py }

After:

evaluators:
  - name: tool_trajectory_avg_score
    type: builtin
    judge_model: gemini-2.5-flash
    threshold: 0.8
    trajectory_match_type: EXACT
  - name: my_eval
    type: code
    path: ./my_eval.py

`/api/evaluate`, `/api/evaluate/stream`, `/api/evaluate/json`, `/api/runs`

Before:

{ "metrics": ["tool_trajectory_avg_score"], "judgeModel": "gemini-2.5-flash" }

After:

{ "evaluators": [
    { "name": "tool_trajectory_avg_score", "type": "builtin",
      "judgeModel": "gemini-2.5-flash" }
  ]
}

Legacy keys now return 4xx with Extra inputs are not permitted.

MCP `/api/streaming/evaluate-sessions`

Same shape as /api/evaluate above. The old metrics / judge_model
/ trajectory_match_type top-level fields are rejected.

CLI

agentevals list-metrics has been removed. Use
agentevals evaluator list --source builtin instead.

…s field Signed-off-by: Peter Jausovec <peter.jausovec@solo.io>

Signed-off-by: Peter Jausovec <peter.jausovec@solo.io>

consolidate 'metrics' and 'custom_evaluators' into a single evaluator…

05b480a

…s field Signed-off-by: Peter Jausovec <peter.jausovec@solo.io>

peterj changed the title ~~consolidate 'metrics' and 'custom_evaluators' into a single field~~ consolidate 'metrics' and 'custom_evaluators' into evaluators May 14, 2026

peterj requested a review from krisztianfekete May 14, 2026 18:45

peterj and others added 2 commits May 14, 2026 11:46

lint

0974346

Signed-off-by: Peter Jausovec <peter.jausovec@solo.io>

cleanup

9d9736c

krisztianfekete changed the title ~~consolidate 'metrics' and 'custom_evaluators' into evaluators~~ BREAKING: consolidate 'metrics' and 'custom_evaluators' into evaluators May 15, 2026

krisztianfekete merged commit b93ab07 into main May 15, 2026
10 checks passed

krisztianfekete deleted the peterj/consolidatemetricsandeval branch May 15, 2026 08:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BREAKING: consolidate 'metrics' and 'custom_evaluators' into `evaluators`#149

BREAKING: consolidate 'metrics' and 'custom_evaluators' into `evaluators`#149
krisztianfekete merged 3 commits into
mainfrom
peterj/consolidatemetricsandeval

peterj commented May 14, 2026 •

edited by krisztianfekete

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

peterj commented May 14, 2026 • edited by krisztianfekete Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Migration

YAML eval config files

/api/evaluate, /api/evaluate/stream, /api/evaluate/json, /api/runs

MCP /api/streaming/evaluate-sessions

CLI

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

peterj commented May 14, 2026 •

edited by krisztianfekete

Loading

`/api/evaluate`, `/api/evaluate/stream`, `/api/evaluate/json`, `/api/runs`

MCP `/api/streaming/evaluate-sessions`