Skip to content

Keep duplicate devserver evaluators addressable#495

Draft
Ritwij Aryan Parmar (RitwijParmar) wants to merge 1 commit into
braintrustdata:mainfrom
RitwijParmar:codex/braintrust-devserver-eval-ids
Draft

Keep duplicate devserver evaluators addressable#495
Ritwij Aryan Parmar (RitwijParmar) wants to merge 1 commit into
braintrustdata:mainfrom
RitwijParmar:codex/braintrust-devserver-eval-ids

Conversation

@RitwijParmar
Copy link
Copy Markdown

Summary

Fixes #366 by making duplicate eval_name registrations in the dev server addressable instead of silently collapsing to the last evaluator.

What changed:

  • build the dev-server evaluator registry by stable evaluator id rather than raw eval_name only
  • keep existing behavior for unique names: /list keys and /eval { name } continue to work
  • when duplicate names exist, /list exposes disambiguated keys like shared-project#1 / shared-project#2 plus id and original name metadata
  • /eval accepts the new id field, also accepts the disambiguated /list key as name, and returns a 409 with candidates when the original name is ambiguous

This avoids the startup-time raise/warn stopgap and keeps both evaluators reachable.

Tests

  • PYTHONPATH=py/src .venv/bin/python -m pytest py/src/braintrust/devserver/test_server_integration.py::test_devserver_keeps_duplicate_eval_names_addressable py/src/braintrust/devserver/test_server_integration.py::test_eval_falls_back_to_evaluator_project_id_when_request_omits_or_empty_it py/src/braintrust/devserver/test_server_integration.py::test_eval_request_project_id_overrides_evaluator -q
  • PYTHONPATH=py/src .venv/bin/python -m compileall -q py/src/braintrust/devserver/server.py py/src/braintrust/devserver/schemas.py py/src/braintrust/devserver/test_server_integration.py
  • git diff --check

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

braintrust eval --dev silently drops evaluators with duplicate eval_name

1 participant