`braintrust eval --dev` silently drops evaluators with duplicate `eval_name`

### Summary

When `braintrust eval --dev` loads multiple `eval_*.py` files whose `Eval(...)` calls share a `name` argument, only one evaluator survives the load. The startup log reports the full count loaded, but `GET /list` returns just one entry. There is no warning or error.

### Repro

Two minimal entrypoints, both targeting the same Braintrust project (the natural pattern when you want one project containing multiple evaluators):

`eval_a.py`:
```python
from braintrust import Eval, EvalCase

Eval(
    "MyProject",
    data=lambda: [EvalCase(input="a", expected="a")],
    task=lambda i, h: i,
    scores=[lambda input, output, expected, **_: 1.0 if output == expected else 0.0],
)
```

`eval_b.py`:
```python
from braintrust import Eval, EvalCase

Eval(
    "MyProject",
    data=lambda: [EvalCase(input="b", expected="b")],
    task=lambda i, h: i,
    scores=[lambda input, output, expected, **_: 1.0 if output == expected else 0.0],
)
```

Run:
```sh
braintrust eval eval_a.py eval_b.py --dev --dev-host 127.0.0.1 --dev-port 8300 --dev-org-name <my-org>
```

Startup log:
```
Loaded 2 evaluator(s): ['MyProject', 'MyProject']
```

`GET /list`:
```sh
$ curl -s http://127.0.0.1:8300/list \
    -H "Authorization: Bearer $BRAINTRUST_API_KEY" \
    -H "x-bt-org-name: <my-org>" | jq 'keys, length'
[ "MyProject" ]
1
```

Two evaluators loaded, only one reachable.

### Cause

`braintrust/devserver/server.py:308`:

```python
_all_evaluators = {evaluator.eval_name: evaluator for evaluator in evaluators}
```

The dict comprehension silently keeps the last entry per duplicate key. The startup log on the previous line prints the full input list *before* this collapse, which is why the loaded count looks correct.

Compounding factor: `Eval(name=...)` doubles as both the evaluator name and the project-name fallback (per the docstring at `framework.py:985`: "this corresponds to a project name in Braintrust"). The natural way to put multiple evaluators under one Braintrust project is to pass `Eval("SHARED_PROJECT_NAME", ...)` in each file — which produces this collision under `--dev`.

### Expected behavior

One of:
1. **Raise** at server startup naming the conflicting `eval_name` and the entrypoints that produced it.
2. **Warn** (logger or `warnings.warn`) on the collapse.
3. **Key** `_all_evaluators` by something more discriminating (e.g., file path + eval name).

Option 1 is the most defensible — silent data loss between the loader and the registered-evaluator dict shouldn't be possible. Option 3 is the most permissive but requires deciding how to disambiguate at the `/list` and `/eval` API.

### Workaround (verified)

Each entrypoint passes a distinct `name` AND an explicit `project_id` to keep multiple evaluators under one Braintrust project:

```python
from braintrust import Eval, EvalCase, init_dataset

_DATASET = init_dataset(project="MyProject", name="SomeDataset")

Eval(
    name="MyProject — variant_a",
    project_id=_DATASET.project_id,
    data=lambda: [EvalCase(input="a", expected="a")],
    task=lambda i, h: i,
    scores=[lambda input, output, expected, **_: 1.0 if output == expected else 0.0],
)
```

Verified empirically:
- `GET /list` returns two distinct entries (one per entrypoint with distinct `name`).
- Experiments from both evaluators land in the existing `MyProject` Braintrust project (URL pattern `…/p/MyProject/experiments/…`), confirming `project_id` overrides the `name`-based project resolution at write time (per `framework.py:767, 1792-1793`).

This is the supported pattern, but it's not discoverable: the API design (`name` doubles as the project-name fallback, no warning on dev-server collapse) leads users into the broken default.

### Fix sketch

In `braintrust/devserver/server.py`, replace the dict comprehension with a guarded loop:

```python
_all_evaluators: dict[str, Evaluator[Any, Any, Any]] = {}
for evaluator in evaluators:
    if evaluator.eval_name in _all_evaluators:
        raise ValueError(
            f"Multiple evaluators registered with name {evaluator.eval_name!r}. "
            f"Each --dev evaluator must have a distinct name. "
            f"Pass a unique `name=` to Eval(...) and `project_id=` to keep them in the same project."
        )
    _all_evaluators[evaluator.eval_name] = evaluator
```

Optionally, `framework.py`'s `Eval` could emit a deprecation note when `project_id` is omitted and `name` matches an existing Braintrust project — but that's a separate API discussion.

### Environment

- `braintrust` 0.17.0
- Python 3.14
- macOS


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`braintrust eval --dev` silently drops evaluators with duplicate `eval_name` #366

Summary

Repro

Cause

Expected behavior

Workaround (verified)

Fix sketch

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

braintrust eval --dev silently drops evaluators with duplicate eval_name #366

Description

Summary

Repro

Cause

Expected behavior

Workaround (verified)

Fix sketch

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`braintrust eval --dev` silently drops evaluators with duplicate `eval_name` #366