Skip to content

_call_user_fn_args silently reassigns kwargs positionally when a scorer declares an unrelated parameter #230

@willfrey

Description

@willfrey

Summary

braintrust.framework._call_user_fn_args (framework.py:488) matches scorer parameters to Braintrust-provided kwargs by name — but when a declared parameter name is not in the provided kwargs, it pops the next available kwarg positionally and assigns it anyway:

for name, param in signature.parameters.items():
    if param.kind in (VAR_POSITIONAL, VAR_KEYWORD):
        continue
    if name in kwargs:
        final_kwargs[name] = kwargs.pop(name)
    else:
        next_arg = list(kwargs.keys())[0]
        final_kwargs[name] = kwargs.pop(next_arg)   # <-- surprising

This means any scorer that declares a keyword-only parameter whose name Braintrust doesn't inject (e.g. for dependency injection, configuration, a client, a cached resource) will silently receive one of Braintrust's own kwargs — typically metadata or trace — under the wrong name. The declared default is discarded.

Repro

from dataclasses import dataclass

@dataclass
class Config:
    threshold: float = 0.5

DEFAULT_CONFIG = Config()

async def my_scorer(
    input: str,
    output,
    expected,
    *,
    config: Config = DEFAULT_CONFIG,  # default silently ignored
    **_,
):
    # At runtime, `config` is actually the `metadata` dict that
    # Braintrust passed, not DEFAULT_CONFIG.
    return config.threshold  # AttributeError: 'dict' object has no attribute 'threshold'

Braintrust calls scorers with input, expected, metadata, output, trace (framework.py:1571). Because config isn't in that set, _call_user_fn_args pops metadata off the dict and assigns it to config. **_ is populated only with whatever remains after the walk.

Why this is a problem

  1. Silent type-punning. The scorer parameter has a declared default and a declared type; both are ignored with no warning.
  2. Order-dependent. Which kwarg gets reassigned depends on the insertion order of the kwargs dict — subtle and fragile.
  3. Breaks DI patterns. The natural way to make a scorer testable is to accept an injectable dependency with a default; this behavior makes that unsafe.
  4. No warning or error. The user only notices when the wrong-type value explodes deep in the call stack.

Expected behavior

One of:

  • Only bind parameters that are present in Braintrust's provided kwargs by name; leave declared-but-absent parameters to their defaults.
  • If reassignment is intentional for some backward-compat reason, raise or warn when a declared parameter name has no matching kwarg and is filled positionally.
  • At minimum, document this behavior prominently in the scorer authoring docs.

Workaround

Drop the extra keyword parameter from the scorer's public signature and reference the dependency from a module-level variable (or closure). Monkeypatch the module-level variable for tests.

Environment

  • braintrust (latest on PyPI as of 2026-04-08)
  • Python 3.14

Metadata

Metadata

Labels

No labels
No labels

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions