Skip to content

[OAI] Support braintrust >=0.13 wrapping (fix Python CI)#193

Draft
Kenny Wong (wong-codaio) wants to merge 1 commit into
braintrustdata:mainfrom
wong-codaio:wong/deps/support-braintrust-0.13
Draft

[OAI] Support braintrust >=0.13 wrapping (fix Python CI)#193
Kenny Wong (wong-codaio) wants to merge 1 commit into
braintrustdata:mainfrom
wong-codaio:wong/deps/support-braintrust-0.13

Conversation

@wong-codaio
Copy link
Copy Markdown
Contributor

@wong-codaio Kenny Wong (wong-codaio) commented Jun 2, 2026

Summary

[OAI] Support braintrust >=0.13 wrapping (fix Python CI)

  • braintrust 0.13 changed wrap_openai to patch methods in place (wrapt) instead of returning a NamedWrapper, and dropped the *Wrapper classes — breaking is_wrapped detection and test_oai.py's imports (collection abort → red Python CI on every PR).
  • fix-forward (no pin): detect wrapping version-agnostically — NamedWrapper (<0.13) or wrapt wrapper on create() (>=0.13).
  • behavior change: >=0.13 only instruments the v1 SDK, so v0 clients are no longer traced. tests updated.

PTAL:
FYI:

Test plan

  • pytest py/autoevals/ → 68 passed on braintrust 0.23 (was: collection ImportError)
  • manual sanity check below — verifies detection + a real classifier completing through the wrapped client (so the injected braintrust span metadata doesn't break the call):
braintrust 0.23.0
OK: wrapping detection works (structural)
OK: live classifier through wrapped client -> score=1
check_braintrust_wrapping.py (run it yourself)
"""Sanity check: autoevals still works with the installed braintrust.

braintrust periodically changes how wrap_openai instruments OpenAI clients
(e.g. 0.13 moved from a NamedWrapper proxy to in-place wrapt patching). If
detection regresses, scorer spans silently lose their `purpose` -- and the
injected span metadata could even break the wrapped call. Run this after a
braintrust upgrade to confirm nothing regressed.

  # structural checks only (no network/key):
  python check_braintrust_wrapping.py

  # also run a real classifier through the wrapped client:
  BRAINTRUST_API_KEY=... python check_braintrust_wrapping.py
"""

import importlib.metadata as md
import os

import openai
from braintrust.oai import wrap_openai

from autoevals import Factuality, init
from autoevals.oai import LLMClient, get_openai_wrappers, openai_client_is_wrapped, prepare_openai

print("braintrust", md.version("braintrust"))
NamedWrapper, _ = get_openai_wrappers()

# 1. helper detects a wrapped client and does not false-positive on a plain one.
wrapped = wrap_openai(openai.OpenAI(api_key="x"))
plain = openai.OpenAI(api_key="x")
assert openai_client_is_wrapped(wrapped, NamedWrapper), "wrapped client not detected"
assert not openai_client_is_wrapped(plain, NamedWrapper), "plain client falsely detected"

# 2. init() -> prepare_openai() wraps the default client and reports is_wrapped.
init(openai.OpenAI(api_key="x"))
assert prepare_openai().is_wrapped, "prepare_openai() did not wrap"

# 3. a client with custom methods opts out of wrapping.
init(None)
custom = openai.OpenAI(api_key="x")
assert not LLMClient(openai=custom, complete=custom.chat.completions.create).is_wrapped
print("OK: wrapping detection works (structural)")

# 4. live: run a real classifier through the wrapped client. The wrapped call
#    must accept the injected braintrust span metadata and return a score.
if os.environ.get("OPENAI_API_KEY") or os.environ.get("BRAINTRUST_API_KEY"):
    init(None)  # default client -> braintrust proxy + BRAINTRUST_API_KEY
    assert prepare_openai().is_wrapped
    r = Factuality(model="gpt-4o-mini").eval(
        output="Paris", expected="Paris", input="What is the capital of France?"
    )
    assert r.error is None, r.error
    assert r.score == 1, r.score
    print(f"OK: live classifier through wrapped client -> score={r.score}")
else:
    print("SKIP live classifier (set BRAINTRUST_API_KEY or OPENAI_API_KEY to run it)")

Other Notes

braintrust 0.13 removed the wrapper classes and changed wrap_openai to patch
resource methods in place (wrapt FunctionWrapper) instead of returning a
NamedWrapper proxy. The isinstance(client, NamedWrapper) check no longer
detected wrapping (is_wrapped wrongly False, so scorer spans lost their
purpose), and test_oai.py imported now-removed classes, aborting collection.

Detect wrapping in a version-agnostic way: NamedWrapper for <0.13, wrapt
wrapper type on create() for >=0.13. Note >=0.13 only instruments the v1 SDK,
so v0 clients are no longer traced; tests updated to match.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant