Summary
OpenAI Agents SDK v0.14.0 introduces two new tracing span data types that our BraintrustTracingProcessor does not handle yet. Existing tests pass because cassettes were recorded pre-v0.14.0 and don't exercise the new span hierarchy.
New Span Types
v0.14.0 adds a new layer of spans around the runner and each agent loop turn:
Trace ("Agent workflow")
└── TaskSpan ("Agent workflow") ← NEW
└── AgentSpan ("test-agent")
└── TurnSpan (turn=1, ...) ← NEW
└── ResponseSpan
| New Type |
Type String |
Purpose |
TaskSpanData |
"task" |
Wraps one top-level Runner.run() invocation |
TurnSpanData |
"turn" |
Wraps one agent loop iteration (model call + tools + handoff) |
Both types carry aggregate usage (token counts) and metadata dicts.
Gaps in py/src/braintrust/integrations/openai_agents/tracing.py
Must fix
_span_name() — No isinstance branches for TaskSpanData or TurnSpanData → both return "Unknown".
_log_data() — No handlers for the new types → returns {}, losing usage metrics and metadata.
_span_type() — Both fall through to the default else → TASK (correct by accident but should be explicit for "task" and "turn").
- Imports — Need to import
TaskSpanData and TurnSpanData from agents.tracing.
Should fix
_agent_log_data() — AgentSpanData now has a metadata slot that we don't capture.
ResponseSpanData — Now has a top-level usage slot (our code reads response.usage which still works, but the new slot could serve as a fallback).
Version matrix / cassettes
py/noxfile.py — OPENAI_AGENTS_VERSIONS = (LATEST, "0.0.19") — consider pinning "0.14.0" now that the span structure changed significantly.
- Cassettes — All 4 existing cassettes were recorded pre-v0.14.0 and don't produce the new span types. Need to re-record for real coverage.
References
Summary
OpenAI Agents SDK v0.14.0 introduces two new tracing span data types that our
BraintrustTracingProcessordoes not handle yet. Existing tests pass because cassettes were recorded pre-v0.14.0 and don't exercise the new span hierarchy.New Span Types
v0.14.0 adds a new layer of spans around the runner and each agent loop turn:
TaskSpanData"task"Runner.run()invocationTurnSpanData"turn"Both types carry aggregate
usage(token counts) andmetadatadicts.Gaps in
py/src/braintrust/integrations/openai_agents/tracing.pyMust fix
_span_name()— Noisinstancebranches forTaskSpanDataorTurnSpanData→ both return"Unknown"._log_data()— No handlers for the new types → returns{}, losing usage metrics and metadata._span_type()— Both fall through to the defaultelse → TASK(correct by accident but should be explicit for"task"and"turn").TaskSpanDataandTurnSpanDatafromagents.tracing.Should fix
_agent_log_data()—AgentSpanDatanow has ametadataslot that we don't capture.ResponseSpanData— Now has a top-levelusageslot (our code readsresponse.usagewhich still works, but the new slot could serve as a fallback).Version matrix / cassettes
py/noxfile.py—OPENAI_AGENTS_VERSIONS = (LATEST, "0.0.19")— consider pinning"0.14.0"now that the span structure changed significantly.References
py/src/braintrust/integrations/openai_agents/tracing.pypy/src/braintrust/integrations/openai_agents/test_openai_agents.pytest_openai_agents