Summary
When using GPT-4o's audio output modality (modalities=["text", "audio"]), the streaming aggregation in ChatCompletionWrapper._postprocess_streaming_results silently drops all audio delta data from the span output. Non-streaming calls capture the full response including audio, creating an inconsistency.
What is missing
The _postprocess_streaming_results method (py/src/braintrust/oai.py, lines 288–357) only processes these delta fields:
| Delta field |
Captured? |
delta.role |
Yes |
delta.content |
Yes |
delta.tool_calls |
Yes |
delta.finish_reason |
Yes |
delta.audio |
No |
When OpenAI streams a chat completion with audio output, chunks include delta.audio with:
delta.audio.id — audio clip identifier
delta.audio.transcript — text transcript of the generated speech
delta.audio.data — base64-encoded audio bytes
delta.audio.expires_at — expiration timestamp
None of these fields are aggregated. The final span output contains no trace of the audio response.
Non-streaming is fine: the non-streaming path (lines 199–205) logs output=log_response["choices"], which includes the full audio field from the response. Only streaming is affected.
Relationship to existing issues
The audio field is distinct from the other missing fields because it carries substantial binary data (audio bytes) and a text transcript that users would want captured for observability.
Braintrust docs status
not_found — The OpenAI integration page does not mention audio modality output in chat completions.
Upstream sources
- OpenAI audio output guide: https://platform.openai.com/docs/guides/audio
- OpenAI chat completions streaming format:
choices[0].delta.audio object
- GPT-4o audio is GA — supports
gpt-4o-audio-preview and gpt-4o-mini-audio-preview models
- OpenAI Python SDK
ChatCompletionChunk.Choice.Delta.audio field
Local files inspected
py/src/braintrust/oai.py:
ChatCompletionWrapper._postprocess_streaming_results (lines 288–357) — only handles role, content, tool_calls, finish_reason; line 353 hardcodes "logprobs": None but doesn't even mention audio
- Non-streaming path (lines 199–205, 255–261) — logs full
choices including audio field
py/src/braintrust/wrappers/test_openai.py — no test cases for audio modality streaming
Summary
When using GPT-4o's audio output modality (
modalities=["text", "audio"]), the streaming aggregation inChatCompletionWrapper._postprocess_streaming_resultssilently drops allaudiodelta data from the span output. Non-streaming calls capture the full response including audio, creating an inconsistency.What is missing
The
_postprocess_streaming_resultsmethod (py/src/braintrust/oai.py, lines 288–357) only processes these delta fields:delta.roledelta.contentdelta.tool_callsdelta.finish_reasondelta.audioWhen OpenAI streams a chat completion with audio output, chunks include
delta.audiowith:delta.audio.id— audio clip identifierdelta.audio.transcript— text transcript of the generated speechdelta.audio.data— base64-encoded audio bytesdelta.audio.expires_at— expiration timestampNone of these fields are aggregated. The final span output contains no trace of the audio response.
Non-streaming is fine: the non-streaming path (lines 199–205) logs
output=log_response["choices"], which includes the fullaudiofield from the response. Only streaming is affected.Relationship to existing issues
logprobsfrom chunks #180 coverslogprobsbeing dropped in streaming (same method, same root cause pattern)refusaldelta text from span output #181 coversrefusalbeing dropped in streaming (same method, same root cause pattern)client.audio.speech,transcriptions,translations) not instrumented #174 covers the directclient.audio.*APIs (separate gap — this issue is about audio output in chat completions)The
audiofield is distinct from the other missing fields because it carries substantial binary data (audio bytes) and a text transcript that users would want captured for observability.Braintrust docs status
not_found — The OpenAI integration page does not mention audio modality output in chat completions.
Upstream sources
choices[0].delta.audioobjectgpt-4o-audio-previewandgpt-4o-mini-audio-previewmodelsChatCompletionChunk.Choice.Delta.audiofieldLocal files inspected
py/src/braintrust/oai.py:ChatCompletionWrapper._postprocess_streaming_results(lines 288–357) — only handlesrole,content,tool_calls,finish_reason; line 353 hardcodes"logprobs": Nonebut doesn't even mentionaudiochoicesincludingaudiofieldpy/src/braintrust/wrappers/test_openai.py— no test cases for audio modality streaming