Skip to content

[BOT ISSUE] OpenAI: chat completions streaming drops audio delta chunks (GPT-4o audio modality) #201

@braintrust-bot

Description

@braintrust-bot

Summary

When using GPT-4o's audio output modality (modalities=["text", "audio"]), the streaming aggregation in ChatCompletionWrapper._postprocess_streaming_results silently drops all audio delta data from the span output. Non-streaming calls capture the full response including audio, creating an inconsistency.

What is missing

The _postprocess_streaming_results method (py/src/braintrust/oai.py, lines 288–357) only processes these delta fields:

Delta field Captured?
delta.role Yes
delta.content Yes
delta.tool_calls Yes
delta.finish_reason Yes
delta.audio No

When OpenAI streams a chat completion with audio output, chunks include delta.audio with:

  • delta.audio.id — audio clip identifier
  • delta.audio.transcript — text transcript of the generated speech
  • delta.audio.data — base64-encoded audio bytes
  • delta.audio.expires_at — expiration timestamp

None of these fields are aggregated. The final span output contains no trace of the audio response.

Non-streaming is fine: the non-streaming path (lines 199–205) logs output=log_response["choices"], which includes the full audio field from the response. Only streaming is affected.

Relationship to existing issues

The audio field is distinct from the other missing fields because it carries substantial binary data (audio bytes) and a text transcript that users would want captured for observability.

Braintrust docs status

not_found — The OpenAI integration page does not mention audio modality output in chat completions.

Upstream sources

  • OpenAI audio output guide: https://platform.openai.com/docs/guides/audio
  • OpenAI chat completions streaming format: choices[0].delta.audio object
  • GPT-4o audio is GA — supports gpt-4o-audio-preview and gpt-4o-mini-audio-preview models
  • OpenAI Python SDK ChatCompletionChunk.Choice.Delta.audio field

Local files inspected

  • py/src/braintrust/oai.py:
    • ChatCompletionWrapper._postprocess_streaming_results (lines 288–357) — only handles role, content, tool_calls, finish_reason; line 353 hardcodes "logprobs": None but doesn't even mention audio
    • Non-streaming path (lines 199–205, 255–261) — logs full choices including audio field
  • py/src/braintrust/wrappers/test_openai.py — no test cases for audio modality streaming

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions