-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Original Issue
Title
Add a Flag to Start the Module Without Sending User Audio
Background
When the module starts after establishing a WebSocket connection to the OpenAI Realtime API, the developer typically sends a session.update to configure instructions and capabilities.
However, in many real-world cases, this update may arrive late — due to network delays, initialization order, or application-specific logic.
During this gap, the module begins streaming user audio from the media bug. If the Voice Activity Detection (VAD) triggers before the session update completes, the model might start responding prematurely.
This can lead to several problems:
- The voice configuration can’t be changed once a response is active.
- The agent may interrupt itself or get interrupted immediately at the start of a call.
Expected Behavior
It should be possible to start the module without sending user audio to the Realtime API until explicitly allowed.
This would allow:
- Delayed initialization while the session update completes.
- Intro messages or greetings from the AI agent before the user can interrupt (VAD trigger).
Proposed Solution
Add an optional flag to the module start command to suppress user audio:
uuid_openai_audio_stream <uuid> start [wss-url | path] [mono | mixed | stereo] [8000 | 16000 | 24000] [mute_user]
Additionally, provide runtime API commands to control this behavior:
uuid_openai_audio_stream <uuid> mute
uuid_openai_audio_stream <uuid> unmute
Notes
This feature would give developers more control over session startup timing and improve integration flexibility for use cases where early user audio should be ignored or delayed.
Update
Integration with Proposed Solution
Integrate a new parameter to mute and unmute, which can take the following values:
user: behaves as defined earlier in the issueopenai: mute OpenAI realtime audioboth: mute both user and OpenAI audio
Usage:
uuid_openai_audio_stream <uuid> mute [user | openai | all]
uuid_openai_audio_stream <uuid> unmute [user | openai | all]
Notes
Important: all will behave differently from pause:
pausepauses the streaming of frames containing user audio (same asmute user), also pausing writing frames to playback OpenAI responses but will not flush any incoming audio from OpenAI; it will accumulate audio and restart from where it stopped playing the remaining audio uponresume.muteinstead will be the same for the user part aspausebut different for OpenAI, as the audio will be consumed but not played in the channel (as a mute should always work).- WebSocket events by OpenAI Realtime API will still be received and can be sent during
pauseandmute.