Skip to content

Add a flag to start the mod without sending user audio #22

@dariopellegrino00

Description

@dariopellegrino00

Original Issue

Title

Add a Flag to Start the Module Without Sending User Audio


Background

When the module starts after establishing a WebSocket connection to the OpenAI Realtime API, the developer typically sends a session.update to configure instructions and capabilities.
However, in many real-world cases, this update may arrive late — due to network delays, initialization order, or application-specific logic.

During this gap, the module begins streaming user audio from the media bug. If the Voice Activity Detection (VAD) triggers before the session update completes, the model might start responding prematurely.
This can lead to several problems:

  • The voice configuration can’t be changed once a response is active.
  • The agent may interrupt itself or get interrupted immediately at the start of a call.

Expected Behavior

It should be possible to start the module without sending user audio to the Realtime API until explicitly allowed.
This would allow:

  • Delayed initialization while the session update completes.
  • Intro messages or greetings from the AI agent before the user can interrupt (VAD trigger).

Proposed Solution

Add an optional flag to the module start command to suppress user audio:

uuid_openai_audio_stream <uuid> start [wss-url | path] [mono | mixed | stereo] [8000 | 16000 | 24000] [mute_user]

Additionally, provide runtime API commands to control this behavior:

uuid_openai_audio_stream <uuid> mute
uuid_openai_audio_stream <uuid> unmute

Notes

This feature would give developers more control over session startup timing and improve integration flexibility for use cases where early user audio should be ignored or delayed.

Update

Integration with Proposed Solution

Integrate a new parameter to mute and unmute, which can take the following values:

  • user: behaves as defined earlier in the issue
  • openai: mute OpenAI realtime audio
  • both: mute both user and OpenAI audio

Usage:

uuid_openai_audio_stream <uuid> mute [user | openai | all]
uuid_openai_audio_stream <uuid> unmute [user | openai | all]

Notes

Important: all will behave differently from pause:

  • pause pauses the streaming of frames containing user audio (same as mute user), also pausing writing frames to playback OpenAI responses but will not flush any incoming audio from OpenAI; it will accumulate audio and restart from where it stopped playing the remaining audio upon resume.
  • mute instead will be the same for the user part as pause but different for OpenAI, as the audio will be consumed but not played in the channel (as a mute should always work).
  • WebSocket events by OpenAI Realtime API will still be received and can be sent during pause and mute.

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions