-
Initialize Audio Interface
- Use
pyaudio.PyAudio()to create thepaudio interface, managing both microphone and speaker audio streams.
Configure Microphone and Speaker Streams
-
Configure
mic_streamfor microphone input and
speaker_streamfor speaker output:
mic_streamusesmic_callbackto capture audio frames, placing each frame in themic_queuefor later transmission over WebSocket.speaker_streamusesspeaker_callbackto retrieve audio data fromaudio_bufferfor playback, handling timing and buffering of audio frames.
Start Audio Streams
- Start audio capture and playback by calling
mic_stream.start_stream()andspeaker_stream.start_stream().
Establish WebSocket Connection
-
The
connect_to_openai()function initiates a WebSocket connection with OpenAI’s API for real-time data exchange. This function spawns two threads:
- Send Microphone Audio:
send_mic_audio_to_websocketpulls audio data frommic_queue, encodes it in base64, and sends it via WebSocket in theinput_audio_buffer.appendformat. - Receive Audio and Responses:
receive_audio_from_websocketlistens for WebSocket messages from OpenAI, handling responses based on messagetype.
- Send Microphone Audio:
Handle WebSocket Events and Message Types
- In
receive_audio_from_websocket, the following message types are processed:session.created: Indicates a successful session creation. This triggerssend_fc_session_update()to configure session parameters, such as transcription and voice activity detection (VAD) settings.input_audio_buffer.speech_started: Indicates that the AI has detected speech in the audio input. This triggers:clear_audio_buffer()to remove any existing data inaudio_buffer.stop_audio_playback()to halt the speaker stream, ensuring only new audio data is played.
response.audio.delta: Contains audio data in thedeltafield, encoded in base64. This data is decoded and appended toaudio_buffer, wherespeaker_callbackcan access it for real-time playback.response.audio.done: Marks the end of the AI's audio response, signaling that no further audio data for the current response will be received.response.function_call_arguments.done: Indicates the AI’s request for a function call. Thehandle_function_callfunction decodes the request arguments, performs the specified action (such asget_weather), and returns the result to the AI usingsend_function_call_result().
Send Session Configuration Updates
send_fc_session_update()configures specific session parameters, such as the AI's tone, speaking speed, language, and audio format. This ensures the WebSocket session maintains the desired interaction behavior and settings.
Keep Audio Streams Active
- The main loop continuously checks the activity status of both audio streams (
mic_streamandspeaker_stream). It usesis_active()to confirm both are running, with a 0.1-second pause in each loop cycle to conserve resources.
Exception Handling and Interrupt Monitoring
- If a
KeyboardInterruptis detected,stop_eventis set, safely signaling all threads to terminate audio streaming and WebSocket communication.
Shutdown and Release Resources
- The
finallyblock handles stream closure and WebSocket disconnection.mic_streamandspeaker_streamare stopped and closed viastop_stream()andclose(), whilep.terminate()releases thePyAudioresources.
- Use
forked from fuwei007/OpenAI-RealtimeAPIDemoPythonPC
-
Notifications
You must be signed in to change notification settings - Fork 0
NascentCore/OpenAI-RealtimeAPIDemoPythonPC
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
About
A fork for realtime openai example
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published
Languages
- HTML 80.0%
- TeX 18.0%
- Python 2.0%