fix: stop sending audio to Omi backend when custom STT is active (prevents listening-minute consumption)#6634
Conversation
When a custom STT provider (Deepgram, Local Whisper, custom endpoint) is configured, CompositeTranscriptionSocket opens two WebSocket connections simultaneously and sends every raw audio chunk to both: primarySocket → custom STT provider (intended) secondarySocket → api.omi.me/v4/listen (unintended side-effect) The secondary connection to /v4/listen causes the Omi backend to transcribe the audio stream in parallel, consuming listening minutes from the user's quota — even though the custom provider is handling all transcription. This makes the "bring your own STT" feature misleading: minutes are counted regardless of which provider is used. Fix: add skipAudioToSecondary flag to CompositeTranscriptionSocket. When true, raw audio bytes go only to the primary socket. The secondary socket still connects and receives forwarded transcript JSON (_forwardAsSuggestedTranscript), so conversation saving, AI processing, and memory extraction on the Omi backend continue to work normally. Only the audio transcription — and the minute counting — is skipped. Set skipAudioToSecondary: true unconditionally in _createCompositeService since the composite is only constructed when a custom STT config is active. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Greptile SummaryThis PR fixes a bug where Confidence Score: 5/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant App as Flutter App
participant CS as CompositeTranscriptionSocket
participant PS as primarySocket<br/>(Custom STT)
participant SS as secondarySocket<br/>(Omi /v4/listen)
Note over App,SS: Before fix — audio sent to BOTH sockets
App->>CS: send(audioChunk)
CS->>PS: send(audioChunk)
PS-->>CS: transcript JSON
CS->>SS: _forwardAsSuggestedTranscript(transcript)
CS->>SS: send(audioChunk) ❌ metered
Note over App,SS: After fix — skipAudioToSecondary: true
App->>CS: send(audioChunk)
CS->>PS: send(audioChunk)
PS-->>CS: transcript JSON
CS->>SS: _forwardAsSuggestedTranscript(transcript) ✅
Note over CS,SS: secondarySocket.send(audioChunk) skipped ✅
Note over SS: Omi backend processes forwarded<br/>transcripts only — no listening minutes consumed
|
…alive Two Greptile P2 review fixes: 1. Rename flag: skipAudioToSecondary → skipSendToSecondary The flag guards the send() path, not just audio specifically. Any message passed to send() is skipped for the secondary socket. The new name reflects what is actually skipped (the send() call) rather than implying audio-type inspection. _forwardAsSuggestedTranscript calls secondarySocket.send() directly and is unaffected by this flag. 2. Document keepalive: PureSocket sets pingInterval=20s on the underlying IOWebSocketChannel, so WebSocket protocol-level pings fire every 20 seconds regardless of application data. The secondary socket stays alive during silence without any additional keep-alive logic. Addresses Greptile P2 comments on PR BasedHardware#6634. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Addressing Greptile P2 comments — both fixed in commit 6a04ebf P2-1 — Flag naming P2-2 — Secondary socket keepalive during silence If the Omi backend enforces an application-level audio-data idle timeout (distinct from a network/WebSocket idle timeout), that would be a separate backend-side concern worth raising, but standard WebSocket keepalives should handle the common case. |
|
Summary for human reviewers This PR fixes a silent billing issue with the custom STT feature. What was happening When a user configures a custom transcription provider (Deepgram, Local Whisper, or a custom endpoint) in Settings → Developer Options → Transcription, the app was still streaming raw audio to Omi's own support confirmed this behaviour:
What this PR does Adds a What is not affected
The Greptile bot reviewed this at 5/5 confidence and marked it safe to merge. The two comments it raised (flag naming and keepalive) were addressed in commit |
|
Fixes #6637 |
Bug
When a custom STT provider (Deepgram, Local Whisper, or a custom endpoint) is configured,
CompositeTranscriptionSocketopens two WebSocket connections simultaneously and sends every raw audio chunk to both:The secondary connection causes the Omi backend to transcribe the audio in parallel, consuming listening minutes from the user's quota — even though the custom provider is handling all transcription.
This makes the "bring your own STT" feature misleading. Per Omi's own support:
The
custom_stt=enabledquery flag tells the backend to use forwarded transcripts instead of its own transcription output, but it does not stop the audio stream from being received and metered.Root Cause
composite_transcription_socket.dartsend()(lines 141-147 before this fix):Fix
Add
skipAudioToSecondaryflag toCompositeTranscriptionSocket. Whentrue:_forwardAsSuggestedTranscriptSet
skipAudioToSecondary: trueunconditionally in_createCompositeServicesince the composite path is only reached when a custom STT config is active.Behaviour After Fix
Changes
2 files, 19 lines added, 1 line changed.
composite_transcription_socket.dartskipAudioToSecondaryfield + conditional insend()transcription_service.dartskipAudioToSecondary: truein_createCompositeService