Add experimental floating bar voice answers#6244
Conversation
Greptile SummaryThis PR adds an experimental, dev-build-only "Voice Answers" feature to the floating bar: AI responses are spoken aloud using ElevenLabs TTS (with a macOS Key findings:
Confidence Score: 3/5Not safe to merge as-is — full-snapshot sync can silently erase ElevenLabs API keys stored from other devices, and a malformed voice ID will crash the app. Two P1 issues: (1) the
Important Files Changed
Sequence DiagramsequenceDiagram
participant User
participant SettingsPage
participant SettingsSyncManager
participant BackendRust as Backend (Rust)
participant Firestore
participant FloatingBar as FloatingControlBarManager
participant VoiceService as FloatingBarVoicePlaybackService
participant ElevenLabs
User->>SettingsPage: Toggles Voice Answers / enters API key
SettingsPage->>SettingsSyncManager: pushPartialUpdate(floatingBar)
SettingsSyncManager->>BackendRust: PATCH /assistant-settings
BackendRust->>Firestore: Merge floating_bar sub-map
Note over SettingsSyncManager,Firestore: On app launch / reconnect
BackendRust->>SettingsSyncManager: GET /assistant-settings
SettingsSyncManager->>ShortcutSettings: floatingBarVoiceAnswersEnabled = v
SettingsSyncManager->>UserDefaults: set elevenlabs_api_key, voice_id
User->>FloatingBar: Ask Omi shortcut
FloatingBar->>VoiceService: stop() (interrupt previous)
FloatingBar->>BackendRust: AI query
BackendRust-->>FloatingBar: AI response
FloatingBar->>VoiceService: playResponseIfEnabled(message)
alt ElevenLabs key present
VoiceService->>ElevenLabs: POST /v1/text-to-speech/{voiceID}
ElevenLabs-->>VoiceService: MP3 audio data
VoiceService->>VoiceService: AVAudioPlayer.play()
else No key / API error
VoiceService->>VoiceService: AVSpeechSynthesizer fallback
end
User->>FloatingBar: New PTT / Ask Omi
FloatingBar->>VoiceService: stop() (interrupt)
|
| elevenLabsApiKey: UserDefaults.standard.string(forKey: FloatingBarVoicePlaybackService.devAPIKeyDefaultsKey) ?? "", | ||
| elevenLabsVoiceID: UserDefaults.standard.string(forKey: FloatingBarVoicePlaybackService.devVoiceIDDefaultsKey) ?? "" |
There was a problem hiding this comment.
?? "" clobbers backend API key on fresh installs
UserDefaults.standard.string(forKey:) returns nil when the key has never been set on this device. The ?? "" fallback converts that nil to an empty string "", which means a full-snapshot push from a freshly installed build sends Some("") for both elevenLabsApiKey and elevenLabsVoiceID.
On the backend in firestore.rs, Option::or() selects the new value when it is Some(...), so Some("") takes precedence over the existing Some("valid-key") already stored in Firestore. The result is that keys stored by another device are silently erased the first time a fresh install does a full push.
Since FloatingBarSettingsResponse.elevenLabsApiKey is already Optional<String>, passing nil is the correct "no value here" signal — the backend merge logic will then leave the Firestore value intact. Remove the ?? "" fallback from both lines.
| } | ||
|
|
||
| private nonisolated static func synthesizeSpeech(text: String, apiKey: String, voiceID: String) async throws -> Data { | ||
| var request = URLRequest(url: URL(string: "https://api.elevenlabs.io/v1/text-to-speech/\(voiceID)")!) |
There was a problem hiding this comment.
Force-unwrap crash if
voiceID makes URL invalid
voiceID is interpolated directly into the URL string without percent-encoding, then force-unwrapped with !. If the stored voice ID contains a space, newline, or any character that makes URL(string:) return nil (e.g. a value pasted with trailing whitespace that survived the trim, or a garbage value synced from the backend), the process will crash on this line.
The backend validates length (≤ 128 chars) but does not validate that the value is a well-formed path segment.
| var request = URLRequest(url: URL(string: "https://api.elevenlabs.io/v1/text-to-speech/\(voiceID)")!) | |
| guard let url = URL(string: "https://api.elevenlabs.io/v1/text-to-speech/\(voiceID.addingPercentEncoding(withAllowedCharacters: .urlPathAllowed) ?? voiceID)") else { | |
| throw FloatingBarVoicePlaybackError.invalidResponse | |
| } | |
| var request = URLRequest(url: url) |
| let collapsedWhitespace = baseText.replacingOccurrences(of: "\\s+", with: " ", options: .regularExpression) | ||
| return collapsedWhitespace.trimmingCharacters(in: .whitespacesAndNewlines) |
There was a problem hiding this comment.
Markdown formatting characters are read aloud by TTS
cleanedPlaybackText collapses whitespace but does not strip markdown syntax characters. AI responses commonly contain **bold**, *italic*, `code`, ### headers, - bullets, and [links](url). When passed to either ElevenLabs or AVSpeechSynthesizer, these will be spoken literally (e.g. "asterisk asterisk important asterisk asterisk"), which degrades TTS quality significantly.
Consider a basic markdown-stripping pass before returning the text.
| private func startPlayback(_ data: Data) { | ||
| do { | ||
| let player = try AVAudioPlayer(data: data) | ||
| player.prepareToPlay() | ||
| player.play() | ||
| audioPlayer = player | ||
| } catch { | ||
| log("FloatingBarVoicePlaybackService: could not start audio playback: \(error.localizedDescription)") | ||
| } | ||
| } |
There was a problem hiding this comment.
AVAudioPlayer not released after natural playback completion
audioPlayer is set in startPlayback and only cleared in stop() (called on new query, toggle-off, or cancel). When audio finishes playing on its own, no cleanup fires: audioPlayer and the decoded MP3 buffer stay in memory until the next interaction.
Set self as the player's delegate and nil the reference in audioPlayerDidFinishPlaying(_:successfully:) to release the buffer promptly.
Summary
Verification