Skip to content

Add experimental floating bar voice answers#6244

Merged
kodjima33 merged 1 commit into
mainfrom
nik/floating-bar-voice-test
Apr 1, 2026
Merged

Add experimental floating bar voice answers#6244
kodjima33 merged 1 commit into
mainfrom
nik/floating-bar-voice-test

Conversation

@kodjima33
Copy link
Copy Markdown
Collaborator

Summary

  • add experimental spoken floating-bar replies with ElevenLabs and macOS fallback
  • move the Voice Answers toggle to Advanced > Developer API Keys
  • sync floating-bar voice settings through assistant settings so dev builds can reconnect automatically

Verification

  • swift build
  • cargo check
  • launched Omi Dev locally
  • copied build to Mac mini and launched there

@kodjima33 kodjima33 merged commit 097b4c9 into main Apr 1, 2026
3 checks passed
@kodjima33 kodjima33 deleted the nik/floating-bar-voice-test branch April 1, 2026 06:17
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 1, 2026

Greptile Summary

This PR adds an experimental, dev-build-only "Voice Answers" feature to the floating bar: AI responses are spoken aloud using ElevenLabs TTS (with a macOS AVSpeechSynthesizer fallback), and the ElevenLabs API key / voice ID / toggle state are synced through the backend so dev builds reconnect automatically. The Rust backend gains a new floating_bar sub-map in assistant_settings with length-validated key storage, and the Swift client gains a new FloatingBarVoicePlaybackService singleton and Settings UI wired into SettingsSyncManager.

Key findings:

  • P1 — API key erasure on fresh install (SettingsSyncManager.swift:165-166): the full-snapshot builder uses ?? \"\" when no ElevenLabs key exists locally, sending an explicit empty string to Firestore. The backend's Option::or() merge prefers new values over existing ones, so a first-launch sync from a fresh device silently overwrites keys saved by other devices. Removing the ?? \"\" (leaving the field nil) is the fix.
  • P1 — Force-unwrap crash on malformed voice ID (FloatingBarVoicePlaybackService.swift:96): the voice ID is interpolated into a URL path and then force-unwrapped with !; a value with a space or other invalid URL character causes an immediate crash. Percent-encoding the path component and using a guard let would resolve this.
  • P2 — Markdown characters read aloud (FloatingBarVoicePlaybackService.swift:146): cleanedPlaybackText collapses whitespace but leaves **bold**, *italic*, headers, and inline code intact, causing TTS to read asterisks and hash characters literally.
  • P2 — No AVAudioPlayerDelegate for natural completion (FloatingBarVoicePlaybackService.swift:64): audio data and the player object are retained in memory until the next explicit stop() call because no delegate is set to nil the reference when playback finishes naturally.
  • P2 — Network write on every keystroke (SettingsPage.swift:573): the ElevenLabs key and voice ID bindings call pushPartialUpdate synchronously on each character, generating one backend write per keypress.

Confidence Score: 3/5

Not safe to merge as-is — full-snapshot sync can silently erase ElevenLabs API keys stored from other devices, and a malformed voice ID will crash the app.

Two P1 issues: (1) the ?? "" fallback in buildCurrentSnapshot causes empty-string overwrites of valid backend API keys on any fresh install, and (2) the unguarded URL force-unwrap with a non-percent-encoded voice ID path is a crash vector. Both are straightforward one-line fixes but need to land before this ships, even in dev builds.

SettingsSyncManager.swift (lines 165-166) and FloatingBarVoicePlaybackService.swift (line 96) need fixes before merge.

Important Files Changed

Filename Overview
desktop/Desktop/Sources/ProactiveAssistants/Services/SettingsSyncManager.swift Adds floating bar voice settings to full-snapshot sync; uses ?? "" fallback that clobbers valid backend API keys from other devices on fresh installs (P1)
desktop/Desktop/Sources/FloatingControlBar/FloatingBarVoicePlaybackService.swift New TTS service integrating ElevenLabs and macOS fallback; has a force-unwrap crash risk from un-encoded voiceID in URL, unstripped markdown in TTS text, and no AVAudioPlayerDelegate for cleanup
desktop/Backend-Rust/src/services/firestore.rs Adds Firestore read/write for floating_bar sub-map; merge logic using Option::or() is correct for partial updates
desktop/Backend-Rust/src/routes/users.rs Adds length validation for ElevenLabs API key (≤512) and voice ID (≤128) before persistence; does not validate path-segment character safety
desktop/Desktop/Sources/MainWindow/Pages/SettingsPage.swift Adds ElevenLabs key/voice ID fields and voice-answers toggle to Advanced > Developer Keys; synced bindings fire a backend write on every keystroke without debounce
desktop/Desktop/Sources/FloatingControlBar/FloatingControlBarWindow.swift Integrates voice playback service: stops on new query start and triggers playback after AI response is complete
desktop/Desktop/Sources/FloatingControlBar/ShortcutSettings.swift Adds floatingBarVoiceAnswersEnabled persisted toggle; stops playback when toggled off
desktop/Desktop/Sources/FloatingControlBar/PushToTalkManager.swift Stops voice playback when PTT listening starts or enters locked-listening state — correct interrupt behaviour
desktop/Desktop/Sources/APIClient.swift Adds FloatingBarSettingsResponse Codable struct and wires it into AssistantSettingsResponse; coding keys correctly map snake_case fields

Sequence Diagram

sequenceDiagram
    participant User
    participant SettingsPage
    participant SettingsSyncManager
    participant BackendRust as Backend (Rust)
    participant Firestore
    participant FloatingBar as FloatingControlBarManager
    participant VoiceService as FloatingBarVoicePlaybackService
    participant ElevenLabs

    User->>SettingsPage: Toggles Voice Answers / enters API key
    SettingsPage->>SettingsSyncManager: pushPartialUpdate(floatingBar)
    SettingsSyncManager->>BackendRust: PATCH /assistant-settings
    BackendRust->>Firestore: Merge floating_bar sub-map

    Note over SettingsSyncManager,Firestore: On app launch / reconnect
    BackendRust->>SettingsSyncManager: GET /assistant-settings
    SettingsSyncManager->>ShortcutSettings: floatingBarVoiceAnswersEnabled = v
    SettingsSyncManager->>UserDefaults: set elevenlabs_api_key, voice_id

    User->>FloatingBar: Ask Omi shortcut
    FloatingBar->>VoiceService: stop() (interrupt previous)
    FloatingBar->>BackendRust: AI query
    BackendRust-->>FloatingBar: AI response
    FloatingBar->>VoiceService: playResponseIfEnabled(message)
    alt ElevenLabs key present
        VoiceService->>ElevenLabs: POST /v1/text-to-speech/{voiceID}
        ElevenLabs-->>VoiceService: MP3 audio data
        VoiceService->>VoiceService: AVAudioPlayer.play()
    else No key / API error
        VoiceService->>VoiceService: AVSpeechSynthesizer fallback
    end

    User->>FloatingBar: New PTT / Ask Omi
    FloatingBar->>VoiceService: stop() (interrupt)
Loading

Comments Outside Diff (1)

  1. desktop/Desktop/Sources/MainWindow/Pages/SettingsPage.swift, line 573-584 (link)

    P2 pushPartialUpdate fires on every keystroke (no debounce)

    syncedElevenLabsKeyBinding and syncedElevenLabsVoiceIDBinding call SettingsSyncManager.shared.pushPartialUpdate(...) synchronously inside the binding's set closure, which fires on each character typed. This sends a network request (and Firestore write) for every single keypress.

    Consider adding a debounce mechanism (e.g. a DispatchWorkItem with a ~0.5 s delay, cancelling the previous one on each keystroke) to batch rapid edits into a single backend write.

Reviews (1): Last reviewed commit: "Add experimental floating bar voice answ..." | Re-trigger Greptile

Comment on lines +165 to +166
elevenLabsApiKey: UserDefaults.standard.string(forKey: FloatingBarVoicePlaybackService.devAPIKeyDefaultsKey) ?? "",
elevenLabsVoiceID: UserDefaults.standard.string(forKey: FloatingBarVoicePlaybackService.devVoiceIDDefaultsKey) ?? ""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 ?? "" clobbers backend API key on fresh installs

UserDefaults.standard.string(forKey:) returns nil when the key has never been set on this device. The ?? "" fallback converts that nil to an empty string "", which means a full-snapshot push from a freshly installed build sends Some("") for both elevenLabsApiKey and elevenLabsVoiceID.

On the backend in firestore.rs, Option::or() selects the new value when it is Some(...), so Some("") takes precedence over the existing Some("valid-key") already stored in Firestore. The result is that keys stored by another device are silently erased the first time a fresh install does a full push.

Since FloatingBarSettingsResponse.elevenLabsApiKey is already Optional<String>, passing nil is the correct "no value here" signal — the backend merge logic will then leave the Firestore value intact. Remove the ?? "" fallback from both lines.

}

private nonisolated static func synthesizeSpeech(text: String, apiKey: String, voiceID: String) async throws -> Data {
var request = URLRequest(url: URL(string: "https://api.elevenlabs.io/v1/text-to-speech/\(voiceID)")!)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Force-unwrap crash if voiceID makes URL invalid

voiceID is interpolated directly into the URL string without percent-encoding, then force-unwrapped with !. If the stored voice ID contains a space, newline, or any character that makes URL(string:) return nil (e.g. a value pasted with trailing whitespace that survived the trim, or a garbage value synced from the backend), the process will crash on this line.

The backend validates length (≤ 128 chars) but does not validate that the value is a well-formed path segment.

Suggested change
var request = URLRequest(url: URL(string: "https://api.elevenlabs.io/v1/text-to-speech/\(voiceID)")!)
guard let url = URL(string: "https://api.elevenlabs.io/v1/text-to-speech/\(voiceID.addingPercentEncoding(withAllowedCharacters: .urlPathAllowed) ?? voiceID)") else {
throw FloatingBarVoicePlaybackError.invalidResponse
}
var request = URLRequest(url: url)

Comment on lines +146 to +147
let collapsedWhitespace = baseText.replacingOccurrences(of: "\\s+", with: " ", options: .regularExpression)
return collapsedWhitespace.trimmingCharacters(in: .whitespacesAndNewlines)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Markdown formatting characters are read aloud by TTS

cleanedPlaybackText collapses whitespace but does not strip markdown syntax characters. AI responses commonly contain **bold**, *italic*, `code`, ### headers, - bullets, and [links](url). When passed to either ElevenLabs or AVSpeechSynthesizer, these will be spoken literally (e.g. "asterisk asterisk important asterisk asterisk"), which degrades TTS quality significantly.

Consider a basic markdown-stripping pass before returning the text.

Comment on lines +64 to +73
private func startPlayback(_ data: Data) {
do {
let player = try AVAudioPlayer(data: data)
player.prepareToPlay()
player.play()
audioPlayer = player
} catch {
log("FloatingBarVoicePlaybackService: could not start audio playback: \(error.localizedDescription)")
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 AVAudioPlayer not released after natural playback completion

audioPlayer is set in startPlayback and only cleared in stop() (called on new query, toggle-off, or cancel). When audio finishes playing on its own, no cleanup fires: audioPlayer and the decoded MP3 buffer stay in memory until the next interaction.

Set self as the player's delegate and nil the reference in audioPlayerDidFinishPlaying(_:successfully:) to release the buffer promptly.

Glucksberg pushed a commit to Glucksberg/omi-local that referenced this pull request Apr 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant