Skip to content

feat(desktop): listening-state toggle with persistence and global hotkey (#6649)#7199

Open
mvanhorn wants to merge 3 commits intoBasedHardware:mainfrom
mvanhorn:feat/6649-desktop-listening-toggle
Open

feat(desktop): listening-state toggle with persistence and global hotkey (#6649)#7199
mvanhorn wants to merge 3 commits intoBasedHardware:mainfrom
mvanhorn:feat/6649-desktop-listening-toggle

Conversation

@mvanhorn
Copy link
Copy Markdown
Contributor

@mvanhorn mvanhorn commented May 6, 2026

Summary

Adds a one-tap pause/resume toggle for conversation listening on the desktop app, persisted across restarts and bound to a rebindable global hotkey (default ⌘⇧L).

Why this matters

Issue #6649 asked for granular control over when the app is actively listening, with an obvious indicator and a hotkey. Today users have to power off the device or quit the app to stop conversation capture. This adds three discoverable surfaces (sidebar pill, floating control bar, hotkey) so a user can pause sensitive conversations without losing the device session.

Demo

Simulated demo (Remotion) of the toggle states. The actual desktop app build is gated by Apple Developer ID signing identity and SwiftPM dep clone failures in my workspace, so this is a programmatic mock against the real Omi color palette and layout, not a live capture.

demo

Changes

  • AppState.isConversationListening (@Published + @AppStorage("omi.listening.enabled")), toggleListening(source:), setListening(_:source:). Default on first launch is true (preserves existing behavior).
  • Pause gracefully stops the in-flight transcription via the existing stopTranscription() path. Microphone and BLE remain open. A guard in startTranscription() declines to start while paused, so auto-start triggers (BLE reconnect, user action) are no-ops until the user resumes.
  • New global shortcut Toggle Listening (default ⌘⇧L), rebindable via Shortcuts settings. GlobalShortcutManager registers the hotkey and posts Notification.Name.toggleListeningShortcutPressed; AppState listens and calls toggleListening(source: "hotkey").
  • Sidebar listening pill (ear.fill / ear.slash.fill, success/warning colors), floating control bar status pill, and tap-to-toggle dot on the compact bar.
  • AnalyticsManager.listeningToggled emits a Sentry breadcrumb (category: "listening") and a PostHog listening_toggled event with state and source properties.
  • Push-to-talk (PushToTalkManager) is independent and continues to work while listening is paused.
  • Added desktop/Desktop/Tests/AppStateListeningTests.swift covering toggle + persistence across AppState reloads.

Note

Touches FloatingControlBarWindow.swift, which my open #6770 also modifies. The change here is additive (new init parameter); rebase from whichever lands first.

Fixes #6649

mvanhorn added 2 commits May 6, 2026 11:43
…key (BasedHardware#6649)

Adds a one-tap toggle to pause/resume conversation listening on the
desktop app, with an obvious state indicator, persistence across
restarts, and a user-rebindable global hotkey.

- AppState publishes isConversationListening, persisted via @AppStorage
- Audio forwarding gates on the flag at the AppState callback for both
  the BLE conversation handler (BleAudioService) and the local mic
  capture (AudioCaptureService) - mic device and BLE socket stay open
- Push-to-talk continues to work while listening is paused
- Sidebar pill button toggles state (ear.fill / ear.slash.fill, green/orange)
- Floating control bar gains a status button + tap-to-toggle compact dot
- New global shortcut Toggle Listening (default Cmd-Shift-L), rebindable
  via Shortcuts settings, posts toggleListeningShortcutPressed
- AnalyticsManager.listeningToggled emits Sentry breadcrumb +
  PostHog listening_toggled event with state and source

Default on first launch: on (preserves existing behavior).

Closes BasedHardware#6649
When listening is paused, gracefully end the in-flight transcription
session via stopTranscription() instead of silently dropping frames.
The /v4/listen backend times out sessions after about 90 seconds
without client activity, so a long pause would otherwise close the
WebSocket while AppState still treated the same recording as active.

Adds a guard at startTranscription() so auto-start triggers (BLE
reconnect, user action) decline to start while paused. Frame-drop
guards at the audio callbacks stay in place as defensive belts.

Addresses self-review P1.
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 6, 2026

Greptile Summary

This PR adds a listening-state toggle for the desktop app: a @Published / @AppStorage-backed isConversationListening flag on AppState, a rebindable global hotkey (⌘⇧L), and three UI surfaces (sidebar pill, floating bar status pill, compact bar dot) that let users pause/resume conversation capture without disconnecting the device or quitting.

  • Pause path correctly calls stopTranscription() and guards both the audioMixer callback and the new BLE conversationAudioHandler closure so no audio reaches the transcription service while paused.
  • Resume path only sets the isConversationListening flag — it does not call startTranscription(). After toggling back to "Listening," the UI shows the active state but recording has not restarted; the user must press play manually.
  • Both audio-delivery closures read isConversationListening from background threads on a @MainActor-isolated class, which is an unsound cross-actor access.

Confidence Score: 3/5

The pause direction works correctly, but resuming listening leaves the transcription pipeline stopped while the UI reports Listening, so users silently lose recordings after every pause/resume cycle.

The resume path in setListening sets the flag but never calls startTranscription(), meaning the device appears active to the user but captures nothing. Both audio-delivery closures also read a @MainActor-isolated property from background audio threads without actor hopping. The rest of the change — shortcut wiring, persistence, analytics, UI surfaces — is well-structured and follows existing patterns.

AppState.swift — specifically setListening (resume branch) and the two audio-callback closures that bypass main-actor isolation.

Important Files Changed

Filename Overview
desktop/Desktop/Sources/AppState.swift Adds listening toggle logic and persistence; resume path does not restart transcription (UI shows Listening but recording is stopped); audio-thread closures read @MainActor-isolated property without isolation.
desktop/Desktop/Sources/Audio/BleAudioService.swift Adds conversationAudioHandler parameter to pipe BLE PCM audio through a guarded closure instead of directly to transcriptionService; cleanup on stopProcessing is correct.
desktop/Desktop/Sources/FloatingControlBar/GlobalShortcutManager.swift Adds toggleListening hot-key registration (ID 3) and re-registration observer; follows the existing Ask Omi pattern correctly.
desktop/Desktop/Sources/FloatingControlBar/ShortcutSettings.swift Adds toggleListeningShortcut and toggleListeningEnabled stored properties with correct UserDefaults persistence and change-notification plumbing.
desktop/Desktop/Sources/FloatingControlBar/FloatingControlBarView.swift Adds listening status pill and toggleable compact dot using appState injection; visually communicates state, no logic issues found.
desktop/Desktop/Sources/FloatingControlBar/FloatingControlBarWindow.swift Adds appState parameter to init and threads it through to FloatingControlBarView; straightforward plumbing change.
desktop/Desktop/Sources/MainWindow/SidebarView.swift Adds listening toggle pill to the expanded sidebar using OmiColors.success/warning; reads correctly from appState.isConversationListening.
desktop/Desktop/Sources/MainWindow/Pages/ShortcutsSettingsSection.swift Adds toggleListeningKeyCard to the shortcuts settings page, following the exact same preset/custom/disable pattern used by Ask Omi.
desktop/Desktop/Sources/AnalyticsManager.swift Adds listeningToggled() that emits a Sentry breadcrumb and a PostHog event; correctly mirrors the pattern used by other analytics methods.
desktop/Desktop/Tests/AppStateListeningTests.swift Covers toggle + persistence across AppState reloads; does not test the resume-restart behavior or the data-race path.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    U([User action: UI / hotkey / sidebar]) --> TL[toggleListening]
    TL --> SL[setListening on/off]

    SL -->|on = false| ST[stopTranscription]
    ST --> SAC[stopAudioCapture + clearTranscriptionState]
    SAC --> FLAG_OFF[isConversationListening = false\npersisted to UserDefaults]

    SL -->|on = true| FLAG_ON[isConversationListening = true\npersisted to UserDefaults]
    FLAG_ON -. missing call .-> NOSTART[startTranscription NOT called - recording stays stopped]

    FLAG_OFF --> MIC_GUARD[audioMixer callback - guard isConversationListening from background thread]
    FLAG_OFF --> BLE_GUARD[conversationAudioHandler - guard isConversationListening from background thread]

    MIC_GUARD -->|blocked| DROP1[audio dropped]
    BLE_GUARD -->|blocked| DROP2[audio dropped]

    GSM[GlobalShortcutManager Carbon hotkey] --> NC[NotificationCenter toggleListeningShortcutPressed]
    NC --> TL
Loading

Reviews (1): Last reviewed commit: "fix(desktop): stop transcription on paus..." | Re-trigger Greptile

Comment on lines +1244 to +1258
func setListening(_ on: Bool, source: String = "ui") {
let previous = isConversationListening
guard previous != on else { return }

if !on && isTranscribing {
stopTranscription()
}

isConversationListening = on
persistedConversationListening = on
UserDefaults.standard.set(on, forKey: Self.conversationListeningDefaultsKey)

AnalyticsManager.shared.listeningToggled(isListening: on, source: source)
log("listening: \(previous ? "on" : "off") -> \(on ? "on" : "off") (source=\(source))")
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Resume doesn't restart transcription

setListening(false) calls stopTranscription(), which calls stopAudioCapture() and clearTranscriptionState() — fully tearing down the pipeline and setting isTranscribing = false. But setListening(true) only flips the flag; it never calls startTranscription(). After the user resumes, isConversationListening is true and the sidebar shows "Listening," yet no recording is happening. The user must manually press play to restart, which silently contradicts the visual affordance of the toggle.

Comment on lines 1465 to 1470
// Start the mixer — it sums mic + system into a mono stream and forwards it to
// the transcription WebSocket.
audioMixer?.start { [weak self] monoMixed in
guard self?.isConversationListening == true else { return }
self?.transcriptionService?.sendAudio(monoMixed)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Cross-actor read of @MainActor-isolated property from audio thread

AppState is @MainActor-isolated, so isConversationListening must only be accessed from the main thread. The closure passed to audioMixer?.start runs on whichever background thread the audio subsystem uses, making self?.isConversationListening a cross-actor access. A matching problem exists in the conversationAudioHandler closure at the BLE path (line 1566). Under Swift 6 strict concurrency this is a compile error; under Swift 5.x it is a runtime data race that can yield stale reads. Dispatch the check to the main actor, or promote both closures to @MainActor to keep the isolation correct.

Comment thread desktop/Desktop/Sources/AppState.swift Outdated
Comment on lines +1252 to +1254
isConversationListening = on
persistedConversationListening = on
UserDefaults.standard.set(on, forKey: Self.conversationListeningDefaultsKey)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Triple-write to the same UserDefaults key

@AppStorage already synchronises persistedConversationListening to UserDefaults automatically. The explicit UserDefaults.standard.set(on, forKey:) call on the line below is redundant and could mislead future readers into thinking @AppStorage doesn't update the store itself.

Suggested change
isConversationListening = on
persistedConversationListening = on
UserDefaults.standard.set(on, forKey: Self.conversationListeningDefaultsKey)
isConversationListening = on
persistedConversationListening = on

…or isolation, UserDefaults)

- P1 (resume doesn't restart): setListening(true) now calls startTranscription()
  when not already transcribing. The existing startTranscription guards
  (isConversationListening + device/mic) keep the no-op cases safe.
- P1 (cross-actor read on audio thread): added a nonisolated NSLock-guarded
  conversationListeningSnapshot, updated from setListening on the MainActor.
  Both the audio mixer closure (~line 1483) and the BLE conversationAudioHandler
  (~line 1582) now read snapshotIsConversationListening() instead of the
  @MainActor-isolated isConversationListening, removing the runtime data race
  under Swift 5 and the strict-concurrency error under Swift 6 without paying a
  main-actor hop per audio chunk.
- P2 (triple-write to UserDefaults): dropped the explicit
  UserDefaults.standard.set call; @AppStorage already syncs
  persistedConversationListening, and the new helper updates the in-memory
  snapshot only.
@mvanhorn
Copy link
Copy Markdown
Contributor Author

mvanhorn commented May 7, 2026

Addressed in 7609650:

  • P1 (resume doesn't restart): setListening(true) now calls startTranscription() when not already transcribing; startTranscription's existing guards keep no-permission / no-device cases safe.
  • P1 (cross-actor read): added a nonisolated NSLock-guarded conversationListeningSnapshot mirror, updated from MainActor in setListening. Both audio closures (mixer at ~1483, BLE conversationAudioHandler at ~1582) now read snapshotIsConversationListening() instead of the @MainActor-isolated property. Removes the runtime race under Swift 5 and the strict-concurrency error under Swift 6 without paying a MainActor hop per audio chunk.
  • P2 (triple-write to UserDefaults): dropped the explicit UserDefaults.standard.set call; @AppStorage is the sole writer of persistedConversationListening. The new setConversationListeningSnapshot only touches the in-memory mirror.

Verified locally: swiftc -parse passes on AppState.swift. CI should run the full Lint & Format Check on push; full Xcode build wasn't reproducible in my local environment (no Xcode toolchain matching the workflow's Flutter image) so I'm relying on CI for the link/build verdict.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature request: quick toggle + granular control for “listening” state

1 participant