fix(coordinator): marshal capsule show/hide to main thread [高] by appergb · Pull Request #389 · Open-Less/openless

appergb · 2026-05-09T17:32:40Z

User description

Summary

`emit_capsule` runs window state changes (`show()` / `hide()` / position helpers) inline on whichever thread invoked it. The level handler in `coordinator/dictation.rs` invokes `emit_capsule` at ~30 Hz from inside cpal's `process_callback` — i.e. the audio callback thread. NSWindow / HWND APIs expect the main thread; calling them from the audio thread risks:

macOS: `dispatch_assert_queue_fail` SIGTRAP if the AppKit runloop assertion fires
Windows: `SendMessage` deadlock against the GUI thread
Either platform: the audio callback misses its deadline and triggers `kAudioUnitErr_TooManyFramesToProcess` (macOS) / dropped frames (Windows)

The 30 Hz throttle bounds frequency but doesn't change per-call risk.

Change

Wrap the window-touching half of `emit_capsule` in `app.run_on_main_thread`:

`window.show` / `window.hide`
`maybe_position_capsule_bottom_center`
`show_capsule_window_no_activate`
`restore_main_window_key_if_active` (macOS-only)

`app.emit_to` stays inline since Tauri's event bus is thread-safe. Closure captures `Arc` clone + `AppHandle` clone + Copy primitives (`translation`, `show_capsule`, `visible`) — no `CapsulePayload` clone needed.

The pattern is already established in this file — see `stop_qa_hotkey_listener` (line 326) for the same reasoning around Carbon hotkey drop running off the AppKit thread.

Why this is safe

emit_to and run_on_main_thread are independent — the JS frontend processes the event asynchronously regardless, so the existing happens-before between "frontend hears event" and "OS window visible" is preserved (both were already async from JS's perspective).
run_on_main_thread is fire-and-forget; emit_capsule's existing semantics (returns immediately, no result) are unchanged.
Window state changes for begin/end/cancel paths still run — just slightly later (one main-runloop turn). At human time scales (<16 ms) this is invisible.

Audit linkage

Audit ID 3.2.2 (CONFIRMED 高). See `docs/audit-2026-05-10-validated.md` (local).

Test plan

`cargo test --lib` — 31/31 pass.
`cargo check --lib` clean.
CI build — to be verified.
Manual functional verification: dictate for ≥30 s on macOS while watching for SIGTRAP / Console.app crash logs; verify capsule still appears at session start. To be done after merge.

PR Type

Bug fix, Enhancement

Description

Marshal window ops to main thread to avoid audio-thread crash
Read show_capsule pref inside main-thread closure
Capture visible/translation at call site for frame consistency
Keep emit_to synchronous on calling thread

Diagram Walkthrough

flowchart LR
  A["emit_capsule (audio thread)"] --> B["compute visible, capture translation"]
  B --> C["run_on_main_thread closure"]
  C --> D["read latest show_capsule pref"]
  D --> E["maybe_position, show/hide capsule window"]
  B --> F["app.emit_to (inline, thread-safe)"]

File Walkthrough

Relevant files

Bug fix

coordinator.rs `Thread-safe capsule window ops and fresh preference reading` openless-all/app/src-tauri/src/coordinator.rs Wraps capsule window show/hide/positioning calls in `app.run_on_main_thread` to prevent main-thread-only API violations from the audio callback thread. Moves the `show_capsule` preference read inside the main-thread closure so the latest user setting is always used, eliminating stale-pref flashes. Computes `visible` and loads `translation` before spawning the closure, aligning window state with the actual capsule payload for the current frame. Preserves the synchronous `app.emit_to` call on the calling thread (Tauri event bus is thread-safe), maintaining the existing happens‑before order between event delivery and window visibility.	+26/-8

emit_capsule does two things — (a) push a CapsulePayload event to the JS frontend via app.emit_to, and (b) flip the OS-level window visibility via window.show/hide and reposition it via maybe_position_capsule_*. Path (a) is safe from any thread (Tauri event bus marshals internally). Path (b) calls NSWindow / HWND APIs that expect the main thread — but emit_capsule is invoked at ~30 Hz from the cpal process_callback (audio thread) via the level_handler in coordinator/dictation.rs. On macOS this risks dispatch_assert_queue_fail SIGTRAP if the AppKit runloop assertion fires; on Windows SendMessage can deadlock against the GUI thread; in either case the audio callback can miss its deadline and trigger kAudioUnitErr_TooManyFramesToProcess on macOS. The 30 Hz throttle limits frequency but does not change per-call risk. Fix: wrap window.show / window.hide / maybe_position_capsule_bottom_center / show_capsule_window_no_activate / restore_main_window_key_if_active in app.run_on_main_thread. The closure captures an Arc<Inner> clone + AppHandle clone + Copy primitives (state, translation, show_capsule, visible) — no payload clone needed. emit_to stays on the calling thread since it is internally thread-safe, preserving the existing happens- before relationship between event payload and window state on the JS frontend (frontend processes events asynchronously regardless). This pattern is already established in the codebase — see coordinator.rs::stop_qa_hotkey_listener (line 326) for the same reasoning around Carbon hotkey drop on the wrong thread (issue #169). Audit ID 3.2.2 (CONFIRMED 高). Test: 31 lib tests pass. Functional verification (no SIGTRAP on long recordings, capsule still appears at start of dictation) requires running the app — to be done after PR ships.

github-actions · 2026-05-09T17:47:47Z

PR Reviewer Guide 🔍

(Review updated until commit `e458605`)

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ No major issues detected

有问题...?

Addresses pr_agent "Stale State" review on PR #389. show_capsule was read on the audio callback thread and snapshotted into the run_on_main_thread closure. If the user toggled the preference between the audio-thread emit and the main-thread closure firing (one or two frames later), the queued window show/hide could apply the old preference and briefly flash the wrong UI state. Move that read inside the closure so the latest user pref always wins. Why visible / translation stay snapshotted: they're the payload of THIS specific capsule:state event — the whole point is to tell the JS frontend "this is the state at the moment emit_capsule was called". Re-reading them inside the closure would mean the OS-level window state and the JS-side payload disagree (the closure lands one frame later than the emit_to that already shipped). No behavior change in the steady state; eliminates a 16–33 ms stale- pref window when the user changes show_capsule mid-dictation.

appergb · 2026-05-10T00:09:29Z

pr_agent 反馈处理

"Stale State" — 已修

84ee3d9：把 show_capsule 的 prefs 读取挪进 run_on_main_thread 闭包内，消除「用户改 show_capsule 偏好后下一帧才生效」的 16-33ms 窗口。visible / translation 仍在 call-site 计算 —— 它们是这一帧 capsule:state event 的 payload 内容，必须跟下发给 JS 的 payload 同步，不能闭包跑时再读。

"Ticket compliance ❌ #169" — 误报

issue #169 在 commit 7ac0421a (closes #169，2026-05-02) 已修。当前 beta 状态：

coordinator.rs:326-340 stop_qa_hotkey_listener 已经 wrap app.run_on_main_thread：

pub fn stop_qa_hotkey_listener(&self) {
    let app = self.inner.app.lock().clone();
    if let Some(app) = app {
        let inner = Arc::clone(&self.inner);
        let _ = app.run_on_main_thread(move || {
            inner.qa_hotkey.lock().take();
        });
    } else {
        self.inner.qa_hotkey.lock().take();
    }
}

lib.rs:347-355 的 RunEvent::Exit 调链已经把六个 stop_*_hotkey_listener 串好：stop_hotkey_listener / stop_qa_hotkey_listener / stop_combo_hotkey_listener / stop_translation_hotkey_listener / stop_switch_style_hotkey_listener / stop_open_app_hotkey_listener。
后四个兄弟函数走 take_*_on_main_thread helpers，每个都用 app.run_on_main_thread marshal Carbon teardown。

PR #389 是 audit 3.2.2（emit_capsule 在音频回调线程上动 NSWindow），跟 issue #169（Carbon RemoveEventHotKey 在 RunEvent::Exit 非主线程上 drop）是两个独立的事；PR diff 不动 hotkey teardown 路径是设计如此，不是合规缺失。

github-actions · 2026-05-10T00:19:43Z

Persistent review updated to latest commit 84ee3d9

github-actions · 2026-05-10T00:40:12Z

Persistent review updated to latest commit e458605

10 PRs landed on beta this cycle: - Open-Less#377 paste shortcut configurable (issue Open-Less#360) - Open-Less#386 TS UserPreferences updateChannel alignment - Open-Less#387 focus_target leak on Processing-phase cancel - Open-Less#388 [严重] MacHotkeyAdapter::shutdown stops CFRunLoop + tap - Open-Less#389 emit_capsule window.show/hide off audio thread - Open-Less#390 QA / dictation hotkey routing race - Open-Less#391 audio-mute spawn_blocking (async hygiene) - Open-Less#392 hotkey supervisor + global dispatcher exit signal - Open-Less#393 post-audit logic-review hotfixes (QA mute .await + focus_target Processing branch) - Open-Less#394 in-process credentials cache (kills repeated Keychain prompts) Bump 4 files: package.json, tauri.conf.json, Cargo.toml, Cargo.lock.

Snapshot two audit artifacts produced 2026-05-10: - docs/audit-2026-05-10-validated.md — 21 audit items run against the live source tree at openless-all/app/src-tauri/src/ with a CONFIRMED / FALSE_POSITIVE / PARTIAL verdict per item plus the exact file:line range that supports each verdict. - docs/logic-review-2026-05-10.md — 5-path logic review of origin/beta @ 400097a covering startup, press/release, end-of- session, capsule emit, and shutdown. Notable: at 400097a the emit_capsule fix from PR #389 was NOT yet an ancestor, so the cpal audio-callback → AppKit/Win32 SIGTRAP risk was still live on that snapshot (subsequent merges have addressed it; this doc remains the dated record). Both files are pure snapshots — no code changes here.

* chore: ignore video-materials/ promo recordings video-materials/ holds ~2.2 GB of promo recordings (promo / promo-materials / promo-openless / promo-openless-v2 / SC). These are local working assets that don't belong in the repo. Add to .gitignore so future stash / commit operations can't accidentally pull the binary blob in. * feat(llm): Google Gemini provider with forced thinking-off + fix provider-switch credentials Add Google Gemini as a native LLM provider hitting v1beta :generateContent (single-shot) and :streamGenerateContent?alt=sse (QA chat) with x-goog-api-key auth. Per-model thinkingConfig is injected to keep thinking off for every supported ID — exact matrix verified against ai.google.dev/gemini-api/docs/thinking: gemini-2.5-flash, gemini-2.5-flash-lite thinkingBudget: 0 gemini-2.5-pro field omitted (Google: cannot disable) gemini-3*-pro* thinkingLevel: "low" gemini-3*-flash*, unknown future IDs thinkingLevel: "minimal" Why native, not the OpenAI-compat shim: shim is beta and for 3.x needs an extra_body workaround for thinking control; auth header and contents shape differ enough that reusing OpenAICompatibleLLMProvider would have been the worse compromise. fix(settings): switching LLM preset previously only auto-filled endpoint/model when those slots were empty. All providers share ark.endpoint / ark.model_id (per-provider entries were never introduced — see persistence.rs lookup_account), so for returning users with non-empty slots, switching the dropdown updated active provider but kept the OLD endpoint — polish silently kept calling the previous provider. Now switching to any non-custom preset overwrites endpoint/model to that preset's defaults. Trade-off: user-customized endpoints within a preset are dropped on switch; silent wrong-routing was a worse failure mode. Backend: - New module llm_gemini.rs (15 unit tests for thinking-config table, URL builder, response parser, SSE shape, history role mapping) - polish.rs extracts compose_polish_prompts / compose_translate_prompts / compose_qa_system_prompt as pub(crate) helpers shared by both LLM paths so prompt assembly can't drift - coordinator.rs's polish_text / translate_text / answer_chat_dispatch open with `if get_active_llm() == "gemini"` early-return; existing OpenAI-compat callsite stays untouched Frontend: - LLM_PRESETS gains gemini entry (baseUrl v1beta, model placeholder gemini-2.5-flash) - ProviderTools hidden when committedLlmProvider === 'gemini' (its "Fetch models" button assumes OpenAI /models shape; native /v1beta/models response shape doesn't match) - Static geminiModelHint lists the 7 available IDs so users pick correctly without auto-fetch - 5 locales (zh-CN, zh-TW, en, ja, ko) gain presets.gemini and geminiModelHint Test plan: - cargo test --lib 200/200 pass - npm run build (tsc + vite) green - Manual: switch to Gemini in Settings, paste Google AI Studio key, dictate; verify polish reaches v1beta and respects the per-model thinking-off contract. * docs: 2026-05-10 audit + beta logic review snapshots Snapshot two audit artifacts produced 2026-05-10: - docs/audit-2026-05-10-validated.md — 21 audit items run against the live source tree at openless-all/app/src-tauri/src/ with a CONFIRMED / FALSE_POSITIVE / PARTIAL verdict per item plus the exact file:line range that supports each verdict. - docs/logic-review-2026-05-10.md — 5-path logic review of origin/beta @ 400097a covering startup, press/release, end-of- session, capsule emit, and shutdown. Notable: at 400097a the emit_capsule fix from PR #389 was NOT yet an ancestor, so the cpal audio-callback → AppKit/Win32 SIGTRAP risk was still live on that snapshot (subsequent merges have addressed it; this doc remains the dated record). Both files are pure snapshots — no code changes here. * fix(llm-gemini): buffer SSE bytes across chunks to handle multibyte UTF-8 split PR #398 pr_agent flagged: reqwest::chunk() can split a multi-byte UTF-8 codepoint (CJK / emoji) across two network chunks, but the streaming code decoded each chunk independently with from_utf8 → half a 你 in the tail of one chunk would surface as "non-utf8 SSE chunk" and abort the whole stream. This breaks Gemini QA streaming for any Chinese-language answer that doesn't happen to align with packet boundaries. Fix: buffer raw bytes (Vec<u8>) until we see the SSE delimiter b"\n\n", and only then decode the complete event content as UTF-8. The delimiter \n\n is two ASCII bytes (0x0A 0x0A) which can never appear inside a multi-byte UTF-8 sequence, so byte-level splitting at that boundary is always safe. Extracted the parsing as drain_complete_sse_events(&mut Vec<u8>) so the invariant is testable without a TcpListener. Three new unit tests cover: - normal split-on-delimiter + remainder buffering - multibyte UTF-8 split across chunks (the actual regression) - invalid-UTF-8 event content gracefully skipped (warn, don't crash) The same byte-level pattern fix applies to polish.rs's chat_completion_history_streaming — left untouched here to keep this PR scoped to the Gemini diff pr_agent reviewed. polish.rs should get the same fix in a follow-up. * feat(llm-gemini): teach ProviderTools to fetch native Gemini /v1beta/models Drop the verbose static geminiModelHint paragraph and restore the normal ProviderTools "Fetch models" button when the active provider is Gemini. Users now read live model IDs from Google instead of staring at a wall of i18n text below their model field. Backend fetch_provider_models now branches on the base URL: - Google's generativelanguage.googleapis.com → x-goog-api-key auth + parse_gemini_model_ids() that walks `models[].name` and strips the `models/` prefix so the result is a plain `gemini-2.5-flash` style ID directly writable to ark.model_id. - Anything else → unchanged Bearer auth + OpenAI `data[].id` shape. Tests: parse_gemini_model_ids_strips_models_prefix_and_dedups + is_gemini_base_url_matches_official_domain. cargo test --lib green (32/32 in commands::tests subset). i18n: settings.providers.geminiModelHint removed from all 5 locales. The TS `typeof zhCN` constraint enforces the consistent removal. * fix(llm-gemini): pr_agent #398 follow-ups — filter non-chat models + CRLF SSE delim Two follow-up fixes for pr_agent advisories on PR #398: 1. parse_gemini_model_ids now filters by `supportedGenerationMethods`. Google's /v1beta/models lists every family — embeddings, TTS-image, predict-only — alongside generateContent-capable chat models. Without filtering, a "Fetch models" click would surface ids like `gemini-embedding-2`, and a user picking one would write it into ark.model_id and have polish silently fail on every dictation. Field is conservatively included when missing (some preview models don't expose it yet). 2. drain_complete_sse_events now accepts both `\n\n` and `\r\n\r\n` as event delimiters. Some HTTP/2 middleware and CDNs normalize line endings to CRLF; the previous LF-only implementation would never drain a complete event from those servers and report the stream as empty even when data was received. Tied positions can't happen by construction (\r\n\r\n contains no \n\n substring), so "earliest match wins" is unambiguous. Tests: - parse_gemini_model_ids_filters_out_non_generate_content_families - drain_complete_sse_events_handles_crlf_delimiter - drain_complete_sse_events_picks_earliest_delimiter_when_mixed --------- Co-authored-by: baiqing <lbx12309@icloud.com>

github-actions Bot added the Review effort 2/5 label May 9, 2026

H-Chris233 previously approved these changes May 9, 2026

View reviewed changes

ci: re-trigger pr_agent after PR description cleanup (no code change)

e458605

appergb merged commit 892f3c4 into beta May 10, 2026
4 checks passed

appergb deleted the fix/audit-emit-capsule-mainthread branch May 10, 2026 00:41

appergb mentioned this pull request May 10, 2026

chore: bump version to 1.2.24-7 (Beta-7) #395

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(coordinator): marshal capsule show/hide to main thread [高]#389

fix(coordinator): marshal capsule show/hide to main thread [高]#389
appergb merged 3 commits intobetafrom
fix/audit-emit-capsule-mainthread

appergb commented May 9, 2026 •

edited by github-actions Bot

Loading

Uh oh!

github-actions Bot commented May 9, 2026 •

edited

Loading

Uh oh!

appergb commented May 10, 2026

Uh oh!

github-actions Bot commented May 10, 2026

Uh oh!

github-actions Bot commented May 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

appergb commented May 9, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

Summary

Change

Why this is safe

Audit linkage

Test plan

PR Type

Description

Diagram Walkthrough

File Walkthrough

Uh oh!

github-actions Bot commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Reviewer Guide 🔍

(Review updated until commit e458605)

Uh oh!

appergb commented May 10, 2026

pr_agent 反馈处理

"Stale State" — 已修

"Ticket compliance ❌ #169" — 误报

Uh oh!

github-actions Bot commented May 10, 2026

Uh oh!

github-actions Bot commented May 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

appergb commented May 9, 2026 •

edited by github-actions Bot

Loading

github-actions Bot commented May 9, 2026 •

edited

Loading

(Review updated until commit `e458605`)