Skip to content

fix(coordinator): marshal capsule show/hide to main thread [高]#389

Merged
appergb merged 3 commits intobetafrom
fix/audit-emit-capsule-mainthread
May 10, 2026
Merged

fix(coordinator): marshal capsule show/hide to main thread [高]#389
appergb merged 3 commits intobetafrom
fix/audit-emit-capsule-mainthread

Conversation

@appergb
Copy link
Copy Markdown
Collaborator

@appergb appergb commented May 9, 2026

User description

Summary

`emit_capsule` runs window state changes (`show()` / `hide()` / position helpers) inline on whichever thread invoked it. The level handler in `coordinator/dictation.rs` invokes `emit_capsule` at ~30 Hz from inside cpal's `process_callback` — i.e. the audio callback thread. NSWindow / HWND APIs expect the main thread; calling them from the audio thread risks:

  • macOS: `dispatch_assert_queue_fail` SIGTRAP if the AppKit runloop assertion fires
  • Windows: `SendMessage` deadlock against the GUI thread
  • Either platform: the audio callback misses its deadline and triggers `kAudioUnitErr_TooManyFramesToProcess` (macOS) / dropped frames (Windows)

The 30 Hz throttle bounds frequency but doesn't change per-call risk.

Change

Wrap the window-touching half of `emit_capsule` in `app.run_on_main_thread`:

  • `window.show` / `window.hide`
  • `maybe_position_capsule_bottom_center`
  • `show_capsule_window_no_activate`
  • `restore_main_window_key_if_active` (macOS-only)

`app.emit_to` stays inline since Tauri's event bus is thread-safe. Closure captures `Arc` clone + `AppHandle` clone + Copy primitives (`translation`, `show_capsule`, `visible`) — no `CapsulePayload` clone needed.

The pattern is already established in this file — see `stop_qa_hotkey_listener` (line 326) for the same reasoning around Carbon hotkey drop running off the AppKit thread.

Why this is safe

  • emit_to and run_on_main_thread are independent — the JS frontend processes the event asynchronously regardless, so the existing happens-before between "frontend hears event" and "OS window visible" is preserved (both were already async from JS's perspective).
  • run_on_main_thread is fire-and-forget; emit_capsule's existing semantics (returns immediately, no result) are unchanged.
  • Window state changes for begin/end/cancel paths still run — just slightly later (one main-runloop turn). At human time scales (<16 ms) this is invisible.

Audit linkage

Audit ID 3.2.2 (CONFIRMED 高). See `docs/audit-2026-05-10-validated.md` (local).

Test plan

  • `cargo test --lib` — 31/31 pass.
  • `cargo check --lib` clean.
  • CI build — to be verified.
  • Manual functional verification: dictate for ≥30 s on macOS while watching for SIGTRAP / Console.app crash logs; verify capsule still appears at session start. To be done after merge.

PR Type

Bug fix, Enhancement


Description

  • Marshal window ops to main thread to avoid audio-thread crash

  • Read show_capsule pref inside main-thread closure

  • Capture visible/translation at call site for frame consistency

  • Keep emit_to synchronous on calling thread


Diagram Walkthrough

flowchart LR
  A["emit_capsule (audio thread)"] --> B["compute visible, capture translation"]
  B --> C["run_on_main_thread closure"]
  C --> D["read latest show_capsule pref"]
  D --> E["maybe_position, show/hide capsule window"]
  B --> F["app.emit_to (inline, thread-safe)"]
Loading

File Walkthrough

Relevant files
Bug fix
coordinator.rs
Thread-safe capsule window ops and fresh preference reading

openless-all/app/src-tauri/src/coordinator.rs

  • Wraps capsule window show/hide/positioning calls in
    app.run_on_main_thread to prevent main-thread-only API violations from
    the audio callback thread.
  • Moves the show_capsule preference read inside the main-thread closure
    so the latest user setting is always used, eliminating stale-pref
    flashes.
  • Computes visible and loads translation before spawning the closure,
    aligning window state with the actual capsule payload for the current
    frame.
  • Preserves the synchronous app.emit_to call on the calling thread
    (Tauri event bus is thread-safe), maintaining the existing
    happens‑before order between event delivery and window visibility.
+26/-8   

emit_capsule does two things — (a) push a CapsulePayload event to the
JS frontend via app.emit_to, and (b) flip the OS-level window visibility
via window.show/hide and reposition it via maybe_position_capsule_*.
Path (a) is safe from any thread (Tauri event bus marshals internally).
Path (b) calls NSWindow / HWND APIs that expect the main thread —
but emit_capsule is invoked at ~30 Hz from the cpal process_callback
(audio thread) via the level_handler in coordinator/dictation.rs.

On macOS this risks dispatch_assert_queue_fail SIGTRAP if the AppKit
runloop assertion fires; on Windows SendMessage can deadlock against
the GUI thread; in either case the audio callback can miss its deadline
and trigger kAudioUnitErr_TooManyFramesToProcess on macOS. The 30 Hz
throttle limits frequency but does not change per-call risk.

Fix: wrap window.show / window.hide / maybe_position_capsule_bottom_center
/ show_capsule_window_no_activate / restore_main_window_key_if_active
in app.run_on_main_thread. The closure captures an Arc<Inner> clone +
AppHandle clone + Copy primitives (state, translation, show_capsule,
visible) — no payload clone needed. emit_to stays on the calling thread
since it is internally thread-safe, preserving the existing happens-
before relationship between event payload and window state on the JS
frontend (frontend processes events asynchronously regardless).

This pattern is already established in the codebase — see
coordinator.rs::stop_qa_hotkey_listener (line 326) for the same
reasoning around Carbon hotkey drop on the wrong thread (issue #169).

Audit ID 3.2.2 (CONFIRMED 高).

Test: 31 lib tests pass. Functional verification (no SIGTRAP on long
recordings, capsule still appears at start of dictation) requires
running the app — to be done after PR ships.
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 9, 2026

PR Reviewer Guide 🔍

(Review updated until commit e458605)

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ No major issues detected

H-Chris233
H-Chris233 previously approved these changes May 9, 2026
@H-Chris233 H-Chris233 dismissed their stale review May 9, 2026 23:35

有问题...?

Addresses pr_agent "Stale State" review on PR #389. show_capsule was
read on the audio callback thread and snapshotted into the
run_on_main_thread closure. If the user toggled the preference between
the audio-thread emit and the main-thread closure firing (one or two
frames later), the queued window show/hide could apply the old
preference and briefly flash the wrong UI state.

Move that read inside the closure so the latest user pref always wins.

Why visible / translation stay snapshotted: they're the payload of
THIS specific capsule:state event — the whole point is to tell the JS
frontend "this is the state at the moment emit_capsule was called".
Re-reading them inside the closure would mean the OS-level window state
and the JS-side payload disagree (the closure lands one frame later
than the emit_to that already shipped).

No behavior change in the steady state; eliminates a 16–33 ms stale-
pref window when the user changes show_capsule mid-dictation.
@appergb
Copy link
Copy Markdown
Collaborator Author

appergb commented May 10, 2026

pr_agent 反馈处理

"Stale State" — 已修

84ee3d9:把 show_capsule 的 prefs 读取挪进 run_on_main_thread 闭包内,消除「用户改 show_capsule 偏好后下一帧才生效」的 16-33ms 窗口。visible / translation 仍在 call-site 计算 —— 它们是这一帧 capsule:state event 的 payload 内容,必须跟下发给 JS 的 payload 同步,不能闭包跑时再读。

"Ticket compliance ❌ #169" — 误报

issue #169 在 commit 7ac0421a (closes #169,2026-05-02) 已修。当前 beta 状态:

  • coordinator.rs:326-340 stop_qa_hotkey_listener 已经 wrap app.run_on_main_thread
    pub fn stop_qa_hotkey_listener(&self) {
        let app = self.inner.app.lock().clone();
        if let Some(app) = app {
            let inner = Arc::clone(&self.inner);
            let _ = app.run_on_main_thread(move || {
                inner.qa_hotkey.lock().take();
            });
        } else {
            self.inner.qa_hotkey.lock().take();
        }
    }
  • lib.rs:347-355RunEvent::Exit 调链已经把六个 stop_*_hotkey_listener 串好:stop_hotkey_listener / stop_qa_hotkey_listener / stop_combo_hotkey_listener / stop_translation_hotkey_listener / stop_switch_style_hotkey_listener / stop_open_app_hotkey_listener
  • 后四个兄弟函数走 take_*_on_main_thread helpers,每个都用 app.run_on_main_thread marshal Carbon teardown。

PR #389 是 audit 3.2.2(emit_capsule 在音频回调线程上动 NSWindow),跟 issue #169(Carbon RemoveEventHotKey 在 RunEvent::Exit 非主线程上 drop)是两个独立的事;PR diff 不动 hotkey teardown 路径是设计如此,不是合规缺失。

@github-actions
Copy link
Copy Markdown

Persistent review updated to latest commit 84ee3d9

@github-actions
Copy link
Copy Markdown

Persistent review updated to latest commit e458605

@appergb appergb merged commit 892f3c4 into beta May 10, 2026
4 checks passed
@appergb appergb deleted the fix/audit-emit-capsule-mainthread branch May 10, 2026 00:41
pull Bot pushed a commit to yimmy23/openless that referenced this pull request May 10, 2026
10 PRs landed on beta this cycle:
- Open-Less#377 paste shortcut configurable (issue Open-Less#360)
- Open-Less#386 TS UserPreferences updateChannel alignment
- Open-Less#387 focus_target leak on Processing-phase cancel
- Open-Less#388 [严重] MacHotkeyAdapter::shutdown stops CFRunLoop + tap
- Open-Less#389 emit_capsule window.show/hide off audio thread
- Open-Less#390 QA / dictation hotkey routing race
- Open-Less#391 audio-mute spawn_blocking (async hygiene)
- Open-Less#392 hotkey supervisor + global dispatcher exit signal
- Open-Less#393 post-audit logic-review hotfixes (QA mute .await + focus_target Processing branch)
- Open-Less#394 in-process credentials cache (kills repeated Keychain prompts)

Bump 4 files: package.json, tauri.conf.json, Cargo.toml, Cargo.lock.
appergb pushed a commit that referenced this pull request May 10, 2026
Snapshot two audit artifacts produced 2026-05-10:

- docs/audit-2026-05-10-validated.md — 21 audit items run against
  the live source tree at openless-all/app/src-tauri/src/ with a
  CONFIRMED / FALSE_POSITIVE / PARTIAL verdict per item plus the
  exact file:line range that supports each verdict.

- docs/logic-review-2026-05-10.md — 5-path logic review of
  origin/beta @ 400097a covering startup, press/release, end-of-
  session, capsule emit, and shutdown. Notable: at 400097a the
  emit_capsule fix from PR #389 was NOT yet an ancestor, so the
  cpal audio-callback → AppKit/Win32 SIGTRAP risk was still live
  on that snapshot (subsequent merges have addressed it; this doc
  remains the dated record).

Both files are pure snapshots — no code changes here.
appergb pushed a commit that referenced this pull request May 10, 2026
Snapshot two audit artifacts produced 2026-05-10:

- docs/audit-2026-05-10-validated.md — 21 audit items run against
  the live source tree at openless-all/app/src-tauri/src/ with a
  CONFIRMED / FALSE_POSITIVE / PARTIAL verdict per item plus the
  exact file:line range that supports each verdict.

- docs/logic-review-2026-05-10.md — 5-path logic review of
  origin/beta @ 400097a covering startup, press/release, end-of-
  session, capsule emit, and shutdown. Notable: at 400097a the
  emit_capsule fix from PR #389 was NOT yet an ancestor, so the
  cpal audio-callback → AppKit/Win32 SIGTRAP risk was still live
  on that snapshot (subsequent merges have addressed it; this doc
  remains the dated record).

Both files are pure snapshots — no code changes here.
appergb added a commit that referenced this pull request May 10, 2026
* chore: ignore video-materials/ promo recordings

video-materials/ holds ~2.2 GB of promo recordings (promo /
promo-materials / promo-openless / promo-openless-v2 / SC). These
are local working assets that don't belong in the repo. Add to
.gitignore so future stash / commit operations can't accidentally
pull the binary blob in.

* feat(llm): Google Gemini provider with forced thinking-off + fix provider-switch credentials

Add Google Gemini as a native LLM provider hitting v1beta
:generateContent (single-shot) and :streamGenerateContent?alt=sse
(QA chat) with x-goog-api-key auth. Per-model thinkingConfig is
injected to keep thinking off for every supported ID — exact
matrix verified against ai.google.dev/gemini-api/docs/thinking:

  gemini-2.5-flash, gemini-2.5-flash-lite     thinkingBudget: 0
  gemini-2.5-pro                              field omitted (Google: cannot disable)
  gemini-3*-pro*                              thinkingLevel: "low"
  gemini-3*-flash*, unknown future IDs        thinkingLevel: "minimal"

Why native, not the OpenAI-compat shim: shim is beta and for 3.x
needs an extra_body workaround for thinking control; auth header
and contents shape differ enough that reusing
OpenAICompatibleLLMProvider would have been the worse compromise.

fix(settings): switching LLM preset previously only auto-filled
endpoint/model when those slots were empty. All providers share
ark.endpoint / ark.model_id (per-provider entries were never
introduced — see persistence.rs lookup_account), so for returning
users with non-empty slots, switching the dropdown updated active
provider but kept the OLD endpoint — polish silently kept calling
the previous provider. Now switching to any non-custom preset
overwrites endpoint/model to that preset's defaults. Trade-off:
user-customized endpoints within a preset are dropped on switch;
silent wrong-routing was a worse failure mode.

Backend:
- New module llm_gemini.rs (15 unit tests for thinking-config table,
  URL builder, response parser, SSE shape, history role mapping)
- polish.rs extracts compose_polish_prompts / compose_translate_prompts
  / compose_qa_system_prompt as pub(crate) helpers shared by both
  LLM paths so prompt assembly can't drift
- coordinator.rs's polish_text / translate_text / answer_chat_dispatch
  open with `if get_active_llm() == "gemini"` early-return; existing
  OpenAI-compat callsite stays untouched

Frontend:
- LLM_PRESETS gains gemini entry (baseUrl v1beta, model placeholder
  gemini-2.5-flash)
- ProviderTools hidden when committedLlmProvider === 'gemini'
  (its "Fetch models" button assumes OpenAI /models shape; native
  /v1beta/models response shape doesn't match)
- Static geminiModelHint lists the 7 available IDs so users pick
  correctly without auto-fetch
- 5 locales (zh-CN, zh-TW, en, ja, ko) gain presets.gemini and
  geminiModelHint

Test plan:
- cargo test --lib                          200/200 pass
- npm run build (tsc + vite)                green
- Manual: switch to Gemini in Settings,
  paste Google AI Studio key, dictate;
  verify polish reaches v1beta and respects
  the per-model thinking-off contract.

* docs: 2026-05-10 audit + beta logic review snapshots

Snapshot two audit artifacts produced 2026-05-10:

- docs/audit-2026-05-10-validated.md — 21 audit items run against
  the live source tree at openless-all/app/src-tauri/src/ with a
  CONFIRMED / FALSE_POSITIVE / PARTIAL verdict per item plus the
  exact file:line range that supports each verdict.

- docs/logic-review-2026-05-10.md — 5-path logic review of
  origin/beta @ 400097a covering startup, press/release, end-of-
  session, capsule emit, and shutdown. Notable: at 400097a the
  emit_capsule fix from PR #389 was NOT yet an ancestor, so the
  cpal audio-callback → AppKit/Win32 SIGTRAP risk was still live
  on that snapshot (subsequent merges have addressed it; this doc
  remains the dated record).

Both files are pure snapshots — no code changes here.

* fix(llm-gemini): buffer SSE bytes across chunks to handle multibyte UTF-8 split

PR #398 pr_agent flagged: reqwest::chunk() can split a multi-byte
UTF-8 codepoint (CJK / emoji) across two network chunks, but the
streaming code decoded each chunk independently with from_utf8 →
half a 你 in the tail of one chunk would surface as
"non-utf8 SSE chunk" and abort the whole stream. This breaks
Gemini QA streaming for any Chinese-language answer that doesn't
happen to align with packet boundaries.

Fix: buffer raw bytes (Vec<u8>) until we see the SSE delimiter
b"\n\n", and only then decode the complete event content as UTF-8.
The delimiter \n\n is two ASCII bytes (0x0A 0x0A) which can never
appear inside a multi-byte UTF-8 sequence, so byte-level splitting
at that boundary is always safe.

Extracted the parsing as drain_complete_sse_events(&mut Vec<u8>) so
the invariant is testable without a TcpListener. Three new unit
tests cover:
- normal split-on-delimiter + remainder buffering
- multibyte UTF-8 split across chunks (the actual regression)
- invalid-UTF-8 event content gracefully skipped (warn, don't crash)

The same byte-level pattern fix applies to polish.rs's
chat_completion_history_streaming — left untouched here to keep
this PR scoped to the Gemini diff pr_agent reviewed. polish.rs
should get the same fix in a follow-up.

* feat(llm-gemini): teach ProviderTools to fetch native Gemini /v1beta/models

Drop the verbose static geminiModelHint paragraph and restore the
normal ProviderTools "Fetch models" button when the active provider
is Gemini. Users now read live model IDs from Google instead of
staring at a wall of i18n text below their model field.

Backend fetch_provider_models now branches on the base URL:
- Google's generativelanguage.googleapis.com → x-goog-api-key auth +
  parse_gemini_model_ids() that walks `models[].name` and strips the
  `models/` prefix so the result is a plain `gemini-2.5-flash` style
  ID directly writable to ark.model_id.
- Anything else → unchanged Bearer auth + OpenAI `data[].id` shape.

Tests: parse_gemini_model_ids_strips_models_prefix_and_dedups +
is_gemini_base_url_matches_official_domain. cargo test --lib green
(32/32 in commands::tests subset).

i18n: settings.providers.geminiModelHint removed from all 5 locales.
The TS `typeof zhCN` constraint enforces the consistent removal.

* fix(llm-gemini): pr_agent #398 follow-ups — filter non-chat models + CRLF SSE delim

Two follow-up fixes for pr_agent advisories on PR #398:

1. parse_gemini_model_ids now filters by `supportedGenerationMethods`.
   Google's /v1beta/models lists every family — embeddings,
   TTS-image, predict-only — alongside generateContent-capable
   chat models. Without filtering, a "Fetch models" click would
   surface ids like `gemini-embedding-2`, and a user picking one
   would write it into ark.model_id and have polish silently fail
   on every dictation. Field is conservatively included when
   missing (some preview models don't expose it yet).

2. drain_complete_sse_events now accepts both `\n\n` and `\r\n\r\n`
   as event delimiters. Some HTTP/2 middleware and CDNs normalize
   line endings to CRLF; the previous LF-only implementation would
   never drain a complete event from those servers and report the
   stream as empty even when data was received. Tied positions
   can't happen by construction (\r\n\r\n contains no \n\n
   substring), so "earliest match wins" is unambiguous.

Tests:
- parse_gemini_model_ids_filters_out_non_generate_content_families
- drain_complete_sse_events_handles_crlf_delimiter
- drain_complete_sse_events_picks_earliest_delimiter_when_mixed

---------

Co-authored-by: baiqing <lbx12309@icloud.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants