fix(coordinator): marshal capsule show/hide to main thread [高]#389
Merged
fix(coordinator): marshal capsule show/hide to main thread [高]#389
Conversation
emit_capsule does two things — (a) push a CapsulePayload event to the JS frontend via app.emit_to, and (b) flip the OS-level window visibility via window.show/hide and reposition it via maybe_position_capsule_*. Path (a) is safe from any thread (Tauri event bus marshals internally). Path (b) calls NSWindow / HWND APIs that expect the main thread — but emit_capsule is invoked at ~30 Hz from the cpal process_callback (audio thread) via the level_handler in coordinator/dictation.rs. On macOS this risks dispatch_assert_queue_fail SIGTRAP if the AppKit runloop assertion fires; on Windows SendMessage can deadlock against the GUI thread; in either case the audio callback can miss its deadline and trigger kAudioUnitErr_TooManyFramesToProcess on macOS. The 30 Hz throttle limits frequency but does not change per-call risk. Fix: wrap window.show / window.hide / maybe_position_capsule_bottom_center / show_capsule_window_no_activate / restore_main_window_key_if_active in app.run_on_main_thread. The closure captures an Arc<Inner> clone + AppHandle clone + Copy primitives (state, translation, show_capsule, visible) — no payload clone needed. emit_to stays on the calling thread since it is internally thread-safe, preserving the existing happens- before relationship between event payload and window state on the JS frontend (frontend processes events asynchronously regardless). This pattern is already established in the codebase — see coordinator.rs::stop_qa_hotkey_listener (line 326) for the same reasoning around Carbon hotkey drop on the wrong thread (issue #169). Audit ID 3.2.2 (CONFIRMED 高). Test: 31 lib tests pass. Functional verification (no SIGTRAP on long recordings, capsule still appears at start of dictation) requires running the app — to be done after PR ships.
PR Reviewer Guide 🔍(Review updated until commit e458605)Here are some key observations to aid the review process:
|
H-Chris233
previously approved these changes
May 9, 2026
Addresses pr_agent "Stale State" review on PR #389. show_capsule was read on the audio callback thread and snapshotted into the run_on_main_thread closure. If the user toggled the preference between the audio-thread emit and the main-thread closure firing (one or two frames later), the queued window show/hide could apply the old preference and briefly flash the wrong UI state. Move that read inside the closure so the latest user pref always wins. Why visible / translation stay snapshotted: they're the payload of THIS specific capsule:state event — the whole point is to tell the JS frontend "this is the state at the moment emit_capsule was called". Re-reading them inside the closure would mean the OS-level window state and the JS-side payload disagree (the closure lands one frame later than the emit_to that already shipped). No behavior change in the steady state; eliminates a 16–33 ms stale- pref window when the user changes show_capsule mid-dictation.
Collaborator
Author
pr_agent 反馈处理"Stale State" — 已修84ee3d9:把 "Ticket compliance ❌ #169" — 误报issue #169 在 commit
PR #389 是 audit 3.2.2(emit_capsule 在音频回调线程上动 NSWindow),跟 issue #169(Carbon RemoveEventHotKey 在 RunEvent::Exit 非主线程上 drop)是两个独立的事;PR diff 不动 hotkey teardown 路径是设计如此,不是合规缺失。 |
|
Persistent review updated to latest commit 84ee3d9 |
|
Persistent review updated to latest commit e458605 |
5 tasks
pull Bot
pushed a commit
to yimmy23/openless
that referenced
this pull request
May 10, 2026
10 PRs landed on beta this cycle: - Open-Less#377 paste shortcut configurable (issue Open-Less#360) - Open-Less#386 TS UserPreferences updateChannel alignment - Open-Less#387 focus_target leak on Processing-phase cancel - Open-Less#388 [严重] MacHotkeyAdapter::shutdown stops CFRunLoop + tap - Open-Less#389 emit_capsule window.show/hide off audio thread - Open-Less#390 QA / dictation hotkey routing race - Open-Less#391 audio-mute spawn_blocking (async hygiene) - Open-Less#392 hotkey supervisor + global dispatcher exit signal - Open-Less#393 post-audit logic-review hotfixes (QA mute .await + focus_target Processing branch) - Open-Less#394 in-process credentials cache (kills repeated Keychain prompts) Bump 4 files: package.json, tauri.conf.json, Cargo.toml, Cargo.lock.
appergb
pushed a commit
that referenced
this pull request
May 10, 2026
Snapshot two audit artifacts produced 2026-05-10: - docs/audit-2026-05-10-validated.md — 21 audit items run against the live source tree at openless-all/app/src-tauri/src/ with a CONFIRMED / FALSE_POSITIVE / PARTIAL verdict per item plus the exact file:line range that supports each verdict. - docs/logic-review-2026-05-10.md — 5-path logic review of origin/beta @ 400097a covering startup, press/release, end-of- session, capsule emit, and shutdown. Notable: at 400097a the emit_capsule fix from PR #389 was NOT yet an ancestor, so the cpal audio-callback → AppKit/Win32 SIGTRAP risk was still live on that snapshot (subsequent merges have addressed it; this doc remains the dated record). Both files are pure snapshots — no code changes here.
appergb
pushed a commit
that referenced
this pull request
May 10, 2026
Snapshot two audit artifacts produced 2026-05-10: - docs/audit-2026-05-10-validated.md — 21 audit items run against the live source tree at openless-all/app/src-tauri/src/ with a CONFIRMED / FALSE_POSITIVE / PARTIAL verdict per item plus the exact file:line range that supports each verdict. - docs/logic-review-2026-05-10.md — 5-path logic review of origin/beta @ 400097a covering startup, press/release, end-of- session, capsule emit, and shutdown. Notable: at 400097a the emit_capsule fix from PR #389 was NOT yet an ancestor, so the cpal audio-callback → AppKit/Win32 SIGTRAP risk was still live on that snapshot (subsequent merges have addressed it; this doc remains the dated record). Both files are pure snapshots — no code changes here.
appergb
added a commit
that referenced
this pull request
May 10, 2026
* chore: ignore video-materials/ promo recordings video-materials/ holds ~2.2 GB of promo recordings (promo / promo-materials / promo-openless / promo-openless-v2 / SC). These are local working assets that don't belong in the repo. Add to .gitignore so future stash / commit operations can't accidentally pull the binary blob in. * feat(llm): Google Gemini provider with forced thinking-off + fix provider-switch credentials Add Google Gemini as a native LLM provider hitting v1beta :generateContent (single-shot) and :streamGenerateContent?alt=sse (QA chat) with x-goog-api-key auth. Per-model thinkingConfig is injected to keep thinking off for every supported ID — exact matrix verified against ai.google.dev/gemini-api/docs/thinking: gemini-2.5-flash, gemini-2.5-flash-lite thinkingBudget: 0 gemini-2.5-pro field omitted (Google: cannot disable) gemini-3*-pro* thinkingLevel: "low" gemini-3*-flash*, unknown future IDs thinkingLevel: "minimal" Why native, not the OpenAI-compat shim: shim is beta and for 3.x needs an extra_body workaround for thinking control; auth header and contents shape differ enough that reusing OpenAICompatibleLLMProvider would have been the worse compromise. fix(settings): switching LLM preset previously only auto-filled endpoint/model when those slots were empty. All providers share ark.endpoint / ark.model_id (per-provider entries were never introduced — see persistence.rs lookup_account), so for returning users with non-empty slots, switching the dropdown updated active provider but kept the OLD endpoint — polish silently kept calling the previous provider. Now switching to any non-custom preset overwrites endpoint/model to that preset's defaults. Trade-off: user-customized endpoints within a preset are dropped on switch; silent wrong-routing was a worse failure mode. Backend: - New module llm_gemini.rs (15 unit tests for thinking-config table, URL builder, response parser, SSE shape, history role mapping) - polish.rs extracts compose_polish_prompts / compose_translate_prompts / compose_qa_system_prompt as pub(crate) helpers shared by both LLM paths so prompt assembly can't drift - coordinator.rs's polish_text / translate_text / answer_chat_dispatch open with `if get_active_llm() == "gemini"` early-return; existing OpenAI-compat callsite stays untouched Frontend: - LLM_PRESETS gains gemini entry (baseUrl v1beta, model placeholder gemini-2.5-flash) - ProviderTools hidden when committedLlmProvider === 'gemini' (its "Fetch models" button assumes OpenAI /models shape; native /v1beta/models response shape doesn't match) - Static geminiModelHint lists the 7 available IDs so users pick correctly without auto-fetch - 5 locales (zh-CN, zh-TW, en, ja, ko) gain presets.gemini and geminiModelHint Test plan: - cargo test --lib 200/200 pass - npm run build (tsc + vite) green - Manual: switch to Gemini in Settings, paste Google AI Studio key, dictate; verify polish reaches v1beta and respects the per-model thinking-off contract. * docs: 2026-05-10 audit + beta logic review snapshots Snapshot two audit artifacts produced 2026-05-10: - docs/audit-2026-05-10-validated.md — 21 audit items run against the live source tree at openless-all/app/src-tauri/src/ with a CONFIRMED / FALSE_POSITIVE / PARTIAL verdict per item plus the exact file:line range that supports each verdict. - docs/logic-review-2026-05-10.md — 5-path logic review of origin/beta @ 400097a covering startup, press/release, end-of- session, capsule emit, and shutdown. Notable: at 400097a the emit_capsule fix from PR #389 was NOT yet an ancestor, so the cpal audio-callback → AppKit/Win32 SIGTRAP risk was still live on that snapshot (subsequent merges have addressed it; this doc remains the dated record). Both files are pure snapshots — no code changes here. * fix(llm-gemini): buffer SSE bytes across chunks to handle multibyte UTF-8 split PR #398 pr_agent flagged: reqwest::chunk() can split a multi-byte UTF-8 codepoint (CJK / emoji) across two network chunks, but the streaming code decoded each chunk independently with from_utf8 → half a 你 in the tail of one chunk would surface as "non-utf8 SSE chunk" and abort the whole stream. This breaks Gemini QA streaming for any Chinese-language answer that doesn't happen to align with packet boundaries. Fix: buffer raw bytes (Vec<u8>) until we see the SSE delimiter b"\n\n", and only then decode the complete event content as UTF-8. The delimiter \n\n is two ASCII bytes (0x0A 0x0A) which can never appear inside a multi-byte UTF-8 sequence, so byte-level splitting at that boundary is always safe. Extracted the parsing as drain_complete_sse_events(&mut Vec<u8>) so the invariant is testable without a TcpListener. Three new unit tests cover: - normal split-on-delimiter + remainder buffering - multibyte UTF-8 split across chunks (the actual regression) - invalid-UTF-8 event content gracefully skipped (warn, don't crash) The same byte-level pattern fix applies to polish.rs's chat_completion_history_streaming — left untouched here to keep this PR scoped to the Gemini diff pr_agent reviewed. polish.rs should get the same fix in a follow-up. * feat(llm-gemini): teach ProviderTools to fetch native Gemini /v1beta/models Drop the verbose static geminiModelHint paragraph and restore the normal ProviderTools "Fetch models" button when the active provider is Gemini. Users now read live model IDs from Google instead of staring at a wall of i18n text below their model field. Backend fetch_provider_models now branches on the base URL: - Google's generativelanguage.googleapis.com → x-goog-api-key auth + parse_gemini_model_ids() that walks `models[].name` and strips the `models/` prefix so the result is a plain `gemini-2.5-flash` style ID directly writable to ark.model_id. - Anything else → unchanged Bearer auth + OpenAI `data[].id` shape. Tests: parse_gemini_model_ids_strips_models_prefix_and_dedups + is_gemini_base_url_matches_official_domain. cargo test --lib green (32/32 in commands::tests subset). i18n: settings.providers.geminiModelHint removed from all 5 locales. The TS `typeof zhCN` constraint enforces the consistent removal. * fix(llm-gemini): pr_agent #398 follow-ups — filter non-chat models + CRLF SSE delim Two follow-up fixes for pr_agent advisories on PR #398: 1. parse_gemini_model_ids now filters by `supportedGenerationMethods`. Google's /v1beta/models lists every family — embeddings, TTS-image, predict-only — alongside generateContent-capable chat models. Without filtering, a "Fetch models" click would surface ids like `gemini-embedding-2`, and a user picking one would write it into ark.model_id and have polish silently fail on every dictation. Field is conservatively included when missing (some preview models don't expose it yet). 2. drain_complete_sse_events now accepts both `\n\n` and `\r\n\r\n` as event delimiters. Some HTTP/2 middleware and CDNs normalize line endings to CRLF; the previous LF-only implementation would never drain a complete event from those servers and report the stream as empty even when data was received. Tied positions can't happen by construction (\r\n\r\n contains no \n\n substring), so "earliest match wins" is unambiguous. Tests: - parse_gemini_model_ids_filters_out_non_generate_content_families - drain_complete_sse_events_handles_crlf_delimiter - drain_complete_sse_events_picks_earliest_delimiter_when_mixed --------- Co-authored-by: baiqing <lbx12309@icloud.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
User description
Summary
`emit_capsule` runs window state changes (`show()` / `hide()` / position helpers) inline on whichever thread invoked it. The level handler in `coordinator/dictation.rs` invokes `emit_capsule` at ~30 Hz from inside cpal's `process_callback` — i.e. the audio callback thread. NSWindow / HWND APIs expect the main thread; calling them from the audio thread risks:
The 30 Hz throttle bounds frequency but doesn't change per-call risk.
Change
Wrap the window-touching half of `emit_capsule` in `app.run_on_main_thread`:
`app.emit_to` stays inline since Tauri's event bus is thread-safe. Closure captures `Arc` clone + `AppHandle` clone + Copy primitives (`translation`, `show_capsule`, `visible`) — no `CapsulePayload` clone needed.
The pattern is already established in this file — see `stop_qa_hotkey_listener` (line 326) for the same reasoning around Carbon hotkey drop running off the AppKit thread.
Why this is safe
Audit linkage
Audit ID 3.2.2 (CONFIRMED 高). See `docs/audit-2026-05-10-validated.md` (local).
Test plan
PR Type
Bug fix, Enhancement
Description
Marshal window ops to main thread to avoid audio-thread crash
Read
show_capsulepref inside main-thread closureCapture visible/translation at call site for frame consistency
Keep
emit_tosynchronous on calling threadDiagram Walkthrough
File Walkthrough
coordinator.rs
Thread-safe capsule window ops and fresh preference readingopenless-all/app/src-tauri/src/coordinator.rs
app.run_on_main_threadto prevent main-thread-only API violations fromthe audio callback thread.
show_capsulepreference read inside the main-thread closureso the latest user setting is always used, eliminating stale-pref
flashes.
visibleand loadstranslationbefore spawning the closure,aligning window state with the actual capsule payload for the current
frame.
app.emit_tocall on the calling thread(Tauri event bus is thread-safe), maintaining the existing
happens‑before order between event delivery and window visibility.