Skip to content

feat: improve Foundry local ASR setup#292

Merged
H-Chris233 merged 47 commits into
Open-Less:mainfrom
millionart:codex/windows-local-asr
May 7, 2026
Merged

feat: improve Foundry local ASR setup#292
H-Chris233 merged 47 commits into
Open-Less:mainfrom
millionart:codex/windows-local-asr

Conversation

@millionart
Copy link
Copy Markdown
Contributor

@millionart millionart commented May 6, 2026

User description

Summary

  • Wire Foundry Local Whisper language hints through the Windows local ASR pipeline into the Foundry SDK audio client.
  • Expose Foundry model catalog size/cache status and staged prepare progress, then surface model size, language selection, progress, and best-effort cancel in LocalAsr.
  • Keep the existing OpenLess path intact: hotkey -> Capsule recording -> Foundry transcript -> existing polish/translate -> TSF/fallback insertion -> history.
  • Merge upstream/main and fill the new ja/ko i18n resources required by the merged resource shape.

Verification

  • cargo check --manifest-path openless-all/app/src-tauri/Cargo.toml
  • npm.cmd --prefix openless-all/app run build
  • cargo test --manifest-path openless-all/app/src-tauri/Cargo.toml foundry --lib --no-run
  • git diff --check upstream/main...HEAD
  • Source grep confirmed no Win+H / Voice Typing / native dictation path was added; existing SendInput references are TSF/fallback insertion paths only.

Not Run

  • Windows manual dictation smoke for Foundry model download, Chinese raw transcript, LLM/no-LLM insertion, and prepare cancel retry behavior.

PR Type

Enhancement


Description

  • Introduce Foundry Local Whisper ASR provider for Windows with model catalog, download, and transcription via Foundry SDK

  • Add Tauri commands and coordinator integration for local ASR sessions including prepare, cancel, release, and progress events

  • Extend hotkey binding system to support multi-key combos, double‑click mode, and a hotkey recorder UI

  • Surface Foundry model status, language hint selection, and keep‑loaded preferences in settings and local ASR management

  • Add shared WAV encoding module and adapt Whisper batch ASR to use it


Diagram Walkthrough

flowchart LR
  A["HotkeyRecorder UI"] --> B["Settings.tsx"]
  B --> C["commands.rs"]
  C --> D["Coordinator"]
  D --> E["FoundryLocalWhisperAsr"]
  E --> F["FoundryLocalRuntime"]
  F --> G["Foundry SDK / Model"]
  C --> H["foundry_prefs"]
  D --> I["Release scheduling"]
  I --> F
Loading

File Walkthrough

Relevant files
Enhancement
17 files
commands.rs
Foundry local ASR commands, provider switch release plan, tests
+328/-15
coordinator.rs
Integrate Foundry ASR into session life cycle with custom timeout
+282/-3 
foundry_runtime.rs
New Foundry runtime managing model lifecycle, prepare, cancel, catalog
+586/-0 
foundry_provider.rs
New Foundry Whisper ASR provider with PCM buffering and WAV
transcription
+353/-0 
foundry.rs
Foundry model definitions, status payloads, constants       
+263/-0 
wav.rs
Shared WAV encoding utility                                                           
+61/-0   
types.rs
Expand hotkey binding with multi‑key combos, double‑click mode,
Foundry prefs
+227/-3 
Settings.tsx
Hotkey recorder UI, Foundry provider visibility, local ASR hints
+353/-16
types.ts
TypeScript types for new hotkey binding and Foundry preferences
+12/-1   
mod.rs
Export Foundry modules                                                                     
+7/-0     
mod.rs
Add WAV module to ASR module tree                                               
+1/-0     
zh-CN.ts
Chinese i18n additions for Foundry and hotkeys                     
+47/-6   
en.ts
English i18n additions for Foundry and hotkeys                     
+47/-6   
ja.ts
Japanese i18n additions for Foundry and hotkeys                   
+44/-3   
ko.ts
Korean i18n additions for Foundry and hotkeys                       
+44/-3   
zh-TW.ts
Traditional Chinese i18n additions for Foundry and hotkeys
+47/-6   
hotkeyRecorder.ts
New hotkey recorder state machine logic                                   
+70/-0   
Configuration changes
2 files
persistence.rs
Default ASR provider set to Foundry on Windows                     
+8/-1     
lib.rs
Register new commands and Foundry runtime state                   
+14/-0   
Refactoring
1 files
whisper.rs
Refactor Whisper batch ASR to use shared WAV encoding       
+6/-28   
Additional files
17 files
2026-05-06-windows-local-asr.md +1396/-0
2026-05-06-windows-local-asr-design.md +247/-0 
check-hotkey-recorder.mjs +22/-0   
check-window-hotkey-fallback.mjs +22/-0   
windows-real-asr-insertion-smoke.ps1 +378/-40
Cargo.toml +1/-0     
App.tsx +19/-11 
hotkey.ts +104/-1 
hotkeyRecorder.test.ts +85/-0   
ipc.ts +7/-4     
localAsr.ts +119/-1 
windowHotkeyFallback.test.ts +42/-0   
windowHotkeyFallback.ts +27/-0   
LocalAsr.tsx +453/-1 
Overview.tsx +1/-0     
QaPanel.tsx +3/-3     
SelectionAsk.tsx +0/-2     

millionart added 26 commits May 4, 2026 17:35
# Conflicts:
#	openless-all/app/src/pages/QaPanel.tsx
…-asr

# Conflicts:
#	openless-all/app/src-tauri/Cargo.lock
#	openless-all/app/src-tauri/src/commands.rs
#	openless-all/app/src-tauri/src/coordinator.rs
#	openless-all/app/src-tauri/src/lib.rs
@millionart millionart marked this pull request as ready for review May 6, 2026 12:22
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 6, 2026

PR Reviewer Guide 🔍

(Review updated until commit e0f7ca2)

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Audio Loss

The transcription buffer is cleared after every transcribe() call, even when the Foundry SDK returns an error. If the first transcribe attempt times out or fails transiently, the user loses the recorded audio and has to дикate again instead of retrying the same session.

fn foundry_transcribe_attempt_consumes_buffer<T>(result: &Result<T>) -> bool {
    let _ = result;
    true
}
Blocking Release

release_now() waits on the same lifecycle lock used by model prepare and transcription, but it does not request prepare cancellation first. If the user clicks release while Foundry is still downloading or loading a model, the command can block until that work finishes instead of freeing memory promptly.

pub async fn release_now(&self) -> Result<()> {
    let _lifecycle = self.lifecycle.lock().await;
    self.release_now_locked().await
}

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 7, 2026

Persistent review updated to latest commit 5c92e87

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 7, 2026

Persistent review updated to latest commit 5c92e87

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 7, 2026

Persistent review updated to latest commit 5c92e87

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 7, 2026

Persistent review updated to latest commit 593cf76

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 593cf766a2

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread openless-all/app/src/pages/LocalAsr.tsx Outdated
Comment on lines +91 to +93
if (isFoundryAlias(status.activeModel)) {
setSelectedFoundryAlias(status.activeModel);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve pending Foundry model selection across refreshes

Avoid forcing selectedFoundryAlias to status.activeModel on every status refresh. When a user picks a different model in the dropdown but hasn't clicked “Prepare/Set default” yet, any subsequent refreshFoundryStatus() call resets their choice back to the currently active model, so the next action can run against the wrong model unexpectedly. This is user-visible and can invalidate local-ASR setup/testing flows.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0bb8d3ebcd

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread openless-all/app/src/pages/Settings.tsx Outdated
const choices: Array<[HotkeyMode, string]> = [
['toggle', t('settings.recording.modeToggle')],
['hold', t('settings.recording.modeHold')],
['doubleClick', t('settings.recording.modeDoubleClick')],
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Remove unusable double-click hotkey mode option

Adding 'doubleClick' to the recording mode picker makes users able to persist prefs.hotkey.mode = "doubleClick", but the backend hotkey state machine still only handles Toggle/Hold in coordinator::handle_pressed and only Hold in handle_released (src-tauri/src/coordinator.rs), so selecting this mode causes hotkey edges to do nothing and dictation cannot start/stop. This is a user-facing break of the core workflow whenever the new mode is chosen.

Useful? React with 👍 / 👎.

Comment thread openless-all/app/src/pages/Settings.tsx Outdated
if (message.includes('API Key')) return t('settings.providers.apiKeyMissing');
if (message.includes('Endpoint')) return t('settings.providers.endpointMissing');
if (message.includes('timeout') || message.includes('超时')) return t('settings.providers.requestTimeout');
if (message.includes('timeout') || message.includes('瓒呮椂')) return t('settings.providers.requestTimeout');
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Match timeout errors with the correct Chinese literal

The timeout fallback check now matches '瓒呮椂' instead of '超时', so Chinese timeout messages no longer map to settings.providers.requestTimeout and will fall through to a generic failure message. This regresses provider error UX specifically for Chinese-language timeout responses.

Useful? React with 👍 / 👎.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 7, 2026

Persistent review updated to latest commit 0bb8d3e

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 7, 2026

Persistent review updated to latest commit 0bb8d3e

millionart added 2 commits May 7, 2026 12:19
…-asr

# Conflicts:
#	openless-all/app/src-tauri/src/lib.rs
#	openless-all/app/src-tauri/src/types.rs
#	openless-all/app/src/lib/ipc.ts
#	openless-all/app/src/pages/Settings.tsx
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

https://github.com/appergb/openless/blob/0bb8d3ebcd737c5887eb0b52521c3ba034d16460/openless-all/app/src-tauri/src/commands.rs#L1028-L1030
P2 Badge Keep hotkey keys synchronized with dictation binding

After this change, the UI reads the recording-key label from hotkey.keys (getHotkeyBindingLabel), but sync_dictation_hotkey_legacy_fields only updates hotkey.trigger and leaves hotkey.keys untouched. When a user changes the dictation shortcut (for example from Right Ctrl to Right Alt or a custom combo), persisted keys can remain stale (ControlRight), so QA/Settings hotkey labels become incorrect even though backend behavior changed; this is a user-visible regression that can mislead users about which key starts/stops dictation.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +5 to +8
if (/^Key[A-Z]$/.test(code)) return true;
if (/^Digit[0-9]$/.test(code)) return true;
if (/^F([1-9]|1[0-9]|2[0-4])$/.test(code)) return true;
if (/^Numpad[0-9]$/.test(code)) return true;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Narrow Windows fallback key forwarding to real trigger keys

This candidate filter now treats nearly all keyboard codes as fallback candidates (Key*, Digit*, F*, numpad, and many symbols), so the app forwards ordinary typing events to handle_window_hotkey_event on Windows. In the current backend path, fallback is disabled (explicit_fallback_available is false), so these extra events are immediately ignored but still incur IPC and "ignored" logging for normal keypresses, causing avoidable log noise and unnecessary runtime overhead while users type in app windows.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0bb8d3ebcd

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +628 to 630
const knownAsr = visibleAsrPresets.find(x => x.id === prefs.activeAsrProvider);
const asrId = knownAsr ? knownAsr.id : 'volcengine';
setAsrProvider(asrId);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve hidden active ASR provider instead of forcing Volcengine

When running on non-Windows, this effect hides foundry-local-whisper and then unconditionally falls back the selected provider to volcengine if the active provider is not in visibleAsrPresets. If a user profile/vault still has Foundry active (for example after moving settings from Windows), the UI shows Volcengine while the backend remains on Foundry, so dictation continues to fail with the Windows-only provider error until the user manually re-selects another provider. Keep the unsupported active provider visible/disabled or explicitly switch the backend provider when applying this fallback.

Useful? React with 👍 / 👎.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 7, 2026

Persistent review updated to latest commit 70455aa

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 7, 2026

Persistent review updated to latest commit e0f7ca2

@H-Chris233 H-Chris233 merged commit 025c23f into Open-Less:main May 7, 2026
3 checks passed
@appergb
Copy link
Copy Markdown
Collaborator

appergb commented May 7, 2026

@claude 审查他的代码,严格审查,对他的这些逻辑性整理一下,检查还有哪些问题然后进行修复。

@millionart millionart deleted the codex/windows-local-asr branch May 7, 2026 05:02
@claude
Copy link
Copy Markdown

claude Bot commented May 7, 2026

Claude finished @appergb's task in 10m 20s —— View jobclaude/pr-292-20260507-0620Create PR ➔


代码审查与修复完成

已完成严格审查,整理所有逻辑问题并修复了其中两个确定的 Bug。


已修复问题

Bug 1 — foundry_runtime.rs:161release_now() 阻塞 prepare(P1)

问题release_now() 直接等待 lifecycle 锁。若此时 ensure_loaded_with_progress(模型下载/加载)正持有该锁,用户点击"释放"按钮会永久阻塞,直到下载完成才能释放内存。

修复:在等锁前先设置 cancel_prepare = true,触发 prepare 循环的检查点中止,锁随即释放,release 操作可正常执行。

pub async fn release_now(&self) -> Result<()> {
    // 先 cancel prepare,避免 release 在下载/加载进行中时永久等锁
    self.cancel_prepare.store(true, Ordering::SeqCst);
    let _lifecycle = self.lifecycle.lock().await;
    self.release_now_locked().await
}

Bug 2 — commands.rs:1101hotkey.keys 标签停留在旧值(P2)

问题sync_dictation_hotkey_legacy_fields 只更新 hotkey.trigger,不更新 hotkey.keys。由于 HotkeyBinding::default() 初始化为 keys: Some([ControlRight]),用户把快捷键改成 Right Alt 后,trigger 更新为 RightOption,但 keys 仍为 [ControlRight]。前端 getHotkeyBindingCodes 优先读 keys,导致 QaPanel 和 Settings 中标签一直显示"右 Ctrl"而非"右 Alt"。

修复

  • 修饰键路径:通过临时 HotkeyBinding{keys: None} 调用 effective_codes() 拿到正确 DOM 码,写入 hotkey.keys
  • 自定义组合键路径:将 hotkey.keys 置为 None,清除旧残留值,effective_codes() 回落到 trigger 解析

新增两个测试覆盖两种场景。


已确认无问题(审查时怀疑、核查后正常)

问题描述 核查结论
schedule_foundry_local_asr_release 读错 keep_loaded 字段 代码第 3219 行已正确读 foundry_local_asr_keep_loaded_secs,第 4103 行有专项测试
转写失败时未 schedule release 第 2565 行失败路径已调用 schedule_foundry_local_asr_release,已覆盖
cancel() 未传播到 runtime prepare cancel foundry_provider.rs:134 已有 self.runtime.request_cancel_prepare()
LocalAsr.tsx 刷新时重置用户选择的模型 foundrySelectionDirty.current ref 已正确保护,用户选择后不会被刷新覆盖

剩余已知问题(不修复原因)

级别 位置 问题 建议
P1 foundry_provider.rs:241 转写出错时仍清空 PCM 缓冲 每次会话创建新 provider,buffer 不跨会话,实际无数据丢失
P1 coordinator.rs:2301 Windows 无条件视 foundry-local-whisper 为凭据就绪 Foundry 是本地 provider,无凭据概念,可接受;错误会在转写时捕获并显示
P2 Settings.tsx:1094 非 Windows 时 UI 回退显示 volcengine,但 backend 仍是 foundry 纯展示问题;只有用户主动点击 volcengine 选项才触发 setActiveAsrProvider,一般不发生
P2 windowHotkeyFallback.ts:5 候选键过广(所有字母/数字键),产生无效 IPC 调用 explicit_fallback_available=false 时后端立即丢弃,不影响功能,仅多余 IPC 噪声

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants