Skip to content

feat: 划词语音问答(Selection + Voice Q&A)— closes #118#119

Merged
appergb merged 6 commits intomainfrom
feat/selection-voice-qa
May 1, 2026
Merged

feat: 划词语音问答(Selection + Voice Q&A)— closes #118#119
appergb merged 6 commits intomainfrom
feat/selection-voice-qa

Conversation

@appergb
Copy link
Copy Markdown
Collaborator

@appergb appergb commented May 1, 2026

Closes #118

功能

用户在任意 app 选中一段文字 → 按 Cmd+Shift+; → QA 浮窗弹出(胶囊正上方 8px)+ 同时进入语音录音 → 用户提问 → 再按 hotkey 停止 → ASR 转写 + 选区 + 提问 一起送 LLM → markdown 答案显示在浮窗。Esc / 点外部 / 30s 关,可 Pin 久留。

实现(agent teams 并行)

调研 8 个同类工具(Wispr Flow Command Mode / Superwhisper Super Mode / Raycast AI / Apple Writing Tools / DeepL / ChatGPT macOS / Cursor Cmd+K / Bob)后形成方案,前后端两个 agent 在隔离 worktree 并行实现。

后端(Rust,agent commit e574029

  • 新文件 qa_hotkey.rs — 用现有 global-hotkey crate 注册 Cmd+Shift+;(macOS)/ Ctrl+Shift+;(Windows);toggle 模式
  • 新文件 selection.rs — AX AXSelectedText → Cmd+C 模拟(snapshot/restore pasteboard)三级 fallback;4000 字符截首+尾各 2000;Linux 返回 None
  • coordinator.rsQaSessionState + begin_qa_session / end_qa_session / cancel_qa_session,独立 phase 不抢 dictation 状态
  • polish.rs::answer_with_selection + qa_system_prompt,复用 chat_completion + context_premise(带 working_languages + front_app)
  • commands.rsget_qa_hotkey_label / set_qa_hotkey / qa_window_dismiss / qa_window_pin
  • lib.rsshow_qa_window / hide_qa_window / position_qa_window不抢前台 app 焦点(Cmd+C fallback 才能拿到选区)
  • UserPreferencesqa_hotkey: Option<QaHotkeyBinding> + qa_save_history: bool,serde default 兼容老 prefs

前端(React + TS,agent commit 63fd917

  • 新文件 pages/QaPanel.tsx — loading skeleton / answer markdown / error 三态视图 + Pin/Close 工具栏 + Esc / window-blur / 30s 自动关
  • tauri.conf.json 新加 windows entry label="qa"(380×280,transparent,alwaysOnTop,focus:false,acceptFirstMouse:true,初始 visible:false)
  • App.tsx + main.tsx?window=qa 路由分发
  • Settings.tsx 录音 section 加 "提问模式快捷键"(4 预设 + 不启用)+ "保存 Q&A 历史" toggle
  • lib/types.tsQaHotkeyBinding / QaStatePayloadUserPreferences 加新字段
  • lib/ipc.ts 4 个 wrapper
  • 新增 marked@^11.2.0 用于 markdown 渲染
  • i18n:qa.* + settings.recording.qa* 全套 zh/en 同步

集成层(commit e68b29d

联调发现 4 处契约不齐,在 main agent 这边修:

  • selectionPreview / text / message → snake_case selection_preview / answer_md / error
  • recording / transcribing / thinking 进度态合并为 loading(前端不识别这些 sub-state)
  • 前端忽略 idle 事件(pinned 用户在读的 answer 不被覆盖)

CLAUDE.md 红线全过

  • macOS hotkey 用 native(global-hotkey 内部走 Carbon API,非 rdev)
  • 不 NSApp.activate(QA 窗 focus: false,确保 Cmd+C fallback 能从原 app 拿选区)
  • bundle id / dictionary.json 等字段不动

Test plan

  • macOS:在 Mail / Safari / Xcode 选段文字按 hotkey,问"用大白话总结",答案应正确显示
  • VS Code / Slack(Electron AX 失败)→ Cmd+C fallback 走通
  • 没选区 → 自动降级为纯语音问答
  • 静默录音 → cancel,浮窗关,无 LLM 调用
  • Esc / 点外部 / 30s 自动关;Pin 后保持
  • Settings 切换 hotkey 预设 → 重新注册
  • 选区超 4000 字 → 截首+尾各 2000
  • 隐私:选区不出现在 history.json(prefs.qa_save_history=false 默认)

baiqing added 3 commits May 1, 2026 13:14
issue #118 后端实现:用户按 Cmd+Shift+;(macOS)/ Ctrl+Shift+;(Windows)
触发,捕获前台 app 当前选区文本,同时进入语音录音;再次按下停止后将
ASR 转写 + 选区文本送给 LLM,把 Markdown 答案显示在 QA 浮窗。

主要改动:
- types.rs:新增 QaHotkeyBinding 类型 + UserPreferences.qa_hotkey /
  qa_save_history 字段,serde(default) 兼容老 preferences.json。
- qa_hotkey.rs:新模块,用 global-hotkey crate 注册组合键并通过 mpsc
  发 QaHotkeyEvent::Pressed 边沿事件,drop 时反注册。
- selection.rs:新模块,三级 fallback 抓选区
  (macOS AX kAXSelectedText → Cmd+C / Ctrl+C 模拟复制 → Linux 返回 None),
  超 4000 字符截断为首尾各 2000 + […truncated…],模拟复制后还原原剪贴板。
- polish.rs:新增 OpenAICompatibleLLMProvider::answer_with_selection 方法
  + qa_system_prompt / qa_user_prompt,复用现有 chat_completion + context_premise。
- coordinator.rs:新增 QaSessionState(独立 phase 枚举,不抢 dictation 状态)+
  qa_hotkey supervisor / bridge 线程 + begin_qa_session / end_qa_session /
  cancel_qa_session 全流程;静默录音不调 LLM,prefs.qa_save_history=false 时
  不写 history.json。
- commands.rs / lib.rs:暴露 get_qa_hotkey_label / set_qa_hotkey /
  qa_window_dismiss / qa_window_pin IPC,新增 show_qa_window /
  hide_qa_window helper(label="qa",380×280,紧贴胶囊上方 8pt)。

CLAUDE.md 红线:
- macOS hotkey 走 global-hotkey crate(内部 Carbon RegisterEventHotKey),
  不引入 rdev,不破坏现有 CGEventTap dictation 路径。
- show_qa_window 不调 NSApp.activate,避免抢前台 app 焦点导致 Cmd+C 选区
  捕获失败。
- bundle id / dictionary.json 不动;新模块依赖只通过 types.rs。
- ASR / LLM 失败保持「用户的话不丢」语义,前端浮窗收 error 状态自行处理。

cargo check + 现有 coordinator 测试 + 新增 6 个 qa_hotkey/selection 单测
全部通过;pre-existing 15 warnings 不涉及(仅多 1 个 source_app 暂未读
取的 dead-code 警告,是给前端 agent 预留的字段)。
- 新增 qa 浮窗(tauri.conf.json windows[2]):380x280 无装饰磨砂玻璃
- 新增 QaPanel.tsx:loading skeleton / answer markdown / error 三态
  - marked@^11 渲染 markdown,注入轻量排版 CSS
  - Esc / 失焦 / 30s 超时自动 dismiss(Pin 时跳过)
  - 监听 qa:state + qa:dismiss 后端事件
- App.tsx 路由:?window=qa → QaPanel
- types.ts:QaHotkeyBinding + UserPreferences.qaHotkey/qaSaveHistory
  以及 QaStatePayload(snake_case 字段对齐 Rust)
- ipc.ts:getQaHotkeyLabel / setQaHotkey / qaWindowDismiss / qaWindowPin
  以及 mockSettings 默认 cmd+shift+;
- Settings 录音区新增「提问模式快捷键」下拉(4 预设 + 不启用)
  与「保存 Q&A 历史」开关
- i18n:zh-CN + en 同步新增 qa.* 与 settings.recording.qa* 文案
前后端 agent 各自实现完整路径后联调出 4 处契约不齐:

1. selectionPreview → selection_preview(snake_case 对齐 issue #118 spec)
2. answer payload `text` → `answer_md`,去掉前端用不到的 question 字段
3. error payload `message` → `error`
4. recording / transcribing / thinking 三个进度态合并为前端能识别的
   `loading`,避免前端 fall-through 渲染成空 AnswerView

外加 QaPanel 收到 idle 时直接 return,不替换 pinned 用户正在读的 answer
(idle 语义是"会话状态机回到 Idle",不 pin 时后端自己 hide 窗口)。

cargo check + npm run build 全绿,16 warnings 全是 pre-existing。
Copy link
Copy Markdown

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @appergb, you have reached your weekly rate limit of 500000 diff characters.

Please try again later or upgrade to continue using Sourcery

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e68b29d5a3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +204 to +206
std::thread::Builder::new()
.name("openless-qa-hotkey-supervisor".into())
.spawn(move || qa_hotkey_supervisor_loop(inner))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Initialize QA hotkey monitor on the main thread

start_qa_hotkey_listener always spawns a background supervisor thread, and that thread later calls QaHotkeyMonitor::start. In qa_hotkey.rs, QaHotkeyMonitor::start documents that GlobalHotKeyManager must be created on macOS main thread, so this call path violates the requirement and can make QA hotkey registration fail repeatedly on macOS (feature appears enabled but never triggers). Please route monitor creation through AppHandle::run_on_main_thread as intended.

Useful? React with 👍 / 👎.

Comment on lines +146 to +149
} else {
// 用户原剪贴板就是空 → 把 sentinel / 选区清掉,避免污染。
if let Err(e) = clipboard.set_text("") {
log::warn!("[selection] clipboard clear failed: {e}");
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve non-text clipboard contents in selection fallback

When clipboard.get_text() fails (common if the user clipboard currently holds an image or rich content), original is treated as None and this branch unconditionally writes an empty string back to the clipboard. Triggering QA in that state will silently destroy the user’s existing clipboard data. The fallback should avoid clearing clipboard when snapshot failed, or restore full clipboard content type-safely.

Useful? React with 👍 / 👎.

Comment on lines +276 to +277
// eslint-disable-next-line react/no-danger
dangerouslySetInnerHTML={{ __html: html }}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Sanitize QA markdown before using dangerouslySetInnerHTML

This renders model output HTML directly via dangerouslySetInnerHTML after marked.parse without sanitization. Because selected text and spoken question can steer LLM output, an attacker-controlled payload can inject active HTML (e.g., event-handler attributes) into the QA webview, creating an XSS path in a desktop context. Sanitize or escape rendered output before injecting it.

Useful? React with 👍 / 👎.

UI 拆分(用户面)
  - 新一级 tab "划词追问"(NAV 在"翻译"后面),icon 是 stroke 文本+问号
  - 新页 src/pages/SelectionAsk.tsx:3 个 Card——触发快捷键预设(含 Cmd+Option
    chord 选项 + 4 个三键预设 + 不启用),保存历史 toggle,使用方法 5 步说明 +
    浮窗位置/生命周期块 + 隐私契约块
  - Settings → 录音 删去 qaHotkey + qaSaveHistory 两行(已搬到新 tab)
  - i18n 加 nav.selectionAsk + 完整 selectionAsk.* 命名空间(zh+en),
    删 settings.recording.qa{Hotkey,SaveHistory}* 旧 keys
  - useAppState.ts AppTab 加 'selectionAsk'

Windows CI fix(PR #119 失败原因)
  - qa_hotkey.rs::Inner 含 GlobalHotKeyManager → Windows 上含 HHOOK 这种
    *mut c_void 不实现 Send/Sync,导致 coordinator.rs:1998 的 async_runtime::spawn
    捕获 Arc<Inner> 时 Windows 编译失败(macOS 编译过)
  - 加 unsafe impl Send + Sync for Inner,与现有 hotkey.rs::CallbackContext 同款
    做法(OS 句柄实际跨线程安全,crate 没标)

不动后端 hotkey 实现
  Cmd+Option chord 选项 UI 已暴露但底层 global-hotkey crate 不支持
  modifier-only chord(需要 CGEventTap 自己写状态机),UI 上加 ⚠️ 警告说明
  "需要 v1.2.9+ 才支持"。当前默认仍为 Cmd+Shift+;。
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 78ff8113eb

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

const QA_HOTKEY_PRESETS: readonly QaHotkeyPreset[] = [
// 双修饰键 chord(Cmd+Option / Ctrl+Alt)默认排第一——用户偏好的纯组合键。
// 后端实现需要 CGEventTap 的"双修饰键按下后无其他键插入即释放"模式(待后端补)。
{ id: 'cmd+option', label: 'Cmd+Option', binding: { primary: '', modifiers: ['cmd', 'option'] } },
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Drop unsupported modifier-only QA preset

This preset defines primary: '', but the backend parser rejects empty primaries (parse_primary returns UnsupportedKey for empty input in src-tauri/src/qa_hotkey.rs). Because bindingToPresetId also falls back to the first preset when a binding is not recognized (including the non-macOS default), this unsupported option is easy to surface and can be persisted, leaving QA hotkey registration failing on restart.

Useful? React with 👍 / 👎.

Comment on lines +71 to +75
await setQaHotkey(preset.binding);
} catch (error) {
console.error('[selectionAsk] failed to set qa hotkey', error);
}
await savePrefs({ ...prefs, qaHotkey: preset.binding });
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Abort prefs write when QA hotkey update fails

If setQaHotkey throws (invalid binding, registration conflict, platform rejection, etc.), this code still writes the same binding through savePrefs, so settings claim success even though the runtime monitor did not update. That can persist a broken hotkey configuration and cause repeated registration failures in later sessions. Return early on error instead of always saving the failed binding.

Useful? React with 👍 / 👎.

issue #118 v2 完整闭环。

## 多轮对话
- QaSessionState 加 messages: Vec<QaChatMessage> + panel_visible flag。
- Cmd+Shift+/ 现在 toggle 浮窗可见性(不再启动录音);浮窗可见时 rightOption
  路由到 QA recording,不可见时仍是主听写。open_qa_panel / close_qa_panel 分离。
- 第一轮 user message 嵌选区原文(# 选区原文 / # 我的问题),之后只送提问。
- 前端 QaPanel 渲染气泡列表(user 蓝色靠右 / assistant 灰色 markdown 靠左)。

## 流式输出
- chat_completion_history → chat_completion_history_streaming:开 stream:true
  + SSE chunk 解析,on_delta 回调把每 chunk emit 成 qa:state{kind:"answer_delta"}。
- 前端 streamingAnswer state + StreamingAssistantBubble(带蓝色闪烁 caret)。
- answer 事件落定后清空 buffer,最终 messages 接管。

## macOS 浮窗修复(v1 一直存在的雷)
- capabilities/default.json 加 "qa" 到 windows 数组。否则前端 listen("qa:state")
  被 Tauri 权限拦截,浮窗永远收不到事件。
- show_qa_window 用 NSWindow.orderFrontRegardless 替代 window.show()
  (后者在 macOS 走 makeKeyAndOrderFront 抢 frontmost)。这样 capture_selection
  的 AX read / Cmd+C fallback 能稳定从原 app 读到选区。
- 所有 ObjC msg_send 必须 wrap 进 app.run_on_main_thread —— macOS 26 对
  NSWindow 主线程要求是硬断言(违反直接 SIGTRAP)。
- NSWindow.movableByWindowBackground=YES:浮窗任意空白处可拖。
- QA hotkey 注册在主线程:global-hotkey crate 在 macOS 用 Carbon
  RegisterEventHotKey,事件回调要靠主 run loop dispatch。

## 胶囊状态同步
- QA 录音 / 转写 / 思考 同步推 capsule:state,让用户在底部胶囊也看到状态。
- schedule_capsule_idle 同时检查 dictation 和 QA 都 Idle 才隐藏,避免旧
  dictation Done timer 把 QA 的胶囊误关。
- open_qa_panel 进入时先 emit Idle 清掉 dictation 残留 message + insertedChars。

## 设置页
- SelectionAsk 独立成一级 tab;预设减到 4 个 cmd+shift+ 系列(Option 跟主听写
  共用,不放进 QA hotkey)。
- howto / hotkey / privacy 文案重写,反映 v2 的「先开浮窗、再 Option 录音」流程。
- prefs 默认 qaHotkey = cmd+shift+;。

## 后续路线图
- docs/qa-reasoning-roadmap.md:思考能力 v2.2 设计稿(C 路线 reasoner model
  切换 + reasoning_content delta + 折叠思考块)。
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

# Conflicts:
#	openless-all/app/src-tauri/src/lib.rs
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fab31d23e1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

state.selection = None;
state.front_app = None;
state.phase = QaPhase::Idle;
state.cancelled = false;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve QA cancel state while closing panel

When close_qa_panel is called during QaPhase::Processing, cancel_qa_session correctly marks cancelled=true, but this block immediately resets it to false. That defeats the cancellation checks later in end_qa_session (after ASR/LLM awaits), so a user dismissing the panel can still get a late answer/error emitted and side effects applied after the window is closed. Keep the cancel flag set until the in-flight async pipeline observes it (or explicitly abort the task) to make dismiss deterministic.

Useful? React with 👍 / 👎.

Comment on lines +645 to +647
let panel_visible = inner.qa_state.lock().panel_visible;
if panel_visible {
handle_qa_option_edge(inner).await;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Block QA key routing while dictation is active

This routing sends every main hotkey press to QA whenever panel_visible is true, regardless of dictation phase. If users open the QA panel while a dictation session is already Starting/Listening, the next trigger no longer reaches the dictation state machine, so stop/cancel behavior is hijacked and QA may try to start a second recorder path concurrently. Gate this branch on dictation being idle (or close/stop dictation before enabling QA routing) to avoid cross-session interference.

Useful? React with 👍 / 👎.

@appergb appergb merged commit b79a1e7 into main May 1, 2026
2 checks passed
@appergb appergb deleted the feat/selection-voice-qa branch May 6, 2026 11:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[feature] 划词语音问答(Selection + Voice Q&A)

1 participant