feat(gui): Wire Audio/STT Pipeline to Tauri Shell#398
feat(gui): Wire Audio/STT Pipeline to Tauri Shell#398Coldaine wants to merge 13 commits intofeat/always-on-push-to-transcribefrom
Conversation
Implements the Portable Agentic Evidence Standard for ColdVox: - docs/reviews/portable_standard_critique.md: Philosophy document explaining why tautological unit tests are insufficient and the case for empirical evidence-based PR review. - docs/reviews/reviewer_driven_evidence.md: Workflow strategy describing how the reviewer-driven evidence process works, evidence tiers (1-5), and semantic drift detection patterns. - docs/plans/agentic-evidence-preview.md: System architecture spec for the shadow mode assessor: permissions, git diff strategy, token budget, failure modes, and Phase 2 considerations. - .github/prompts/evidence-assessor.md: The hardened CoT prompt that Gemini executes in CI. Includes explicit anti-hallucination constraints, structured output format, and ColdVox-specific ground truths (Moonshine fragile, Parakeet not ready, stubs dead). - .github/workflows/agentic-evidence-preview.yml: GitHub Actions workflow triggering on PR events. Uses fetch-depth: 0 for correct git diff, truncates diffs at 2000 lines, composes the full prompt with pre-gathered context, runs gemini-cli in non-interactive mode, and pipes the report to GITHUB_STEP_SUMMARY. Shadow mode: never blocks merges. GEMINI_API_KEY secret must be configured in repo settings.
…nces Root cleanup: - Delete CLAUDE.md, GEMINI.md (byte-identical copies of AGENTS.md) - Delete root junk: plugins.json, pr_365_details.json, test_enigo_live.rs - Archive 6 root reports to docs/archive/root/ Dead backend code: - Replace WHISPER_MODEL_PATH with STT_MODEL_PATH in types.rs and tests - Fix integration tests: whisper -> moonshine as preferred plugin - Update doc comments in plugin.rs, plugin_types.rs - Delete crates/app/plugins.json (had preferred_plugin: whisper) - Remove stale faster-whisper comment from Cargo.toml Dead reference fixes: - Replace ALL 20+ windows-multi-agent-recovery.md refs with current-status.md - Remove 'absolute truth' language from agent rules - Fix AGENTS.md pointer to nonexistent CI/policy.md -> CI/architecture.md - Fix README.md: remove CLAUDE.md reference - Update drive-project.prompt.md, gui-design-overview, todo.md Doc pruning: - Delete 15 empty/expired docs (stubs, chat transcripts, past-retention) - Archive 8 stale docs (Linux-only, org-wide, superseded) - Fix stt-overview.md: remove Whisper from Supported Backends - Fix aud-user-config-design.md: Moonshine is PyO3 not pure Rust - Fix fdn-testing-guide.md: add Parakeet validation warning Agent instruction restructure: - Sync AGENTS.md from .github/copilot-instructions.md (full content) - Update ensure_agent_hardlinks.sh: source is now copilot-instructions.md - Update check_markdown_placement.py: CLAUDE.md -> AGENTS.md - Update standards.md: remove CLAUDE.md/GEMINI.md references Dead vendor/scripts: - Delete vendor/vosk/ (stubs to dead Linux runner cache) - Delete scripts: setup-vosk-cache.sh, verify_vosk_model.sh, ensure_venv.sh, start-headless.sh
- SttRemoteAuthSettings: use #[derive(Default)] instead of manual impl (clippy::derivable_impls error in CI) - deny.toml: add RUSTSEC ignores for unmaintained transitive deps from Tauri (gtk3-rs, fxhash, unic-*, proc-macro-error - all from wry/tauri GUI layer, no safe upgrade available, no security impact) - docs/index.md: regenerate after doc cleanup changed file count/structure
Previous implementation failed because:
1. The 'Gather PR context' step failed with bash string substitution bugs
2. \{PLACEHOLDER\} patterns in bash expansion don't match {PLACEHOLDER} tokens
3. Large PRs caused the diff to be unavailable or truncated incorrectly
New approach:
- Use gemini-cli --approval-mode=yolo to give the agent autonomous tools
- Agent reads its instructions from the prompt file directly
- Agent runs git diff, reads files, and explores the repo itself
- No more brittle bash string replacement for prompt composition
- Combines two steps into one to avoid compose/run split failures
- Still uses gemini-2.0-flash (fixes model name/docs mismatch)
- Agent writes report to /tmp/report.md which is always checked
Addresses Copilot reviewer comments on bash substitution bugs and
model name mismatch between workflow comments and actual --model flag.
Delete .github/agents/ (project-driver, researcher, implementer, tester) and .github/prompts/drive-project.prompt.md — these were prompt-only specs with no automation hooks. They added complexity without value. Add docs/visuals/ with two interactive HTML dashboards: - agentic-workflow-dashboard.html: provenance, wiring, prompt anatomy - ci-reviewer-dashboard.html: activation cadence, prompt, implications
…rthstar (complex only) - Add complexity scorer (pure bash, no API) that counts Rust file changes - complex: >10 crates/ files → triggers Northstar Reviewer - moderate: 1-10 crates/ OR any workflow change - simple: docs/config only - Evidence Assessor now always runs using gemini-2.5-flash (was gemini-2.0-flash which is invalid) - Northstar Reviewer added, runs only on complex PRs, uses gemini-2.5-pro - Fix file access: instructions written into workspace dir (not /tmp/) so Gemini CLI can read them - Add northstar-alignment-reviewer.md prompt - Add _ci_*.md and _tmp_*.md patterns to .gitignore (agent working files)
…ort paths - Remove _ci_evidence_*.md and _ci_northstar_*.md from .gitignore Gemini CLI uses .gitignore as a security boundary and refuses to read or write any gitignored file. CI temp files must be ungitignored. - Fix evidence-assessor.md: write to _ci_evidence_report.md (not /tmp/report.md) - Fix northstar-alignment-reviewer.md: write to _ci_northstar_report.md The _tmp_*.md wildcard pattern is retained for other temp files.
Switch models to validated stable versions: - Evidence Assessor: gemini-3-flash-preview -> gemini-2.5-flash - Northstar Reviewer: gemini-3.1-pro-preview-customtools -> gemini-2.5-pro Add settings.json auth step before each gemini-cli invocation to prevent OAuth browser prompt in headless CI runners. GEMINI_API_KEY env var alone is not sufficient to skip interactive auth selection. Update comments and Step Summary labels to match actual models.
…i CLI" This reverts commit be490c4.
Adds AlwaysOnPushToTranscribe Mode and maintains a 2-second rolling audio buffer in the STT processor. Prevents hotkey start mechanical clipping of transcription.
Replaces demo driver with coldvox-app runtime. Connects Always-On Push-to-Transcribe mode and wires STT events to React frontend.
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 50484b4f1f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| pub struct OverlayModel { | ||
| demo_token: u64, | ||
| snapshot: OverlaySnapshot, |
There was a problem hiding this comment.
Reintroduce overlay state fields still used by methods
This change removes demo_token from OverlayModel, but methods in the same file (stop, clear, and reject_command) still write to self.demo_token, and the state module still carries demo-era references. In this commit state, coldvox-gui no longer compiles, so the new Tauri pipeline wiring cannot be built or run.
Useful? React with 👍 / 👎.
| resampler_quality: coldvox_audio::ResamplerQuality::Balanced, | ||
| stt_selection: Some(coldvox_stt::plugin::PluginSelectionConfig::default()), |
There was a problem hiding this comment.
Declare crates referenced in start_pipeline
start_pipeline uses coldvox_audio::ResamplerQuality and coldvox_stt::plugin::PluginSelectionConfig directly, but crates/coldvox-gui/src-tauri/Cargo.toml in this commit does not declare coldvox-audio or coldvox-stt as direct dependencies. Because Rust does not expose transitive deps by crate name, these paths are unresolved and the crate fails to compile.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Pull request overview
This PR replaces the GUI’s demo-driver wiring with a real integration between the Tauri v2 shell and the coldvox-app runtime, translating runtime STT events into overlay snapshots for the frontend.
Changes:
- Renames frontend bridge/hook commands from demo-driver to pipeline start/stop.
- Adds Tauri commands to start/stop the
coldvox-appruntime and forwardTranscriptionEvents to the UI. - Removes the Rust-side demo script and introduces a simplified overlay model update API (partial/final/status).
Reviewed changes
Copilot reviewed 7 out of 8 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| crates/coldvox-gui/src/lib/overlayBridge.ts | Updates invoked Tauri command names to start_pipeline / stop_pipeline. |
| crates/coldvox-gui/src/hooks/useOverlayShell.ts | Renames hook actions to startPipeline / stopPipeline and wires them through the command runner. |
| crates/coldvox-gui/src/App.tsx | Connects useOverlayShell pipeline actions into the OverlayShell component. |
| crates/coldvox-gui/src-tauri/src/state.rs | Refactors overlay model away from demo-token flow toward status/partial/final update helpers. |
| crates/coldvox-gui/src-tauri/src/lib.rs | Implements async Tauri commands to start/stop ColdVox runtime and emit overlay updates from STT events. |
| crates/coldvox-gui/src-tauri/src/demo.rs | Removes the demo script driver. |
| crates/coldvox-gui/src-tauri/Cargo.toml | Adds coldvox-app, tokio, and parking_lot dependencies/features for runtime integration. |
| Cargo.lock | Lockfile updates for the new dependencies. |
| use crate::contract::{OverlaySnapshot, OverlayStatus}; | ||
| use crate::demo::DemoStep; | ||
|
|
||
| #[derive(Debug, Default)] | ||
| pub struct OverlayModel { | ||
| demo_token: u64, | ||
| snapshot: OverlaySnapshot, | ||
| } |
There was a problem hiding this comment.
OverlayModel no longer has demo_token and the demo module has been removed, but this file still imports crate::demo::DemoStep and later methods/tests still reference self.demo_token, start_demo, and apply_demo_step. As-is this won’t compile; either fully remove the remaining demo-related code/tests or reintroduce the missing pieces consistently.
| let mut handle_guard = runtime.app_handle.lock().await; | ||
| if handle_guard.is_some() { | ||
| return Err("Pipeline already running".to_string()); | ||
| } | ||
|
|
||
| let opts = AppRuntimeOptions { | ||
| activation_mode: ActivationMode::AlwaysOnPushToTranscribe, | ||
| resampler_quality: coldvox_audio::ResamplerQuality::Balanced, | ||
| stt_selection: Some(coldvox_stt::plugin::PluginSelectionConfig::default()), | ||
| enable_device_monitor: true, | ||
| ..Default::default() | ||
| }; | ||
|
|
||
| let mut coldvox_app = app_runtime::start(opts).await | ||
| .map_err(|e| format!("Failed to start ColdVox runner: {}", e))?; |
There was a problem hiding this comment.
runtime.app_handle.lock().await is held across the potentially slow app_runtime::start(opts).await. Holding an async mutex guard across an .await can block other commands and is a deadlock risk if future code needs this lock during startup; consider taking/setting a temporary state then dropping the guard before awaiting startup.
| // Spawn STT event listener | ||
| tokio::spawn(async move { | ||
| while let Some(event) = stt_rx.recv().await { | ||
| let snapshot = { | ||
| let mut model = model_clone.lock(); | ||
| match event { | ||
| TranscriptionEvent::Partial { text, .. } => model.update_partial(text), | ||
| TranscriptionEvent::Final { text, .. } => model.update_final(text), | ||
| TranscriptionEvent::Error { message, .. } => { | ||
| model.set_status(OverlayStatus::Error, message) | ||
| } | ||
| } | ||
| }; | ||
| let _ = emit_and_resize(&app_clone, &window_clone, &snapshot, "stt-update"); | ||
| } | ||
| }); |
There was a problem hiding this comment.
The spawned STT listener task isn’t tracked or cancelled on stop_pipeline. That means it may still emit updates after a stop (draining buffered events), and a quick stop/start cycle can result in multiple listener tasks concurrently mutating the same model/window. Consider storing the JoinHandle (or a cancellation token / generation counter) in OverlayRuntime and aborting/invalidating it on stop/restart.
| let opts = AppRuntimeOptions { | ||
| activation_mode: ActivationMode::AlwaysOnPushToTranscribe, | ||
| resampler_quality: coldvox_audio::ResamplerQuality::Balanced, | ||
| stt_selection: Some(coldvox_stt::plugin::PluginSelectionConfig::default()), |
There was a problem hiding this comment.
AppRuntimeOptions.stt_selection is set to Some(PluginSelectionConfig::default()), which causes SttPluginManager::set_selection_config to persist that default config to disk and enable default GC/metrics. If the intent is to respect an existing user config, consider loading/passing the on-disk config (or adding a way to apply selection without saving) rather than always overwriting with defaults.
| stt_selection: Some(coldvox_stt::plugin::PluginSelectionConfig::default()), |
| let mut handle_guard = runtime.app_handle.lock().await; | ||
| if let Some(handle) = handle_guard.take() { | ||
| Arc::new(handle).shutdown().await; | ||
| let snapshot = runtime.with_model(|model| { | ||
| model.reset_to_idle("Pipeline stopped.".to_string()) | ||
| }); | ||
| emit_and_resize(&app, &window, &snapshot, "pipeline-stopped") |
There was a problem hiding this comment.
stop_pipeline holds the app_handle async-mutex guard while awaiting shutdown(). It’s safer to take the handle (or clone an Arc) and drop the guard before awaiting, so other commands aren’t blocked and to avoid lock-across-await patterns.
| use std::{ | ||
| sync::{Arc, Mutex}, | ||
| thread, | ||
| sync::Arc, | ||
| time::Duration, | ||
| }; | ||
|
|
||
| use contract::{OverlayEvent, OverlaySnapshot, OVERLAY_EVENT_NAME}; | ||
| use demo::demo_script; | ||
| use contract::{OverlayEvent, OverlaySnapshot, OVERLAY_EVENT_NAME, OverlayStatus}; | ||
| use state::OverlayModel; | ||
| use tauri::{AppHandle, Emitter, Manager, State, WebviewWindow}; | ||
| use coldvox_app::runtime::{self as app_runtime, AppHandle as ColdVoxHandle, AppRuntimeOptions, ActivationMode}; | ||
| use coldvox_app::stt::TranscriptionEvent; | ||
| use coldvox_audio::ResamplerQuality; | ||
| use coldvox_stt::plugin::PluginSelectionConfig; | ||
| use tokio::sync::Mutex as AsyncMutex; |
There was a problem hiding this comment.
There are several now-unused imports (Duration, ResamplerQuality, PluginSelectionConfig) after removing the demo driver and switching to fully-qualified paths. Please remove or use them to keep the Tauri shell crate warning-free.
| return { | ||
| snapshot, | ||
| setExpanded: (expanded: boolean) => runCommand(() => setOverlayExpanded(expanded)), | ||
| startDemo: () => runCommand(startDemoDriver), | ||
| startPipeline: () => runCommand(startPipeline), | ||
| togglePause: () => runCommand(togglePauseState), | ||
| stopDemo: () => runCommand(stopDemoDriver), | ||
| stopPipeline: () => runCommand(stopPipeline), | ||
| clearTranscript: () => runCommand(clearOverlayTranscript), | ||
| openSettings: () => runCommand(openSettingsPlaceholder), | ||
| }; |
There was a problem hiding this comment.
This hook API rename (startDemo/stopDemo → startPipeline/stopPipeline) requires updating remaining call sites/mocks. For example, crates/coldvox-gui/src/hooks/useOverlayShell.test.tsx still imports/mocks startDemoDriver and calls startDemo(), which will break tests/build until updated.
| <OverlayShell | ||
| snapshot={snapshot} | ||
| onSetExpanded={setExpanded} | ||
| onStartDemo={startDemo} | ||
| onStartDemo={startPipeline} | ||
| onTogglePause={togglePause} | ||
| onStop={stopDemo} | ||
| onStop={stopPipeline} | ||
| onClear={clearTranscript} |
There was a problem hiding this comment.
OverlayShell still exposes onStartDemo / renders a "Run demo" control, but App now wires that to startPipeline. This naming/label mismatch makes the UI and component contract misleading; consider renaming the prop/control to reflect pipeline start (or keep the demo terminology end-to-end if it’s still intended to be a demo).
254707d to
6a77c38
Compare
|
Superseded by retargeted #400 which has been merged to tauri-base. |
Stacked on #397. This PR removes the demo driver logic and connects the Tauri v2 GUI shell directly to the coldvox-app runtime. It wires the Always-On Push-to-Transcribe mode and translates STT events into frontend snapshots.