Feat/realtime voice vad p2 by ailuckly · Pull Request #19 · ailuckly/VocaTa

ailuckly · 2026-04-18T16:00:55Z

📌 变更内容

如:新增用户登录 API
如:修复登录失败时错误提示

✅ 测试验证

本地运行通过
自测通过
CI 流水线通过

PR 提交规范提醒：

确保提交主题信息符合约定式提交规范 (feat/fix/docs/style/refactor/test/chore)
确保代码已经通过本地测试
确保没有提交敏感信息(密码、密钥等)

Root cause: after VAD silence, stopRecording() disconnected the ScriptProcessorNode entirely, so onaudioprocess stopped firing during AI speech. Barge-in detection never triggered because there was no audio being captured to analyze. Fix: separate "stop sending audio" from "stop hardware": - Add monitoringOnly flag: mic stays open but frames are not sent - pauseRecording(): sets monitoringOnly=true (called by VAD silence callback instead of full stopRecording) - resumeRecording(): sets monitoringOnly=false + resets VAD state (called after TTS ends or barge-in detected) - onaudioprocess: VAD + barge-in detection runs in BOTH recording and monitoring modes; audio is only sent when !monitoringOnly Now the mic is always on during a call. State transitions: RECORDING → VAD silence → MONITORING (mic on, not sending) MONITORING + AI speaking + user speaks → barge-in → RECORDING MONITORING + TTS ends → RECORDING stopAudioCall() → hardware stop

Three fixes for smoother barge-in experience: 1. clearQueue() now immediately stops the currently-playing AudioBufferSourceNode via source.stop(), instead of just emptying the queue while the current chunk plays to completion. This eliminates the 200-500ms audio tail after barge-in. 2. Barge-in callback explicitly calls setAISpeaking(false) so the VAD state resets immediately, rather than waiting for the playback onended callback chain. 3. Added barge_in_ack message handler so the frontend logs server confirmation of successful barge-in (was silently dropped in the default case before).

After barge-in, resumeRecording() was called 3 times: 1. By the barge-in callback (correct) 2. By the playback state listener (300ms later, duplicate) 3. By handleProcessComplete (old pipeline's complete message) Each call reset hasSpeechStarted=false, causing the user's ongoing speech to be forgotten and VAD to immediately re-trigger silence. Fix: add monitoringOnly guard to resumeRecording() - only runs when actually in monitoring mode. Second/third calls are no-ops since monitoringOnly is already false after the first resume.

…e timeout audio_start was sent before getUserMedia and ScriptProcessorNode setup, causing the server pipeline to wait for audio that wouldn't arrive for ~500ms. With vad_eos=1000ms, the server-side VAD timed out and the pipeline completed before any audio was sent. Fix: move wsClient.startAudioRecording() to AFTER audioManager.startRecording() returns successfully, ensuring the mic is capturing and the ScriptProcessorNode is connected before the server starts its STT pipeline.

…ce triggers Root cause: ambient noise / mic self-noise exceeded SPEECH_THRESHOLD (0.015), causing hasSpeechStarted=true before the user spoke. Then silence was detected immediately → audio_end sent with no real speech. Fixes: - SPEECH_THRESHOLD: 0.015 → 0.02 (less sensitive to ambient noise) - MIN_SPEECH_FRAMES: 2 → 5 (~640ms of genuine speech required) - New VAD_GRACE_FRAMES=8: after recording starts or resumes, skip the first ~1s of VAD detection so the user has time to begin speaking - Grace period also applied in resumeRecording() after TTS ends

…auses 768ms silence window was cutting users off mid-sentence during normal word/phrase pauses (200-500ms). Increased SILENCE_FRAMES_REQUIRED from 6 to 10 frames (~1.3s), which tolerates natural conversational pauses while still auto-submitting reasonably quickly after the user finishes speaking.

Copilot

Pull request overview

This PR iterates on realtime voice/VAD behavior to improve noise robustness and support “monitoring-only” mode for barge-in (interrupting TTS) without fully stopping the microphone pipeline.

Changes:

Tuned VAD thresholds/frames and added a VAD grace window after recording starts.
Introduced “monitoring-only” mode with pauseRecording() / resumeRecording() to stop sending audio while keeping the mic open for barge-in detection.
Improved immediate TTS stop by tracking and stopping the current AudioBufferSourceNode in clearQueue().

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-18T16:05:09Z

+      // 持续模式：TTS 播完后恢复录音（麦克风一直开着，只需 resume）
      if (!isPlaying && this.isAudioCallActive && this.isContinuousModeActive) {
        setTimeout(() => {
-          if (this.isAudioCallActive && this.voiceState === 'idle') {
-            this.startRecording().catch(err => console.error('❌ 自动重启录音失败:', err))
+          if (this.isAudioCallActive) {
+            this.audioManager.resumeRecording()  // 从监听模式回到发送模式
          }


In continuous mode you now call resumeRecording() after TTS ends, but the VAD silence handler sends audio_end to the server. Resuming local sending without also sending a new audio_start will likely leave the server with no active audio session for subsequent turns. Consider sending wsClient.startAudioRecording() immediately before/when resuming from monitoring mode (or making resumeRecording() coordinate this via a callback).

Copilot · 2026-04-18T16:05:10Z

      setTimeout(() => {
-        if (this.isAudioCallActive && this.voiceState === 'idle') {
-          this.startRecording().catch(err => console.error('❌ complete后自动重启失败:', err))
+        if (this.isAudioCallActive) {


Same issue as the playback handler: handleProcessComplete() resumes local sending via audioManager.resumeRecording(), but after an audio_end the server likely requires a new audio_start before accepting binary audio again. Send startAudioRecording() before resuming (or centralize this logic in one place).

Suggested change

if (this.isAudioCallActive) {

if (this.isAudioCallActive) {

this.startAudioRecording()

Copilot · 2026-04-18T16:05:10Z

+const SPEECH_THRESHOLD = 0.02         // RMS 超过此值 → 识别为说话（提高以减少噪音误触）
+const SILENCE_THRESHOLD = 0.01        // RMS 低于此值 → 识别为静音
+const MIN_SPEECH_FRAMES = 5           // 至少 5 帧真实语音才允许 VAD 触发（~640ms，防环境噪音误触）
+const SILENCE_FRAMES_REQUIRED = 10    // 10 × 128ms ≈ 1.3s 静音后自动停止（容忍句间停顿）


The comment on SILENCE_FRAMES_REQUIRED says “静音后自动停止”, but the implementation now switches into monitoring-only mode and relies on callbacks to send audio_end / pause sending. Please update the constant comment to match the new behavior to avoid confusion when tuning VAD parameters.

Suggested change

const SILENCE_FRAMES_REQUIRED = 10 // 10 × 128ms ≈ 1.3s 静音后自动停止（容忍句间停顿）

const SILENCE_FRAMES_REQUIRED = 10 // 10 × 128ms ≈ 1.3s 连续静音后切换为监测/静音处理，具体 audio_end/暂停发送由回调控制（容忍句间停顿）

Copilot · 2026-04-18T16:05:10Z

    this.onAudioPlayCallback?.(false)

-    // 注册 VAD 静音回调：静音 ~0.8s 后自动提交
+    // 注册 VAD 静音回调：静音 ~0.8s 后暂停发送（麦克风保持开启）


This comment says “静音 ~0.8s 后…”, but SILENCE_FRAMES_REQUIRED is now 10 frames at 128ms/frame (~1.3s). Update the comment to reflect the current timing so behavior matches expectations during QA/tuning.

Suggested change

// 注册 VAD 静音回调：静音 ~0.8s 后暂停发送（麦克风保持开启）

// 注册 VAD 静音回调：静音 ~1.3s 后暂停发送（麦克风保持开启）

Copilot · 2026-04-18T16:05:10Z

      source.start()
+      this.currentSource = source


playAudio() checks if (!this.audioContext) await this.initialize(), but initialize() no longer creates an AudioContext, so direct calls to playAudio() can still hit this.audioContext!.decodeAudioData(...) with a null context. Call ensureAudioContext() here (or otherwise guarantee audioContext exists) before starting playback.

Copilot · 2026-04-18T16:05:11Z

+          // 冷却期：录音刚启动时跳过 VAD 检测（防止环境噪音误触发）
+          if (this.vadGraceRemaining > 0) {
+            this.vadGraceRemaining--
+          } else {


vadGraceRemaining is never initialized on a normal startRecording() path (it’s only set in resetRecordingState() / resumeRecording()), so the first recording session (and any session after a clean stopRecording()) will skip the intended grace period. Set this.vadGraceRemaining = VAD_GRACE_FRAMES when starting a new recording session (before audio processing begins).

Copilot · 2026-04-18T16:05:11Z

+          // 监听模式：麦克风开着但不发送音频
+          if (this.monitoringOnly) return
+


monitoringOnly can remain true across sessions because stopRecording() doesn’t reset VAD/monitoring flags (unlike resetRecordingState()). If a user ends a call while in listening mode, the next startRecording() will immediately return at if (this.monitoringOnly) return, sending no audio. Ensure monitoringOnly (and related VAD counters) are reset when stopping a recording session and/or at the beginning of startRecording().

ailuckly added 6 commits April 16, 2026 13:27

Copilot AI review requested due to automatic review settings April 18, 2026 16:00

ailuckly merged commit cfe3a00 into develop Apr 18, 2026
2 checks passed

Copilot started reviewing on behalf of ailuckly April 18, 2026 16:01 View session

Copilot AI reviewed Apr 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/realtime voice vad p2#19

Feat/realtime voice vad p2#19
ailuckly merged 6 commits into
developfrom
feat/realtime-voice-vad-p2

ailuckly commented Apr 18, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 18, 2026

Uh oh!

Copilot AI Apr 18, 2026

Uh oh!

Copilot AI Apr 18, 2026

Uh oh!

Copilot AI Apr 18, 2026

Uh oh!

Copilot AI Apr 18, 2026

Uh oh!

Copilot AI Apr 18, 2026

Uh oh!

Copilot AI Apr 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	if (this.isAudioCallActive) {
	if (this.isAudioCallActive) {
	this.startAudioRecording()

	const SILENCE_FRAMES_REQUIRED = 10 // 10 × 128ms ≈ 1.3s 静音后自动停止（容忍句间停顿）
	const SILENCE_FRAMES_REQUIRED = 10 // 10 × 128ms ≈ 1.3s 连续静音后切换为监测/静音处理，具体 audio_end/暂停发送由回调控制（容忍句间停顿）

	// 注册 VAD 静音回调：静音 ~0.8s 后暂停发送（麦克风保持开启）
	// 注册 VAD 静音回调：静音 ~1.3s 后暂停发送（麦克风保持开启）

		// 监听模式：麦克风开着但不发送音频
		if (this.monitoringOnly) return

Conversation

ailuckly commented Apr 18, 2026

📌 变更内容

✅ 测试验证

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants