fix: cancel TTS on remote human speech in multi-participant huddles by tlongwell-block · Pull Request #332 · block/sprout

tlongwell-block · 2026-04-15T22:11:02Z

Problem

In multi-participant huddles, when a remote human speaks while the local agent's TTS is playing, the TTS continues uninterrupted. The existing barge-in mechanism only handles local speech (via STT), so remote human speech from the audio relay is ignored. This creates an awkward experience where the agent talks over humans.

Approach

Client-side per-peer frame counting in the relay recv task. Each remote peer gets an independent frame counter that increments on successfully-decoded Opus audio frames. Counting is gated on tts_active — frames only accumulate while TTS is playing. When any peer's counter crosses REMOTE_SPEECH_THRESHOLD (15 frames in a 500ms window), the shared tts_cancel atomic flag is set, which the TTS worker picks up and cancels playback.

Key design decisions:

Per-peer isolation: DTX comfort noise (~2-3 frames/window) from silent peers doesn't accumulate with real speech from active peers
Gated accumulation: Counters only increment while tts_active is true, preventing false triggers
Session reset: On TTS session start (false→true transition), all counters clear — prevents stale pre-playback speech from tripping a cancel
Instant-based window: 500ms reset window uses std::time::Instant (starvation-proof, no async dependency)
Saturating arithmetic: saturating_add prevents overflow on the u16 counters

What Changed

relay_api.rs (production): Per-peer HashMap<u8, u16> frame counting in the recv task, REMOTE_SPEECH_THRESHOLD constant promoted to pub(crate) module level, selective Joined handler, Instant-based window reset
tts.rs (tests): 18 new tests covering threshold behavior, per-peer isolation, DTX filtering, TTS session transitions, cancel consumption lifecycle, concurrent local+remote cancel safety, and apply_fades edge cases
check-file-sizes.mjs (config): File size overrides for relay_api.rs (502 lines, +2 over limit) and tts.rs (1005 lines with test suite)

Reviews

Three independent approvals:

Hana: Architecture review — approved the per-peer frame counting approach
Clove: Code quality review — 9/10
Lep: Security review — 9/10

Prior art research (Levin) confirmed no existing multi-participant barge-in implementations in the wild.

Thread tts_cancel and tts_active flags into the audio relay pipeline by cloning them from HuddleState inside connect_audio_relay. Use per-peer frame counting to distinguish real speech (~25 frames/500ms) from DTX comfort noise (~1 frame/500ms). When any remote peer crosses REMOTE_SPEECH_THRESHOLD (5 decoded audio frames) while TTS is active, immediately set tts_cancel. Key design choices: - REMOTE_SPEECH_THRESHOLD promoted to pub(crate) module level so tests can import it directly instead of duplicating the value. - Frame counting is gated on tts_active and happens after successful Opus decode (Ok(n) if n > 0). Corrupt frames and DTX silence are excluded. - TTS state transitions tracked at binary frame level (tts_was_active) so session boundaries are detected even during DTX silence. Counters reset on new TTS session (false→true) and on Instant-based window expiry (starvation-proof). Uses saturating_add. - joined selectively resets decoder/counter/active_indices state only for new or reassigned peer indices. Existing peers keep their state. - left cleans up index_to_pubkey, frame_counts, and decoders. - Acquire/Release ordering matches the existing tts.rs/stt.rs convention. Every remote peer on the audio WebSocket is a human (agents never connect to the audio relay), so no peer filtering is needed.

Comprehensive test suite for the per-peer frame counting and cancel mechanism. Covers threshold behavior, per-peer isolation, DTX comfort noise filtering, TTS session transitions, cancel consumption lifecycle, concurrent local+remote cancel safety, and apply_fades edge cases.

relay_api.rs is 2 lines over (502 vs 500) from the per-peer frame counting logic. tts.rs grew to ~1005 lines with the comprehensive remote interrupt test suite (18 tests, 564 lines).

* origin/main: [codex] Fix authz, scope propagation, and shell-injection bugs (#320) feat(mobile): implement Activity tab with personalized feed (#337) feat(mobile): upgrade mobile_scanner to v7 (Apple Vision) (#336) feat(mobile): app branding — icon, name, launch screen (#335) fix: cancel TTS on remote human speech in multi-participant huddles (#332) feat(mobile): design refresh — navigation, search, reactions (#334) feat(desktop+acp+mcp): deterministic nested thread replies via persisted reply context (#322) feat(mobile): channel management — create, browse, join/leave, DMs, canvas (#331) fix: derive staging ports from worktree to avoid collisions (#329) fix: mentions survive copy/paste from chat into composer (#328) feat(home): add activity and agent feed sections with deep-linking (#330)

tlongwell-block added 3 commits April 15, 2026 17:58

chore: add file size overrides for relay_api.rs and tts.rs

c7544df

relay_api.rs is 2 lines over (502 vs 500) from the per-peer frame counting logic. tts.rs grew to ~1005 lines with the comprehensive remote interrupt test suite (18 tests, 564 lines).

tlongwell-block requested a review from wesbillman as a code owner April 15, 2026 22:11

tlongwell-block merged commit d99332e into main Apr 16, 2026
10 checks passed

tlongwell-block deleted the fix/remote-human-tts-interrupt branch April 16, 2026 01:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: cancel TTS on remote human speech in multi-participant huddles#332

fix: cancel TTS on remote human speech in multi-participant huddles#332
tlongwell-block merged 3 commits intomainfrom
fix/remote-human-tts-interrupt

tlongwell-block commented Apr 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tlongwell-block commented Apr 15, 2026

Problem

Approach

What Changed

Reviews

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant