Skip to content

feat(voice): batch STT via Yapper (Phase 1)#105

Merged
dimakis merged 6 commits intomainfrom
feat/voice-stt-batch
Apr 5, 2026
Merged

feat(voice): batch STT via Yapper (Phase 1)#105
dimakis merged 6 commits intomainfrom
feat/voice-stt-batch

Conversation

@dimakis
Copy link
Copy Markdown
Owner

@dimakis dimakis commented Apr 5, 2026

Summary

  • Audio capture (audio.ts): MediaRecorder wrapper with runtime format negotiation (WebM/Opus preferred, MP4 fallback for Safari), auto-stop timer, cancel support, and FormData helper for Yapper's /v1/transcribe endpoint
  • Voice hook (useVoice.ts): Yapper health polling (30s), browser mic permission flow with micBlocked state, recording state machine, and batch transcription via POST
  • MicButton component: push-to-talk via pointer events (hold to record, release to send, drag away to cancel), with recording pulse, transcribing spinner, and blocked states
  • ChatInput/ChatView wiring: optional voice prop on ChatInput, transcript inserted into textarea for user review before sending. Voice is purely opt-in — Mitzo works identically without Yapper

Frontend-only — server and v2 protocol are untouched. Follows the voice integration design doc.

Test plan

  • 34 new tests across 4 test files (audio, useVoice, MicButton, ChatInput integration)
  • Full suite passes: 545/545
  • Manual: verify mic button appears when Yapper is running on LAN
  • Manual: hold-to-record → release → transcript appears in textarea
  • Manual: mic button hidden when Yapper is offline
  • Manual: Safari format fallback (MP4)

🤖 Generated with Claude Code

dimakis and others added 5 commits April 5, 2026 20:01
MediaRecorder wrapper with runtime mimeType negotiation (WebM/Opus
preferred, MP4 fallback for Safari), auto-stop timer, cancel support,
and FormData helper for Yapper transcription endpoint.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Yapper health polling (30s interval), browser mic permission flow,
MediaRecorder capture, and POST /v1/transcribe for batch transcription.
Graceful degradation when Yapper is unavailable or mic is blocked.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Hold-to-record button with pointer events. States: hidden (unavailable),
idle, recording (red pulse), transcribing (spinner), mic-blocked.
Cancel on pointer leave.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ChatInput accepts optional voice prop, renders MicButton in the input
row. Transcript is inserted into the textarea on recording stop.
ChatView owns useVoice hook and passes it down. Mic button CSS with
recording pulse, transcribing spin, and blocked states.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Verifies mic button visibility, recording state, transcript insertion,
mic-blocked display, and graceful absence when voice prop is omitted.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* docs(design): tts playback design doc (phase 2)

Covers: useVoice TTS extension, text chunking with pipelining,
AudioContext playback, VoiceSettings component, ChatView auto-speak
on message_end, interruption rules, voice selection, and error handling.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs(design): address review feedback on tts design

- AudioContext reuse: singleton with lazy creation and cleanup
- AbortController on synthesize() for cancellable fetches
- Track messageId instead of messages.length for TTS trigger
- Simplify to sequential playback for MVP (no pipelining)
- Lazy voice list fetch (on first TTS enable, not mount)
- Dynamic default voice from /v1/voices response

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
@dimakis dimakis merged commit fd283be into main Apr 5, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant