Skip to content

Naturalize floating bar voice streaming#6259

Merged
kodjima33 merged 1 commit into
mainfrom
nik/naturalize-streaming-voice
Apr 1, 2026
Merged

Naturalize floating bar voice streaming#6259
kodjima33 merged 1 commit into
mainfrom
nik/naturalize-streaming-voice

Conversation

@kodjima33
Copy link
Copy Markdown
Collaborator

Summary

  • switch the default ElevenLabs voice from Rachel to Sloane for a less generic release voice
  • make streaming wait for sentence-sized chunks before speaking, with a larger emergency cutoff to avoid stitched robot prosody
  • slightly retune ElevenLabs voice settings and align the settings copy with the shipped default voice

Verification

  • intended verification target was the Mac mini only
  • the Mac mini became unreachable over SSH during this pass, so I could not complete the remote compile/run loop before merging
  • root cause for the reported bad voice was verified from source and release tags: v0.11.214 already contains streaming playback, still defaults to generic voices without a custom voice id, and chunks aggressively enough to sound robotic

@kodjima33 kodjima33 merged commit fdbff29 into main Apr 1, 2026
2 checks passed
@kodjima33 kodjima33 deleted the nik/naturalize-streaming-voice branch April 1, 2026 19:29
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 1, 2026

Greptile Summary

This PR improves voice playback quality in the floating control bar by switching the default ElevenLabs voice from Rachel to Sloane, retuning voice settings for more natural delivery, and significantly reworking the text-chunking heuristic to wait for sentence-level boundaries before handing text off to the TTS API — reducing the "stitched robot prosody" caused by over-eager chunking.

Key changes:

  • Default voice ID changed to Sloane (BAMYoBHLZM7lJgJAmFz0) in all three places it appears (service, settings UI, placeholder text).
  • Chunk thresholds raised: minimum 48 → 85 chars, preferred 140 → 220 chars, and a new 360-char emergency ceiling added.
  • nextChunkBoundary now has three tiers: (1) sentence-ending punctuation within the preferred window, (2) sentence-ending punctuation extended to the emergency window, then (3) clause separators / whitespace / hard cut only when the emergency ceiling is reached.
  • ElevenLabs voice settings retuned (stability 0.42 → 0.34, similarity boost 0.82 → 0.88, style 0.22 → 0.12).
  • System-voice fallback preference list expanded with "Ava" and "Allison" ahead of "Samantha".
  • floatingBarVoiceAnswersEnabled doc-comment de-scoped from "development builds" to all builds.
  • Note: Per the PR description, the intended hardware verification target (Mac mini) became unreachable before merging. The logic change was reviewed from source, but a live compile/run cycle was not completed.

Confidence Score: 5/5

  • Safe to merge — all findings are minor style suggestions with no functional impact.
  • The chunking logic is sound: all index arithmetic is bounded by min(text.count, limit), no infinite-loop risk exists in drainBufferedText, and the ElevenLabs error path already falls back to the system voice. The only finding is a P2 redundant character in the clause-separator set. The unverified hardware run is noted in the PR description as a known gap, not a regression introduced by this change.
  • No files require special attention beyond the one P2 style note in FloatingBarVoicePlaybackService.swift.

Important Files Changed

Filename Overview
desktop/Desktop/Sources/FloatingControlBar/FloatingBarVoicePlaybackService.swift Default ElevenLabs voice changed from Rachel to Sloane; chunk thresholds raised (min 48→85, preferred 140→220, new emergency 360); voice settings retuned; system voice preference list expanded — one minor redundancy in the clause-separator character set.
desktop/Desktop/Sources/FloatingControlBar/ShortcutSettings.swift Doc-comment updated to remove "development builds" qualifier, accurately reflecting that voice answers are now a general feature.
desktop/Desktop/Sources/MainWindow/Pages/SettingsPage.swift UI strings and placeholder voice ID updated from Rachel/21m00Tcm4TlvDq8ikWAM to Sloane/BAMYoBHLZM7lJgJAmFz0 to match the new default.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[New streamed text arrives] --> B{text.count >= minimumChunkLength\n85 chars?}
    B -- No --> WAIT[Return nil — buffer more text]
    B -- Yes --> C[Search preferredSlice\n0..220 chars for '.!?\\n']
    C -- Found --> SPLIT1[Split after last sentence-ending punctuation]
    C -- Not found --> D{text.count >= preferredChunkLength\n220 chars?}
    D -- No --> WAIT
    D -- Yes --> E[Search emergencySlice\n0..360 chars for '.!?\\n']
    E -- Found --> SPLIT2[Split after punctuation in emergency window]
    E -- Not found --> F{text.count >= emergencyChunkLength\n360 chars?}
    F -- No --> WAIT
    F -- Yes --> G[Search emergencySlice for ',;:']
    G -- Found --> SPLIT3[Split after clause separator]
    G -- Not found --> H[Search emergencySlice for whitespace]
    H -- Found --> SPLIT4[Split at last whitespace]
    H -- Not found --> SPLIT5[Hard cut at emergencyLimit]
Loading

Reviews (1): Last reviewed commit: "Naturalize floating bar voice streaming" | Re-trigger Greptile


guard text.count >= emergencyChunkLength else { return nil }

if let clauseIndex = emergencySlice.lastIndex(where: { ",;:\n".contains($0) }) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Redundant \n in clause separator set

At this point in the control flow, the preceding emergencySlice.lastIndex(where: { ".!?\n".contains($0) }) check on the same slice has already returned nil, which guarantees there is no \n character anywhere within emergencySlice. Including \n in ",;:\n" is therefore unreachable dead code on this path.

Suggested change
if let clauseIndex = emergencySlice.lastIndex(where: { ",;:\n".contains($0) }) {
if let clauseIndex = emergencySlice.lastIndex(where: { ",;:".contains($0) }) {

Glucksberg pushed a commit to Glucksberg/omi-local that referenced this pull request Apr 28, 2026
## Summary
- switch the default ElevenLabs voice from Rachel to Sloane for a less
generic release voice
- make streaming wait for sentence-sized chunks before speaking, with a
larger emergency cutoff to avoid stitched robot prosody
- slightly retune ElevenLabs voice settings and align the settings copy
with the shipped default voice

## Verification
- intended verification target was the Mac mini only
- the Mac mini became unreachable over SSH during this pass, so I could
not complete the remote compile/run loop before merging
- root cause for the reported bad voice was verified from source and
release tags: v0.11.214 already contains streaming playback, still
defaults to generic voices without a custom voice id, and chunks
aggressively enough to sound robotic
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant