enhancement: better tts #2256

LebToki · 2024-05-14T08:50:58Z

Problem:
The current implementation of Whisper in Open-WebUI uses a limited, robotic voice for all interactions.
While this is functional, it can be jarring and unnatural, making it difficult for users to engage with the interface.

Solution:
I would like to see the addition of more voices to Open-WebUI, specifically ones that are less robotic and more natural-sounding. This would improve the overall user experience and make the interface more enjoyable to interact with.

Alternatives considered:
I've considered using third-party voice libraries or integrating existing voice assistants, but these would require significant modifications to the Open-WebUI codebase. I've also considered using text-to-speech software, but these often lack the emotional expression and nuance of human speech.

Additional context:
The Whisper implementation in Open-WebUI is a great step forward in providing a more natural-sounding interface, but adding more voices would take it to the next level. This would be especially beneficial for users who rely heavily on voice interfaces for daily tasks, such as seniors or individuals with disabilities.

Some potential voices to consider adding include:

A softer, more gentle voice for more intimate interactions
A more assertive, commanding voice for tasks that require attention
A soothing, calming voice for relaxation and stress relief
A playful, whimsical voice for entertainment and leisure activities

By adding more voices to Open-WebUI, we can create a more engaging and immersive experience for users, making it easier for them to interact with the interface and achieve their goals.

zhewang1-intc · 2024-05-14T14:41:53Z

echo, i think this is a valuable feature.

cheahjs · 2024-05-14T14:56:12Z

Whisper isn't a text to speech model, it handles speech to text.

Currently there's two options for text to speech on Open WebUI, one is generated locally on your browser using the Web Speech API, and another by calling out to an OpenAI-compatible text to speech API. The local one uses whatever the browser provides, which is typically the OS's text to speech models. On the OpenAI front, some in the community have deployed LocalAI or OpenedAI Speech to provide self-hosted TTS models.

MichaelFomenko · 2024-05-14T15:12:02Z

Local Speech Models:

silentoplayz · 2024-05-19T02:21:33Z

Semi-related #1456

boshk0 · 2024-05-20T20:19:32Z

OpenVoice would be great addition to OpenWebUI!
https://github.com/myshell-ai/OpenVoice

kevin070982 · 2024-06-03T14:35:25Z

I can only recommend to use the solution provided by UXVirtual.
When you run Open WebUI in Docker then take a look here: #126 (comment).

Easy Steps:
Clone https://github.com/matatonic/openedai-speech
Run docker compose -f docker-compose.min.yml up for a minimal docker image with only piper TTs support <1GB (CPU only)
or
Run docker compose up for the HD version but requires around 4GB GPU VRAM
Then add the below lines to the TTS Settings under Audio in the Open WebUI Settings.

API Base URL: http://host.docker.internal:8000/v1
API Key: sk-111111111

following voices can be changed under Set Voice:
alloy
echo
echo-alt
fable
onyx
nova
shimmer

Enjoy more naturally sounding voice.

tjbck changed the title ~~adding better voices (That are tuned to act Less Robotic)~~ enhancement: better tts May 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

enhancement: better tts #2256

enhancement: better tts #2256

LebToki commented May 14, 2024

zhewang1-intc commented May 14, 2024

cheahjs commented May 14, 2024

MichaelFomenko commented May 14, 2024 •

edited

silentoplayz commented May 19, 2024

boshk0 commented May 20, 2024 •

edited

kevin070982 commented Jun 3, 2024

enhancement: better tts #2256

enhancement: better tts #2256

Comments

LebToki commented May 14, 2024

zhewang1-intc commented May 14, 2024

cheahjs commented May 14, 2024

MichaelFomenko commented May 14, 2024 • edited

silentoplayz commented May 19, 2024

boshk0 commented May 20, 2024 • edited

kevin070982 commented Jun 3, 2024

MichaelFomenko commented May 14, 2024 •

edited

boshk0 commented May 20, 2024 •

edited