Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enhancement: better tts #2256

Open
LebToki opened this issue May 14, 2024 · 6 comments
Open

enhancement: better tts #2256

LebToki opened this issue May 14, 2024 · 6 comments

Comments

@LebToki
Copy link

LebToki commented May 14, 2024

Problem:
The current implementation of Whisper in Open-WebUI uses a limited, robotic voice for all interactions.
While this is functional, it can be jarring and unnatural, making it difficult for users to engage with the interface.

Solution:
I would like to see the addition of more voices to Open-WebUI, specifically ones that are less robotic and more natural-sounding. This would improve the overall user experience and make the interface more enjoyable to interact with.

Alternatives considered:
I've considered using third-party voice libraries or integrating existing voice assistants, but these would require significant modifications to the Open-WebUI codebase. I've also considered using text-to-speech software, but these often lack the emotional expression and nuance of human speech.

Additional context:
The Whisper implementation in Open-WebUI is a great step forward in providing a more natural-sounding interface, but adding more voices would take it to the next level. This would be especially beneficial for users who rely heavily on voice interfaces for daily tasks, such as seniors or individuals with disabilities.

Some potential voices to consider adding include:

  • A softer, more gentle voice for more intimate interactions
  • A more assertive, commanding voice for tasks that require attention
  • A soothing, calming voice for relaxation and stress relief
  • A playful, whimsical voice for entertainment and leisure activities

By adding more voices to Open-WebUI, we can create a more engaging and immersive experience for users, making it easier for them to interact with the interface and achieve their goals.

@zhewang1-intc
Copy link

echo, i think this is a valuable feature.

@cheahjs
Copy link
Contributor

cheahjs commented May 14, 2024

Whisper isn't a text to speech model, it handles speech to text.

Currently there's two options for text to speech on Open WebUI, one is generated locally on your browser using the Web Speech API, and another by calling out to an OpenAI-compatible text to speech API. The local one uses whatever the browser provides, which is typically the OS's text to speech models. On the OpenAI front, some in the community have deployed LocalAI or OpenedAI Speech to provide self-hosted TTS models.

@MichaelFomenko
Copy link

MichaelFomenko commented May 14, 2024

Local Speech Models:

@tjbck tjbck changed the title adding better voices (That are tuned to act Less Robotic) enhancement: better tts May 14, 2024
@silentoplayz
Copy link
Collaborator

Semi-related #1456

@boshk0
Copy link

boshk0 commented May 20, 2024

OpenVoice would be great addition to OpenWebUI!
https://github.com/myshell-ai/OpenVoice

@kevin070982
Copy link

I can only recommend to use the solution provided by UXVirtual.
When you run Open WebUI in Docker then take a look here: #126 (comment).

Easy Steps:
Clone https://github.com/matatonic/openedai-speech
Run docker compose -f docker-compose.min.yml up for a minimal docker image with only piper TTs support <1GB (CPU only)
or
Run docker compose up for the HD version but requires around 4GB GPU VRAM
Then add the below lines to the TTS Settings under Audio in the Open WebUI Settings.

API Base URL: http://host.docker.internal:8000/v1
API Key: sk-111111111

following voices can be changed under Set Voice:
alloy
echo
echo-alt
fable
onyx
nova
shimmer

Enjoy more naturally sounding voice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants