Basic use appears to generate unusable output #982

danielquinn · 2021-11-29T21:59:28Z

danielquinn
Nov 29, 2021

I wasn't sure how to title this, but following the basic instructions, I did the following:

$ pip install tts
$ tts --text hello
$ mpv tts_output.wav

The result is attached and... it's pretty bad. I tried variations using different models:

$ tts --text hello --model_name=tts_models/en/ljspeech/tacotron2-DCA
$ tts --text hello --model_name=tts_models/en/ek1/tacotron2
$ tts --text hello --model_name=tts_models/en/ljspeech/glow-tts

The results for all of these are unusable. Have I missed something? The results seem really all over the place, with hello generating some creepy OOoooOOoOoOo noises, but hello I am a sample text sounding fabulous. Unfortunately, my application needs some semblance of predictability, and it's conceivable that I'd want my program to "say hello" so this is causing me some pain :-(

I'm pretty experienced in software development, but have next to no experience with ML so "just train the model to do X" doesn't really mean anything to me. If I shouldn't consider this library to be off-the-shelf text-to-speech, just let me know and I'll explore other options, but so far it's seemed remarkably user-friendly for an ML project, so I thought I'd give it a go as the voice for my current project.

Samples

I had to zip these 'cause GitHub doesn't allow .wav files.

tts --text "hello" (Bad)
tts --text "hello I am a sample text" (Great!)

Answered by DelanoLeslie

Nov 30, 2021

Try giving the input punctuation like "Hello." or "Hello, this is a sample."

I think this should help. A majority of the time the datasets these models are trained use punctuation, so the lack of punctuation may confuse it.

Upon further testing, it looks like punctuation does fix the issue.

tts --text "hello"
tts --text "Hello."

View full answer

DelanoLeslie · 2021-11-30T02:55:01Z

DelanoLeslie
Nov 30, 2021

Try giving the input punctuation like "Hello." or "Hello, this is a sample."

I think this should help. A majority of the time the datasets these models are trained use punctuation, so the lack of punctuation may confuse it.

Upon further testing, it looks like punctuation does fix the issue.

tts --text "hello"
tts --text "Hello."

1 reply

danielquinn Dec 4, 2021
Author

Ooh, that's a good to know. It might be a good idea to include that sort of thing in the README though as it's likely that people wouldn't think of this by default.

erogol · 2021-11-30T08:58:46Z

erogol
Nov 30, 2021
Maintainer

You need to add punctuations since all the models are trained with punctuated sentences.

I close this as it is not a bug

0 replies

saona-raimundo · 2021-12-23T16:28:25Z

saona-raimundo
Dec 23, 2021

Would it not be a better solution to "pre-process" input text?

In the pre-processing, one could add "missing punctuation" (and potentially other things) to make any input closer to the trained sentences and ultimately give better results(?)

For this case, it was as simple as adding . at the end of the sentence, a procedure that can be made totally generic ovr input!

2 replies

erogol Dec 28, 2021
Maintainer

We don't always know what every model expects to see. So such hard-coded pre-processing might cause obfuscated bugs.

So again our tools are mostly for experimenting with the models not intended for deployment. If you deploy your own model, it is up to the dev to deal with these details.

saona-raimundo Jan 3, 2022

Thank you for the answer! I assumed this could have been model-related, but I agree that such pre-processing would involve the model and the data, and therefore it is better suited as part of a finished product than to experiments!

jeffrafter · 2022-01-05T14:12:17Z

jeffrafter
Jan 5, 2022

I found this post while trying to find the answer. I've tried a number of ways to get TTS running on an Apple Mac M1 including running on Docker with https://github.com/synesthesiam/coqui-docker. Using version 0.5.0 and the latest models I am also getting unusable about - even with punctuation. I am sure I am doing something wrong:

./tts --text "Hello, this is Mike."

tts_output_hello_this_is_mike.mp4

./tts --text "Hello, this is mike. " --model_name tts_models/en/ljspeech/speedy-speech-wn

tts_output_hello_this_is_mike_speedy.mp4

Is there a requirement on the machine or memory?

1 reply

saona-raimundo Jan 5, 2022

Interesting! This might imply that it is not only punctuation but also capitalization that has an impact on the output quality?

In my fresh install of tts (in Windows), I do not see your problems. The quality is much better.
Details:

Using model: Tacotron2
Model's reduction rate r is set to: 1
Vocoder Model: hifigan
Generator Model: hifigan_generator
Discriminator Model: hifigan_discriminator

The difference between capitalization is heard in the rhythm of the voice, which is nice :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Basic use appears to generate unusable output #982

{{title}}

Replies: 4 comments 4 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Basic use appears to generate unusable output #982

danielquinn Nov 29, 2021

Samples

Replies: 4 comments · 4 replies

DelanoLeslie Nov 30, 2021

danielquinn Dec 4, 2021 Author

erogol Nov 30, 2021 Maintainer

saona-raimundo Dec 23, 2021

erogol Dec 28, 2021 Maintainer

saona-raimundo Jan 3, 2022

jeffrafter Jan 5, 2022

saona-raimundo Jan 5, 2022

danielquinn
Nov 29, 2021

Replies: 4 comments 4 replies

DelanoLeslie
Nov 30, 2021

danielquinn Dec 4, 2021
Author

erogol
Nov 30, 2021
Maintainer

saona-raimundo
Dec 23, 2021

erogol Dec 28, 2021
Maintainer

jeffrafter
Jan 5, 2022