Basic use appears to generate unusable output #982
-
I wasn't sure how to title this, but following the basic instructions, I did the following:
The result is attached and... it's pretty bad. I tried variations using different models:
The results for all of these are unusable. Have I missed something? The results seem really all over the place, with I'm pretty experienced in software development, but have next to no experience with ML so "just train the model to do X" doesn't really mean anything to me. If I shouldn't consider this library to be off-the-shelf text-to-speech, just let me know and I'll explore other options, but so far it's seemed remarkably user-friendly for an ML project, so I thought I'd give it a go as the voice for my current project. SamplesI had to zip these 'cause GitHub doesn't allow .wav files.
|
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 4 replies
-
Try giving the input punctuation like "Hello." or "Hello, this is a sample." I think this should help. A majority of the time the datasets these models are trained use punctuation, so the lack of punctuation may confuse it. Upon further testing, it looks like punctuation does fix the issue. |
Beta Was this translation helpful? Give feedback.
-
You need to add punctuations since all the models are trained with punctuated sentences. I close this as it is not a bug |
Beta Was this translation helpful? Give feedback.
-
Would it not be a better solution to "pre-process" input text? In the pre-processing, one could add "missing punctuation" (and potentially other things) to make any input closer to the trained sentences and ultimately give better results(?) For this case, it was as simple as adding |
Beta Was this translation helpful? Give feedback.
-
I found this post while trying to find the answer. I've tried a number of ways to get ./tts --text "Hello, this is Mike." tts_output_hello_this_is_mike.mp4./tts --text "Hello, this is mike. " --model_name tts_models/en/ljspeech/speedy-speech-wn tts_output_hello_this_is_mike_speedy.mp4Is there a requirement on the machine or memory? |
Beta Was this translation helpful? Give feedback.
Try giving the input punctuation like "Hello." or "Hello, this is a sample."
I think this should help. A majority of the time the datasets these models are trained use punctuation, so the lack of punctuation may confuse it.
Upon further testing, it looks like punctuation does fix the issue.
tts --text "hello"
tts --text "Hello."