Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request] prosody rate, style emotions, expressiveness, aggressiveness, pace, etc. #437

Closed
andrewarrow opened this issue Apr 18, 2021 · 4 comments
Labels
feature request feature requests for making TTS better.

Comments

@andrewarrow
Copy link

andrewarrow commented Apr 18, 2021

The resemble.ai system has markup like:

<prosody rate="45%"><style emotions="expressiveness:0.9
aggressiveness:0.5 pace:0.2">
<say-as interpret-as="characters">Zeuxis</style></say-as>

Is this open sourced in coqui?

@andrewarrow andrewarrow added the feature request feature requests for making TTS better. label Apr 18, 2021
@AndrewBarfield
Copy link

I've been thinking about the same. Especially speech rate.

I've also come across some text that isn't read correctly, like number ranges (i.e., 400-750) and acronyms (i.e., MPH). This could be interpreted correctly via mark-up configuration.

@erogol
Copy link
Member

erogol commented Apr 19, 2021

This level of detail is not possible with coqui TTS yet due to the limits of the open datasets.

Depending on which model you use, it might struggle with the acronyms and numbers too.

These are limitations due to the use of a publicly available dataset. Most commercial systems use specially created TTS datasets.

@erogol erogol closed this as completed Apr 19, 2021
@AndrewBarfield
Copy link

For numerics and acronyms, we can simply preprocess the string before synthesizing using search and replace or regex.

This is no show stopper.

@erogol
Copy link
Member

erogol commented Apr 20, 2021

That's true. Some of the models we release use Phonemes and a text front-end to do the work. You might like to try them.

The only model that only use characters is tts_models/en/ljspeech/tacotron2-DDC the rest is more robust to such variations.

Hopefully we'll update this mode soon to use a more advance front end.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request feature requests for making TTS better.
Projects
None yet
Development

No branches or pull requests

3 participants