Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request] add stopnet delay argument to synthesis function (tacotron) #440

Closed
WeberJulian opened this issue Apr 21, 2021 · 5 comments
Labels
feature request feature requests for making TTS better. wontfix This will not be worked on but feel free to help.

Comments

@WeberJulian
Copy link
Contributor

Sometimes synthesis for some sentences are cut short at the last word. I know (think) that it's indicative that something is amiss in the model or the dataset, either not trained long enough, audio parameters could be tuned further (trim_db ?) or just dataset quality. But taking time to fix that issue, debugging and training many models is a luxury that some people can't afford (maybe even more if it's a low ressource language).

I would gladly do a PR to propose the feature but I'm not sure how to go about the implementation.
Would adding a stopnet delay (delaying from n steps the stop signal) solve this issue ?

@WeberJulian WeberJulian added the feature request feature requests for making TTS better. label Apr 21, 2021
@erogol
Copy link
Member

erogol commented Apr 21, 2021

do you think it would solve the problem for all the occurrences?

Then you might need to tune that delay per sample.

In general, there are two tricks I also use:

  1. stopnet delay. Maybe delay longer than it needs and trim the silence.
  2. Don't use stopnet but look at the attention map and signal stop when the attention reaches the last token.

@WeberJulian
Copy link
Contributor Author

Then you might need to tune that delay per sample.

I was thinking that in the worst scenario it would add half a second of silence at the end of the sample (but I never actually tried)

Don't use stopnet but look at the attention map and signal stop when the attention reaches the last token.

That approach sounds interesting, is it implemented yet ?

@erogol
Copy link
Member

erogol commented Apr 21, 2021

That approach sounds interesting, is it implemented yet?

implemented once a long ago but don't know now where :)

@WeberJulian
Copy link
Contributor Author

implemented once a long ago but don't know now where :)

Haha ok, I'm gonna look for it and try both approaches

@stale
Copy link

stale bot commented May 21, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.

@stale stale bot added the wontfix This will not be worked on but feel free to help. label May 21, 2021
@stale stale bot closed this as completed May 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request feature requests for making TTS better. wontfix This will not be worked on but feel free to help.
Projects
None yet
Development

No branches or pull requests

2 participants