Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Phoneme extraction with punctuations is wrongly delimited #771

Closed
erogol opened this issue Aug 29, 2021 · 9 comments
Closed

[Bug] Phoneme extraction with punctuations is wrongly delimited #771

erogol opened this issue Aug 29, 2021 · 9 comments
Assignees
Labels
bug Something isn't working wontfix This will not be worked on but feel free to help.

Comments

@erogol
Copy link
Member

erogol commented Aug 29, 2021

Describe the bug
Punctuations in extracted phonemes are delimited wrongly.

For instance the sentence tuː foːɹ paʊndz , bʌt hɛviɚ aɪɚnz , should be tuː foːɹ paʊndz, bʌt hɛviɚ aɪɚnz,

So punctuations do not need a space preceding them.

I think the current implementation causes unnatural silences in the trained models.

@erogol erogol added the bug Something isn't working label Aug 29, 2021
@synesthesiam
Copy link
Contributor

I'm reworking parts of gruut's tokenization pipeline to preserve whitespace. I'll delay updating the current pull request until these changes are in.

@skol101
Copy link

skol101 commented Sep 19, 2021

@erogol would love to see it in the next minor version update, please.

@erogol
Copy link
Member Author

erogol commented Sep 19, 2021

It is pretty much in the hands of @synesthesiam

@synesthesiam
Copy link
Contributor

I'm working on it! I have this pesky job that keeps taking my time 😉

This is taking longer than expected since I'm adding it and preliminary SSML support at this same time. It didn't seem worth it to me to redo the existing gruut tokenizer (to add proper whitespace preservation) only to scrap it later for SSML.

I will be completing the changes to gruut this week, and my goal is to have it integrated and tested with 🐸 TTS by 1 Oct 👍

@stale
Copy link

stale bot commented Oct 20, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.

@stale stale bot added the wontfix This will not be worked on but feel free to help. label Oct 20, 2021
@erogol
Copy link
Member Author

erogol commented Oct 22, 2021

I think Gruut 2 adresses this right @synesthesiam

@stale stale bot removed the wontfix This will not be worked on but feel free to help. label Oct 22, 2021
@synesthesiam
Copy link
Contributor

Yes, the punctuation doesn't have whitespace artificially added now.

@stale
Copy link

stale bot commented Nov 21, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.

@stale stale bot added the wontfix This will not be worked on but feel free to help. label Nov 21, 2021
@erogol
Copy link
Member Author

erogol commented Nov 23, 2021

I close this as it's been fixed by the latest Gruut

@erogol erogol closed this as completed Nov 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working wontfix This will not be worked on but feel free to help.
Projects
None yet
Development

No branches or pull requests

3 participants