-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TTS] Add VietnameseCharsTokenizer #9665
Conversation
Signed-off-by: huutuongtu <huutuongtu@users.noreply.github.com>
Signed-off-by: huutuongtu <huutuongtu@users.noreply.github.com>
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: XuesongYang <XuesongYang@users.noreply.github.com>
nemo/collections/common/tokenizers/text_to_speech/tts_tokenizers.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please refactor the code accordingly.
@@ -184,6 +208,32 @@ def get_ipa_punctuation_list(locale): | |||
'—', # em dash, U+2014, decimal 8212 | |||
] | |||
) | |||
if locale == "vi-VN": | |||
punct_set.update( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you pls add the source of punctuations of Vietnamese?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, it seems that there isn't any 'official' source that talks about Vietnamese punctuation marks. I can find some information about punctuation marks here: https://languagedrops.com/word/en/english/vietnamese/topics/punctuation/.
Maybe we just need to use DEFAULT_PUNCTUATION.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks. LGTM. pls add the source of Vietnamese punctuations if any.
…ilto:huutu12312vn@gmail.com)
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: XuesongYang <XuesongYang@users.noreply.github.com>
* Update tts_tokenizers.py * Update tokenizer_utils.py * Update test_tts_tokenizers.py * Apply isort and black reformatting Signed-off-by: huutuongtu <huutuongtu@users.noreply.github.com> * Signed-off-by: Tu [huutu12312vn@gmail.com](mailto:huutu12312vn@gmail.com) * Update ipa_lexicon.py - Signed-off-by: Tu [huutu12312vn@gmail.com](mailto:huutu12312vn@gmail.com) Signed-off-by: XuesongYang <XuesongYang@users.noreply.github.com> --------- Signed-off-by: huutuongtu <huutuongtu@users.noreply.github.com> Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: XuesongYang <XuesongYang@users.noreply.github.com> Co-authored-by: huutuongtu <huutuongtu@users.noreply.github.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: XuesongYang <XuesongYang@users.noreply.github.com> Signed-off-by: Boxiang Wang <boxiangw@nvidia.com>
* Update tts_tokenizers.py * Update tokenizer_utils.py * Update test_tts_tokenizers.py * Apply isort and black reformatting Signed-off-by: huutuongtu <huutuongtu@users.noreply.github.com> * Signed-off-by: Tu [huutu12312vn@gmail.com](mailto:huutu12312vn@gmail.com) * Update ipa_lexicon.py - Signed-off-by: Tu [huutu12312vn@gmail.com](mailto:huutu12312vn@gmail.com) Signed-off-by: XuesongYang <XuesongYang@users.noreply.github.com> --------- Signed-off-by: huutuongtu <huutuongtu@users.noreply.github.com> Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: XuesongYang <XuesongYang@users.noreply.github.com> Co-authored-by: huutuongtu <huutuongtu@users.noreply.github.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: XuesongYang <XuesongYang@users.noreply.github.com> Signed-off-by: Vivian Chen <xuanzic@example.com>
Signed-off-by: Tu huutu12312vn@gmail.com
What does this PR do ?
Add a Vietnamese language tokenizer for TTS training
Collection: [TTS]
Changelog
Usage
Before your PR is "Ready for review"
Pre checks:
PR Type: