Add Kannada (kn-IN) G2P support for TTS#15582
Open
jasro23 wants to merge 3 commits into
Open
Conversation
2cb28e8 to
8ddb3c2
Compare
|
@jasro23 Can you also add support for Telugu language |
Contributor
Author
@annagirimokshith . I am not too familiar with Telugu. |
Contributor
There was a problem hiding this comment.
Pull request overview
Adds Kannada (kn-IN) grapheme-to-phoneme (G2P) support for NeMo TTS, including a new Kannada IPA G2P implementation, locale character sets/punctuation, a pronunciation dictionary, and unit tests.
Changes:
- Introduce
KannadaG2pwith hybrid dictionary + rule-based IPA conversion. - Add
kn-INgrapheme and IPA character sets plus locale punctuation handling. - Add a Kannada pronunciation lexicon and basic unit tests validating G2P outputs.
Reviewed changes
Copilot reviewed 3 out of 4 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
nemo/collections/tts/g2p/models/kn_in_ipa.py |
New Kannada G2P implementation (dictionary + rule-based). |
nemo/collections/common/tokenizers/text_to_speech/ipa_lexicon.py |
Adds kn-IN locale support, including grapheme/IPA sets and punctuation. |
scripts/tts_dataset_files/kn_IN/kn_IN_nv260318.dict |
New Kannada pronunciation dictionary (~4.3K entries). |
tests/collections/common/tokenizers/text_to_speech/test_tts_tokenizers.py |
Adds unit tests for Kannada G2P behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Add KannadaG2p class with hybrid dictionary + rule-based IPA conversion - Add Kannada grapheme and IPA character sets to ipa_lexicon.py - Add kn-IN locale support with punctuation handling - Include lexicon with 4264 Kannada words - Add test script with assertions for validation The G2P module handles: - All Kannada vowels, consonants, matras (dependent vowels) - Virama (halant), anusvara, visarga - Anusvara place assimilation based on following consonant Signed-off-by: Jason Roche <jas.tech23@gmail.com>
5bb1488 to
d67c70b
Compare
Signed-off-by: Jason Roche <jas.tech23@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The G2P module handles:
Important
The
Update branchbutton must only be pressed in very rare occassions.An outdated branch is never blocking the merge of a PR.
Please reach out to the automation team before pressing that button.
What does this PR do ?
Add Kannada (kn-IN) G2P support for TTS
Collection: [TTS]
Changelog
Usage
# Add a code snippet demonstrating how to use thisGitHub Actions CI
The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.
The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".
Before your PR is "Ready for review"
Pre checks:
PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.
Who can review?
Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.
Additional Information