Update text processing in Magpietts eval#15608
Conversation
rfejgin
left a comment
There was a problem hiding this comment.
Overall looks good to me. Left some minor comments.
rfejgin
left a comment
There was a problem hiding this comment.
All looks good. One question: have we tested the updated code at all on non-English? At least to the extent that we'd expect WERs to not be higher than before the change, or even to be lower. One possible way to test that while removing variability due to the TTS model would be to run evaluation on a pre-populated directory of Magpie outputs (if the code allows doing that easily), once before the changes and once after.
| return text | ||
|
|
||
|
|
||
| def get_text_processor(language: str) -> TextProcessor: |
There was a problem hiding this comment.
This function and https://github.com/NVIDIA-NeMo/NeMo/blob/main/nemo/collections/tts/parts/utils/helpers.py#L810 needs to be de-duplicated before we merge. The functionalities of process_text_for_cer() should be merged in and GRPO should use this text processor.
There was a problem hiding this comment.
That method is used by 3 different recipes: Magpie GPRO, Easy Magpie, and Magpie GRPO, and the refactoring needed for each of those is complicated both to implement and test. It is too large and dangerous of a change to couple with this PR, which is relatively small and required for us to proceed with updating our evaluation datasets.
I could implement something and rely on CI tests to validate each recipe still runs, but that is not safe because it is changing the GRPO criteria itself.
Using the same Magpie outputs I ran evaluation on CML test (french, german, italian, spanish) and all metrics were identical, except CER on french dropped from 0.022 to 0.019 because the updated logic with |
Signed-off-by: Ryan <rlangman@nvidia.com>
Signed-off-by: Ryan <rlangman@nvidia.com>
Signed-off-by: Ryan <rlangman@nvidia.com>
Signed-off-by: Ryan <rlangman@nvidia.com>
Signed-off-by: Ryan <rlangman@nvidia.com>
Signed-off-by: Ryan <rlangman@nvidia.com>
Signed-off-by: Ryan Langman <rlangman@nvidia.com>
|
/ok to test 418f0d3 |
Signed-off-by: Ryan Langman <rlangman@nvidia.com>
|
[🤖]: Hi @rlangman 👋, We wanted to let you know that a CICD pipeline for this PR just finished successfully. So it might be time to merge this PR or get some approvals. |
|
/ok to test d85344d |
|
[🤖]: Hi @rlangman 👋, We wanted to let you know that a CICD pipeline for this PR just finished successfully. So it might be time to merge this PR or get some approvals. |
What does this PR do ?
Adds an interface for doing language-specific text processing for TTS.
Collection: [TTS]
Changelog
isalnum(). Previous logic usingstring.punctuationonly removes a limited set of English punctuation.datasets_base_pathto allow doing inference and evaluation using a config file with relative paths.Pre checks:
PR Type: