-
Notifications
You must be signed in to change notification settings - Fork 135
Staging vi tn to main #338
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
* Add Vietnamese text normalization for cardinal semiotic class Signed-off-by: folivoramanh <palasek182@gmail.com> * Add missing init file Signed-off-by: folivoramanh <palasek182@gmail.com> * Fix Cardinal and optimize logic Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: folivoramanh <palasek182@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
* Add Vietnamese text normalization for ordinal and decimal semiotic classes Signed-off-by: folivoramanh <palasek182@gmail.com> * update sparrowhawk Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refractor decimal code and docstring Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: folivoramanh <palasek182@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
* Fraction class for Vietnamese TN Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove irrelavant test case Signed-off-by: folivoramanh <palasek182@gmail.com> * Remove irrelavant test case Signed-off-by: folivoramanh <palasek182@gmail.com> --------- Signed-off-by: folivoramanh <palasek182@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
* Date for vietnamese TN Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add roman support and correct copyright header Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change header to current year Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change header time Signed-off-by: folivoramanh <palasek182@gmail.com> --------- Signed-off-by: folivoramanh <palasek182@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
* Time - semiotic class for Vietnamese TN Signed-off-by: folivoramanh <palasek182@gmail.com> * remove irrelevant import and comment Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add comment and refractor pattern Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Change the spaces to NEMO_SPACE for maintenance. Signed-off-by: folivoramanh <palasek182@gmail.com> * Change the spaces to NEMO_SPACE for maintenance. Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Change the spaces to NEMO_SPACE for maintenance. - remove quote Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: folivoramanh <palasek182@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
* Add Vietnamese TN support for Money and Range semiotic classes - Add money.py tagger and verbalizer for Vietnamese currency handling - Add range.py tagger for numerical range processing - Add supporting data files for money (currency, currency_minor, per_unit) - Add quantity abbreviations and time units data - Update existing taggers and verbalizers for integration - Add comprehensive test cases for money and range functionality - Update tokenize_and_classify to include new semiotic classes Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * modify illogical test cases Signed-off-by: folivoramanh <palasek182@gmail.com> * refractor and simplify word and punctuation to avoid hardcoding Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refractor code money range Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: folivoramanh <palasek182@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
* Add Vietnamese measure text normalization support - Added measure tagger and verbalizer for Vietnamese TN - Updated money tagger and verbalizer to handle per-unit measurements - Added test cases for measure normalization - Updated fraction handling for better integration - Added data files for measurements, prefixes, and per-unit bases Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: folivoramanh <palasek182@gmail.com> * add test case for range measure Signed-off-by: folivoramanh <palasek182@gmail.com> * additional support for cardinal and remove duplicate test case Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refractor cardinal and add test cases Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove duplicate lines in run_eval file Signed-off-by: folivoramanh <palasek182@gmail.com> * refractor minor code Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add measure support for unit per unit cases Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: folivoramanh <palasek182@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
* fix and add cases Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: folivoramanh <palasek182@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
* Fix Jenkinsfile for CI * Fix requirements for test * Update paths and docker * Fix docker name * Fix click version * Change path of grammars for sparrowhawk tests * Update paths in sh_test.sh * Update paths * Revert paths --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Signed-off-by: folivoramanh <palasek182@gmail.com> Co-authored-by: anand-nv <105917641+anand-nv@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
* fix range and quote Signed-off-by: folivoramanh <palasek182@gmail.com> * fix quote in post process Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix quote and range Signed-off-by: folivoramanh <palasek182@gmail.com> --------- Signed-off-by: folivoramanh <palasek182@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
* improve numeric semiotic classes Signed-off-by: folivoramanh <palasek182@gmail.com> * Fix Jenkinsfile for CI (#325) * Fix Jenkinsfile for CI Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix requirements for test Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update paths and docker Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix docker name Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix click version Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Change path of grammars for sparrowhawk tests Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update paths in sh_test.sh Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update paths Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Revert paths Signed-off-by: Anand Joseph <anajoseph@nvidia.com> --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: folivoramanh <palasek182@gmail.com> * revert old codes Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert not inherit Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * improve date time Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix pynini union instead of union operator Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * improve measure, telephone, electronic Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change union operator to pynini union Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: folivoramanh <palasek182@gmail.com> Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Co-authored-by: anand-nv <105917641+anand-nv@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
* Fix Jenkinsfile for CI (#325) * Fix Jenkinsfile for CI Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix requirements for test Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update paths and docker Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix docker name Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix click version Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Change path of grammars for sparrowhawk tests Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update paths in sh_test.sh Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update paths Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Revert paths Signed-off-by: Anand Joseph <anajoseph@nvidia.com> --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Signed-off-by: folivoramanh <palasek182@gmail.com> * PR: Add Vietnamese text normalization for cardinal semiotic class (#289) * Add Vietnamese text normalization for cardinal semiotic class Signed-off-by: folivoramanh <palasek182@gmail.com> * Add missing init file Signed-off-by: folivoramanh <palasek182@gmail.com> * Fix Cardinal and optimize logic Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: folivoramanh <palasek182@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: folivoramanh <palasek182@gmail.com> * Ordinal and Decimal for Vietnamese TN (#290) * Add Vietnamese text normalization for ordinal and decimal semiotic classes Signed-off-by: folivoramanh <palasek182@gmail.com> * update sparrowhawk Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refractor decimal code and docstring Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: folivoramanh <palasek182@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: folivoramanh <palasek182@gmail.com> * Vietnamese TN - Fraction (#296) * Fraction class for Vietnamese TN Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove irrelavant test case Signed-off-by: folivoramanh <palasek182@gmail.com> * Remove irrelavant test case Signed-off-by: folivoramanh <palasek182@gmail.com> --------- Signed-off-by: folivoramanh <palasek182@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: folivoramanh <palasek182@gmail.com> * Date Semiotic Class for Vietnamese TN (#298) * Date for vietnamese TN Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add roman support and correct copyright header Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change header to current year Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change header time Signed-off-by: folivoramanh <palasek182@gmail.com> --------- Signed-off-by: folivoramanh <palasek182@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: folivoramanh <palasek182@gmail.com> * Time - semiotic class for Vietnamese TN (#302) * Time - semiotic class for Vietnamese TN Signed-off-by: folivoramanh <palasek182@gmail.com> * remove irrelevant import and comment Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add comment and refractor pattern Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Change the spaces to NEMO_SPACE for maintenance. Signed-off-by: folivoramanh <palasek182@gmail.com> * Change the spaces to NEMO_SPACE for maintenance. Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Change the spaces to NEMO_SPACE for maintenance. - remove quote Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: folivoramanh <palasek182@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: folivoramanh <palasek182@gmail.com> * Add Vietnamese TN support for Money and Range semiotic classes (#304) * Add Vietnamese TN support for Money and Range semiotic classes - Add money.py tagger and verbalizer for Vietnamese currency handling - Add range.py tagger for numerical range processing - Add supporting data files for money (currency, currency_minor, per_unit) - Add quantity abbreviations and time units data - Update existing taggers and verbalizers for integration - Add comprehensive test cases for money and range functionality - Update tokenize_and_classify to include new semiotic classes Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * modify illogical test cases Signed-off-by: folivoramanh <palasek182@gmail.com> * refractor and simplify word and punctuation to avoid hardcoding Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refractor code money range Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: folivoramanh <palasek182@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: folivoramanh <palasek182@gmail.com> * Add Vietnamese measure text normalization support (#307) * Add Vietnamese measure text normalization support - Added measure tagger and verbalizer for Vietnamese TN - Updated money tagger and verbalizer to handle per-unit measurements - Added test cases for measure normalization - Updated fraction handling for better integration - Added data files for measurements, prefixes, and per-unit bases Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: folivoramanh <palasek182@gmail.com> * add test case for range measure Signed-off-by: folivoramanh <palasek182@gmail.com> * additional support for cardinal and remove duplicate test case Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refractor cardinal and add test cases Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove duplicate lines in run_eval file Signed-off-by: folivoramanh <palasek182@gmail.com> * refractor minor code Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add measure support for unit per unit cases Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: folivoramanh <palasek182@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: folivoramanh <palasek182@gmail.com> * Vietnamese MRC 1.0 fix case (#312) * fix and add cases Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: folivoramanh <palasek182@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: folivoramanh <palasek182@gmail.com> * Fix word range (#334) * fix range and quote Signed-off-by: folivoramanh <palasek182@gmail.com> * fix quote in post process Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix quote and range Signed-off-by: folivoramanh <palasek182@gmail.com> --------- Signed-off-by: folivoramanh <palasek182@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: folivoramanh <palasek182@gmail.com> * Date time itn (#333) * improve numeric semiotic classes Signed-off-by: folivoramanh <palasek182@gmail.com> * Fix Jenkinsfile for CI (#325) * Fix Jenkinsfile for CI Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix requirements for test Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update paths and docker Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix docker name Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix click version Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Change path of grammars for sparrowhawk tests Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update paths in sh_test.sh Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update paths Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Revert paths Signed-off-by: Anand Joseph <anajoseph@nvidia.com> --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: folivoramanh <palasek182@gmail.com> * revert old codes Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert not inherit Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * improve date time Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix pynini union instead of union operator Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * improve measure, telephone, electronic Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change union operator to pynini union Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: folivoramanh <palasek182@gmail.com> Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Co-authored-by: anand-nv <105917641+anand-nv@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: folivoramanh <palasek182@gmail.com> --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Signed-off-by: folivoramanh <palasek182@gmail.com> Co-authored-by: anand-nv <105917641+anand-nv@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
* fix bug with commas and electronics Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> * update jenkins Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> --------- Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Signed-off-by: folivoramanh <palasek182@gmail.com> Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Only mount TestData from path Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
updates: - [github.com/pre-commit/pre-commit-hooks: v5.0.0 → v6.0.0](pre-commit/pre-commit-hooks@v5.0.0...v6.0.0) - [github.com/PyCQA/flake8: 7.2.0 → 7.3.0](PyCQA/flake8@7.2.0...7.3.0) - [github.com/PyCQA/isort: 6.0.1 → 6.1.0](PyCQA/isort@6.0.1...6.1.0) - https://github.com/psf/black → https://github.com/psf/black-pre-commit-mirror - [github.com/psf/black-pre-commit-mirror: 25.1.0 → 25.9.0](psf/black-pre-commit-mirror@25.1.0...25.9.0) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Signed-off-by: folivoramanh <palasek182@gmail.com> Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
* PR: Add Vietnamese text normalization for cardinal semiotic class (#289) * Add Vietnamese text normalization for cardinal semiotic class Signed-off-by: folivoramanh <palasek182@gmail.com> * Add missing init file Signed-off-by: folivoramanh <palasek182@gmail.com> * Fix Cardinal and optimize logic Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: folivoramanh <palasek182@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> Signed-off-by: Mai Anh <palasek182@gmail.com> * Ordinal and Decimal for Vietnamese TN (#290) * Add Vietnamese text normalization for ordinal and decimal semiotic classes Signed-off-by: Mai Anh <palasek182@gmail.com> * update sparrowhawk Signed-off-by: Mai Anh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refractor decimal code and docstring Signed-off-by: Mai Anh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Mai Anh <palasek182@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> Signed-off-by: Mai Anh <palasek182@gmail.com> * Vietnamese TN - Fraction (#296) * Fraction class for Vietnamese TN Signed-off-by: Mai Anh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove irrelavant test case Signed-off-by: Mai Anh <palasek182@gmail.com> * Remove irrelavant test case Signed-off-by: Mai Anh <palasek182@gmail.com> --------- Signed-off-by: Mai Anh <palasek182@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> Signed-off-by: Mai Anh <palasek182@gmail.com> * Date Semiotic Class for Vietnamese TN (#298) * Date for vietnamese TN Signed-off-by: Mai Anh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add roman support and correct copyright header Signed-off-by: Mai Anh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change header to current year Signed-off-by: Mai Anh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change header time Signed-off-by: Mai Anh <palasek182@gmail.com> --------- Signed-off-by: Mai Anh <palasek182@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> Signed-off-by: Mai Anh <palasek182@gmail.com> * Time - semiotic class for Vietnamese TN (#302) * Time - semiotic class for Vietnamese TN Signed-off-by: Mai Anh <palasek182@gmail.com> * remove irrelevant import and comment Signed-off-by: Mai Anh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add comment and refractor pattern Signed-off-by: Mai Anh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Change the spaces to NEMO_SPACE for maintenance. Signed-off-by: Mai Anh <palasek182@gmail.com> * Change the spaces to NEMO_SPACE for maintenance. Signed-off-by: Mai Anh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Change the spaces to NEMO_SPACE for maintenance. - remove quote Signed-off-by: Mai Anh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Mai Anh <palasek182@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> Signed-off-by: Mai Anh <palasek182@gmail.com> * Add Vietnamese TN support for Money and Range semiotic classes (#304) * Add Vietnamese TN support for Money and Range semiotic classes - Add money.py tagger and verbalizer for Vietnamese currency handling - Add range.py tagger for numerical range processing - Add supporting data files for money (currency, currency_minor, per_unit) - Add quantity abbreviations and time units data - Update existing taggers and verbalizers for integration - Add comprehensive test cases for money and range functionality - Update tokenize_and_classify to include new semiotic classes Signed-off-by: Mai Anh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * modify illogical test cases Signed-off-by: Mai Anh <palasek182@gmail.com> * refractor and simplify word and punctuation to avoid hardcoding Signed-off-by: Mai Anh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refractor code money range Signed-off-by: Mai Anh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Mai Anh <palasek182@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> Signed-off-by: Mai Anh <palasek182@gmail.com> * Add Vietnamese measure text normalization support (#307) * Add Vietnamese measure text normalization support - Added measure tagger and verbalizer for Vietnamese TN - Updated money tagger and verbalizer to handle per-unit measurements - Added test cases for measure normalization - Updated fraction handling for better integration - Added data files for measurements, prefixes, and per-unit bases Signed-off-by: Mai Anh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Mai Anh <palasek182@gmail.com> * add test case for range measure Signed-off-by: Mai Anh <palasek182@gmail.com> * additional support for cardinal and remove duplicate test case Signed-off-by: Mai Anh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refractor cardinal and add test cases Signed-off-by: Mai Anh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove duplicate lines in run_eval file Signed-off-by: Mai Anh <palasek182@gmail.com> * refractor minor code Signed-off-by: Mai Anh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add measure support for unit per unit cases Signed-off-by: Mai Anh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Mai Anh <palasek182@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> Signed-off-by: Mai Anh <palasek182@gmail.com> * Vietnamese MRC 1.0 fix case (#312) * fix and add cases Signed-off-by: Mai Anh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Mai Anh <palasek182@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> Signed-off-by: Mai Anh <palasek182@gmail.com> * Fix Jenkinsfile for CI (#325) (#327) * Fix Jenkinsfile for CI * Fix requirements for test * Update paths and docker * Fix docker name * Fix click version * Change path of grammars for sparrowhawk tests * Update paths in sh_test.sh * Update paths * Revert paths --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Signed-off-by: Mai Anh <palasek182@gmail.com> Co-authored-by: anand-nv <105917641+anand-nv@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> Signed-off-by: Mai Anh <palasek182@gmail.com> * Fix word range (#334) * fix range and quote Signed-off-by: Mai Anh <palasek182@gmail.com> * fix quote in post process Signed-off-by: Mai Anh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix quote and range Signed-off-by: Mai Anh <palasek182@gmail.com> --------- Signed-off-by: Mai Anh <palasek182@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> Signed-off-by: Mai Anh <palasek182@gmail.com> * Date time itn (#333) * improve numeric semiotic classes Signed-off-by: Mai Anh <palasek182@gmail.com> * Fix Jenkinsfile for CI (#325) * Fix Jenkinsfile for CI Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix requirements for test Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update paths and docker Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix docker name Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix click version Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Change path of grammars for sparrowhawk tests Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update paths in sh_test.sh Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update paths Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Revert paths Signed-off-by: Anand Joseph <anajoseph@nvidia.com> --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Signed-off-by: Mai Anh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Mai Anh <palasek182@gmail.com> * revert old codes Signed-off-by: Mai Anh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert not inherit Signed-off-by: Mai Anh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * improve date time Signed-off-by: Mai Anh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix pynini union instead of union operator Signed-off-by: Mai Anh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * improve measure, telephone, electronic Signed-off-by: Mai Anh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change union operator to pynini union Signed-off-by: Mai Anh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Mai Anh <palasek182@gmail.com> Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Co-authored-by: anand-nv <105917641+anand-nv@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> Signed-off-by: Mai Anh <palasek182@gmail.com> * Staging vi tn signed off (#339) * Fix Jenkinsfile for CI (#325) * Fix Jenkinsfile for CI Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix requirements for test Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update paths and docker Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix docker name Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix click version Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Change path of grammars for sparrowhawk tests Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update paths in sh_test.sh Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update paths Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Revert paths Signed-off-by: Anand Joseph <anajoseph@nvidia.com> --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Signed-off-by: Mai Anh <palasek182@gmail.com> * PR: Add Vietnamese text normalization for cardinal semiotic class (#289) * Add Vietnamese text normalization for cardinal semiotic class Signed-off-by: Mai Anh <palasek182@gmail.com> * Add missing init file Signed-off-by: Mai Anh <palasek182@gmail.com> * Fix Cardinal and optimize logic Signed-off-by: Mai Anh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Mai Anh <palasek182@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mai Anh <palasek182@gmail.com> * Ordinal and Decimal for Vietnamese TN (#290) * Add Vietnamese text normalization for ordinal and decimal semiotic classes Signed-off-by: Mai Anh <palasek182@gmail.com> * update sparrowhawk Signed-off-by: Mai Anh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refractor decimal code and docstring Signed-off-by: Mai Anh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Mai Anh <palasek182@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mai Anh <palasek182@gmail.com> * Vietnamese TN - Fraction (#296) * Fraction class for Vietnamese TN Signed-off-by: Mai Anh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove irrelavant test case Signed-off-by: Mai Anh <palasek182@gmail.com> * Remove irrelavant test case Signed-off-by: Mai Anh <palasek182@gmail.com> --------- Signed-off-by: Mai Anh <palasek182@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mai Anh <palasek182@gmail.com> * Date Semiotic Class for Vietnamese TN (#298) * Date for vietnamese TN Signed-off-by: Mai Anh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add roman support and correct copyright header Signed-off-by: Mai Anh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change header to current year Signed-off-by: Mai Anh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change header time Signed-off-by: Mai Anh <palasek182@gmail.com> --------- Signed-off-by: Mai Anh <palasek182@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mai Anh <palasek182@gmail.com> * Time - semiotic class for Vietnamese TN (#302) * Time - semiotic class for Vietnamese TN Signed-off-by: Mai Anh <palasek182@gmail.com> * remove irrelevant import and comment Signed-off-by: Mai Anh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add comment and refractor pattern Signed-off-by: Mai Anh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Change the spaces to NEMO_SPACE for maintenance. Signed-off-by: Mai Anh <palasek182@gmail.com> * Change the spaces to NEMO_SPACE for maintenance. Signed-off-by: Mai Anh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Change the spaces to NEMO_SPACE for maintenance. - remove quote Signed-off-by: Mai Anh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Mai Anh <palasek182@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mai Anh <palasek182@gmail.com> * Add Vietnamese TN support for Money and Range semiotic classes (#304) * Add Vietnamese TN support for Money and Range semiotic classes - Add money.py tagger and verbalizer for Vietnamese currency handling - Add range.py tagger for numerical range processing - Add supporting data files for money (currency, currency_minor, per_unit) - Add quantity abbreviations and time units data - Update existing taggers and verbalizers for integration - Add comprehensive test cases for money and range functionality - Update tokenize_and_classify to include new semiotic classes Signed-off-by: Mai Anh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * modify illogical test cases Signed-off-by: Mai Anh <palasek182@gmail.com> * refractor and simplify word and punctuation to avoid hardcoding Signed-off-by: Mai Anh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refractor code money range Signed-off-by: Mai Anh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Mai Anh <palasek182@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mai Anh <palasek182@gmail.com> * Add Vietnamese measure text normalization support (#307) * Add Vietnamese measure text normalization support - Added measure tagger and verbalizer for Vietnamese TN - Updated money tagger and verbalizer to handle per-unit measurements - Added test cases for measure normalization - Updated fraction handling for better integration - Added data files for measurements, prefixes, and per-unit bases Signed-off-by: Mai Anh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Mai Anh <palasek182@gmail.com> * add test case for range measure Signed-off-by: Mai Anh <palasek182@gmail.com> * additional support for cardinal and remove duplicate test case Signed-off-by: Mai Anh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refractor cardinal and add test cases Signed-off-by: Mai Anh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove duplicate lines in run_eval file Signed-off-by: Mai Anh <palasek182@gmail.com> * refractor minor code Signed-off-by: Mai Anh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add measure support for unit per unit cases Signed-off-by: Mai Anh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Mai Anh <palasek182@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mai Anh <palasek182@gmail.com> * Vietnamese MRC 1.0 fix case (#312) * fix and add cases Signed-off-by: Mai Anh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Mai Anh <palasek182@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mai Anh <palasek182@gmail.com> * Fix word range (#334) * fix range and quote Signed-off-by: Mai Anh <palasek182@gmail.com> * fix quote in post process Signed-off-by: Mai Anh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix quote and range Signed-off-by: Mai Anh <palasek182@gmail.com> --------- Signed-off-by: Mai Anh <palasek182@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mai Anh <palasek182@gmail.com> * Date time itn (#333) * improve numeric semiotic classes Signed-off-by: Mai Anh <palasek182@gmail.com> * Fix Jenkinsfile for CI (#325) * Fix Jenkinsfile for CI Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix requirements for test Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update paths and docker Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix docker name Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix click version Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Change path of grammars for sparrowhawk tests Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update paths in sh_test.sh Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update paths Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Revert paths Signed-off-by: Anand Joseph <anajoseph@nvidia.com> --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Signed-off-by: Mai Anh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Mai Anh <palasek182@gmail.com> * revert old codes Signed-off-by: Mai Anh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert not inherit Signed-off-by: Mai Anh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * improve date time Signed-off-by: Mai Anh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix pynini union instead of union operator Signed-off-by: Mai Anh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * improve measure, telephone, electronic Signed-off-by: Mai Anh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change union operator to pynini union Signed-off-by: Mai Anh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Mai Anh <palasek182@gmail.com> Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Co-authored-by: anand-nv <105917641+anand-nv@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mai Anh <palasek182@gmail.com> --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Signed-off-by: Mai Anh <palasek182@gmail.com> Co-authored-by: anand-nv <105917641+anand-nv@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> Signed-off-by: Mai Anh <palasek182@gmail.com> * Comma bugfix for En electronics (#332) * fix bug with commas and electronics Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> * update jenkins Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> --------- Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> Signed-off-by: Mai Anh <palasek182@gmail.com> * remove unuse import (#340) Signed-off-by: Mai Anh <palasek182@gmail.com> Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> Signed-off-by: Mai Anh <palasek182@gmail.com> * Update Jenkinsfile (#341) Only mount TestData from path Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> Signed-off-by: Mai Anh <palasek182@gmail.com> * [pre-commit.ci] pre-commit suggestions (#335) updates: - [github.com/pre-commit/pre-commit-hooks: v5.0.0 → v6.0.0](pre-commit/pre-commit-hooks@v5.0.0...v6.0.0) - [github.com/PyCQA/flake8: 7.2.0 → 7.3.0](PyCQA/flake8@7.2.0...7.3.0) - [github.com/PyCQA/isort: 6.0.1 → 6.1.0](PyCQA/isort@6.0.1...6.1.0) - https://github.com/psf/black → https://github.com/psf/black-pre-commit-mirror - [github.com/psf/black-pre-commit-mirror: 25.1.0 → 25.9.0](psf/black-pre-commit-mirror@25.1.0...25.9.0) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mai Anh <palasek182@gmail.com> * update jenkins cache Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> Signed-off-by: Mai Anh <palasek182@gmail.com> * fill missing lang in arg run (#347) Signed-off-by: Mai Anh <palasek182@gmail.com> Signed-off-by: Mai Anh <palasek182@gmail.com> --------- Signed-off-by: folivoramanh <palasek182@gmail.com> Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> Signed-off-by: Mai Anh <palasek182@gmail.com> Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: anand-nv <105917641+anand-nv@users.noreply.github.com> Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com> Co-authored-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
tbartley94
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs loggers removed. Please refactor LOCS so there's less nesting. Please reuse redundant code to keep file size down.
| graph = ( | ||
| # Thousands pattern (e.g., "hai nghìn không ba" -> "2003") | ||
| graph_hundred_component = pynini.union( | ||
| pynini.union(graph_digit, graph_zero) + delete_space + pynutil.delete("trăm"), pynutil.insert("0") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
without weighting you're going to get non-determinate behavior where a 0 is just inserted here.
| month_graph = _get_month_graph() | ||
|
|
||
| month_graph = pynutil.insert('month: "') + month_graph + pynutil.insert('"') | ||
| # Complete year graph with all supported patterns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you modularize this. it's hard to track all the different graphs with the nesting
| ), | ||
| ).optimize() | ||
|
|
||
| year_graph = pynutil.add_weight(year_graph_raw, YEAR_WEIGHT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you need a dedicated weight or can you just reuse a general one ("min weight) for instance)
| def __init__(self): | ||
| super().__init__(name="electronic", kind="classify") | ||
|
|
||
| delete_extra_space = pynutil.delete(" ") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should exist in the graph_utils class.
| protocol = pynutil.insert('protocol: "') + protocol + pynutil.insert('"') | ||
| graph |= protocol | ||
| graph = pynini.union(graph, protocol) | ||
| ######## |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stray comment
| start_time = time.time() | ||
| cardinal = CardinalFst(deterministic=deterministic) | ||
| cardinal_graph = cardinal.fst | ||
| logger.debug(f"cardinal: {time.time() - start_time: .2f}s -- {cardinal_graph.num_states()} nodes") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove the loggers
| key_cardinal = pynutil.delete("key_cardinal: \"") + pynini.closure(NEMO_NOT_QUOTE) + pynutil.delete("\"") | ||
| integer = pynutil.delete("integer: \"") + pynini.closure(NEMO_NOT_QUOTE) + pynutil.delete("\"") | ||
|
|
||
| graph_with_key = key_cardinal + delete_space + pynutil.insert(" ") + integer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NEMO_SPACE
| + pynini.closure(NEMO_NOT_QUOTE, 1) | ||
| + pynutil.delete("\"") | ||
| ) | ||
| graph = graph @ pynini.cdrewrite(pynini.cross(u"\u00a0", " "), "", "", NEMO_SIGMA) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NEMO_SPACE
tests/nemo_text_processing/vi/test_sparrowhawk_inverse_text_normalization.sh
Outdated
Show resolved
Hide resolved
* Refactor Vietnamese ITN taggers: modularize date, add data files, improve naming - Modularize date.py year components for better readability - Add weights to prevent non-deterministic behavior in insert operations - Remove redundant YEAR_WEIGHT constant (use inline weights) - Create zero_prefix.tsv and digit_special.tsv data files - Rename delete_extra_space to delete_single_space in electronic.py for clarity - Add delete_single_space to graph_utils for reuse Signed-off-by: Mai Anh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Refactor Vietnamese: PSA follow Signed-off-by: Mai Anh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Mai Anh <palasek182@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
nemo_text_processing/text_normalization/vi/taggers/tokenize_and_classify.py
Fixed
Show fixed
Hide fixed
nemo_text_processing/text_normalization/vi/taggers/tokenize_and_classify.py
Fixed
Show fixed
Hide fixed
nemo_text_processing/text_normalization/vi/taggers/tokenize_and_classify.py
Fixed
Show fixed
Hide fixed
nemo_text_processing/text_normalization/vi/taggers/tokenize_and_classify.py
Fixed
Show fixed
Hide fixed
nemo_text_processing/text_normalization/vi/taggers/tokenize_and_classify.py
Fixed
Show fixed
Hide fixed
nemo_text_processing/text_normalization/vi/taggers/tokenize_and_classify.py
Fixed
Show fixed
Hide fixed
nemo_text_processing/text_normalization/vi/taggers/tokenize_and_classify.py
Fixed
Show fixed
Hide fixed
nemo_text_processing/text_normalization/vi/taggers/tokenize_and_classify.py
Fixed
Show fixed
Hide fixed
nemo_text_processing/text_normalization/vi/taggers/tokenize_and_classify.py
Fixed
Show fixed
Hide fixed
nemo_text_processing/text_normalization/vi/taggers/tokenize_and_classify.py
Fixed
Show fixed
Hide fixed
Signed-off-by: Mai Anh <palasek182@gmail.com>
tbartley94
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
What does this PR do ?
Vietnamese TN v1 merged to main
Before your PR is "Ready for review"
Pre checks:
git commit -sto sign.pytestor (if your machine does not have GPU)pytest --cpufrom the root folder (given you marked your test cases accordingly@pytest.mark.run_only_on('CPU')).bash tools/text_processing_deployment/export_grammars.sh --MODE=test ...pytestand Sparrowhawk here.__init__.pyfor every folder and subfolder, includingdatafolder which has .TSV files?Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.to all newly added Python files?Copyright 2015 and onwards Google, Inc.. See an example here.try import: ... except: ...) if not already done.PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.