-
Notifications
You must be signed in to change notification settings - Fork 133
Date Semiotic Class for Vietnamese TN #298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: folivoramanh <palasek182@gmail.com>
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a very solid start. Please see my comments on some of the things that are missing.
nemo_text_processing/text_normalization/vi/taggers/tokenize_and_classify.py
Show resolved
Hide resolved
tests/nemo_text_processing/vi/data_text_normalization/test_cases_date.txt
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fold the date files into each other. Request to clean up the date verbalizer function. Please revisit the weights to see if they can be less overtuned. If avoidable we'll need to ship learned weights via HMM optimization.
nemo_text_processing/text_normalization/vi/data/date/__init__.py
Outdated
Show resolved
Hide resolved
nemo_text_processing/text_normalization/vi/taggers/tokenize_and_classify.py
Outdated
Show resolved
Hide resolved
tests/nemo_text_processing/vi/data_text_normalization/test_cases_date.txt
Show resolved
Hide resolved
Signed-off-by: folivoramanh <palasek182@gmail.com>
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copyright header for new files should be current year. You only need the affiliates when Google is also listed in the copy or it stems from outside contributor. Revert those headers and replace with current year.
Stray comments left in pytest files. Please remove.
Else LGTM
nemo_text_processing/text_normalization/vi/data/date/__init__.py
Outdated
Show resolved
Hide resolved
nemo_text_processing/text_normalization/vi/data/roman/__init__.py
Outdated
Show resolved
Hide resolved
nemo_text_processing/text_normalization/vi/verbalizers/whitelist.py
Outdated
Show resolved
Hide resolved
Signed-off-by: folivoramanh <palasek182@gmail.com>
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One last change on roman. Else LGTM. Please confirm unit and sparrowhawk tests are passing.
Signed-off-by: folivoramanh <palasek182@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* Date for vietnamese TN Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add roman support and correct copyright header Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change header to current year Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change header time Signed-off-by: folivoramanh <palasek182@gmail.com> --------- Signed-off-by: folivoramanh <palasek182@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* Date for vietnamese TN Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add roman support and correct copyright header Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change header to current year Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change header time Signed-off-by: folivoramanh <palasek182@gmail.com> --------- Signed-off-by: folivoramanh <palasek182@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: folivoramanh <palasek182@gmail.com>
* Date for vietnamese TN Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add roman support and correct copyright header Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change header to current year Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change header time Signed-off-by: folivoramanh <palasek182@gmail.com> --------- Signed-off-by: folivoramanh <palasek182@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: folivoramanh <palasek182@gmail.com>
* Date for vietnamese TN Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add roman support and correct copyright header Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change header to current year Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change header time Signed-off-by: folivoramanh <palasek182@gmail.com> --------- Signed-off-by: folivoramanh <palasek182@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: folivoramanh <palasek182@gmail.com>
* Fix Jenkinsfile for CI (#325) * Fix Jenkinsfile for CI Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix requirements for test Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update paths and docker Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix docker name Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix click version Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Change path of grammars for sparrowhawk tests Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update paths in sh_test.sh Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update paths Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Revert paths Signed-off-by: Anand Joseph <anajoseph@nvidia.com> --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Signed-off-by: folivoramanh <palasek182@gmail.com> * PR: Add Vietnamese text normalization for cardinal semiotic class (#289) * Add Vietnamese text normalization for cardinal semiotic class Signed-off-by: folivoramanh <palasek182@gmail.com> * Add missing init file Signed-off-by: folivoramanh <palasek182@gmail.com> * Fix Cardinal and optimize logic Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: folivoramanh <palasek182@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: folivoramanh <palasek182@gmail.com> * Ordinal and Decimal for Vietnamese TN (#290) * Add Vietnamese text normalization for ordinal and decimal semiotic classes Signed-off-by: folivoramanh <palasek182@gmail.com> * update sparrowhawk Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refractor decimal code and docstring Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: folivoramanh <palasek182@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: folivoramanh <palasek182@gmail.com> * Vietnamese TN - Fraction (#296) * Fraction class for Vietnamese TN Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove irrelavant test case Signed-off-by: folivoramanh <palasek182@gmail.com> * Remove irrelavant test case Signed-off-by: folivoramanh <palasek182@gmail.com> --------- Signed-off-by: folivoramanh <palasek182@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: folivoramanh <palasek182@gmail.com> * Date Semiotic Class for Vietnamese TN (#298) * Date for vietnamese TN Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add roman support and correct copyright header Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change header to current year Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change header time Signed-off-by: folivoramanh <palasek182@gmail.com> --------- Signed-off-by: folivoramanh <palasek182@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: folivoramanh <palasek182@gmail.com> * Time - semiotic class for Vietnamese TN (#302) * Time - semiotic class for Vietnamese TN Signed-off-by: folivoramanh <palasek182@gmail.com> * remove irrelevant import and comment Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add comment and refractor pattern Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Change the spaces to NEMO_SPACE for maintenance. Signed-off-by: folivoramanh <palasek182@gmail.com> * Change the spaces to NEMO_SPACE for maintenance. Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Change the spaces to NEMO_SPACE for maintenance. - remove quote Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: folivoramanh <palasek182@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: folivoramanh <palasek182@gmail.com> * Add Vietnamese TN support for Money and Range semiotic classes (#304) * Add Vietnamese TN support for Money and Range semiotic classes - Add money.py tagger and verbalizer for Vietnamese currency handling - Add range.py tagger for numerical range processing - Add supporting data files for money (currency, currency_minor, per_unit) - Add quantity abbreviations and time units data - Update existing taggers and verbalizers for integration - Add comprehensive test cases for money and range functionality - Update tokenize_and_classify to include new semiotic classes Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * modify illogical test cases Signed-off-by: folivoramanh <palasek182@gmail.com> * refractor and simplify word and punctuation to avoid hardcoding Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refractor code money range Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: folivoramanh <palasek182@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: folivoramanh <palasek182@gmail.com> * Add Vietnamese measure text normalization support (#307) * Add Vietnamese measure text normalization support - Added measure tagger and verbalizer for Vietnamese TN - Updated money tagger and verbalizer to handle per-unit measurements - Added test cases for measure normalization - Updated fraction handling for better integration - Added data files for measurements, prefixes, and per-unit bases Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: folivoramanh <palasek182@gmail.com> * add test case for range measure Signed-off-by: folivoramanh <palasek182@gmail.com> * additional support for cardinal and remove duplicate test case Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refractor cardinal and add test cases Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove duplicate lines in run_eval file Signed-off-by: folivoramanh <palasek182@gmail.com> * refractor minor code Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add measure support for unit per unit cases Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: folivoramanh <palasek182@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: folivoramanh <palasek182@gmail.com> * Vietnamese MRC 1.0 fix case (#312) * fix and add cases Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: folivoramanh <palasek182@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: folivoramanh <palasek182@gmail.com> * Fix word range (#334) * fix range and quote Signed-off-by: folivoramanh <palasek182@gmail.com> * fix quote in post process Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix quote and range Signed-off-by: folivoramanh <palasek182@gmail.com> --------- Signed-off-by: folivoramanh <palasek182@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: folivoramanh <palasek182@gmail.com> * Date time itn (#333) * improve numeric semiotic classes Signed-off-by: folivoramanh <palasek182@gmail.com> * Fix Jenkinsfile for CI (#325) * Fix Jenkinsfile for CI Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix requirements for test Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update paths and docker Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix docker name Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix click version Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Change path of grammars for sparrowhawk tests Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update paths in sh_test.sh Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update paths Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Revert paths Signed-off-by: Anand Joseph <anajoseph@nvidia.com> --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: folivoramanh <palasek182@gmail.com> * revert old codes Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert not inherit Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * improve date time Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix pynini union instead of union operator Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * improve measure, telephone, electronic Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change union operator to pynini union Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: folivoramanh <palasek182@gmail.com> Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Co-authored-by: anand-nv <105917641+anand-nv@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: folivoramanh <palasek182@gmail.com> --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Signed-off-by: folivoramanh <palasek182@gmail.com> Co-authored-by: anand-nv <105917641+anand-nv@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* Date for vietnamese TN Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add roman support and correct copyright header Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change header to current year Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change header time Signed-off-by: folivoramanh <palasek182@gmail.com> --------- Signed-off-by: folivoramanh <palasek182@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
* Fix Jenkinsfile for CI (#325) * Fix Jenkinsfile for CI Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix requirements for test Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update paths and docker Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix docker name Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix click version Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Change path of grammars for sparrowhawk tests Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update paths in sh_test.sh Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update paths Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Revert paths Signed-off-by: Anand Joseph <anajoseph@nvidia.com> --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Signed-off-by: folivoramanh <palasek182@gmail.com> * PR: Add Vietnamese text normalization for cardinal semiotic class (#289) * Add Vietnamese text normalization for cardinal semiotic class Signed-off-by: folivoramanh <palasek182@gmail.com> * Add missing init file Signed-off-by: folivoramanh <palasek182@gmail.com> * Fix Cardinal and optimize logic Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: folivoramanh <palasek182@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: folivoramanh <palasek182@gmail.com> * Ordinal and Decimal for Vietnamese TN (#290) * Add Vietnamese text normalization for ordinal and decimal semiotic classes Signed-off-by: folivoramanh <palasek182@gmail.com> * update sparrowhawk Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refractor decimal code and docstring Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: folivoramanh <palasek182@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: folivoramanh <palasek182@gmail.com> * Vietnamese TN - Fraction (#296) * Fraction class for Vietnamese TN Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove irrelavant test case Signed-off-by: folivoramanh <palasek182@gmail.com> * Remove irrelavant test case Signed-off-by: folivoramanh <palasek182@gmail.com> --------- Signed-off-by: folivoramanh <palasek182@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: folivoramanh <palasek182@gmail.com> * Date Semiotic Class for Vietnamese TN (#298) * Date for vietnamese TN Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add roman support and correct copyright header Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change header to current year Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change header time Signed-off-by: folivoramanh <palasek182@gmail.com> --------- Signed-off-by: folivoramanh <palasek182@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: folivoramanh <palasek182@gmail.com> * Time - semiotic class for Vietnamese TN (#302) * Time - semiotic class for Vietnamese TN Signed-off-by: folivoramanh <palasek182@gmail.com> * remove irrelevant import and comment Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add comment and refractor pattern Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Change the spaces to NEMO_SPACE for maintenance. Signed-off-by: folivoramanh <palasek182@gmail.com> * Change the spaces to NEMO_SPACE for maintenance. Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Change the spaces to NEMO_SPACE for maintenance. - remove quote Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: folivoramanh <palasek182@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: folivoramanh <palasek182@gmail.com> * Add Vietnamese TN support for Money and Range semiotic classes (#304) * Add Vietnamese TN support for Money and Range semiotic classes - Add money.py tagger and verbalizer for Vietnamese currency handling - Add range.py tagger for numerical range processing - Add supporting data files for money (currency, currency_minor, per_unit) - Add quantity abbreviations and time units data - Update existing taggers and verbalizers for integration - Add comprehensive test cases for money and range functionality - Update tokenize_and_classify to include new semiotic classes Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * modify illogical test cases Signed-off-by: folivoramanh <palasek182@gmail.com> * refractor and simplify word and punctuation to avoid hardcoding Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refractor code money range Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: folivoramanh <palasek182@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: folivoramanh <palasek182@gmail.com> * Add Vietnamese measure text normalization support (#307) * Add Vietnamese measure text normalization support - Added measure tagger and verbalizer for Vietnamese TN - Updated money tagger and verbalizer to handle per-unit measurements - Added test cases for measure normalization - Updated fraction handling for better integration - Added data files for measurements, prefixes, and per-unit bases Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: folivoramanh <palasek182@gmail.com> * add test case for range measure Signed-off-by: folivoramanh <palasek182@gmail.com> * additional support for cardinal and remove duplicate test case Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * refractor cardinal and add test cases Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove duplicate lines in run_eval file Signed-off-by: folivoramanh <palasek182@gmail.com> * refractor minor code Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add measure support for unit per unit cases Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: folivoramanh <palasek182@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: folivoramanh <palasek182@gmail.com> * Vietnamese MRC 1.0 fix case (#312) * fix and add cases Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: folivoramanh <palasek182@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: folivoramanh <palasek182@gmail.com> * Fix word range (#334) * fix range and quote Signed-off-by: folivoramanh <palasek182@gmail.com> * fix quote in post process Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix quote and range Signed-off-by: folivoramanh <palasek182@gmail.com> --------- Signed-off-by: folivoramanh <palasek182@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: folivoramanh <palasek182@gmail.com> * Date time itn (#333) * improve numeric semiotic classes Signed-off-by: folivoramanh <palasek182@gmail.com> * Fix Jenkinsfile for CI (#325) * Fix Jenkinsfile for CI Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix requirements for test Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update paths and docker Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix docker name Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix click version Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Change path of grammars for sparrowhawk tests Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update paths in sh_test.sh Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update paths Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Revert paths Signed-off-by: Anand Joseph <anajoseph@nvidia.com> --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: folivoramanh <palasek182@gmail.com> * revert old codes Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert not inherit Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * improve date time Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix pynini union instead of union operator Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * improve measure, telephone, electronic Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change union operator to pynini union Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: folivoramanh <palasek182@gmail.com> Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Co-authored-by: anand-nv <105917641+anand-nv@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: folivoramanh <palasek182@gmail.com> --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Signed-off-by: folivoramanh <palasek182@gmail.com> Co-authored-by: anand-nv <105917641+anand-nv@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
What does this PR do ?
Add a one line overview of what this PR aims to accomplish.
Date Semiotic Class for Vietnamese TN
Before your PR is "Ready for review"
Pre checks:
git commit -s
to sign.pytest
or (if your machine does not have GPU)pytest --cpu
from the root folder (given you marked your test cases accordingly@pytest.mark.run_only_on('CPU')
).bash tools/text_processing_deployment/export_grammars.sh --MODE=test ...
pytest
and Sparrowhawk here.__init__.py
for every folder and subfolder, includingdata
folder which has .TSV files?Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
to all newly added Python files?Copyright 2015 and onwards Google, Inc.
. See an example here.try import: ... except: ...
) if not already done.PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.