Skip to content

Conversation

@mgrafu
Copy link
Collaborator

@mgrafu mgrafu commented Oct 13, 2025

What does this PR do ?

Vietnamese TN v1 merged to main

Before your PR is "Ready for review"

Pre checks:

  • Have you signed your commits? Use git commit -s to sign.
  • Do all unittests finish successfully before sending PR?
    1. pytest or (if your machine does not have GPU) pytest --cpu from the root folder (given you marked your test cases accordingly @pytest.mark.run_only_on('CPU')).
    2. Sparrowhawk tests bash tools/text_processing_deployment/export_grammars.sh --MODE=test ...
  • If you are adding a new feature: Have you added test cases for both pytest and Sparrowhawk here.
  • Have you added __init__.py for every folder and subfolder, including data folder which has .TSV files?
  • Have you followed codeQL results and removed unused variables and imports (report is at the bottom of the PR in github review box) ?
  • Have you added the correct license header Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved. to all newly added Python files?
  • If you copied nemo_text_processing/text_normalization/en/graph_utils.py your header's second line should be Copyright 2015 and onwards Google, Inc.. See an example here.
  • Remove import guards (try import: ... except: ...) if not already done.
  • If you added a new language or a new feature please update the NeMo documentation (lives in different repo).
  • Have you added your language support to tools/text_processing_deployment/pynini_export.py.

PR Type:

  • New Feature
  • Bugfix
  • Documentation
  • Test

If you haven't finished some of the above items you can still open "Draft" PR.

folivoramanh and others added 19 commits October 29, 2025 11:44
* Add Vietnamese text normalization for cardinal semiotic class

Signed-off-by: folivoramanh <palasek182@gmail.com>

* Add missing init file

Signed-off-by: folivoramanh <palasek182@gmail.com>

* Fix Cardinal and optimize logic

Signed-off-by: folivoramanh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: folivoramanh <palasek182@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
* Add Vietnamese text normalization for ordinal and decimal semiotic classes

Signed-off-by: folivoramanh <palasek182@gmail.com>

* update sparrowhawk

Signed-off-by: folivoramanh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* refractor decimal code and docstring

Signed-off-by: folivoramanh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: folivoramanh <palasek182@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
* Fraction class for Vietnamese TN

Signed-off-by: folivoramanh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove irrelavant test case

Signed-off-by: folivoramanh <palasek182@gmail.com>

* Remove irrelavant test case

Signed-off-by: folivoramanh <palasek182@gmail.com>

---------

Signed-off-by: folivoramanh <palasek182@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
* Date for vietnamese TN

Signed-off-by: folivoramanh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add roman support and correct copyright header

Signed-off-by: folivoramanh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change header to current year

Signed-off-by: folivoramanh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change header time

Signed-off-by: folivoramanh <palasek182@gmail.com>

---------

Signed-off-by: folivoramanh <palasek182@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
* Time - semiotic class for Vietnamese TN

Signed-off-by: folivoramanh <palasek182@gmail.com>

* remove irrelevant import and comment

Signed-off-by: folivoramanh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add comment and refractor pattern

Signed-off-by: folivoramanh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Change the spaces to NEMO_SPACE for maintenance.

Signed-off-by: folivoramanh <palasek182@gmail.com>

* Change the spaces to NEMO_SPACE for maintenance.

Signed-off-by: folivoramanh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Change the spaces to NEMO_SPACE for maintenance. - remove quote

Signed-off-by: folivoramanh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: folivoramanh <palasek182@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
* Add Vietnamese TN support for Money and Range semiotic classes

- Add money.py tagger and verbalizer for Vietnamese currency handling
- Add range.py tagger for numerical range processing
- Add supporting data files for money (currency, currency_minor, per_unit)
- Add quantity abbreviations and time units data
- Update existing taggers and verbalizers for integration
- Add comprehensive test cases for money and range functionality
- Update tokenize_and_classify to include new semiotic classes

Signed-off-by: folivoramanh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* modify illogical test cases

Signed-off-by: folivoramanh <palasek182@gmail.com>

* refractor and simplify word and punctuation to avoid hardcoding

Signed-off-by: folivoramanh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* refractor code money range

Signed-off-by: folivoramanh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: folivoramanh <palasek182@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
* Add Vietnamese measure text normalization support

- Added measure tagger and verbalizer for Vietnamese TN
- Updated money tagger and verbalizer to handle per-unit measurements
- Added test cases for measure normalization
- Updated fraction handling for better integration
- Added data files for measurements, prefixes, and per-unit bases

Signed-off-by: folivoramanh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: folivoramanh <palasek182@gmail.com>

* add test case for range measure

Signed-off-by: folivoramanh <palasek182@gmail.com>

* additional support for cardinal and remove duplicate test case

Signed-off-by: folivoramanh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* refractor cardinal and add test cases

Signed-off-by: folivoramanh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove duplicate lines in run_eval file

Signed-off-by: folivoramanh <palasek182@gmail.com>

* refractor minor code

Signed-off-by: folivoramanh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add measure support for unit per unit cases

Signed-off-by: folivoramanh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: folivoramanh <palasek182@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
* fix and add cases

Signed-off-by: folivoramanh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: folivoramanh <palasek182@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
* Fix Jenkinsfile for CI

* Fix requirements for test

* Update paths and docker

* Fix docker name

* Fix click version

* Change path of grammars for sparrowhawk tests

* Update paths in sh_test.sh

* Update paths

* Revert paths

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: folivoramanh <palasek182@gmail.com>
Co-authored-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
* fix range and quote

Signed-off-by: folivoramanh <palasek182@gmail.com>

* fix quote in post process

Signed-off-by: folivoramanh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix quote and range

Signed-off-by: folivoramanh <palasek182@gmail.com>

---------

Signed-off-by: folivoramanh <palasek182@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
* improve numeric semiotic classes

Signed-off-by: folivoramanh <palasek182@gmail.com>

* Fix Jenkinsfile for CI (#325)

* Fix Jenkinsfile for CI

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix requirements for test

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update paths and docker

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix docker name

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix click version

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Change path of grammars for sparrowhawk tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update paths in sh_test.sh

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update paths

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Revert paths

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: folivoramanh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: folivoramanh <palasek182@gmail.com>

* revert old codes

Signed-off-by: folivoramanh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert not inherit

Signed-off-by: folivoramanh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* improve date time

Signed-off-by: folivoramanh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix pynini union instead of union operator

Signed-off-by: folivoramanh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* improve measure, telephone, electronic

Signed-off-by: folivoramanh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change union operator to pynini union

Signed-off-by: folivoramanh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: folivoramanh <palasek182@gmail.com>
Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
* Fix Jenkinsfile for CI (#325)

* Fix Jenkinsfile for CI

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix requirements for test

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update paths and docker

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix docker name

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix click version

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Change path of grammars for sparrowhawk tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update paths in sh_test.sh

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update paths

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Revert paths

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: folivoramanh <palasek182@gmail.com>

* PR: Add Vietnamese text normalization for cardinal semiotic class (#289)

* Add Vietnamese text normalization for cardinal semiotic class

Signed-off-by: folivoramanh <palasek182@gmail.com>

* Add missing init file

Signed-off-by: folivoramanh <palasek182@gmail.com>

* Fix Cardinal and optimize logic

Signed-off-by: folivoramanh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: folivoramanh <palasek182@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: folivoramanh <palasek182@gmail.com>

* Ordinal and Decimal for Vietnamese TN (#290)

* Add Vietnamese text normalization for ordinal and decimal semiotic classes

Signed-off-by: folivoramanh <palasek182@gmail.com>

* update sparrowhawk

Signed-off-by: folivoramanh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* refractor decimal code and docstring

Signed-off-by: folivoramanh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: folivoramanh <palasek182@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: folivoramanh <palasek182@gmail.com>

* Vietnamese TN - Fraction (#296)

* Fraction class for Vietnamese TN

Signed-off-by: folivoramanh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove irrelavant test case

Signed-off-by: folivoramanh <palasek182@gmail.com>

* Remove irrelavant test case

Signed-off-by: folivoramanh <palasek182@gmail.com>

---------

Signed-off-by: folivoramanh <palasek182@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: folivoramanh <palasek182@gmail.com>

* Date Semiotic Class for Vietnamese TN (#298)

* Date for vietnamese TN

Signed-off-by: folivoramanh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add roman support and correct copyright header

Signed-off-by: folivoramanh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change header to current year

Signed-off-by: folivoramanh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change header time

Signed-off-by: folivoramanh <palasek182@gmail.com>

---------

Signed-off-by: folivoramanh <palasek182@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: folivoramanh <palasek182@gmail.com>

* Time - semiotic class for Vietnamese TN  (#302)

* Time - semiotic class for Vietnamese TN

Signed-off-by: folivoramanh <palasek182@gmail.com>

* remove irrelevant import and comment

Signed-off-by: folivoramanh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add comment and refractor pattern

Signed-off-by: folivoramanh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Change the spaces to NEMO_SPACE for maintenance.

Signed-off-by: folivoramanh <palasek182@gmail.com>

* Change the spaces to NEMO_SPACE for maintenance.

Signed-off-by: folivoramanh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Change the spaces to NEMO_SPACE for maintenance. - remove quote

Signed-off-by: folivoramanh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: folivoramanh <palasek182@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: folivoramanh <palasek182@gmail.com>

* Add Vietnamese TN support for Money and Range semiotic classes (#304)

* Add Vietnamese TN support for Money and Range semiotic classes

- Add money.py tagger and verbalizer for Vietnamese currency handling
- Add range.py tagger for numerical range processing
- Add supporting data files for money (currency, currency_minor, per_unit)
- Add quantity abbreviations and time units data
- Update existing taggers and verbalizers for integration
- Add comprehensive test cases for money and range functionality
- Update tokenize_and_classify to include new semiotic classes

Signed-off-by: folivoramanh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* modify illogical test cases

Signed-off-by: folivoramanh <palasek182@gmail.com>

* refractor and simplify word and punctuation to avoid hardcoding

Signed-off-by: folivoramanh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* refractor code money range

Signed-off-by: folivoramanh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: folivoramanh <palasek182@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: folivoramanh <palasek182@gmail.com>

* Add Vietnamese measure text normalization support (#307)

* Add Vietnamese measure text normalization support

- Added measure tagger and verbalizer for Vietnamese TN
- Updated money tagger and verbalizer to handle per-unit measurements
- Added test cases for measure normalization
- Updated fraction handling for better integration
- Added data files for measurements, prefixes, and per-unit bases

Signed-off-by: folivoramanh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: folivoramanh <palasek182@gmail.com>

* add test case for range measure

Signed-off-by: folivoramanh <palasek182@gmail.com>

* additional support for cardinal and remove duplicate test case

Signed-off-by: folivoramanh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* refractor cardinal and add test cases

Signed-off-by: folivoramanh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove duplicate lines in run_eval file

Signed-off-by: folivoramanh <palasek182@gmail.com>

* refractor minor code

Signed-off-by: folivoramanh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add measure support for unit per unit cases

Signed-off-by: folivoramanh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: folivoramanh <palasek182@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: folivoramanh <palasek182@gmail.com>

* Vietnamese MRC 1.0 fix case (#312)

* fix and add cases

Signed-off-by: folivoramanh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: folivoramanh <palasek182@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: folivoramanh <palasek182@gmail.com>

* Fix word range (#334)

* fix range and quote

Signed-off-by: folivoramanh <palasek182@gmail.com>

* fix quote in post process

Signed-off-by: folivoramanh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix quote and range

Signed-off-by: folivoramanh <palasek182@gmail.com>

---------

Signed-off-by: folivoramanh <palasek182@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: folivoramanh <palasek182@gmail.com>

* Date time itn (#333)

* improve numeric semiotic classes

Signed-off-by: folivoramanh <palasek182@gmail.com>

* Fix Jenkinsfile for CI (#325)

* Fix Jenkinsfile for CI

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix requirements for test

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update paths and docker

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix docker name

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix click version

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Change path of grammars for sparrowhawk tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update paths in sh_test.sh

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update paths

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Revert paths

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: folivoramanh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: folivoramanh <palasek182@gmail.com>

* revert old codes

Signed-off-by: folivoramanh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert not inherit

Signed-off-by: folivoramanh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* improve date time

Signed-off-by: folivoramanh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix pynini union instead of union operator

Signed-off-by: folivoramanh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* improve measure, telephone, electronic

Signed-off-by: folivoramanh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change union operator to pynini union

Signed-off-by: folivoramanh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: folivoramanh <palasek182@gmail.com>
Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: folivoramanh <palasek182@gmail.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: folivoramanh <palasek182@gmail.com>
Co-authored-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
* fix bug with commas and electronics

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>

* update jenkins

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Signed-off-by: folivoramanh <palasek182@gmail.com>
Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Only mount TestData from path

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
updates:
- [github.com/pre-commit/pre-commit-hooks: v5.0.0 → v6.0.0](pre-commit/pre-commit-hooks@v5.0.0...v6.0.0)
- [github.com/PyCQA/flake8: 7.2.0 → 7.3.0](PyCQA/flake8@7.2.0...7.3.0)
- [github.com/PyCQA/isort: 6.0.1 → 6.1.0](PyCQA/isort@6.0.1...6.1.0)
- https://github.com/psf/blackhttps://github.com/psf/black-pre-commit-mirror
- [github.com/psf/black-pre-commit-mirror: 25.1.0 → 25.9.0](psf/black-pre-commit-mirror@25.1.0...25.9.0)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Signed-off-by: folivoramanh <palasek182@gmail.com>
Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
* PR: Add Vietnamese text normalization for cardinal semiotic class (#289)

* Add Vietnamese text normalization for cardinal semiotic class

Signed-off-by: folivoramanh <palasek182@gmail.com>

* Add missing init file

Signed-off-by: folivoramanh <palasek182@gmail.com>

* Fix Cardinal and optimize logic

Signed-off-by: folivoramanh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: folivoramanh <palasek182@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Signed-off-by: Mai Anh <palasek182@gmail.com>

* Ordinal and Decimal for Vietnamese TN (#290)

* Add Vietnamese text normalization for ordinal and decimal semiotic classes

Signed-off-by: Mai Anh <palasek182@gmail.com>

* update sparrowhawk

Signed-off-by: Mai Anh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* refractor decimal code and docstring

Signed-off-by: Mai Anh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Mai Anh <palasek182@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Signed-off-by: Mai Anh <palasek182@gmail.com>

* Vietnamese TN - Fraction (#296)

* Fraction class for Vietnamese TN

Signed-off-by: Mai Anh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove irrelavant test case

Signed-off-by: Mai Anh <palasek182@gmail.com>

* Remove irrelavant test case

Signed-off-by: Mai Anh <palasek182@gmail.com>

---------

Signed-off-by: Mai Anh <palasek182@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Signed-off-by: Mai Anh <palasek182@gmail.com>

* Date Semiotic Class for Vietnamese TN (#298)

* Date for vietnamese TN

Signed-off-by: Mai Anh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add roman support and correct copyright header

Signed-off-by: Mai Anh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change header to current year

Signed-off-by: Mai Anh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change header time

Signed-off-by: Mai Anh <palasek182@gmail.com>

---------

Signed-off-by: Mai Anh <palasek182@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Signed-off-by: Mai Anh <palasek182@gmail.com>

* Time - semiotic class for Vietnamese TN  (#302)

* Time - semiotic class for Vietnamese TN

Signed-off-by: Mai Anh <palasek182@gmail.com>

* remove irrelevant import and comment

Signed-off-by: Mai Anh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add comment and refractor pattern

Signed-off-by: Mai Anh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Change the spaces to NEMO_SPACE for maintenance.

Signed-off-by: Mai Anh <palasek182@gmail.com>

* Change the spaces to NEMO_SPACE for maintenance.

Signed-off-by: Mai Anh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Change the spaces to NEMO_SPACE for maintenance. - remove quote

Signed-off-by: Mai Anh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Mai Anh <palasek182@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Signed-off-by: Mai Anh <palasek182@gmail.com>

* Add Vietnamese TN support for Money and Range semiotic classes (#304)

* Add Vietnamese TN support for Money and Range semiotic classes

- Add money.py tagger and verbalizer for Vietnamese currency handling
- Add range.py tagger for numerical range processing
- Add supporting data files for money (currency, currency_minor, per_unit)
- Add quantity abbreviations and time units data
- Update existing taggers and verbalizers for integration
- Add comprehensive test cases for money and range functionality
- Update tokenize_and_classify to include new semiotic classes

Signed-off-by: Mai Anh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* modify illogical test cases

Signed-off-by: Mai Anh <palasek182@gmail.com>

* refractor and simplify word and punctuation to avoid hardcoding

Signed-off-by: Mai Anh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* refractor code money range

Signed-off-by: Mai Anh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Mai Anh <palasek182@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Signed-off-by: Mai Anh <palasek182@gmail.com>

* Add Vietnamese measure text normalization support (#307)

* Add Vietnamese measure text normalization support

- Added measure tagger and verbalizer for Vietnamese TN
- Updated money tagger and verbalizer to handle per-unit measurements
- Added test cases for measure normalization
- Updated fraction handling for better integration
- Added data files for measurements, prefixes, and per-unit bases

Signed-off-by: Mai Anh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Mai Anh <palasek182@gmail.com>

* add test case for range measure

Signed-off-by: Mai Anh <palasek182@gmail.com>

* additional support for cardinal and remove duplicate test case

Signed-off-by: Mai Anh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* refractor cardinal and add test cases

Signed-off-by: Mai Anh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove duplicate lines in run_eval file

Signed-off-by: Mai Anh <palasek182@gmail.com>

* refractor minor code

Signed-off-by: Mai Anh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add measure support for unit per unit cases

Signed-off-by: Mai Anh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Mai Anh <palasek182@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Signed-off-by: Mai Anh <palasek182@gmail.com>

* Vietnamese MRC 1.0 fix case (#312)

* fix and add cases

Signed-off-by: Mai Anh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Mai Anh <palasek182@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Signed-off-by: Mai Anh <palasek182@gmail.com>

* Fix Jenkinsfile for CI (#325) (#327)

* Fix Jenkinsfile for CI

* Fix requirements for test

* Update paths and docker

* Fix docker name

* Fix click version

* Change path of grammars for sparrowhawk tests

* Update paths in sh_test.sh

* Update paths

* Revert paths

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: Mai Anh <palasek182@gmail.com>
Co-authored-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Signed-off-by: Mai Anh <palasek182@gmail.com>

* Fix word range (#334)

* fix range and quote

Signed-off-by: Mai Anh <palasek182@gmail.com>

* fix quote in post process

Signed-off-by: Mai Anh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix quote and range

Signed-off-by: Mai Anh <palasek182@gmail.com>

---------

Signed-off-by: Mai Anh <palasek182@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Signed-off-by: Mai Anh <palasek182@gmail.com>

* Date time itn (#333)

* improve numeric semiotic classes

Signed-off-by: Mai Anh <palasek182@gmail.com>

* Fix Jenkinsfile for CI (#325)

* Fix Jenkinsfile for CI

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix requirements for test

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update paths and docker

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix docker name

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix click version

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Change path of grammars for sparrowhawk tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update paths in sh_test.sh

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update paths

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Revert paths

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: Mai Anh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Mai Anh <palasek182@gmail.com>

* revert old codes

Signed-off-by: Mai Anh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert not inherit

Signed-off-by: Mai Anh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* improve date time

Signed-off-by: Mai Anh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix pynini union instead of union operator

Signed-off-by: Mai Anh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* improve measure, telephone, electronic

Signed-off-by: Mai Anh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change union operator to pynini union

Signed-off-by: Mai Anh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Mai Anh <palasek182@gmail.com>
Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Signed-off-by: Mai Anh <palasek182@gmail.com>

* Staging vi tn signed off (#339)

* Fix Jenkinsfile for CI (#325)

* Fix Jenkinsfile for CI

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix requirements for test

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update paths and docker

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix docker name

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix click version

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Change path of grammars for sparrowhawk tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update paths in sh_test.sh

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update paths

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Revert paths

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: Mai Anh <palasek182@gmail.com>

* PR: Add Vietnamese text normalization for cardinal semiotic class (#289)

* Add Vietnamese text normalization for cardinal semiotic class

Signed-off-by: Mai Anh <palasek182@gmail.com>

* Add missing init file

Signed-off-by: Mai Anh <palasek182@gmail.com>

* Fix Cardinal and optimize logic

Signed-off-by: Mai Anh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Mai Anh <palasek182@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Mai Anh <palasek182@gmail.com>

* Ordinal and Decimal for Vietnamese TN (#290)

* Add Vietnamese text normalization for ordinal and decimal semiotic classes

Signed-off-by: Mai Anh <palasek182@gmail.com>

* update sparrowhawk

Signed-off-by: Mai Anh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* refractor decimal code and docstring

Signed-off-by: Mai Anh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Mai Anh <palasek182@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Mai Anh <palasek182@gmail.com>

* Vietnamese TN - Fraction (#296)

* Fraction class for Vietnamese TN

Signed-off-by: Mai Anh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove irrelavant test case

Signed-off-by: Mai Anh <palasek182@gmail.com>

* Remove irrelavant test case

Signed-off-by: Mai Anh <palasek182@gmail.com>

---------

Signed-off-by: Mai Anh <palasek182@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Mai Anh <palasek182@gmail.com>

* Date Semiotic Class for Vietnamese TN (#298)

* Date for vietnamese TN

Signed-off-by: Mai Anh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add roman support and correct copyright header

Signed-off-by: Mai Anh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change header to current year

Signed-off-by: Mai Anh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change header time

Signed-off-by: Mai Anh <palasek182@gmail.com>

---------

Signed-off-by: Mai Anh <palasek182@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Mai Anh <palasek182@gmail.com>

* Time - semiotic class for Vietnamese TN  (#302)

* Time - semiotic class for Vietnamese TN

Signed-off-by: Mai Anh <palasek182@gmail.com>

* remove irrelevant import and comment

Signed-off-by: Mai Anh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add comment and refractor pattern

Signed-off-by: Mai Anh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Change the spaces to NEMO_SPACE for maintenance.

Signed-off-by: Mai Anh <palasek182@gmail.com>

* Change the spaces to NEMO_SPACE for maintenance.

Signed-off-by: Mai Anh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Change the spaces to NEMO_SPACE for maintenance. - remove quote

Signed-off-by: Mai Anh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Mai Anh <palasek182@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Mai Anh <palasek182@gmail.com>

* Add Vietnamese TN support for Money and Range semiotic classes (#304)

* Add Vietnamese TN support for Money and Range semiotic classes

- Add money.py tagger and verbalizer for Vietnamese currency handling
- Add range.py tagger for numerical range processing
- Add supporting data files for money (currency, currency_minor, per_unit)
- Add quantity abbreviations and time units data
- Update existing taggers and verbalizers for integration
- Add comprehensive test cases for money and range functionality
- Update tokenize_and_classify to include new semiotic classes

Signed-off-by: Mai Anh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* modify illogical test cases

Signed-off-by: Mai Anh <palasek182@gmail.com>

* refractor and simplify word and punctuation to avoid hardcoding

Signed-off-by: Mai Anh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* refractor code money range

Signed-off-by: Mai Anh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Mai Anh <palasek182@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Mai Anh <palasek182@gmail.com>

* Add Vietnamese measure text normalization support (#307)

* Add Vietnamese measure text normalization support

- Added measure tagger and verbalizer for Vietnamese TN
- Updated money tagger and verbalizer to handle per-unit measurements
- Added test cases for measure normalization
- Updated fraction handling for better integration
- Added data files for measurements, prefixes, and per-unit bases

Signed-off-by: Mai Anh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Mai Anh <palasek182@gmail.com>

* add test case for range measure

Signed-off-by: Mai Anh <palasek182@gmail.com>

* additional support for cardinal and remove duplicate test case

Signed-off-by: Mai Anh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* refractor cardinal and add test cases

Signed-off-by: Mai Anh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* remove duplicate lines in run_eval file

Signed-off-by: Mai Anh <palasek182@gmail.com>

* refractor minor code

Signed-off-by: Mai Anh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add measure support for unit per unit cases

Signed-off-by: Mai Anh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Mai Anh <palasek182@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Mai Anh <palasek182@gmail.com>

* Vietnamese MRC 1.0 fix case (#312)

* fix and add cases

Signed-off-by: Mai Anh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Mai Anh <palasek182@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Mai Anh <palasek182@gmail.com>

* Fix word range (#334)

* fix range and quote

Signed-off-by: Mai Anh <palasek182@gmail.com>

* fix quote in post process

Signed-off-by: Mai Anh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix quote and range

Signed-off-by: Mai Anh <palasek182@gmail.com>

---------

Signed-off-by: Mai Anh <palasek182@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Mai Anh <palasek182@gmail.com>

* Date time itn (#333)

* improve numeric semiotic classes

Signed-off-by: Mai Anh <palasek182@gmail.com>

* Fix Jenkinsfile for CI (#325)

* Fix Jenkinsfile for CI

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix requirements for test

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update paths and docker

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix docker name

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Fix click version

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Change path of grammars for sparrowhawk tests

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update paths in sh_test.sh

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Update paths

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

* Revert paths

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: Mai Anh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Mai Anh <palasek182@gmail.com>

* revert old codes

Signed-off-by: Mai Anh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert not inherit

Signed-off-by: Mai Anh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* improve date time

Signed-off-by: Mai Anh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix pynini union instead of union operator

Signed-off-by: Mai Anh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* improve measure, telephone, electronic

Signed-off-by: Mai Anh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* change union operator to pynini union

Signed-off-by: Mai Anh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Mai Anh <palasek182@gmail.com>
Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Co-authored-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Mai Anh <palasek182@gmail.com>

---------

Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: Mai Anh <palasek182@gmail.com>
Co-authored-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Signed-off-by: Mai Anh <palasek182@gmail.com>

* Comma bugfix for En electronics (#332)

* fix bug with commas and electronics

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>

* update jenkins

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>

---------

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Signed-off-by: Mai Anh <palasek182@gmail.com>

* remove unuse import (#340)

Signed-off-by: Mai Anh <palasek182@gmail.com>
Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Signed-off-by: Mai Anh <palasek182@gmail.com>

* Update Jenkinsfile (#341)

Only mount TestData from path

Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Signed-off-by: Mai Anh <palasek182@gmail.com>

* [pre-commit.ci] pre-commit suggestions (#335)

updates:
- [github.com/pre-commit/pre-commit-hooks: v5.0.0 → v6.0.0](pre-commit/pre-commit-hooks@v5.0.0...v6.0.0)
- [github.com/PyCQA/flake8: 7.2.0 → 7.3.0](PyCQA/flake8@7.2.0...7.3.0)
- [github.com/PyCQA/isort: 6.0.1 → 6.1.0](PyCQA/isort@6.0.1...6.1.0)
- https://github.com/psf/blackhttps://github.com/psf/black-pre-commit-mirror
- [github.com/psf/black-pre-commit-mirror: 25.1.0 → 25.9.0](psf/black-pre-commit-mirror@25.1.0...25.9.0)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Mai Anh <palasek182@gmail.com>

* update jenkins cache

Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Signed-off-by: Mai Anh <palasek182@gmail.com>

* fill missing lang in arg run (#347)

Signed-off-by: Mai Anh <palasek182@gmail.com>
Signed-off-by: Mai Anh <palasek182@gmail.com>

---------

Signed-off-by: folivoramanh <palasek182@gmail.com>
Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Signed-off-by: Mai Anh <palasek182@gmail.com>
Signed-off-by: Anand Joseph <anajoseph@nvidia.com>
Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: anand-nv <105917641+anand-nv@users.noreply.github.com>
Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Co-authored-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
mgrafu and others added 2 commits October 29, 2025 14:46
Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com>
Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com>
Copy link
Member

@tbartley94 tbartley94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs loggers removed. Please refactor LOCS so there's less nesting. Please reuse redundant code to keep file size down.

graph = (
# Thousands pattern (e.g., "hai nghìn không ba" -> "2003")
graph_hundred_component = pynini.union(
pynini.union(graph_digit, graph_zero) + delete_space + pynutil.delete("trăm"), pynutil.insert("0")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

without weighting you're going to get non-determinate behavior where a 0 is just inserted here.

month_graph = _get_month_graph()

month_graph = pynutil.insert('month: "') + month_graph + pynutil.insert('"')
# Complete year graph with all supported patterns
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you modularize this. it's hard to track all the different graphs with the nesting

),
).optimize()

year_graph = pynutil.add_weight(year_graph_raw, YEAR_WEIGHT)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you need a dedicated weight or can you just reuse a general one ("min weight) for instance)

def __init__(self):
super().__init__(name="electronic", kind="classify")

delete_extra_space = pynutil.delete(" ")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should exist in the graph_utils class.

protocol = pynutil.insert('protocol: "') + protocol + pynutil.insert('"')
graph |= protocol
graph = pynini.union(graph, protocol)
########
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stray comment

start_time = time.time()
cardinal = CardinalFst(deterministic=deterministic)
cardinal_graph = cardinal.fst
logger.debug(f"cardinal: {time.time() - start_time: .2f}s -- {cardinal_graph.num_states()} nodes")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the loggers

key_cardinal = pynutil.delete("key_cardinal: \"") + pynini.closure(NEMO_NOT_QUOTE) + pynutil.delete("\"")
integer = pynutil.delete("integer: \"") + pynini.closure(NEMO_NOT_QUOTE) + pynutil.delete("\"")

graph_with_key = key_cardinal + delete_space + pynutil.insert(" ") + integer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NEMO_SPACE

+ pynini.closure(NEMO_NOT_QUOTE, 1)
+ pynutil.delete("\"")
)
graph = graph @ pynini.cdrewrite(pynini.cross(u"\u00a0", " "), "", "", NEMO_SIGMA)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NEMO_SPACE

* Refactor Vietnamese ITN taggers: modularize date, add data files, improve naming

- Modularize date.py year components for better readability
- Add weights to prevent non-deterministic behavior in insert operations
- Remove redundant YEAR_WEIGHT constant (use inline weights)
- Create zero_prefix.tsv and digit_special.tsv data files
- Rename delete_extra_space to delete_single_space in electronic.py for clarity
- Add delete_single_space to graph_utils for reuse

Signed-off-by: Mai Anh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Refactor Vietnamese: PSA follow

Signed-off-by: Mai Anh <palasek182@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Mai Anh <palasek182@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Mai Anh <palasek182@gmail.com>
Copy link
Member

@tbartley94 tbartley94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tbartley94 tbartley94 merged commit edd2288 into main Nov 5, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants