Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transformer-based Text Normalization Models #2415

Merged
merged 56 commits into from
Jul 8, 2021
Merged

Transformer-based Text Normalization Models #2415

merged 56 commits into from
Jul 8, 2021

Commits on Jun 29, 2021

  1. Add notebook with recommendations for 8 kHz speech (#2326)

    * Added a notebook with best practices for telephony speech
    
    * Added datasets detaiils
    
    * Added training recommendations
    
    * Emptied out cells with results
    
    * Added tutorial to docs
    
    Signed-off-by: jbalam <jbalam@nvidia.com>
    
    * Addressed review comments
    
    Signed-off-by: jbalam <jbalam@nvidia.com>
    
    * Added a line to note original sampling rate of an4
    
    Signed-off-by: jbalam <jbalam@nvidia.com>
    
    * Made changes suggested in review
    
    Signed-off-by: jbalam <jbalam@nvidia.com>
    Signed-off-by: Tuan Lai <tuanl@nvidia.com>
    jbalam-nv authored and Tuan Lai committed Jun 29, 2021
    Configuration menu
    Copy the full SHA
    e60f018 View commit details
    Browse the repository at this point in the history
  2. Add FastEmit support for RNNT Losses (#2374)

    * Temp commit
    
    Signed-off-by: smajumdar <titu1994@gmail.com>
    
    * Initial code for fastemit forward pass
    
    Signed-off-by: smajumdar <titu1994@gmail.com>
    
    * Correct return reg value
    
    Signed-off-by: smajumdar <titu1994@gmail.com>
    
    * Initial cpu impl
    
    Signed-off-by: smajumdar <titu1994@gmail.com>
    
    * Try gpu impl
    
    Signed-off-by: smajumdar <titu1994@gmail.com>
    
    * Try gpu impl
    
    Signed-off-by: smajumdar <titu1994@gmail.com>
    
    * Correct few impl
    
    Signed-off-by: smajumdar <titu1994@gmail.com>
    
    * Update fastemit scaling
    
    Signed-off-by: smajumdar <titu1994@gmail.com>
    
    * Cleanup fastemit
    
    Signed-off-by: smajumdar <titu1994@gmail.com>
    
    * Finalize FastEmit regularization PR
    
    Signed-off-by: smajumdar <titu1994@gmail.com>
    
    * Refactor code to support fastemit regularization
    
    Signed-off-by: smajumdar <titu1994@gmail.com>
    
    Co-authored-by: Samuel Kriman <samuelkriman@gmail.com>
    Signed-off-by: Tuan Lai <tuanl@nvidia.com>
    2 people authored and Tuan Lai committed Jun 29, 2021
    Configuration menu
    Copy the full SHA
    b3c1b01 View commit details
    Browse the repository at this point in the history
  3. Implement inference functions of TN models

    Signed-off-by: Tuan Lai <tuanl@nvidia.com>
    Tuan Lai committed Jun 29, 2021
    Configuration menu
    Copy the full SHA
    7461bc4 View commit details
    Browse the repository at this point in the history
  4. Minor Fix

    Signed-off-by: Tuan Lai <tuanl@nvidia.com>
    Tuan Lai committed Jun 29, 2021
    Configuration menu
    Copy the full SHA
    d11bfc8 View commit details
    Browse the repository at this point in the history
  5. fix bugs in hifigan code (#2392)

    Signed-off-by: Oktai Tatanov <oktai.tatanov@gmail.com>
    Signed-off-by: Tuan Lai <tuanl@nvidia.com>
    Oktai15 authored and Tuan Lai committed Jun 29, 2021
    Configuration menu
    Copy the full SHA
    21169a3 View commit details
    Browse the repository at this point in the history
  6. Update setup.py (#2394)

    Signed-off-by: Jason <jasoli@nvidia.com>
    Signed-off-by: Tuan Lai <tuanl@nvidia.com>
    blisc authored and Tuan Lai committed Jun 29, 2021
    Configuration menu
    Copy the full SHA
    f81dfc2 View commit details
    Browse the repository at this point in the history
  7. update checkpointing (#2396)

    Signed-off-by: Jason <jasoli@nvidia.com>
    Signed-off-by: Tuan Lai <tuanl@nvidia.com>
    blisc authored and Tuan Lai committed Jun 29, 2021
    Configuration menu
    Copy the full SHA
    2db9c4b View commit details
    Browse the repository at this point in the history
  8. byt5 unicode implementation (#2365)

    * Audio Norm (#2285)
    
    * add jenkins test, refactoring
    
    Signed-off-by: ekmb <ebakhturina@nvidia.com>
    
    * update test
    
    Signed-off-by: ekmb <ebakhturina@nvidia.com>
    
    * fix new test
    
    Signed-off-by: ekmb <ebakhturina@nvidia.com>
    
    * add serial to the default normalizer, add tests
    
    Signed-off-by: ekmb <ebakhturina@nvidia.com>
    
    * manifest test added
    
    Signed-off-by: ekmb <ebakhturina@nvidia.com>
    
    * expose more params, new test cases
    
    Signed-off-by: ekmb <ebakhturina@nvidia.com>
    
    * fix jenkins, serial clean, exclude range from cardinal
    
    Signed-off-by: ekmb <ebakhturina@nvidia.com>
    
    * jenkins
    
    Signed-off-by: ekmb <ebakhturina@nvidia.com>
    
    * jenkins dollar sign format
    
    Signed-off-by: ekmb <ebakhturina@nvidia.com>
    
    * jenkins
    
    Signed-off-by: ekmb <ebakhturina@nvidia.com>
    
    * jenkins dollar sign format
    
    Signed-off-by: ekmb <ebakhturina@nvidia.com>
    
    * addressed review comments
    
    Signed-off-by: ekmb <ebakhturina@nvidia.com>
    
    * fix decimal in measure
    
    Signed-off-by: ekmb <ebakhturina@nvidia.com>
    
    * move serial in cardinal
    
    Signed-off-by: ekmb <ebakhturina@nvidia.com>
    
    * clean up
    
    Signed-off-by: ekmb <ebakhturina@nvidia.com>
    
    * update for SH zero -> oh
    
    Signed-off-by: ekmb <ebakhturina@nvidia.com>
    
    * change n_tagger default
    
    Signed-off-by: ekmb <ebakhturina@nvidia.com>
    Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>
    
    * bumping version to 1.0.1
    
    Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>
    Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>
    
    * Add check for numba regardless of device
    
    Signed-off-by: smajumdar <titu1994@gmail.com>
    Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>
    
    * upper bound for webdataset
    
    Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>
    Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>
    
    * Correct Dockerfile
    
    Signed-off-by: smajumdar <titu1994@gmail.com>
    Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>
    
    * update readmes
    
    Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>
    Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>
    
    * update README (#2332)
    
    Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>
    Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>
    
    * ddp translate GPU allocation fix (#2312)
    
    * fixed branch in IR tutorial
    
    Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>
    
    * ddp translate GPU allocation fix
    
    Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>
    
    * map_location instead of set_device
    
    Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>
    
    Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
    Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
    Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>
    
    * Shallow fusion (#2315)
    
    * fixed branch in IR tutorial
    
    Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>
    
    * shallow fusion init commit
    
    Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>
    
    * debug info removed
    
    Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>
    
    Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
    Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
    Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>
    
    * [BUGFIX] Add upper bound to hydra for 1.0.x (#2337)
    
    * upper bound hydra
    
    Signed-off-by: ericharper <complex451@gmail.com>
    
    * upper bound hydra
    
    Signed-off-by: ericharper <complex451@gmail.com>
    Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>
    
    * update version number
    
    Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>
    Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>
    
    * update package version
    
    Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>
    Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>
    
    * sparrowhawk tests + punctuation post processing for pynini TN (#2320)
    
    * add jenkins test, refactoring
    
    Signed-off-by: ekmb <ebakhturina@nvidia.com>
    
    * update test
    
    Signed-off-by: ekmb <ebakhturina@nvidia.com>
    
    * fix new test
    
    Signed-off-by: ekmb <ebakhturina@nvidia.com>
    
    * add serial to the default normalizer, add tests
    
    Signed-off-by: ekmb <ebakhturina@nvidia.com>
    
    * manifest test added
    
    Signed-off-by: ekmb <ebakhturina@nvidia.com>
    
    * expose more params, new test cases
    
    Signed-off-by: ekmb <ebakhturina@nvidia.com>
    
    * fix jenkins, serial clean, exclude range from cardinal
    
    Signed-off-by: ekmb <ebakhturina@nvidia.com>
    
    * jenkins
    
    Signed-off-by: ekmb <ebakhturina@nvidia.com>
    
    * jenkins dollar sign format
    
    Signed-off-by: ekmb <ebakhturina@nvidia.com>
    
    * jenkins
    
    Signed-off-by: ekmb <ebakhturina@nvidia.com>
    
    * jenkins dollar sign format
    
    Signed-off-by: ekmb <ebakhturina@nvidia.com>
    
    * addressed review comments
    
    Signed-off-by: ekmb <ebakhturina@nvidia.com>
    
    * fix decimal in measure
    
    Signed-off-by: ekmb <ebakhturina@nvidia.com>
    
    * move serial in cardinal
    
    Signed-off-by: ekmb <ebakhturina@nvidia.com>
    
    * sh tests init
    
    Signed-off-by: ekmb <ebakhturina@nvidia.com>
    
    * sparrowhawk container tests support added
    
    Signed-off-by: ekmb <ebakhturina@nvidia.com>
    
    * add post process to normalize.py, update tests
    
    Signed-off-by: ekmb <ebakhturina@nvidia.com>
    
    * remove duplication
    
    Signed-off-by: ekmb <ebakhturina@nvidia.com>
    Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>
    
    * Update notebooks to 1.0.2 release (#2338)
    
    Signed-off-by: smajumdar <titu1994@gmail.com>
    Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>
    
    * Update ranges for omegaconf and hydra (#2336)
    
    * Update ranges
    
    Signed-off-by: smajumdar <titu1994@gmail.com>
    
    * Updates for Hydra and OmegaConf updates
    
    Signed-off-by: smajumdar <titu1994@gmail.com>
    
    * Style fixes
    
    Signed-off-by: smajumdar <titu1994@gmail.com>
    
    * Correct tests and revert patch for model utils
    
    Signed-off-by: smajumdar <titu1994@gmail.com>
    
    * Correct docstring
    
    Signed-off-by: smajumdar <titu1994@gmail.com>
    
    * Revert unnecessary change
    
    Signed-off-by: smajumdar <titu1994@gmail.com>
    
    * Revert unnecessary change
    
    Signed-off-by: smajumdar <titu1994@gmail.com>
    
    * Guard scheduler for None
    
    Signed-off-by: smajumdar <titu1994@gmail.com>
    
    * default to 0.0 if bpe_dropout is None
    
    Signed-off-by: ericharper <complex451@gmail.com>
    
    * Correctly log class that was restored
    
    Signed-off-by: smajumdar <titu1994@gmail.com>
    
    * Root patch *bpe_dropout
    
    Signed-off-by: smajumdar <titu1994@gmail.com>
    
    Co-authored-by: ericharper <complex451@gmail.com>
    Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>
    
    * Update FastPitch Export (#2355)
    
    Signed-off-by: Jason <jasoli@nvidia.com>
    Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>
    
    * byt5 unicode implementation, first cut
    
    Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>
    
    * add bytelevel tokenizer
    
    Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>
    
    * update out_dir to not collide (#2358)
    
    Signed-off-by: ericharper <complex451@gmail.com>
    Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>
    
    * Update container version to 21.05 (#2309)
    
    * Update container version
    
    Signed-off-by: smajumdar <titu1994@gmail.com>
    
    * Temporarily change export format of waveglow
    
    Signed-off-by: smajumdar <titu1994@gmail.com>
    
    * Add conda update for numba
    
    Signed-off-by: smajumdar <titu1994@gmail.com>
    
    * Update numba compat via global flag for strictness level `--relax_numba_compat`, remove pytorchlightning.metrics, refactor out numba utils to core, update tests
    
    Signed-off-by: smajumdar <titu1994@gmail.com>
    
    * Correct order of numba minimum verion, remove wrong flag from test
    
    Signed-off-by: smajumdar <titu1994@gmail.com>
    
    * Double test of cuda numba
    
    Signed-off-by: smajumdar <titu1994@gmail.com>
    
    * Double test of cuda numba
    
    Signed-off-by: smajumdar <titu1994@gmail.com>
    
    * Enable RNNT tests
    
    Signed-off-by: smajumdar <titu1994@gmail.com>
    Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>
    
    * Text Normalization Update (#2356)
    
    * upper cased date support
    
    Signed-off-by: ekmb <ebakhturina@nvidia.com>
    
    * update whitelist, change roman weights
    
    Signed-off-by: ekmb <ebakhturina@nvidia.com>
    
    * docstrings, space fix, init file
    
    Signed-off-by: ekmb <ebakhturina@nvidia.com>
    
    * lgtm
    
    Signed-off-by: ekmb <ebakhturina@nvidia.com>
    
    * fraction with measure class
    
    Signed-off-by: ekmb <ebakhturina@nvidia.com>
    Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>
    
    * address comment
    
    Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>
    
    * Add ASR CTC tutorial on fine-tuning on another language (#2346)
    
    * Add ASR CTC Language finetuning notebook
    
    Signed-off-by: smajumdar <titu1994@gmail.com>
    
    * Add to documentation
    
    Signed-off-by: smajumdar <titu1994@gmail.com>
    
    * Improve documentation
    
    Signed-off-by: smajumdar <titu1994@gmail.com>
    
    * Correct name of the dataset
    
    Signed-off-by: smajumdar <titu1994@gmail.com>
    Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>
    
    * Correct colab link to notebook (#2366)
    
    Signed-off-by: smajumdar <titu1994@gmail.com>
    Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>
    
    * sgdqa update data directories for testing (#2323)
    
    * sgdqa update data directories for testing
    
    Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
    
    * fix syntax
    
    Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
    
    * check if data dir exists
    
    Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
    
    * fix
    
    Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
    
    * adding pretrained model
    
    Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
    Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>
    
    * Added documentation for export() (#2330)
    
    * Added export document
    
    Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>
    
    * Addressed review comments
    
    Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>
    
    Co-authored-by: Eric Harper <complex451@gmail.com>
    Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>
    
    * Update Citrinet model card info (#2369)
    
    * Update model card info
    
    Signed-off-by: smajumdar <titu1994@gmail.com>
    
    * Cleanup Docs
    
    Signed-off-by: smajumdar <titu1994@gmail.com>
    Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>
    
    * [NMT] Model Parallel Megatron Encoders (#2238)
    
    * add megatron encoder
    
    Signed-off-by: ericharper <complex451@gmail.com>
    
    * added megatron to get_nmt_tokenizer
    
    Signed-off-by: ericharper <complex451@gmail.com>
    
    * add vocab_size and hidden_size to megatron bert
    
    Signed-off-by: ericharper <complex451@gmail.com>
    
    * add megatron encoder module
    
    Signed-off-by: ericharper <complex451@gmail.com>
    
    * fixed horrible typo
    
    Signed-off-by: ericharper <complex451@gmail.com>
    
    * fix typo and add default
    
    Signed-off-by: ericharper <complex451@gmail.com>
    
    * updating nlp overrides for mp nmt
    
    Signed-off-by: ericharper <complex451@gmail.com>
    
    * move some logic back to nlpmodel from overrides
    
    Signed-off-by: ericharper <complex451@gmail.com>
    
    * add checkpoint_file property
    
    Signed-off-by: ericharper <complex451@gmail.com>
    
    * fix property
    
    Signed-off-by: ericharper <complex451@gmail.com>
    
    * num_tokentypes=0
    
    Signed-off-by: ericharper <complex451@gmail.com>
    
    * typo
    
    Signed-off-by: ericharper <complex451@gmail.com>
    
    * typo
    
    Signed-off-by: ericharper <complex451@gmail.com>
    
    * find_unused_parameters=True
    
    Signed-off-by: ericharper <complex451@gmail.com>
    
    * typo
    
    Signed-off-by: ericharper <complex451@gmail.com>
    
    * style
    
    Signed-off-by: ericharper <complex451@gmail.com>
    
    * get instead of pop
    
    Signed-off-by: ericharper <complex451@gmail.com>
    
    * remove token type ids from megatron input example
    
    Signed-off-by: ericharper <complex451@gmail.com>
    
    * pop vocab_size
    
    Signed-off-by: ericharper <complex451@gmail.com>
    
    * fix checkpointing for model parallel
    
    Signed-off-by: ericharper <complex451@gmail.com>
    
    * fix bug in non model parallel
    
    Signed-off-by: ericharper <complex451@gmail.com>
    
    * convert cfg.trainer to dict
    
    Signed-off-by: ericharper <complex451@gmail.com>
    
    * make num_tokentypes configurable for nmt
    
    Signed-off-by: ericharper <complex451@gmail.com>
    
    * update checkpoint_file when using named megatron model in nemo
    
    Signed-off-by: ericharper <complex451@gmail.com>
    
    * make vocab_file configurable
    
    Signed-off-by: ericharper <complex451@gmail.com>
    
    * dataclass can't have mutable default
    
    Signed-off-by: ericharper <complex451@gmail.com>
    
    * style
    
    Signed-off-by: ericharper <complex451@gmail.com>
    
    * unused imports
    
    Signed-off-by: ericharper <complex451@gmail.com>
    
    * revert input example
    
    Signed-off-by: ericharper <complex451@gmail.com>
    
    * check that checkpoint version is not None
    
    Signed-off-by: ericharper <complex451@gmail.com>
    
    * add mp jenkins test
    
    Signed-off-by: ericharper <complex451@gmail.com>
    
    * update docstring
    
    Signed-off-by: ericharper <complex451@gmail.com>
    
    * add docs for pretrained encoders with nemo nmt
    
    Signed-off-by: ericharper <complex451@gmail.com>
    Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>
    
    * Add notebook with recommendations for 8 kHz speech (#2326)
    
    * Added a notebook with best practices for telephony speech
    
    * Added datasets detaiils
    
    * Added training recommendations
    
    * Emptied out cells with results
    
    * Added tutorial to docs
    
    Signed-off-by: jbalam <jbalam@nvidia.com>
    
    * Addressed review comments
    
    Signed-off-by: jbalam <jbalam@nvidia.com>
    
    * Added a line to note original sampling rate of an4
    
    Signed-off-by: jbalam <jbalam@nvidia.com>
    
    * Made changes suggested in review
    
    Signed-off-by: jbalam <jbalam@nvidia.com>
    Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>
    
    * Add FastEmit support for RNNT Losses (#2374)
    
    * Temp commit
    
    Signed-off-by: smajumdar <titu1994@gmail.com>
    
    * Initial code for fastemit forward pass
    
    Signed-off-by: smajumdar <titu1994@gmail.com>
    
    * Correct return reg value
    
    Signed-off-by: smajumdar <titu1994@gmail.com>
    
    * Initial cpu impl
    
    Signed-off-by: smajumdar <titu1994@gmail.com>
    
    * Try gpu impl
    
    Signed-off-by: smajumdar <titu1994@gmail.com>
    
    * Try gpu impl
    
    Signed-off-by: smajumdar <titu1994@gmail.com>
    
    * Correct few impl
    
    Signed-off-by: smajumdar <titu1994@gmail.com>
    
    * Update fastemit scaling
    
    Signed-off-by: smajumdar <titu1994@gmail.com>
    
    * Cleanup fastemit
    
    Signed-off-by: smajumdar <titu1994@gmail.com>
    
    * Finalize FastEmit regularization PR
    
    Signed-off-by: smajumdar <titu1994@gmail.com>
    
    * Refactor code to support fastemit regularization
    
    Signed-off-by: smajumdar <titu1994@gmail.com>
    
    Co-authored-by: Samuel Kriman <samuelkriman@gmail.com>
    Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>
    
    * byt5 unicode implementation, first cut
    
    Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>
    Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>
    
    * add bytelevel tokenizer
    
    Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>
    Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>
    
    * update styling
    
    Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>
    Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>
    
    * avoid circular import
    
    Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>
    Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>
    
    * fix bugs in hifigan code (#2392)
    
    Signed-off-by: Oktai Tatanov <oktai.tatanov@gmail.com>
    Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>
    
    * Update setup.py (#2394)
    
    Signed-off-by: Jason <jasoli@nvidia.com>
    Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>
    
    * Update bytelevel_tokenizer.py
    
    Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>
    
    * Update bytelevel_tokenizer.py
    
    Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>
    
    * typo
    
    Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>
    
    * missed one
    
    Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>
    
    * bug fixes
    
    Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>
    
    * style fix
    
    Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>
    
    * bytelevelprocessor is now generic.
    
    Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>
    
    * style fix
    
    Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>
    
    * update checkpointing (#2396)
    
    Signed-off-by: Jason <jasoli@nvidia.com>
    Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>
    
    * style
    
    Signed-off-by: ericharper <complex451@gmail.com>
    Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>
    
    * woops, didnt merge jenkinsfile the right way
    
    * add newline
    
    Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>
    
    * undo changes to enja processor
    
    Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>
    
    * processor selection decision fix
    
    Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>
    
    * newline fix
    
    Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>
    
    Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
    Co-authored-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>
    Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
    Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
    Co-authored-by: Aleksey Grinchuk (Oleksii Hrinchuk) <grinchuk.alexey@gmail.com>
    Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
    Co-authored-by: Eric Harper <complex451@gmail.com>
    Co-authored-by: Jason <jasoli@nvidia.com>
    Co-authored-by: mchrzanowski <mchrzanowski@nvidia.com>
    Co-authored-by: Yang Zhang <yzhang123@users.noreply.github.com>
    Co-authored-by: Boris Fomitchev <borisfom@users.noreply.github.com>
    Co-authored-by: Jagadeesh Balam <4916480+jbalam-nv@users.noreply.github.com>
    Co-authored-by: Samuel Kriman <samuelkriman@gmail.com>
    Co-authored-by: Oktai Tatanov <oktai.tatanov@gmail.com>
    Co-authored-by: root <root@dgx0026.nsv.rno1.nvmetal.net>
    Co-authored-by: root <root@dgx0079.nsv.rno1.nvmetal.net>
    Signed-off-by: Tuan Lai <tuanl@nvidia.com>
    17 people authored and Tuan Lai committed Jun 29, 2021
    Configuration menu
    Copy the full SHA
    3e76624 View commit details
    Browse the repository at this point in the history
  9. Minor Fix

    Signed-off-by: Tuan Lai <tuanl@nvidia.com>
    Tuan Lai committed Jun 29, 2021
    Configuration menu
    Copy the full SHA
    af49749 View commit details
    Browse the repository at this point in the history
  10. Minor Fixes

    Signed-off-by: Tuan Lai <tuanl@nvidia.com>
    Tuan Lai committed Jun 29, 2021
    Configuration menu
    Copy the full SHA
    6fdc83f View commit details
    Browse the repository at this point in the history
  11. Add TextNormalizationTestDataset and testing/evaluation code

    Signed-off-by: Tuan Lai <tuanl@nvidia.com>
    Tuan Lai committed Jun 29, 2021
    Configuration menu
    Copy the full SHA
    02a1943 View commit details
    Browse the repository at this point in the history
  12. Add TextNormalizationTaggerDataset and training code for tagger

    Signed-off-by: Tuan Lai <tuanl@nvidia.com>
    Tuan Lai committed Jun 29, 2021
    Configuration menu
    Copy the full SHA
    690dca3 View commit details
    Browse the repository at this point in the history
  13. Restore from local nemo ckpts

    Signed-off-by: Tuan Lai <tuanl@nvidia.com>
    Tuan Lai committed Jun 29, 2021
    Configuration menu
    Copy the full SHA
    4eeb6be View commit details
    Browse the repository at this point in the history
  14. Add TextNormalizationDecoderDataset

    Signed-off-by: Tuan Lai <tuanl@nvidia.com>
    Tuan Lai committed Jun 29, 2021
    Configuration menu
    Copy the full SHA
    d642381 View commit details
    Browse the repository at this point in the history
  15. Add interactive mode for neural_text_normalization_test.py

    Signed-off-by: Tuan Lai <tuanl@nvidia.com>
    Tuan Lai committed Jun 29, 2021
    Configuration menu
    Copy the full SHA
    e7f1a3f View commit details
    Browse the repository at this point in the history
  16. Add options to do training or not for tagger/decoder

    Signed-off-by: Tuan Lai <tuanl@nvidia.com>
    Tuan Lai committed Jun 29, 2021
    Configuration menu
    Copy the full SHA
    2775d62 View commit details
    Browse the repository at this point in the history
  17. Renamed

    Signed-off-by: Tuan Lai <tuanl@nvidia.com>
    Tuan Lai committed Jun 29, 2021
    Configuration menu
    Copy the full SHA
    cf13111 View commit details
    Browse the repository at this point in the history
  18. Implemented setup dataloader for decoder

    Signed-off-by: Tuan Lai <tuanl@nvidia.com>
    Tuan Lai committed Jun 29, 2021
    Configuration menu
    Copy the full SHA
    85c1417 View commit details
    Browse the repository at this point in the history
  19. Implemented training and validation for decoder

    Signed-off-by: Tuan Lai <tuanl@nvidia.com>
    Tuan Lai committed Jun 29, 2021
    Configuration menu
    Copy the full SHA
    7aeba27 View commit details
    Browse the repository at this point in the history
  20. Data augmentation for decoder training

    Signed-off-by: Tuan Lai <tuanl@nvidia.com>
    Tuan Lai committed Jun 29, 2021
    Configuration menu
    Copy the full SHA
    7bfa8de View commit details
    Browse the repository at this point in the history
  21. Config change

    Signed-off-by: Tuan Lai <tuanl@nvidia.com>
    Tuan Lai committed Jun 29, 2021
    Configuration menu
    Copy the full SHA
    8e370f0 View commit details
    Browse the repository at this point in the history
  22. add blossom-ci.yml (#2401)

    Signed-off-by: ericharper <complex451@gmail.com>
    Signed-off-by: Tuan Lai <tuanl@nvidia.com>
    ericharper authored and Tuan Lai committed Jun 29, 2021
    Configuration menu
    Copy the full SHA
    85f1e3c View commit details
    Browse the repository at this point in the history
  23. Merge r1.1 bugfixes into main (#2407)

    * Update notebook branch and Jenkinsfile for 1.1.0 testing (#2378)
    
    * update branch
    
    Signed-off-by: ericharper <complex451@gmail.com>
    
    * update jenkinsfile
    
    Signed-off-by: ericharper <complex451@gmail.com>
    
    * [BUGFIX] NMT Multi-node was incorrectly computing num_replicas (#2380)
    
    * fix property when not using model parallel
    
    Signed-off-by: ericharper <complex451@gmail.com>
    
    * fix property when not using model parallel
    
    Signed-off-by: ericharper <complex451@gmail.com>
    
    * add debug statement
    
    Signed-off-by: ericharper <complex451@gmail.com>
    
    * add debug statement
    
    Signed-off-by: ericharper <complex451@gmail.com>
    
    * instantiate with NLPDDPPlugin with num_nodes from trainer config
    
    Signed-off-by: ericharper <complex451@gmail.com>
    
    * Update ASR scripts for tokenizer building and tarred dataset building (#2381)
    
    * Update ASR scripts for tokenizer building and tarred dataset building
    
    Signed-off-by: smajumdar <titu1994@gmail.com>
    
    * Update container
    
    Signed-off-by: smajumdar <titu1994@gmail.com>
    
    * Add STT Zh Citrinet 1024 Gamma 0.25 model
    
    Signed-off-by: smajumdar <titu1994@gmail.com>
    
    * Update notebook (#2391)
    
    Signed-off-by: smajumdar <titu1994@gmail.com>
    
    * ASR Notebooks fix for 1.1.0 (#2395)
    
    * nb fix for spring clean
    
    Signed-off-by: fayejf <fayejf07@gmail.com>
    
    * remove outdated instruction
    
    Signed-off-by: fayejf <fayejf07@gmail.com>
    
    * Mean normalization (#2397)
    
    * norm embeddings
    
    Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>
    
    * move to utils
    
    Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>
    
    * Bugfix adaptive spec augment time masking (#2398)
    
    * bugfix adaptive spec augment
    
    Signed-off-by: smajumdar <titu1994@gmail.com>
    
    * Revert freq mask guard
    
    Signed-off-by: smajumdar <titu1994@gmail.com>
    
    * Revert freq mask guard
    
    Signed-off-by: smajumdar <titu1994@gmail.com>
    
    * Remove static time width clamping
    
    Signed-off-by: smajumdar <titu1994@gmail.com>
    
    * Correct typos and issues with notebooks (#2402)
    
    * Fix Primer notebook
    
    Signed-off-by: smajumdar <titu1994@gmail.com>
    
    * Typo
    
    Signed-off-by: smajumdar <titu1994@gmail.com>
    
    * remove accelerator=DDP in tutorial notebooks to avoid errors. (#2403)
    
    Signed-off-by: Hoo Chang Shin <hshin@nvidia.com>
    
    Co-authored-by: Hoo Chang Shin <hshin@nvidia.com>
    
    * style
    
    Signed-off-by: ericharper <complex451@gmail.com>
    
    * update jenkins branch
    
    Signed-off-by: ericharper <complex451@gmail.com>
    
    * update notebook branch to main
    
    Signed-off-by: ericharper <complex451@gmail.com>
    
    Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
    Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com>
    Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com>
    Co-authored-by: khcs <khcs@users.noreply.github.com>
    Co-authored-by: Hoo Chang Shin <hshin@nvidia.com>
    Signed-off-by: Tuan Lai <tuanl@nvidia.com>
    6 people authored and Tuan Lai committed Jun 29, 2021
    Configuration menu
    Copy the full SHA
    5d93237 View commit details
    Browse the repository at this point in the history
  24. Remove unused imports

    Signed-off-by: Tuan Lai <tuanl@nvidia.com>
    Tuan Lai committed Jun 29, 2021
    Configuration menu
    Copy the full SHA
    42ad961 View commit details
    Browse the repository at this point in the history
  25. Add initial doc for text_normalization

    Signed-off-by: Tuan Lai <tuanl@nvidia.com>
    Tuan Lai committed Jun 29, 2021
    Configuration menu
    Copy the full SHA
    b4ffb28 View commit details
    Browse the repository at this point in the history
  26. Fixed imports warnings

    Signed-off-by: Tuan Lai <tuanl@nvidia.com>
    Tuan Lai committed Jun 29, 2021
    Configuration menu
    Copy the full SHA
    d83ccf7 View commit details
    Browse the repository at this point in the history
  27. Minor Fix

    Signed-off-by: Tuan Lai <tuanl@nvidia.com>
    Tuan Lai committed Jun 29, 2021
    Configuration menu
    Copy the full SHA
    cff1dee View commit details
    Browse the repository at this point in the history
  28. Renamed

    Signed-off-by: Tuan Lai <tuanl@nvidia.com>
    Tuan Lai committed Jun 29, 2021
    Configuration menu
    Copy the full SHA
    1ab0583 View commit details
    Browse the repository at this point in the history
  29. Allowed duplex modes

    Signed-off-by: Tuan Lai <tuanl@nvidia.com>
    Tuan Lai committed Jun 29, 2021
    Configuration menu
    Copy the full SHA
    617fa9f View commit details
    Browse the repository at this point in the history
  30. Minor Fix

    Signed-off-by: Tuan Lai <tuanl@nvidia.com>
    Tuan Lai committed Jun 29, 2021
    Configuration menu
    Copy the full SHA
    96e58b9 View commit details
    Browse the repository at this point in the history
  31. Add docs for duplex_text_normalization_train and duplex_text_normaliz…

    …ation_test
    
    Signed-off-by: Tuan Lai <tuanl@nvidia.com>
    Tuan Lai committed Jun 29, 2021
    Configuration menu
    Copy the full SHA
    ec26b7e View commit details
    Browse the repository at this point in the history
  32. docstrings for model codes + minor fix

    Signed-off-by: Tuan Lai <tuanl@nvidia.com>
    Tuan Lai committed Jun 29, 2021
    Configuration menu
    Copy the full SHA
    9e3e825 View commit details
    Browse the repository at this point in the history
  33. Add more comments and doc strings

    Signed-off-by: Tuan Lai <tuanl@nvidia.com>
    Tuan Lai committed Jun 29, 2021
    Configuration menu
    Copy the full SHA
    f5b566c View commit details
    Browse the repository at this point in the history
  34. Configuration menu
    Copy the full SHA
    8859f1d View commit details
    Browse the repository at this point in the history

Commits on Jun 30, 2021

  1. Add doc for datasets + Use time.perf_counter()

    Signed-off-by: Tuan Lai <tuanl@nvidia.com>
    Tuan Lai committed Jun 30, 2021
    Configuration menu
    Copy the full SHA
    e0b036b View commit details
    Browse the repository at this point in the history
  2. Add code for preprocessing Google TN data

    Signed-off-by: Tuan Lai <tuanl@nvidia.com>
    Tuan Lai committed Jun 30, 2021
    Configuration menu
    Copy the full SHA
    780ed53 View commit details
    Browse the repository at this point in the history
  3. Add more docs and comments + Minor Fixes

    Signed-off-by: Tuan Lai <tuanl@nvidia.com>
    Tuan Lai committed Jun 30, 2021
    Configuration menu
    Copy the full SHA
    8580b9c View commit details
    Browse the repository at this point in the history
  4. Add more licenses + Fixed comments + Minors

    Signed-off-by: Tuan Lai <tuanl@nvidia.com>
    Tuan Lai committed Jun 30, 2021
    Configuration menu
    Copy the full SHA
    8f38766 View commit details
    Browse the repository at this point in the history

Commits on Jul 1, 2021

  1. Moved evaluation logic to DuplexTextNormalizationModel

    Signed-off-by: Tuan Lai <tuanl@nvidia.com>
    Tuan Lai committed Jul 1, 2021
    Configuration menu
    Copy the full SHA
    c68e1a3 View commit details
    Browse the repository at this point in the history
  2. Add logging errors

    Signed-off-by: Tuan Lai <tuanl@nvidia.com>
    Tuan Lai committed Jul 1, 2021
    Configuration menu
    Copy the full SHA
    b106c02 View commit details
    Browse the repository at this point in the history
  3. Updated validation code of tagger + Minors

    Signed-off-by: Tuan Lai <tuanl@nvidia.com>
    Tuan Lai committed Jul 1, 2021
    Configuration menu
    Copy the full SHA
    2cd3b43 View commit details
    Browse the repository at this point in the history
  4. Also write tag preds to log file

    Signed-off-by: Tuan Lai <tuanl@nvidia.com>
    Tuan Lai committed Jul 1, 2021
    Configuration menu
    Copy the full SHA
    ab5b1b9 View commit details
    Browse the repository at this point in the history
  5. Add data augmentation for tagger dataset

    Signed-off-by: Tuan Lai <tuanl@nvidia.com>
    Tuan Lai committed Jul 1, 2021
    Configuration menu
    Copy the full SHA
    fed32a8 View commit details
    Browse the repository at this point in the history
  6. Added experimental decorators

    Signed-off-by: Tuan Lai <tuanl@nvidia.com>
    Tuan Lai committed Jul 1, 2021
    Configuration menu
    Copy the full SHA
    0eaee3a View commit details
    Browse the repository at this point in the history

Commits on Jul 2, 2021

  1. Updated docs

    Signed-off-by: Tuan Lai <tuanl@nvidia.com>
    Tuan Lai committed Jul 2, 2021
    Configuration menu
    Copy the full SHA
    c2def6c View commit details
    Browse the repository at this point in the history
  2. Updated duplex_tn_config.yaml

    Signed-off-by: Tuan Lai <tuanl@nvidia.com>
    Tuan Lai committed Jul 2, 2021
    Configuration menu
    Copy the full SHA
    6f47f2b View commit details
    Browse the repository at this point in the history
  3. Compute token precision of tagger using NeMo metrics

    Signed-off-by: Tuan Lai <tuanl@nvidia.com>
    Tuan Lai committed Jul 2, 2021
    Configuration menu
    Copy the full SHA
    5360a48 View commit details
    Browse the repository at this point in the history

Commits on Jul 3, 2021

  1. Fixed saving issue when using ddp accelerator

    Signed-off-by: Tuan Lai <tuanl@nvidia.com>
    Tuan Lai committed Jul 3, 2021
    Configuration menu
    Copy the full SHA
    aafef49 View commit details
    Browse the repository at this point in the history
  2. Refactoring

    Signed-off-by: Tuan Lai <tuanl@nvidia.com>
    Tuan Lai committed Jul 3, 2021
    Configuration menu
    Copy the full SHA
    fb2d7e7 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    fcf921e View commit details
    Browse the repository at this point in the history

Commits on Jul 4, 2021

  1. Add option to keep punctuations in TextNormalizationTestDataset

    Signed-off-by: Tuan Lai <tuanl@nvidia.com>
    Tuan Lai committed Jul 4, 2021
    Configuration menu
    Copy the full SHA
    55f9db0 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    381318c View commit details
    Browse the repository at this point in the history

Commits on Jul 6, 2021

  1. Changes to input preprocessing + decoder's postprocessing

    Signed-off-by: Tuan Lai <tuanl@nvidia.com>
    Tuan Lai committed Jul 6, 2021
    Configuration menu
    Copy the full SHA
    722bb9e View commit details
    Browse the repository at this point in the history

Commits on Jul 7, 2021

  1. Fixed styles + Add references

    Signed-off-by: Tuan Lai <tuanl@nvidia.com>
    root committed Jul 7, 2021
    Configuration menu
    Copy the full SHA
    f15dad2 View commit details
    Browse the repository at this point in the history

Commits on Jul 8, 2021

  1. Renamed examples/nlp/duplex_text_normalization/utils.py to helpers.py

    Signed-off-by: Tuan Lai <tuanl@nvidia.com>
    Tuan Lai committed Jul 8, 2021
    Configuration menu
    Copy the full SHA
    a6e8cdc View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    f7d390b View commit details
    Browse the repository at this point in the history