Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

byt5 unicode implementation #2365

Merged
merged 53 commits into from
Jun 24, 2021
Merged

byt5 unicode implementation #2365

merged 53 commits into from
Jun 24, 2021

Conversation

mchrzanowski
Copy link
Contributor

@mchrzanowski mchrzanowski commented Jun 15, 2021

This PR adds:

  • a straightforward implementation of unicode encoding & decoding from ByT5
  • a very bare-bones text processor for use with this tokenizer that can support any language.

Usage:

To enable this, add the following lines to your nemo launch script:

model.encoder_tokenizer.library='byte-level'
model.decoder_tokenizer.library='byte-level'

Things work as-is with tarred datasets, but you will need to re-create your datasets using the byte-level tokenizer class.

Currently, the following model, which is 260M params, gets within 1 BLEU (18.9) of my transformer-large run on wmt20 ja-en (19.8):

    model.encoder_tokenizer.library='byte-level' \
    model.decoder_tokenizer.library='byte-level' \
    model.encoder.max_sequence_length=1024 \
    model.encoder.hidden_size=1536 \
    model.encoder.inner_size=4096 \
    model.encoder.num_attention_heads=16 \
    model.encoder.attn_layer_dropout=0.1 \
    model.encoder.ffn_dropout=0.1 \
    model.encoder.num_layers=10 \
    model.encoder.pre_ln=true \
    model.decoder.max_sequence_length=1024 \
    model.decoder.hidden_size=1536 \
    model.decoder.inner_size=4096 \
    model.decoder.num_attention_heads=16 \
    model.decoder.attn_layer_dropout=0.1 \
    model.decoder.ffn_dropout=0.1 \
    model.decoder.num_layers=2 \
    model.decoder.pre_ln=true \
    model.src_language=ja \
    model.tgt_language=en \
    model.beam_size=3 \
    model.max_generation_delta=3 \
    model.label_smoothing=0.1 \
    model.optim.lr=4e-4 \
    +model.optim.sched.warmup_steps=35000 \
    ~model.optim.sched.warmup_ratio \

@mchrzanowski mchrzanowski marked this pull request as draft June 15, 2021 18:59
@lgtm-com
Copy link

lgtm-com bot commented Jun 15, 2021

This pull request introduces 4 alerts when merging 885fba5 into 08f3c65 - view on LGTM.com

new alerts:

  • 4 for Unused import

@ericharper ericharper self-requested a review June 18, 2021 17:07
@mchrzanowski mchrzanowski marked this pull request as ready for review June 23, 2021 01:09
@mchrzanowski mchrzanowski changed the title byt5 unicode implementation, first cut byt5 unicode implementation Jun 23, 2021
@lgtm-com
Copy link

lgtm-com bot commented Jun 23, 2021

This pull request introduces 4 alerts when merging 3c451c2 into b9944d3 - view on LGTM.com

new alerts:

  • 4 for Unused import

@lgtm-com
Copy link

lgtm-com bot commented Jun 23, 2021

This pull request introduces 4 alerts when merging a2c5067 into b9944d3 - view on LGTM.com

new alerts:

  • 4 for Unused import

@lgtm-com
Copy link

lgtm-com bot commented Jun 23, 2021

This pull request introduces 4 alerts when merging 2bd22ac into b9944d3 - view on LGTM.com

new alerts:

  • 4 for Unused import

@ericharper
Copy link
Collaborator

Could you add sample usage to the PR readme?

Do you have any results from experiments comparing this tokenizer to yttm?

Have you used it with tarred datasets?

@lgtm-com
Copy link

lgtm-com bot commented Jun 23, 2021

This pull request introduces 4 alerts when merging f208c69 into 8c64f47 - view on LGTM.com

new alerts:

  • 4 for Unused import

@lgtm-com
Copy link

lgtm-com bot commented Jun 23, 2021

This pull request introduces 4 alerts when merging e015a8c into 39f76f8 - view on LGTM.com

new alerts:

  • 4 for Unused import

Copy link
Member

@okuchaiev okuchaiev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me. One question on special tokens, please see below.

nemo/collections/common/tokenizers/bytelevel_tokenizer.py Outdated Show resolved Hide resolved
nemo/collections/common/tokenizers/bytelevel_tokenizer.py Outdated Show resolved Hide resolved
@lgtm-com
Copy link

lgtm-com bot commented Jun 23, 2021

This pull request introduces 4 alerts when merging 290292d into 39f76f8 - view on LGTM.com

new alerts:

  • 4 for Unused import

@lgtm-com
Copy link

lgtm-com bot commented Jun 23, 2021

This pull request introduces 4 alerts when merging 6309963 into 39f76f8 - view on LGTM.com

new alerts:

  • 4 for Unused import

@lgtm-com
Copy link

lgtm-com bot commented Jun 23, 2021

This pull request introduces 4 alerts when merging d75e8dd into 39f76f8 - view on LGTM.com

new alerts:

  • 4 for Unused import

@lgtm-com
Copy link

lgtm-com bot commented Jun 23, 2021

This pull request introduces 4 alerts when merging f3b44f9 into 39f76f8 - view on LGTM.com

new alerts:

  • 4 for Unused import

@mchrzanowski
Copy link
Contributor Author

mchrzanowski commented Jun 23, 2021

Could you add sample usage to the PR readme?

can do!

Do you have any results from experiments comparing this tokenizer to yttm?

we are within 1 bleu of transformer-large on wmt20 ja-en. the relevant part of my launch config has been included in the readme

Have you used it with tarred datasets?

yep!

@mchrzanowski
Copy link
Contributor Author

i think i've addressed all comments? everyone happy?

@lgtm-com
Copy link

lgtm-com bot commented Jun 23, 2021

This pull request introduces 4 alerts when merging a4b4207 into 39f76f8 - view on LGTM.com

new alerts:

  • 4 for Unused import

Copy link
Collaborator

@ericharper ericharper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for the updates!

Copy link
Collaborator

@ericharper ericharper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, found a couple things to change. Could you see the new comments?

root and others added 4 commits June 23, 2021 15:59
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>
Signed-off-by: Jason <jasoli@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>
Signed-off-by: ericharper <complex451@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>
mchrzanowski and others added 5 commits June 23, 2021 16:03
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>
@lgtm-com
Copy link

lgtm-com bot commented Jun 23, 2021

This pull request introduces 2 alerts when merging 0ce62c0 into 4fc7444 - view on LGTM.com

new alerts:

  • 2 for Unused import

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>
Copy link
Contributor

@MaximumEntropy MaximumEntropy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making all of the changes :)

@lgtm-com
Copy link

lgtm-com bot commented Jun 23, 2021

This pull request introduces 2 alerts when merging 7d005a2 into 4fc7444 - view on LGTM.com

new alerts:

  • 2 for Unused import

"""

def detokenize(self, tokens: List[str]) -> str:
return ' '.join(tokens)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A small question, why are we adding whitespace while detokenizing while we are simply returning the text at the time of tokenization?

@mchrzanowski
Copy link
Contributor Author

mchrzanowski commented Jun 24, 2021 via email

@ericharper ericharper merged commit 70987d1 into main Jun 24, 2021
@ericharper ericharper deleted the byt5 branch June 24, 2021 02:05
@mchrzanowski
Copy link
Contributor Author

mchrzanowski commented Jun 24, 2021 via email

@mchrzanowski mchrzanowski restored the byt5 branch June 25, 2021 13:57
yzhang123 added a commit that referenced this pull request Jul 8, 2021
* Add notebook with recommendations for 8 kHz speech (#2326)

* Added a notebook with best practices for telephony speech

* Added datasets detaiils

* Added training recommendations

* Emptied out cells with results

* Added tutorial to docs

Signed-off-by: jbalam <jbalam@nvidia.com>

* Addressed review comments

Signed-off-by: jbalam <jbalam@nvidia.com>

* Added a line to note original sampling rate of an4

Signed-off-by: jbalam <jbalam@nvidia.com>

* Made changes suggested in review

Signed-off-by: jbalam <jbalam@nvidia.com>
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Add FastEmit support for RNNT Losses (#2374)

* Temp commit

Signed-off-by: smajumdar <titu1994@gmail.com>

* Initial code for fastemit forward pass

Signed-off-by: smajumdar <titu1994@gmail.com>

* Correct return reg value

Signed-off-by: smajumdar <titu1994@gmail.com>

* Initial cpu impl

Signed-off-by: smajumdar <titu1994@gmail.com>

* Try gpu impl

Signed-off-by: smajumdar <titu1994@gmail.com>

* Try gpu impl

Signed-off-by: smajumdar <titu1994@gmail.com>

* Correct few impl

Signed-off-by: smajumdar <titu1994@gmail.com>

* Update fastemit scaling

Signed-off-by: smajumdar <titu1994@gmail.com>

* Cleanup fastemit

Signed-off-by: smajumdar <titu1994@gmail.com>

* Finalize FastEmit regularization PR

Signed-off-by: smajumdar <titu1994@gmail.com>

* Refactor code to support fastemit regularization

Signed-off-by: smajumdar <titu1994@gmail.com>

Co-authored-by: Samuel Kriman <samuelkriman@gmail.com>
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Implement inference functions of TN models

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Minor Fix

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* fix bugs in hifigan code (#2392)

Signed-off-by: Oktai Tatanov <oktai.tatanov@gmail.com>
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Update setup.py (#2394)

Signed-off-by: Jason <jasoli@nvidia.com>
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* update checkpointing (#2396)

Signed-off-by: Jason <jasoli@nvidia.com>
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* byt5 unicode implementation (#2365)

* Audio Norm (#2285)

* add jenkins test, refactoring

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix new test

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add serial to the default normalizer, add tests

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* manifest test added

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* expose more params, new test cases

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix jenkins, serial clean, exclude range from cardinal

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins dollar sign format

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins dollar sign format

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* addressed review comments

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix decimal in measure

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* move serial in cardinal

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update for SH zero -> oh

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* change n_tagger default

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* bumping version to 1.0.1

Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Add check for numba regardless of device

Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* upper bound for webdataset

Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Correct Dockerfile

Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* update readmes

Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* update README (#2332)

Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* ddp translate GPU allocation fix (#2312)

* fixed branch in IR tutorial

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* ddp translate GPU allocation fix

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* map_location instead of set_device

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Shallow fusion (#2315)

* fixed branch in IR tutorial

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* shallow fusion init commit

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* debug info removed

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* [BUGFIX] Add upper bound to hydra for 1.0.x (#2337)

* upper bound hydra

Signed-off-by: ericharper <complex451@gmail.com>

* upper bound hydra

Signed-off-by: ericharper <complex451@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* update version number

Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* update package version

Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* sparrowhawk tests + punctuation post processing for pynini TN (#2320)

* add jenkins test, refactoring

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix new test

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add serial to the default normalizer, add tests

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* manifest test added

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* expose more params, new test cases

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix jenkins, serial clean, exclude range from cardinal

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins dollar sign format

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins dollar sign format

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* addressed review comments

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix decimal in measure

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* move serial in cardinal

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* sh tests init

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* sparrowhawk container tests support added

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add post process to normalize.py, update tests

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* remove duplication

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Update notebooks to 1.0.2 release (#2338)

Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Update ranges for omegaconf and hydra (#2336)

* Update ranges

Signed-off-by: smajumdar <titu1994@gmail.com>

* Updates for Hydra and OmegaConf updates

Signed-off-by: smajumdar <titu1994@gmail.com>

* Style fixes

Signed-off-by: smajumdar <titu1994@gmail.com>

* Correct tests and revert patch for model utils

Signed-off-by: smajumdar <titu1994@gmail.com>

* Correct docstring

Signed-off-by: smajumdar <titu1994@gmail.com>

* Revert unnecessary change

Signed-off-by: smajumdar <titu1994@gmail.com>

* Revert unnecessary change

Signed-off-by: smajumdar <titu1994@gmail.com>

* Guard scheduler for None

Signed-off-by: smajumdar <titu1994@gmail.com>

* default to 0.0 if bpe_dropout is None

Signed-off-by: ericharper <complex451@gmail.com>

* Correctly log class that was restored

Signed-off-by: smajumdar <titu1994@gmail.com>

* Root patch *bpe_dropout

Signed-off-by: smajumdar <titu1994@gmail.com>

Co-authored-by: ericharper <complex451@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Update FastPitch Export (#2355)

Signed-off-by: Jason <jasoli@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* byt5 unicode implementation, first cut

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* add bytelevel tokenizer

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* update out_dir to not collide (#2358)

Signed-off-by: ericharper <complex451@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Update container version to 21.05 (#2309)

* Update container version

Signed-off-by: smajumdar <titu1994@gmail.com>

* Temporarily change export format of waveglow

Signed-off-by: smajumdar <titu1994@gmail.com>

* Add conda update for numba

Signed-off-by: smajumdar <titu1994@gmail.com>

* Update numba compat via global flag for strictness level `--relax_numba_compat`, remove pytorchlightning.metrics, refactor out numba utils to core, update tests

Signed-off-by: smajumdar <titu1994@gmail.com>

* Correct order of numba minimum verion, remove wrong flag from test

Signed-off-by: smajumdar <titu1994@gmail.com>

* Double test of cuda numba

Signed-off-by: smajumdar <titu1994@gmail.com>

* Double test of cuda numba

Signed-off-by: smajumdar <titu1994@gmail.com>

* Enable RNNT tests

Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Text Normalization Update (#2356)

* upper cased date support

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update whitelist, change roman weights

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* docstrings, space fix, init file

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* lgtm

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fraction with measure class

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* address comment

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Add ASR CTC tutorial on fine-tuning on another language (#2346)

* Add ASR CTC Language finetuning notebook

Signed-off-by: smajumdar <titu1994@gmail.com>

* Add to documentation

Signed-off-by: smajumdar <titu1994@gmail.com>

* Improve documentation

Signed-off-by: smajumdar <titu1994@gmail.com>

* Correct name of the dataset

Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Correct colab link to notebook (#2366)

Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* sgdqa update data directories for testing (#2323)

* sgdqa update data directories for testing

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix syntax

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* check if data dir exists

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* adding pretrained model

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Added documentation for export() (#2330)

* Added export document

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>

* Addressed review comments

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>

Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Update Citrinet model card info (#2369)

* Update model card info

Signed-off-by: smajumdar <titu1994@gmail.com>

* Cleanup Docs

Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* [NMT] Model Parallel Megatron Encoders (#2238)

* add megatron encoder

Signed-off-by: ericharper <complex451@gmail.com>

* added megatron to get_nmt_tokenizer

Signed-off-by: ericharper <complex451@gmail.com>

* add vocab_size and hidden_size to megatron bert

Signed-off-by: ericharper <complex451@gmail.com>

* add megatron encoder module

Signed-off-by: ericharper <complex451@gmail.com>

* fixed horrible typo

Signed-off-by: ericharper <complex451@gmail.com>

* fix typo and add default

Signed-off-by: ericharper <complex451@gmail.com>

* updating nlp overrides for mp nmt

Signed-off-by: ericharper <complex451@gmail.com>

* move some logic back to nlpmodel from overrides

Signed-off-by: ericharper <complex451@gmail.com>

* add checkpoint_file property

Signed-off-by: ericharper <complex451@gmail.com>

* fix property

Signed-off-by: ericharper <complex451@gmail.com>

* num_tokentypes=0

Signed-off-by: ericharper <complex451@gmail.com>

* typo

Signed-off-by: ericharper <complex451@gmail.com>

* typo

Signed-off-by: ericharper <complex451@gmail.com>

* find_unused_parameters=True

Signed-off-by: ericharper <complex451@gmail.com>

* typo

Signed-off-by: ericharper <complex451@gmail.com>

* style

Signed-off-by: ericharper <complex451@gmail.com>

* get instead of pop

Signed-off-by: ericharper <complex451@gmail.com>

* remove token type ids from megatron input example

Signed-off-by: ericharper <complex451@gmail.com>

* pop vocab_size

Signed-off-by: ericharper <complex451@gmail.com>

* fix checkpointing for model parallel

Signed-off-by: ericharper <complex451@gmail.com>

* fix bug in non model parallel

Signed-off-by: ericharper <complex451@gmail.com>

* convert cfg.trainer to dict

Signed-off-by: ericharper <complex451@gmail.com>

* make num_tokentypes configurable for nmt

Signed-off-by: ericharper <complex451@gmail.com>

* update checkpoint_file when using named megatron model in nemo

Signed-off-by: ericharper <complex451@gmail.com>

* make vocab_file configurable

Signed-off-by: ericharper <complex451@gmail.com>

* dataclass can't have mutable default

Signed-off-by: ericharper <complex451@gmail.com>

* style

Signed-off-by: ericharper <complex451@gmail.com>

* unused imports

Signed-off-by: ericharper <complex451@gmail.com>

* revert input example

Signed-off-by: ericharper <complex451@gmail.com>

* check that checkpoint version is not None

Signed-off-by: ericharper <complex451@gmail.com>

* add mp jenkins test

Signed-off-by: ericharper <complex451@gmail.com>

* update docstring

Signed-off-by: ericharper <complex451@gmail.com>

* add docs for pretrained encoders with nemo nmt

Signed-off-by: ericharper <complex451@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Add notebook with recommendations for 8 kHz speech (#2326)

* Added a notebook with best practices for telephony speech

* Added datasets detaiils

* Added training recommendations

* Emptied out cells with results

* Added tutorial to docs

Signed-off-by: jbalam <jbalam@nvidia.com>

* Addressed review comments

Signed-off-by: jbalam <jbalam@nvidia.com>

* Added a line to note original sampling rate of an4

Signed-off-by: jbalam <jbalam@nvidia.com>

* Made changes suggested in review

Signed-off-by: jbalam <jbalam@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Add FastEmit support for RNNT Losses (#2374)

* Temp commit

Signed-off-by: smajumdar <titu1994@gmail.com>

* Initial code for fastemit forward pass

Signed-off-by: smajumdar <titu1994@gmail.com>

* Correct return reg value

Signed-off-by: smajumdar <titu1994@gmail.com>

* Initial cpu impl

Signed-off-by: smajumdar <titu1994@gmail.com>

* Try gpu impl

Signed-off-by: smajumdar <titu1994@gmail.com>

* Try gpu impl

Signed-off-by: smajumdar <titu1994@gmail.com>

* Correct few impl

Signed-off-by: smajumdar <titu1994@gmail.com>

* Update fastemit scaling

Signed-off-by: smajumdar <titu1994@gmail.com>

* Cleanup fastemit

Signed-off-by: smajumdar <titu1994@gmail.com>

* Finalize FastEmit regularization PR

Signed-off-by: smajumdar <titu1994@gmail.com>

* Refactor code to support fastemit regularization

Signed-off-by: smajumdar <titu1994@gmail.com>

Co-authored-by: Samuel Kriman <samuelkriman@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* byt5 unicode implementation, first cut

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* add bytelevel tokenizer

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* update styling

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* avoid circular import

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* fix bugs in hifigan code (#2392)

Signed-off-by: Oktai Tatanov <oktai.tatanov@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Update setup.py (#2394)

Signed-off-by: Jason <jasoli@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Update bytelevel_tokenizer.py

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Update bytelevel_tokenizer.py

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* typo

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* missed one

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* bug fixes

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* style fix

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* bytelevelprocessor is now generic.

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* style fix

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* update checkpointing (#2396)

Signed-off-by: Jason <jasoli@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* style

Signed-off-by: ericharper <complex451@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* woops, didnt merge jenkinsfile the right way

* add newline

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* undo changes to enja processor

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* processor selection decision fix

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* newline fix

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
Co-authored-by: Aleksey Grinchuk (Oleksii Hrinchuk) <grinchuk.alexey@gmail.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: Jason <jasoli@nvidia.com>
Co-authored-by: mchrzanowski <mchrzanowski@nvidia.com>
Co-authored-by: Yang Zhang <yzhang123@users.noreply.github.com>
Co-authored-by: Boris Fomitchev <borisfom@users.noreply.github.com>
Co-authored-by: Jagadeesh Balam <4916480+jbalam-nv@users.noreply.github.com>
Co-authored-by: Samuel Kriman <samuelkriman@gmail.com>
Co-authored-by: Oktai Tatanov <oktai.tatanov@gmail.com>
Co-authored-by: root <root@dgx0026.nsv.rno1.nvmetal.net>
Co-authored-by: root <root@dgx0079.nsv.rno1.nvmetal.net>
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Minor Fix

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Minor Fixes

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Add TextNormalizationTestDataset and testing/evaluation code

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Add TextNormalizationTaggerDataset and training code for tagger

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Restore from local nemo ckpts

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Add TextNormalizationDecoderDataset

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Add interactive mode for neural_text_normalization_test.py

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Add options to do training or not for tagger/decoder

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Renamed

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Implemented setup dataloader for decoder

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Implemented training and validation for decoder

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Data augmentation for decoder training

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Config change

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* add blossom-ci.yml (#2401)

Signed-off-by: ericharper <complex451@gmail.com>
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Merge r1.1 bugfixes into main (#2407)

* Update notebook branch and Jenkinsfile for 1.1.0 testing (#2378)

* update branch

Signed-off-by: ericharper <complex451@gmail.com>

* update jenkinsfile

Signed-off-by: ericharper <complex451@gmail.com>

* [BUGFIX] NMT Multi-node was incorrectly computing num_replicas (#2380)

* fix property when not using model parallel

Signed-off-by: ericharper <complex451@gmail.com>

* fix property when not using model parallel

Signed-off-by: ericharper <complex451@gmail.com>

* add debug statement

Signed-off-by: ericharper <complex451@gmail.com>

* add debug statement

Signed-off-by: ericharper <complex451@gmail.com>

* instantiate with NLPDDPPlugin with num_nodes from trainer config

Signed-off-by: ericharper <complex451@gmail.com>

* Update ASR scripts for tokenizer building and tarred dataset building (#2381)

* Update ASR scripts for tokenizer building and tarred dataset building

Signed-off-by: smajumdar <titu1994@gmail.com>

* Update container

Signed-off-by: smajumdar <titu1994@gmail.com>

* Add STT Zh Citrinet 1024 Gamma 0.25 model

Signed-off-by: smajumdar <titu1994@gmail.com>

* Update notebook (#2391)

Signed-off-by: smajumdar <titu1994@gmail.com>

* ASR Notebooks fix for 1.1.0 (#2395)

* nb fix for spring clean

Signed-off-by: fayejf <fayejf07@gmail.com>

* remove outdated instruction

Signed-off-by: fayejf <fayejf07@gmail.com>

* Mean normalization (#2397)

* norm embeddings

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* move to utils

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* Bugfix adaptive spec augment time masking (#2398)

* bugfix adaptive spec augment

Signed-off-by: smajumdar <titu1994@gmail.com>

* Revert freq mask guard

Signed-off-by: smajumdar <titu1994@gmail.com>

* Revert freq mask guard

Signed-off-by: smajumdar <titu1994@gmail.com>

* Remove static time width clamping

Signed-off-by: smajumdar <titu1994@gmail.com>

* Correct typos and issues with notebooks (#2402)

* Fix Primer notebook

Signed-off-by: smajumdar <titu1994@gmail.com>

* Typo

Signed-off-by: smajumdar <titu1994@gmail.com>

* remove accelerator=DDP in tutorial notebooks to avoid errors. (#2403)

Signed-off-by: Hoo Chang Shin <hshin@nvidia.com>

Co-authored-by: Hoo Chang Shin <hshin@nvidia.com>

* style

Signed-off-by: ericharper <complex451@gmail.com>

* update jenkins branch

Signed-off-by: ericharper <complex451@gmail.com>

* update notebook branch to main

Signed-off-by: ericharper <complex451@gmail.com>

Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com>
Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com>
Co-authored-by: khcs <khcs@users.noreply.github.com>
Co-authored-by: Hoo Chang Shin <hshin@nvidia.com>
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Remove unused imports

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Add initial doc for text_normalization

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Fixed imports warnings

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Minor Fix

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Renamed

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Allowed duplex modes

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Minor Fix

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Add docs for duplex_text_normalization_train and duplex_text_normalization_test

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* docstrings for model codes + minor fix

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Add more comments and doc strings

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Add doc for datasets + Use time.perf_counter()
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Add code for preprocessing Google TN data
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Add more docs and comments + Minor Fixes
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Add more licenses + Fixed comments + Minors
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Moved evaluation logic to DuplexTextNormalizationModel
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Add logging errors
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Updated validation code of tagger + Minors
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Also write tag preds to log file
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Add data augmentation for tagger dataset
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Added experimental decorators
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Updated docs
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Updated duplex_tn_config.yaml
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Compute token precision of tagger using NeMo metrics
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Fixed saving issue when using ddp accelerator
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Refactoring
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Add option to keep punctuations in TextNormalizationTestDataset
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Changes to input preprocessing + decoder's postprocessing
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Fixed styles + Add references
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Renamed examples/nlp/duplex_text_normalization/utils.py to helpers.py
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

Co-authored-by: Jagadeesh Balam <4916480+jbalam-nv@users.noreply.github.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: Samuel Kriman <samuelkriman@gmail.com>
Co-authored-by: Oktai Tatanov <oktai.tatanov@gmail.com>
Co-authored-by: Jason <jasoli@nvidia.com>
Co-authored-by: Mike Chrzanowski <mike.chrzanowski0@gmail.com>
Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
Co-authored-by: Aleksey Grinchuk (Oleksii Hrinchuk) <grinchuk.alexey@gmail.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: mchrzanowski <mchrzanowski@nvidia.com>
Co-authored-by: Yang Zhang <yzhang123@users.noreply.github.com>
Co-authored-by: Boris Fomitchev <borisfom@users.noreply.github.com>
Co-authored-by: root <root@dgx0026.nsv.rno1.nvmetal.net>
Co-authored-by: root <root@dgx0079.nsv.rno1.nvmetal.net>
Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com>
Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com>
Co-authored-by: khcs <khcs@users.noreply.github.com>
Co-authored-by: Hoo Chang Shin <hshin@nvidia.com>
mousebaiker pushed a commit to mousebaiker/NeMo that referenced this pull request Jul 8, 2021
* Audio Norm (NVIDIA#2285)

* add jenkins test, refactoring

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix new test

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add serial to the default normalizer, add tests

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* manifest test added

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* expose more params, new test cases

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix jenkins, serial clean, exclude range from cardinal

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins dollar sign format

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins dollar sign format

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* addressed review comments

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix decimal in measure

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* move serial in cardinal

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update for SH zero -> oh

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* change n_tagger default

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* bumping version to 1.0.1

Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Add check for numba regardless of device

Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* upper bound for webdataset

Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Correct Dockerfile

Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* update readmes

Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* update README (NVIDIA#2332)

Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* ddp translate GPU allocation fix (NVIDIA#2312)

* fixed branch in IR tutorial

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* ddp translate GPU allocation fix

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* map_location instead of set_device

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Shallow fusion (NVIDIA#2315)

* fixed branch in IR tutorial

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* shallow fusion init commit

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* debug info removed

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* [BUGFIX] Add upper bound to hydra for 1.0.x (NVIDIA#2337)

* upper bound hydra

Signed-off-by: ericharper <complex451@gmail.com>

* upper bound hydra

Signed-off-by: ericharper <complex451@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* update version number

Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* update package version

Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* sparrowhawk tests + punctuation post processing for pynini TN (NVIDIA#2320)

* add jenkins test, refactoring

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix new test

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add serial to the default normalizer, add tests

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* manifest test added

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* expose more params, new test cases

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix jenkins, serial clean, exclude range from cardinal

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins dollar sign format

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins dollar sign format

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* addressed review comments

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix decimal in measure

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* move serial in cardinal

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* sh tests init

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* sparrowhawk container tests support added

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add post process to normalize.py, update tests

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* remove duplication

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Update notebooks to 1.0.2 release (NVIDIA#2338)

Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Update ranges for omegaconf and hydra (NVIDIA#2336)

* Update ranges

Signed-off-by: smajumdar <titu1994@gmail.com>

* Updates for Hydra and OmegaConf updates

Signed-off-by: smajumdar <titu1994@gmail.com>

* Style fixes

Signed-off-by: smajumdar <titu1994@gmail.com>

* Correct tests and revert patch for model utils

Signed-off-by: smajumdar <titu1994@gmail.com>

* Correct docstring

Signed-off-by: smajumdar <titu1994@gmail.com>

* Revert unnecessary change

Signed-off-by: smajumdar <titu1994@gmail.com>

* Revert unnecessary change

Signed-off-by: smajumdar <titu1994@gmail.com>

* Guard scheduler for None

Signed-off-by: smajumdar <titu1994@gmail.com>

* default to 0.0 if bpe_dropout is None

Signed-off-by: ericharper <complex451@gmail.com>

* Correctly log class that was restored

Signed-off-by: smajumdar <titu1994@gmail.com>

* Root patch *bpe_dropout

Signed-off-by: smajumdar <titu1994@gmail.com>

Co-authored-by: ericharper <complex451@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Update FastPitch Export (NVIDIA#2355)

Signed-off-by: Jason <jasoli@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* byt5 unicode implementation, first cut

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* add bytelevel tokenizer

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* update out_dir to not collide (NVIDIA#2358)

Signed-off-by: ericharper <complex451@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Update container version to 21.05 (NVIDIA#2309)

* Update container version

Signed-off-by: smajumdar <titu1994@gmail.com>

* Temporarily change export format of waveglow

Signed-off-by: smajumdar <titu1994@gmail.com>

* Add conda update for numba

Signed-off-by: smajumdar <titu1994@gmail.com>

* Update numba compat via global flag for strictness level `--relax_numba_compat`, remove pytorchlightning.metrics, refactor out numba utils to core, update tests

Signed-off-by: smajumdar <titu1994@gmail.com>

* Correct order of numba minimum verion, remove wrong flag from test

Signed-off-by: smajumdar <titu1994@gmail.com>

* Double test of cuda numba

Signed-off-by: smajumdar <titu1994@gmail.com>

* Double test of cuda numba

Signed-off-by: smajumdar <titu1994@gmail.com>

* Enable RNNT tests

Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Text Normalization Update (NVIDIA#2356)

* upper cased date support

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update whitelist, change roman weights

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* docstrings, space fix, init file

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* lgtm

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fraction with measure class

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* address comment

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Add ASR CTC tutorial on fine-tuning on another language (NVIDIA#2346)

* Add ASR CTC Language finetuning notebook

Signed-off-by: smajumdar <titu1994@gmail.com>

* Add to documentation

Signed-off-by: smajumdar <titu1994@gmail.com>

* Improve documentation

Signed-off-by: smajumdar <titu1994@gmail.com>

* Correct name of the dataset

Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Correct colab link to notebook (NVIDIA#2366)

Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* sgdqa update data directories for testing (NVIDIA#2323)

* sgdqa update data directories for testing

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix syntax

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* check if data dir exists

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* adding pretrained model

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Added documentation for export() (NVIDIA#2330)

* Added export document

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>

* Addressed review comments

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>

Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Update Citrinet model card info (NVIDIA#2369)

* Update model card info

Signed-off-by: smajumdar <titu1994@gmail.com>

* Cleanup Docs

Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* [NMT] Model Parallel Megatron Encoders (NVIDIA#2238)

* add megatron encoder

Signed-off-by: ericharper <complex451@gmail.com>

* added megatron to get_nmt_tokenizer

Signed-off-by: ericharper <complex451@gmail.com>

* add vocab_size and hidden_size to megatron bert

Signed-off-by: ericharper <complex451@gmail.com>

* add megatron encoder module

Signed-off-by: ericharper <complex451@gmail.com>

* fixed horrible typo

Signed-off-by: ericharper <complex451@gmail.com>

* fix typo and add default

Signed-off-by: ericharper <complex451@gmail.com>

* updating nlp overrides for mp nmt

Signed-off-by: ericharper <complex451@gmail.com>

* move some logic back to nlpmodel from overrides

Signed-off-by: ericharper <complex451@gmail.com>

* add checkpoint_file property

Signed-off-by: ericharper <complex451@gmail.com>

* fix property

Signed-off-by: ericharper <complex451@gmail.com>

* num_tokentypes=0

Signed-off-by: ericharper <complex451@gmail.com>

* typo

Signed-off-by: ericharper <complex451@gmail.com>

* typo

Signed-off-by: ericharper <complex451@gmail.com>

* find_unused_parameters=True

Signed-off-by: ericharper <complex451@gmail.com>

* typo

Signed-off-by: ericharper <complex451@gmail.com>

* style

Signed-off-by: ericharper <complex451@gmail.com>

* get instead of pop

Signed-off-by: ericharper <complex451@gmail.com>

* remove token type ids from megatron input example

Signed-off-by: ericharper <complex451@gmail.com>

* pop vocab_size

Signed-off-by: ericharper <complex451@gmail.com>

* fix checkpointing for model parallel

Signed-off-by: ericharper <complex451@gmail.com>

* fix bug in non model parallel

Signed-off-by: ericharper <complex451@gmail.com>

* convert cfg.trainer to dict

Signed-off-by: ericharper <complex451@gmail.com>

* make num_tokentypes configurable for nmt

Signed-off-by: ericharper <complex451@gmail.com>

* update checkpoint_file when using named megatron model in nemo

Signed-off-by: ericharper <complex451@gmail.com>

* make vocab_file configurable

Signed-off-by: ericharper <complex451@gmail.com>

* dataclass can't have mutable default

Signed-off-by: ericharper <complex451@gmail.com>

* style

Signed-off-by: ericharper <complex451@gmail.com>

* unused imports

Signed-off-by: ericharper <complex451@gmail.com>

* revert input example

Signed-off-by: ericharper <complex451@gmail.com>

* check that checkpoint version is not None

Signed-off-by: ericharper <complex451@gmail.com>

* add mp jenkins test

Signed-off-by: ericharper <complex451@gmail.com>

* update docstring

Signed-off-by: ericharper <complex451@gmail.com>

* add docs for pretrained encoders with nemo nmt

Signed-off-by: ericharper <complex451@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Add notebook with recommendations for 8 kHz speech (NVIDIA#2326)

* Added a notebook with best practices for telephony speech

* Added datasets detaiils

* Added training recommendations

* Emptied out cells with results

* Added tutorial to docs

Signed-off-by: jbalam <jbalam@nvidia.com>

* Addressed review comments

Signed-off-by: jbalam <jbalam@nvidia.com>

* Added a line to note original sampling rate of an4

Signed-off-by: jbalam <jbalam@nvidia.com>

* Made changes suggested in review

Signed-off-by: jbalam <jbalam@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Add FastEmit support for RNNT Losses (NVIDIA#2374)

* Temp commit

Signed-off-by: smajumdar <titu1994@gmail.com>

* Initial code for fastemit forward pass

Signed-off-by: smajumdar <titu1994@gmail.com>

* Correct return reg value

Signed-off-by: smajumdar <titu1994@gmail.com>

* Initial cpu impl

Signed-off-by: smajumdar <titu1994@gmail.com>

* Try gpu impl

Signed-off-by: smajumdar <titu1994@gmail.com>

* Try gpu impl

Signed-off-by: smajumdar <titu1994@gmail.com>

* Correct few impl

Signed-off-by: smajumdar <titu1994@gmail.com>

* Update fastemit scaling

Signed-off-by: smajumdar <titu1994@gmail.com>

* Cleanup fastemit

Signed-off-by: smajumdar <titu1994@gmail.com>

* Finalize FastEmit regularization PR

Signed-off-by: smajumdar <titu1994@gmail.com>

* Refactor code to support fastemit regularization

Signed-off-by: smajumdar <titu1994@gmail.com>

Co-authored-by: Samuel Kriman <samuelkriman@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* byt5 unicode implementation, first cut

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* add bytelevel tokenizer

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* update styling

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* avoid circular import

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* fix bugs in hifigan code (NVIDIA#2392)

Signed-off-by: Oktai Tatanov <oktai.tatanov@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Update setup.py (NVIDIA#2394)

Signed-off-by: Jason <jasoli@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Update bytelevel_tokenizer.py

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Update bytelevel_tokenizer.py

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* typo

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* missed one

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* bug fixes

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* style fix

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* bytelevelprocessor is now generic.

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* style fix

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* update checkpointing (NVIDIA#2396)

Signed-off-by: Jason <jasoli@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* style

Signed-off-by: ericharper <complex451@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* woops, didnt merge jenkinsfile the right way

* add newline

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* undo changes to enja processor

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* processor selection decision fix

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* newline fix

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
Co-authored-by: Aleksey Grinchuk (Oleksii Hrinchuk) <grinchuk.alexey@gmail.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: Jason <jasoli@nvidia.com>
Co-authored-by: mchrzanowski <mchrzanowski@nvidia.com>
Co-authored-by: Yang Zhang <yzhang123@users.noreply.github.com>
Co-authored-by: Boris Fomitchev <borisfom@users.noreply.github.com>
Co-authored-by: Jagadeesh Balam <4916480+jbalam-nv@users.noreply.github.com>
Co-authored-by: Samuel Kriman <samuelkriman@gmail.com>
Co-authored-by: Oktai Tatanov <oktai.tatanov@gmail.com>
Co-authored-by: root <root@dgx0026.nsv.rno1.nvmetal.net>
Co-authored-by: root <root@dgx0079.nsv.rno1.nvmetal.net>
pasandi20 pushed a commit to pasandi20/NeMo that referenced this pull request Jul 13, 2021
* Add notebook with recommendations for 8 kHz speech (NVIDIA#2326)

* Added a notebook with best practices for telephony speech

* Added datasets detaiils

* Added training recommendations

* Emptied out cells with results

* Added tutorial to docs

Signed-off-by: jbalam <jbalam@nvidia.com>

* Addressed review comments

Signed-off-by: jbalam <jbalam@nvidia.com>

* Added a line to note original sampling rate of an4

Signed-off-by: jbalam <jbalam@nvidia.com>

* Made changes suggested in review

Signed-off-by: jbalam <jbalam@nvidia.com>
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Add FastEmit support for RNNT Losses (NVIDIA#2374)

* Temp commit

Signed-off-by: smajumdar <titu1994@gmail.com>

* Initial code for fastemit forward pass

Signed-off-by: smajumdar <titu1994@gmail.com>

* Correct return reg value

Signed-off-by: smajumdar <titu1994@gmail.com>

* Initial cpu impl

Signed-off-by: smajumdar <titu1994@gmail.com>

* Try gpu impl

Signed-off-by: smajumdar <titu1994@gmail.com>

* Try gpu impl

Signed-off-by: smajumdar <titu1994@gmail.com>

* Correct few impl

Signed-off-by: smajumdar <titu1994@gmail.com>

* Update fastemit scaling

Signed-off-by: smajumdar <titu1994@gmail.com>

* Cleanup fastemit

Signed-off-by: smajumdar <titu1994@gmail.com>

* Finalize FastEmit regularization PR

Signed-off-by: smajumdar <titu1994@gmail.com>

* Refactor code to support fastemit regularization

Signed-off-by: smajumdar <titu1994@gmail.com>

Co-authored-by: Samuel Kriman <samuelkriman@gmail.com>
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Implement inference functions of TN models

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Minor Fix

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* fix bugs in hifigan code (NVIDIA#2392)

Signed-off-by: Oktai Tatanov <oktai.tatanov@gmail.com>
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Update setup.py (NVIDIA#2394)

Signed-off-by: Jason <jasoli@nvidia.com>
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* update checkpointing (NVIDIA#2396)

Signed-off-by: Jason <jasoli@nvidia.com>
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* byt5 unicode implementation (NVIDIA#2365)

* Audio Norm (NVIDIA#2285)

* add jenkins test, refactoring

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix new test

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add serial to the default normalizer, add tests

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* manifest test added

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* expose more params, new test cases

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix jenkins, serial clean, exclude range from cardinal

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins dollar sign format

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins dollar sign format

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* addressed review comments

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix decimal in measure

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* move serial in cardinal

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update for SH zero -> oh

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* change n_tagger default

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* bumping version to 1.0.1

Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Add check for numba regardless of device

Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* upper bound for webdataset

Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Correct Dockerfile

Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* update readmes

Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* update README (NVIDIA#2332)

Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* ddp translate GPU allocation fix (NVIDIA#2312)

* fixed branch in IR tutorial

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* ddp translate GPU allocation fix

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* map_location instead of set_device

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Shallow fusion (NVIDIA#2315)

* fixed branch in IR tutorial

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* shallow fusion init commit

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* debug info removed

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* [BUGFIX] Add upper bound to hydra for 1.0.x (NVIDIA#2337)

* upper bound hydra

Signed-off-by: ericharper <complex451@gmail.com>

* upper bound hydra

Signed-off-by: ericharper <complex451@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* update version number

Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* update package version

Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* sparrowhawk tests + punctuation post processing for pynini TN (NVIDIA#2320)

* add jenkins test, refactoring

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix new test

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add serial to the default normalizer, add tests

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* manifest test added

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* expose more params, new test cases

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix jenkins, serial clean, exclude range from cardinal

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins dollar sign format

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins dollar sign format

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* addressed review comments

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix decimal in measure

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* move serial in cardinal

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* sh tests init

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* sparrowhawk container tests support added

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add post process to normalize.py, update tests

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* remove duplication

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Update notebooks to 1.0.2 release (NVIDIA#2338)

Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Update ranges for omegaconf and hydra (NVIDIA#2336)

* Update ranges

Signed-off-by: smajumdar <titu1994@gmail.com>

* Updates for Hydra and OmegaConf updates

Signed-off-by: smajumdar <titu1994@gmail.com>

* Style fixes

Signed-off-by: smajumdar <titu1994@gmail.com>

* Correct tests and revert patch for model utils

Signed-off-by: smajumdar <titu1994@gmail.com>

* Correct docstring

Signed-off-by: smajumdar <titu1994@gmail.com>

* Revert unnecessary change

Signed-off-by: smajumdar <titu1994@gmail.com>

* Revert unnecessary change

Signed-off-by: smajumdar <titu1994@gmail.com>

* Guard scheduler for None

Signed-off-by: smajumdar <titu1994@gmail.com>

* default to 0.0 if bpe_dropout is None

Signed-off-by: ericharper <complex451@gmail.com>

* Correctly log class that was restored

Signed-off-by: smajumdar <titu1994@gmail.com>

* Root patch *bpe_dropout

Signed-off-by: smajumdar <titu1994@gmail.com>

Co-authored-by: ericharper <complex451@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Update FastPitch Export (NVIDIA#2355)

Signed-off-by: Jason <jasoli@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* byt5 unicode implementation, first cut

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* add bytelevel tokenizer

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* update out_dir to not collide (NVIDIA#2358)

Signed-off-by: ericharper <complex451@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Update container version to 21.05 (NVIDIA#2309)

* Update container version

Signed-off-by: smajumdar <titu1994@gmail.com>

* Temporarily change export format of waveglow

Signed-off-by: smajumdar <titu1994@gmail.com>

* Add conda update for numba

Signed-off-by: smajumdar <titu1994@gmail.com>

* Update numba compat via global flag for strictness level `--relax_numba_compat`, remove pytorchlightning.metrics, refactor out numba utils to core, update tests

Signed-off-by: smajumdar <titu1994@gmail.com>

* Correct order of numba minimum verion, remove wrong flag from test

Signed-off-by: smajumdar <titu1994@gmail.com>

* Double test of cuda numba

Signed-off-by: smajumdar <titu1994@gmail.com>

* Double test of cuda numba

Signed-off-by: smajumdar <titu1994@gmail.com>

* Enable RNNT tests

Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Text Normalization Update (NVIDIA#2356)

* upper cased date support

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update whitelist, change roman weights

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* docstrings, space fix, init file

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* lgtm

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fraction with measure class

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* address comment

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Add ASR CTC tutorial on fine-tuning on another language (NVIDIA#2346)

* Add ASR CTC Language finetuning notebook

Signed-off-by: smajumdar <titu1994@gmail.com>

* Add to documentation

Signed-off-by: smajumdar <titu1994@gmail.com>

* Improve documentation

Signed-off-by: smajumdar <titu1994@gmail.com>

* Correct name of the dataset

Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Correct colab link to notebook (NVIDIA#2366)

Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* sgdqa update data directories for testing (NVIDIA#2323)

* sgdqa update data directories for testing

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix syntax

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* check if data dir exists

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* adding pretrained model

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Added documentation for export() (NVIDIA#2330)

* Added export document

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>

* Addressed review comments

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>

Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Update Citrinet model card info (NVIDIA#2369)

* Update model card info

Signed-off-by: smajumdar <titu1994@gmail.com>

* Cleanup Docs

Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* [NMT] Model Parallel Megatron Encoders (NVIDIA#2238)

* add megatron encoder

Signed-off-by: ericharper <complex451@gmail.com>

* added megatron to get_nmt_tokenizer

Signed-off-by: ericharper <complex451@gmail.com>

* add vocab_size and hidden_size to megatron bert

Signed-off-by: ericharper <complex451@gmail.com>

* add megatron encoder module

Signed-off-by: ericharper <complex451@gmail.com>

* fixed horrible typo

Signed-off-by: ericharper <complex451@gmail.com>

* fix typo and add default

Signed-off-by: ericharper <complex451@gmail.com>

* updating nlp overrides for mp nmt

Signed-off-by: ericharper <complex451@gmail.com>

* move some logic back to nlpmodel from overrides

Signed-off-by: ericharper <complex451@gmail.com>

* add checkpoint_file property

Signed-off-by: ericharper <complex451@gmail.com>

* fix property

Signed-off-by: ericharper <complex451@gmail.com>

* num_tokentypes=0

Signed-off-by: ericharper <complex451@gmail.com>

* typo

Signed-off-by: ericharper <complex451@gmail.com>

* typo

Signed-off-by: ericharper <complex451@gmail.com>

* find_unused_parameters=True

Signed-off-by: ericharper <complex451@gmail.com>

* typo

Signed-off-by: ericharper <complex451@gmail.com>

* style

Signed-off-by: ericharper <complex451@gmail.com>

* get instead of pop

Signed-off-by: ericharper <complex451@gmail.com>

* remove token type ids from megatron input example

Signed-off-by: ericharper <complex451@gmail.com>

* pop vocab_size

Signed-off-by: ericharper <complex451@gmail.com>

* fix checkpointing for model parallel

Signed-off-by: ericharper <complex451@gmail.com>

* fix bug in non model parallel

Signed-off-by: ericharper <complex451@gmail.com>

* convert cfg.trainer to dict

Signed-off-by: ericharper <complex451@gmail.com>

* make num_tokentypes configurable for nmt

Signed-off-by: ericharper <complex451@gmail.com>

* update checkpoint_file when using named megatron model in nemo

Signed-off-by: ericharper <complex451@gmail.com>

* make vocab_file configurable

Signed-off-by: ericharper <complex451@gmail.com>

* dataclass can't have mutable default

Signed-off-by: ericharper <complex451@gmail.com>

* style

Signed-off-by: ericharper <complex451@gmail.com>

* unused imports

Signed-off-by: ericharper <complex451@gmail.com>

* revert input example

Signed-off-by: ericharper <complex451@gmail.com>

* check that checkpoint version is not None

Signed-off-by: ericharper <complex451@gmail.com>

* add mp jenkins test

Signed-off-by: ericharper <complex451@gmail.com>

* update docstring

Signed-off-by: ericharper <complex451@gmail.com>

* add docs for pretrained encoders with nemo nmt

Signed-off-by: ericharper <complex451@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Add notebook with recommendations for 8 kHz speech (NVIDIA#2326)

* Added a notebook with best practices for telephony speech

* Added datasets detaiils

* Added training recommendations

* Emptied out cells with results

* Added tutorial to docs

Signed-off-by: jbalam <jbalam@nvidia.com>

* Addressed review comments

Signed-off-by: jbalam <jbalam@nvidia.com>

* Added a line to note original sampling rate of an4

Signed-off-by: jbalam <jbalam@nvidia.com>

* Made changes suggested in review

Signed-off-by: jbalam <jbalam@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Add FastEmit support for RNNT Losses (NVIDIA#2374)

* Temp commit

Signed-off-by: smajumdar <titu1994@gmail.com>

* Initial code for fastemit forward pass

Signed-off-by: smajumdar <titu1994@gmail.com>

* Correct return reg value

Signed-off-by: smajumdar <titu1994@gmail.com>

* Initial cpu impl

Signed-off-by: smajumdar <titu1994@gmail.com>

* Try gpu impl

Signed-off-by: smajumdar <titu1994@gmail.com>

* Try gpu impl

Signed-off-by: smajumdar <titu1994@gmail.com>

* Correct few impl

Signed-off-by: smajumdar <titu1994@gmail.com>

* Update fastemit scaling

Signed-off-by: smajumdar <titu1994@gmail.com>

* Cleanup fastemit

Signed-off-by: smajumdar <titu1994@gmail.com>

* Finalize FastEmit regularization PR

Signed-off-by: smajumdar <titu1994@gmail.com>

* Refactor code to support fastemit regularization

Signed-off-by: smajumdar <titu1994@gmail.com>

Co-authored-by: Samuel Kriman <samuelkriman@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* byt5 unicode implementation, first cut

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* add bytelevel tokenizer

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* update styling

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* avoid circular import

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* fix bugs in hifigan code (NVIDIA#2392)

Signed-off-by: Oktai Tatanov <oktai.tatanov@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Update setup.py (NVIDIA#2394)

Signed-off-by: Jason <jasoli@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Update bytelevel_tokenizer.py

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Update bytelevel_tokenizer.py

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* typo

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* missed one

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* bug fixes

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* style fix

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* bytelevelprocessor is now generic.

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* style fix

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* update checkpointing (NVIDIA#2396)

Signed-off-by: Jason <jasoli@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* style

Signed-off-by: ericharper <complex451@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* woops, didnt merge jenkinsfile the right way

* add newline

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* undo changes to enja processor

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* processor selection decision fix

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* newline fix

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
Co-authored-by: Aleksey Grinchuk (Oleksii Hrinchuk) <grinchuk.alexey@gmail.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: Jason <jasoli@nvidia.com>
Co-authored-by: mchrzanowski <mchrzanowski@nvidia.com>
Co-authored-by: Yang Zhang <yzhang123@users.noreply.github.com>
Co-authored-by: Boris Fomitchev <borisfom@users.noreply.github.com>
Co-authored-by: Jagadeesh Balam <4916480+jbalam-nv@users.noreply.github.com>
Co-authored-by: Samuel Kriman <samuelkriman@gmail.com>
Co-authored-by: Oktai Tatanov <oktai.tatanov@gmail.com>
Co-authored-by: root <root@dgx0026.nsv.rno1.nvmetal.net>
Co-authored-by: root <root@dgx0079.nsv.rno1.nvmetal.net>
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Minor Fix

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Minor Fixes

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Add TextNormalizationTestDataset and testing/evaluation code

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Add TextNormalizationTaggerDataset and training code for tagger

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Restore from local nemo ckpts

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Add TextNormalizationDecoderDataset

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Add interactive mode for neural_text_normalization_test.py

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Add options to do training or not for tagger/decoder

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Renamed

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Implemented setup dataloader for decoder

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Implemented training and validation for decoder

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Data augmentation for decoder training

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Config change

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* add blossom-ci.yml (NVIDIA#2401)

Signed-off-by: ericharper <complex451@gmail.com>
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Merge r1.1 bugfixes into main (NVIDIA#2407)

* Update notebook branch and Jenkinsfile for 1.1.0 testing (NVIDIA#2378)

* update branch

Signed-off-by: ericharper <complex451@gmail.com>

* update jenkinsfile

Signed-off-by: ericharper <complex451@gmail.com>

* [BUGFIX] NMT Multi-node was incorrectly computing num_replicas (NVIDIA#2380)

* fix property when not using model parallel

Signed-off-by: ericharper <complex451@gmail.com>

* fix property when not using model parallel

Signed-off-by: ericharper <complex451@gmail.com>

* add debug statement

Signed-off-by: ericharper <complex451@gmail.com>

* add debug statement

Signed-off-by: ericharper <complex451@gmail.com>

* instantiate with NLPDDPPlugin with num_nodes from trainer config

Signed-off-by: ericharper <complex451@gmail.com>

* Update ASR scripts for tokenizer building and tarred dataset building (NVIDIA#2381)

* Update ASR scripts for tokenizer building and tarred dataset building

Signed-off-by: smajumdar <titu1994@gmail.com>

* Update container

Signed-off-by: smajumdar <titu1994@gmail.com>

* Add STT Zh Citrinet 1024 Gamma 0.25 model

Signed-off-by: smajumdar <titu1994@gmail.com>

* Update notebook (NVIDIA#2391)

Signed-off-by: smajumdar <titu1994@gmail.com>

* ASR Notebooks fix for 1.1.0 (NVIDIA#2395)

* nb fix for spring clean

Signed-off-by: fayejf <fayejf07@gmail.com>

* remove outdated instruction

Signed-off-by: fayejf <fayejf07@gmail.com>

* Mean normalization (NVIDIA#2397)

* norm embeddings

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* move to utils

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* Bugfix adaptive spec augment time masking (NVIDIA#2398)

* bugfix adaptive spec augment

Signed-off-by: smajumdar <titu1994@gmail.com>

* Revert freq mask guard

Signed-off-by: smajumdar <titu1994@gmail.com>

* Revert freq mask guard

Signed-off-by: smajumdar <titu1994@gmail.com>

* Remove static time width clamping

Signed-off-by: smajumdar <titu1994@gmail.com>

* Correct typos and issues with notebooks (NVIDIA#2402)

* Fix Primer notebook

Signed-off-by: smajumdar <titu1994@gmail.com>

* Typo

Signed-off-by: smajumdar <titu1994@gmail.com>

* remove accelerator=DDP in tutorial notebooks to avoid errors. (NVIDIA#2403)

Signed-off-by: Hoo Chang Shin <hshin@nvidia.com>

Co-authored-by: Hoo Chang Shin <hshin@nvidia.com>

* style

Signed-off-by: ericharper <complex451@gmail.com>

* update jenkins branch

Signed-off-by: ericharper <complex451@gmail.com>

* update notebook branch to main

Signed-off-by: ericharper <complex451@gmail.com>

Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com>
Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com>
Co-authored-by: khcs <khcs@users.noreply.github.com>
Co-authored-by: Hoo Chang Shin <hshin@nvidia.com>
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Remove unused imports

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Add initial doc for text_normalization

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Fixed imports warnings

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Minor Fix

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Renamed

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Allowed duplex modes

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Minor Fix

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Add docs for duplex_text_normalization_train and duplex_text_normalization_test

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* docstrings for model codes + minor fix

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Add more comments and doc strings

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Add doc for datasets + Use time.perf_counter()
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Add code for preprocessing Google TN data
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Add more docs and comments + Minor Fixes
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Add more licenses + Fixed comments + Minors
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Moved evaluation logic to DuplexTextNormalizationModel
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Add logging errors
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Updated validation code of tagger + Minors
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Also write tag preds to log file
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Add data augmentation for tagger dataset
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Added experimental decorators
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Updated docs
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Updated duplex_tn_config.yaml
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Compute token precision of tagger using NeMo metrics
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Fixed saving issue when using ddp accelerator
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Refactoring
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Add option to keep punctuations in TextNormalizationTestDataset
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Changes to input preprocessing + decoder's postprocessing
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Fixed styles + Add references
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Renamed examples/nlp/duplex_text_normalization/utils.py to helpers.py
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

Co-authored-by: Jagadeesh Balam <4916480+jbalam-nv@users.noreply.github.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: Samuel Kriman <samuelkriman@gmail.com>
Co-authored-by: Oktai Tatanov <oktai.tatanov@gmail.com>
Co-authored-by: Jason <jasoli@nvidia.com>
Co-authored-by: Mike Chrzanowski <mike.chrzanowski0@gmail.com>
Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
Co-authored-by: Aleksey Grinchuk (Oleksii Hrinchuk) <grinchuk.alexey@gmail.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: mchrzanowski <mchrzanowski@nvidia.com>
Co-authored-by: Yang Zhang <yzhang123@users.noreply.github.com>
Co-authored-by: Boris Fomitchev <borisfom@users.noreply.github.com>
Co-authored-by: root <root@dgx0026.nsv.rno1.nvmetal.net>
Co-authored-by: root <root@dgx0079.nsv.rno1.nvmetal.net>
Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com>
Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com>
Co-authored-by: khcs <khcs@users.noreply.github.com>
Co-authored-by: Hoo Chang Shin <hshin@nvidia.com>
Signed-off-by: Ghasem Pasandi <gpasandi@nvidia.com>
fayejf added a commit that referenced this pull request Jul 16, 2021
* Add notebook with recommendations for 8 kHz speech (#2326)

* Added a notebook with best practices for telephony speech

* Added datasets detaiils

* Added training recommendations

* Emptied out cells with results

* Added tutorial to docs

Signed-off-by: jbalam <jbalam@nvidia.com>

* Addressed review comments

Signed-off-by: jbalam <jbalam@nvidia.com>

* Added a line to note original sampling rate of an4

Signed-off-by: jbalam <jbalam@nvidia.com>

* Made changes suggested in review

Signed-off-by: jbalam <jbalam@nvidia.com>
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Add FastEmit support for RNNT Losses (#2374)

* Temp commit

Signed-off-by: smajumdar <titu1994@gmail.com>

* Initial code for fastemit forward pass

Signed-off-by: smajumdar <titu1994@gmail.com>

* Correct return reg value

Signed-off-by: smajumdar <titu1994@gmail.com>

* Initial cpu impl

Signed-off-by: smajumdar <titu1994@gmail.com>

* Try gpu impl

Signed-off-by: smajumdar <titu1994@gmail.com>

* Try gpu impl

Signed-off-by: smajumdar <titu1994@gmail.com>

* Correct few impl

Signed-off-by: smajumdar <titu1994@gmail.com>

* Update fastemit scaling

Signed-off-by: smajumdar <titu1994@gmail.com>

* Cleanup fastemit

Signed-off-by: smajumdar <titu1994@gmail.com>

* Finalize FastEmit regularization PR

Signed-off-by: smajumdar <titu1994@gmail.com>

* Refactor code to support fastemit regularization

Signed-off-by: smajumdar <titu1994@gmail.com>

Co-authored-by: Samuel Kriman <samuelkriman@gmail.com>
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Implement inference functions of TN models

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Minor Fix

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* fix bugs in hifigan code (#2392)

Signed-off-by: Oktai Tatanov <oktai.tatanov@gmail.com>
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Update setup.py (#2394)

Signed-off-by: Jason <jasoli@nvidia.com>
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* update checkpointing (#2396)

Signed-off-by: Jason <jasoli@nvidia.com>
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* byt5 unicode implementation (#2365)

* Audio Norm (#2285)

* add jenkins test, refactoring

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix new test

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add serial to the default normalizer, add tests

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* manifest test added

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* expose more params, new test cases

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix jenkins, serial clean, exclude range from cardinal

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins dollar sign format

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins dollar sign format

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* addressed review comments

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix decimal in measure

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* move serial in cardinal

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update for SH zero -> oh

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* change n_tagger default

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* bumping version to 1.0.1

Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Add check for numba regardless of device

Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* upper bound for webdataset

Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Correct Dockerfile

Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* update readmes

Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* update README (#2332)

Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* ddp translate GPU allocation fix (#2312)

* fixed branch in IR tutorial

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* ddp translate GPU allocation fix

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* map_location instead of set_device

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Shallow fusion (#2315)

* fixed branch in IR tutorial

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* shallow fusion init commit

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* debug info removed

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* [BUGFIX] Add upper bound to hydra for 1.0.x (#2337)

* upper bound hydra

Signed-off-by: ericharper <complex451@gmail.com>

* upper bound hydra

Signed-off-by: ericharper <complex451@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* update version number

Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* update package version

Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* sparrowhawk tests + punctuation post processing for pynini TN (#2320)

* add jenkins test, refactoring

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix new test

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add serial to the default normalizer, add tests

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* manifest test added

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* expose more params, new test cases

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix jenkins, serial clean, exclude range from cardinal

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins dollar sign format

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins dollar sign format

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* addressed review comments

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix decimal in measure

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* move serial in cardinal

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* sh tests init

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* sparrowhawk container tests support added

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add post process to normalize.py, update tests

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* remove duplication

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Update notebooks to 1.0.2 release (#2338)

Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Update ranges for omegaconf and hydra (#2336)

* Update ranges

Signed-off-by: smajumdar <titu1994@gmail.com>

* Updates for Hydra and OmegaConf updates

Signed-off-by: smajumdar <titu1994@gmail.com>

* Style fixes

Signed-off-by: smajumdar <titu1994@gmail.com>

* Correct tests and revert patch for model utils

Signed-off-by: smajumdar <titu1994@gmail.com>

* Correct docstring

Signed-off-by: smajumdar <titu1994@gmail.com>

* Revert unnecessary change

Signed-off-by: smajumdar <titu1994@gmail.com>

* Revert unnecessary change

Signed-off-by: smajumdar <titu1994@gmail.com>

* Guard scheduler for None

Signed-off-by: smajumdar <titu1994@gmail.com>

* default to 0.0 if bpe_dropout is None

Signed-off-by: ericharper <complex451@gmail.com>

* Correctly log class that was restored

Signed-off-by: smajumdar <titu1994@gmail.com>

* Root patch *bpe_dropout

Signed-off-by: smajumdar <titu1994@gmail.com>

Co-authored-by: ericharper <complex451@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Update FastPitch Export (#2355)

Signed-off-by: Jason <jasoli@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* byt5 unicode implementation, first cut

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* add bytelevel tokenizer

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* update out_dir to not collide (#2358)

Signed-off-by: ericharper <complex451@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Update container version to 21.05 (#2309)

* Update container version

Signed-off-by: smajumdar <titu1994@gmail.com>

* Temporarily change export format of waveglow

Signed-off-by: smajumdar <titu1994@gmail.com>

* Add conda update for numba

Signed-off-by: smajumdar <titu1994@gmail.com>

* Update numba compat via global flag for strictness level `--relax_numba_compat`, remove pytorchlightning.metrics, refactor out numba utils to core, update tests

Signed-off-by: smajumdar <titu1994@gmail.com>

* Correct order of numba minimum verion, remove wrong flag from test

Signed-off-by: smajumdar <titu1994@gmail.com>

* Double test of cuda numba

Signed-off-by: smajumdar <titu1994@gmail.com>

* Double test of cuda numba

Signed-off-by: smajumdar <titu1994@gmail.com>

* Enable RNNT tests

Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Text Normalization Update (#2356)

* upper cased date support

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update whitelist, change roman weights

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* docstrings, space fix, init file

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* lgtm

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fraction with measure class

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* address comment

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Add ASR CTC tutorial on fine-tuning on another language (#2346)

* Add ASR CTC Language finetuning notebook

Signed-off-by: smajumdar <titu1994@gmail.com>

* Add to documentation

Signed-off-by: smajumdar <titu1994@gmail.com>

* Improve documentation

Signed-off-by: smajumdar <titu1994@gmail.com>

* Correct name of the dataset

Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Correct colab link to notebook (#2366)

Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* sgdqa update data directories for testing (#2323)

* sgdqa update data directories for testing

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix syntax

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* check if data dir exists

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* adding pretrained model

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Added documentation for export() (#2330)

* Added export document

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>

* Addressed review comments

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>

Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Update Citrinet model card info (#2369)

* Update model card info

Signed-off-by: smajumdar <titu1994@gmail.com>

* Cleanup Docs

Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* [NMT] Model Parallel Megatron Encoders (#2238)

* add megatron encoder

Signed-off-by: ericharper <complex451@gmail.com>

* added megatron to get_nmt_tokenizer

Signed-off-by: ericharper <complex451@gmail.com>

* add vocab_size and hidden_size to megatron bert

Signed-off-by: ericharper <complex451@gmail.com>

* add megatron encoder module

Signed-off-by: ericharper <complex451@gmail.com>

* fixed horrible typo

Signed-off-by: ericharper <complex451@gmail.com>

* fix typo and add default

Signed-off-by: ericharper <complex451@gmail.com>

* updating nlp overrides for mp nmt

Signed-off-by: ericharper <complex451@gmail.com>

* move some logic back to nlpmodel from overrides

Signed-off-by: ericharper <complex451@gmail.com>

* add checkpoint_file property

Signed-off-by: ericharper <complex451@gmail.com>

* fix property

Signed-off-by: ericharper <complex451@gmail.com>

* num_tokentypes=0

Signed-off-by: ericharper <complex451@gmail.com>

* typo

Signed-off-by: ericharper <complex451@gmail.com>

* typo

Signed-off-by: ericharper <complex451@gmail.com>

* find_unused_parameters=True

Signed-off-by: ericharper <complex451@gmail.com>

* typo

Signed-off-by: ericharper <complex451@gmail.com>

* style

Signed-off-by: ericharper <complex451@gmail.com>

* get instead of pop

Signed-off-by: ericharper <complex451@gmail.com>

* remove token type ids from megatron input example

Signed-off-by: ericharper <complex451@gmail.com>

* pop vocab_size

Signed-off-by: ericharper <complex451@gmail.com>

* fix checkpointing for model parallel

Signed-off-by: ericharper <complex451@gmail.com>

* fix bug in non model parallel

Signed-off-by: ericharper <complex451@gmail.com>

* convert cfg.trainer to dict

Signed-off-by: ericharper <complex451@gmail.com>

* make num_tokentypes configurable for nmt

Signed-off-by: ericharper <complex451@gmail.com>

* update checkpoint_file when using named megatron model in nemo

Signed-off-by: ericharper <complex451@gmail.com>

* make vocab_file configurable

Signed-off-by: ericharper <complex451@gmail.com>

* dataclass can't have mutable default

Signed-off-by: ericharper <complex451@gmail.com>

* style

Signed-off-by: ericharper <complex451@gmail.com>

* unused imports

Signed-off-by: ericharper <complex451@gmail.com>

* revert input example

Signed-off-by: ericharper <complex451@gmail.com>

* check that checkpoint version is not None

Signed-off-by: ericharper <complex451@gmail.com>

* add mp jenkins test

Signed-off-by: ericharper <complex451@gmail.com>

* update docstring

Signed-off-by: ericharper <complex451@gmail.com>

* add docs for pretrained encoders with nemo nmt

Signed-off-by: ericharper <complex451@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Add notebook with recommendations for 8 kHz speech (#2326)

* Added a notebook with best practices for telephony speech

* Added datasets detaiils

* Added training recommendations

* Emptied out cells with results

* Added tutorial to docs

Signed-off-by: jbalam <jbalam@nvidia.com>

* Addressed review comments

Signed-off-by: jbalam <jbalam@nvidia.com>

* Added a line to note original sampling rate of an4

Signed-off-by: jbalam <jbalam@nvidia.com>

* Made changes suggested in review

Signed-off-by: jbalam <jbalam@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Add FastEmit support for RNNT Losses (#2374)

* Temp commit

Signed-off-by: smajumdar <titu1994@gmail.com>

* Initial code for fastemit forward pass

Signed-off-by: smajumdar <titu1994@gmail.com>

* Correct return reg value

Signed-off-by: smajumdar <titu1994@gmail.com>

* Initial cpu impl

Signed-off-by: smajumdar <titu1994@gmail.com>

* Try gpu impl

Signed-off-by: smajumdar <titu1994@gmail.com>

* Try gpu impl

Signed-off-by: smajumdar <titu1994@gmail.com>

* Correct few impl

Signed-off-by: smajumdar <titu1994@gmail.com>

* Update fastemit scaling

Signed-off-by: smajumdar <titu1994@gmail.com>

* Cleanup fastemit

Signed-off-by: smajumdar <titu1994@gmail.com>

* Finalize FastEmit regularization PR

Signed-off-by: smajumdar <titu1994@gmail.com>

* Refactor code to support fastemit regularization

Signed-off-by: smajumdar <titu1994@gmail.com>

Co-authored-by: Samuel Kriman <samuelkriman@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* byt5 unicode implementation, first cut

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* add bytelevel tokenizer

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* update styling

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* avoid circular import

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* fix bugs in hifigan code (#2392)

Signed-off-by: Oktai Tatanov <oktai.tatanov@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Update setup.py (#2394)

Signed-off-by: Jason <jasoli@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Update bytelevel_tokenizer.py

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Update bytelevel_tokenizer.py

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* typo

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* missed one

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* bug fixes

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* style fix

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* bytelevelprocessor is now generic.

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* style fix

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* update checkpointing (#2396)

Signed-off-by: Jason <jasoli@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* style

Signed-off-by: ericharper <complex451@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* woops, didnt merge jenkinsfile the right way

* add newline

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* undo changes to enja processor

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* processor selection decision fix

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* newline fix

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
Co-authored-by: Aleksey Grinchuk (Oleksii Hrinchuk) <grinchuk.alexey@gmail.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: Jason <jasoli@nvidia.com>
Co-authored-by: mchrzanowski <mchrzanowski@nvidia.com>
Co-authored-by: Yang Zhang <yzhang123@users.noreply.github.com>
Co-authored-by: Boris Fomitchev <borisfom@users.noreply.github.com>
Co-authored-by: Jagadeesh Balam <4916480+jbalam-nv@users.noreply.github.com>
Co-authored-by: Samuel Kriman <samuelkriman@gmail.com>
Co-authored-by: Oktai Tatanov <oktai.tatanov@gmail.com>
Co-authored-by: root <root@dgx0026.nsv.rno1.nvmetal.net>
Co-authored-by: root <root@dgx0079.nsv.rno1.nvmetal.net>
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Minor Fix

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Minor Fixes

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Add TextNormalizationTestDataset and testing/evaluation code

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Add TextNormalizationTaggerDataset and training code for tagger

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Restore from local nemo ckpts

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Add TextNormalizationDecoderDataset

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Add interactive mode for neural_text_normalization_test.py

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Add options to do training or not for tagger/decoder

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Renamed

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Implemented setup dataloader for decoder

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Implemented training and validation for decoder

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Data augmentation for decoder training

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Config change

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* add blossom-ci.yml (#2401)

Signed-off-by: ericharper <complex451@gmail.com>
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Merge r1.1 bugfixes into main (#2407)

* Update notebook branch and Jenkinsfile for 1.1.0 testing (#2378)

* update branch

Signed-off-by: ericharper <complex451@gmail.com>

* update jenkinsfile

Signed-off-by: ericharper <complex451@gmail.com>

* [BUGFIX] NMT Multi-node was incorrectly computing num_replicas (#2380)

* fix property when not using model parallel

Signed-off-by: ericharper <complex451@gmail.com>

* fix property when not using model parallel

Signed-off-by: ericharper <complex451@gmail.com>

* add debug statement

Signed-off-by: ericharper <complex451@gmail.com>

* add debug statement

Signed-off-by: ericharper <complex451@gmail.com>

* instantiate with NLPDDPPlugin with num_nodes from trainer config

Signed-off-by: ericharper <complex451@gmail.com>

* Update ASR scripts for tokenizer building and tarred dataset building (#2381)

* Update ASR scripts for tokenizer building and tarred dataset building

Signed-off-by: smajumdar <titu1994@gmail.com>

* Update container

Signed-off-by: smajumdar <titu1994@gmail.com>

* Add STT Zh Citrinet 1024 Gamma 0.25 model

Signed-off-by: smajumdar <titu1994@gmail.com>

* Update notebook (#2391)

Signed-off-by: smajumdar <titu1994@gmail.com>

* ASR Notebooks fix for 1.1.0 (#2395)

* nb fix for spring clean

Signed-off-by: fayejf <fayejf07@gmail.com>

* remove outdated instruction

Signed-off-by: fayejf <fayejf07@gmail.com>

* Mean normalization (#2397)

* norm embeddings

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* move to utils

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* Bugfix adaptive spec augment time masking (#2398)

* bugfix adaptive spec augment

Signed-off-by: smajumdar <titu1994@gmail.com>

* Revert freq mask guard

Signed-off-by: smajumdar <titu1994@gmail.com>

* Revert freq mask guard

Signed-off-by: smajumdar <titu1994@gmail.com>

* Remove static time width clamping

Signed-off-by: smajumdar <titu1994@gmail.com>

* Correct typos and issues with notebooks (#2402)

* Fix Primer notebook

Signed-off-by: smajumdar <titu1994@gmail.com>

* Typo

Signed-off-by: smajumdar <titu1994@gmail.com>

* remove accelerator=DDP in tutorial notebooks to avoid errors. (#2403)

Signed-off-by: Hoo Chang Shin <hshin@nvidia.com>

Co-authored-by: Hoo Chang Shin <hshin@nvidia.com>

* style

Signed-off-by: ericharper <complex451@gmail.com>

* update jenkins branch

Signed-off-by: ericharper <complex451@gmail.com>

* update notebook branch to main

Signed-off-by: ericharper <complex451@gmail.com>

Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com>
Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com>
Co-authored-by: khcs <khcs@users.noreply.github.com>
Co-authored-by: Hoo Chang Shin <hshin@nvidia.com>
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Remove unused imports

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Add initial doc for text_normalization

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Fixed imports warnings

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Minor Fix

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Renamed

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Allowed duplex modes

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Minor Fix

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Add docs for duplex_text_normalization_train and duplex_text_normalization_test

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* docstrings for model codes + minor fix

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Add more comments and doc strings

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Add doc for datasets + Use time.perf_counter()
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Add code for preprocessing Google TN data
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Add more docs and comments + Minor Fixes
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Add more licenses + Fixed comments + Minors
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Moved evaluation logic to DuplexTextNormalizationModel
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Add logging errors
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Updated validation code of tagger + Minors
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Also write tag preds to log file
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Add data augmentation for tagger dataset
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Added experimental decorators
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Updated docs
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Updated duplex_tn_config.yaml
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Compute token precision of tagger using NeMo metrics
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Fixed saving issue when using ddp accelerator
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Refactoring
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Add option to keep punctuations in TextNormalizationTestDataset
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Changes to input preprocessing + decoder's postprocessing
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Fixed styles + Add references
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Renamed examples/nlp/duplex_text_normalization/utils.py to helpers.py
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

Co-authored-by: Jagadeesh Balam <4916480+jbalam-nv@users.noreply.github.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: Samuel Kriman <samuelkriman@gmail.com>
Co-authored-by: Oktai Tatanov <oktai.tatanov@gmail.com>
Co-authored-by: Jason <jasoli@nvidia.com>
Co-authored-by: Mike Chrzanowski <mike.chrzanowski0@gmail.com>
Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
Co-authored-by: Aleksey Grinchuk (Oleksii Hrinchuk) <grinchuk.alexey@gmail.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: mchrzanowski <mchrzanowski@nvidia.com>
Co-authored-by: Yang Zhang <yzhang123@users.noreply.github.com>
Co-authored-by: Boris Fomitchev <borisfom@users.noreply.github.com>
Co-authored-by: root <root@dgx0026.nsv.rno1.nvmetal.net>
Co-authored-by: root <root@dgx0079.nsv.rno1.nvmetal.net>
Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com>
Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com>
Co-authored-by: khcs <khcs@users.noreply.github.com>
Co-authored-by: Hoo Chang Shin <hshin@nvidia.com>
titu1994 added a commit to titu1994/NeMo that referenced this pull request Jul 20, 2021
* Add notebook with recommendations for 8 kHz speech (NVIDIA#2326)

* Added a notebook with best practices for telephony speech

* Added datasets detaiils

* Added training recommendations

* Emptied out cells with results

* Added tutorial to docs

Signed-off-by: jbalam <jbalam@nvidia.com>

* Addressed review comments

Signed-off-by: jbalam <jbalam@nvidia.com>

* Added a line to note original sampling rate of an4

Signed-off-by: jbalam <jbalam@nvidia.com>

* Made changes suggested in review

Signed-off-by: jbalam <jbalam@nvidia.com>
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Add FastEmit support for RNNT Losses (NVIDIA#2374)

* Temp commit

Signed-off-by: smajumdar <titu1994@gmail.com>

* Initial code for fastemit forward pass

Signed-off-by: smajumdar <titu1994@gmail.com>

* Correct return reg value

Signed-off-by: smajumdar <titu1994@gmail.com>

* Initial cpu impl

Signed-off-by: smajumdar <titu1994@gmail.com>

* Try gpu impl

Signed-off-by: smajumdar <titu1994@gmail.com>

* Try gpu impl

Signed-off-by: smajumdar <titu1994@gmail.com>

* Correct few impl

Signed-off-by: smajumdar <titu1994@gmail.com>

* Update fastemit scaling

Signed-off-by: smajumdar <titu1994@gmail.com>

* Cleanup fastemit

Signed-off-by: smajumdar <titu1994@gmail.com>

* Finalize FastEmit regularization PR

Signed-off-by: smajumdar <titu1994@gmail.com>

* Refactor code to support fastemit regularization

Signed-off-by: smajumdar <titu1994@gmail.com>

Co-authored-by: Samuel Kriman <samuelkriman@gmail.com>
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Implement inference functions of TN models

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Minor Fix

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* fix bugs in hifigan code (NVIDIA#2392)

Signed-off-by: Oktai Tatanov <oktai.tatanov@gmail.com>
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Update setup.py (NVIDIA#2394)

Signed-off-by: Jason <jasoli@nvidia.com>
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* update checkpointing (NVIDIA#2396)

Signed-off-by: Jason <jasoli@nvidia.com>
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* byt5 unicode implementation (NVIDIA#2365)

* Audio Norm (NVIDIA#2285)

* add jenkins test, refactoring

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix new test

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add serial to the default normalizer, add tests

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* manifest test added

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* expose more params, new test cases

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix jenkins, serial clean, exclude range from cardinal

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins dollar sign format

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins dollar sign format

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* addressed review comments

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix decimal in measure

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* move serial in cardinal

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update for SH zero -> oh

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* change n_tagger default

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* bumping version to 1.0.1

Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Add check for numba regardless of device

Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* upper bound for webdataset

Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Correct Dockerfile

Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* update readmes

Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* update README (NVIDIA#2332)

Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* ddp translate GPU allocation fix (NVIDIA#2312)

* fixed branch in IR tutorial

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* ddp translate GPU allocation fix

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* map_location instead of set_device

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Shallow fusion (NVIDIA#2315)

* fixed branch in IR tutorial

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* shallow fusion init commit

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* debug info removed

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* [BUGFIX] Add upper bound to hydra for 1.0.x (NVIDIA#2337)

* upper bound hydra

Signed-off-by: ericharper <complex451@gmail.com>

* upper bound hydra

Signed-off-by: ericharper <complex451@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* update version number

Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* update package version

Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* sparrowhawk tests + punctuation post processing for pynini TN (NVIDIA#2320)

* add jenkins test, refactoring

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix new test

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add serial to the default normalizer, add tests

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* manifest test added

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* expose more params, new test cases

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix jenkins, serial clean, exclude range from cardinal

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins dollar sign format

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins dollar sign format

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* addressed review comments

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix decimal in measure

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* move serial in cardinal

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* sh tests init

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* sparrowhawk container tests support added

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add post process to normalize.py, update tests

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* remove duplication

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Update notebooks to 1.0.2 release (NVIDIA#2338)

Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Update ranges for omegaconf and hydra (NVIDIA#2336)

* Update ranges

Signed-off-by: smajumdar <titu1994@gmail.com>

* Updates for Hydra and OmegaConf updates

Signed-off-by: smajumdar <titu1994@gmail.com>

* Style fixes

Signed-off-by: smajumdar <titu1994@gmail.com>

* Correct tests and revert patch for model utils

Signed-off-by: smajumdar <titu1994@gmail.com>

* Correct docstring

Signed-off-by: smajumdar <titu1994@gmail.com>

* Revert unnecessary change

Signed-off-by: smajumdar <titu1994@gmail.com>

* Revert unnecessary change

Signed-off-by: smajumdar <titu1994@gmail.com>

* Guard scheduler for None

Signed-off-by: smajumdar <titu1994@gmail.com>

* default to 0.0 if bpe_dropout is None

Signed-off-by: ericharper <complex451@gmail.com>

* Correctly log class that was restored

Signed-off-by: smajumdar <titu1994@gmail.com>

* Root patch *bpe_dropout

Signed-off-by: smajumdar <titu1994@gmail.com>

Co-authored-by: ericharper <complex451@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Update FastPitch Export (NVIDIA#2355)

Signed-off-by: Jason <jasoli@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* byt5 unicode implementation, first cut

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* add bytelevel tokenizer

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* update out_dir to not collide (NVIDIA#2358)

Signed-off-by: ericharper <complex451@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Update container version to 21.05 (NVIDIA#2309)

* Update container version

Signed-off-by: smajumdar <titu1994@gmail.com>

* Temporarily change export format of waveglow

Signed-off-by: smajumdar <titu1994@gmail.com>

* Add conda update for numba

Signed-off-by: smajumdar <titu1994@gmail.com>

* Update numba compat via global flag for strictness level `--relax_numba_compat`, remove pytorchlightning.metrics, refactor out numba utils to core, update tests

Signed-off-by: smajumdar <titu1994@gmail.com>

* Correct order of numba minimum verion, remove wrong flag from test

Signed-off-by: smajumdar <titu1994@gmail.com>

* Double test of cuda numba

Signed-off-by: smajumdar <titu1994@gmail.com>

* Double test of cuda numba

Signed-off-by: smajumdar <titu1994@gmail.com>

* Enable RNNT tests

Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Text Normalization Update (NVIDIA#2356)

* upper cased date support

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update whitelist, change roman weights

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* docstrings, space fix, init file

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* lgtm

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fraction with measure class

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* address comment

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Add ASR CTC tutorial on fine-tuning on another language (NVIDIA#2346)

* Add ASR CTC Language finetuning notebook

Signed-off-by: smajumdar <titu1994@gmail.com>

* Add to documentation

Signed-off-by: smajumdar <titu1994@gmail.com>

* Improve documentation

Signed-off-by: smajumdar <titu1994@gmail.com>

* Correct name of the dataset

Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Correct colab link to notebook (NVIDIA#2366)

Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* sgdqa update data directories for testing (NVIDIA#2323)

* sgdqa update data directories for testing

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix syntax

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* check if data dir exists

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* adding pretrained model

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Added documentation for export() (NVIDIA#2330)

* Added export document

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>

* Addressed review comments

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>

Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Update Citrinet model card info (NVIDIA#2369)

* Update model card info

Signed-off-by: smajumdar <titu1994@gmail.com>

* Cleanup Docs

Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* [NMT] Model Parallel Megatron Encoders (NVIDIA#2238)

* add megatron encoder

Signed-off-by: ericharper <complex451@gmail.com>

* added megatron to get_nmt_tokenizer

Signed-off-by: ericharper <complex451@gmail.com>

* add vocab_size and hidden_size to megatron bert

Signed-off-by: ericharper <complex451@gmail.com>

* add megatron encoder module

Signed-off-by: ericharper <complex451@gmail.com>

* fixed horrible typo

Signed-off-by: ericharper <complex451@gmail.com>

* fix typo and add default

Signed-off-by: ericharper <complex451@gmail.com>

* updating nlp overrides for mp nmt

Signed-off-by: ericharper <complex451@gmail.com>

* move some logic back to nlpmodel from overrides

Signed-off-by: ericharper <complex451@gmail.com>

* add checkpoint_file property

Signed-off-by: ericharper <complex451@gmail.com>

* fix property

Signed-off-by: ericharper <complex451@gmail.com>

* num_tokentypes=0

Signed-off-by: ericharper <complex451@gmail.com>

* typo

Signed-off-by: ericharper <complex451@gmail.com>

* typo

Signed-off-by: ericharper <complex451@gmail.com>

* find_unused_parameters=True

Signed-off-by: ericharper <complex451@gmail.com>

* typo

Signed-off-by: ericharper <complex451@gmail.com>

* style

Signed-off-by: ericharper <complex451@gmail.com>

* get instead of pop

Signed-off-by: ericharper <complex451@gmail.com>

* remove token type ids from megatron input example

Signed-off-by: ericharper <complex451@gmail.com>

* pop vocab_size

Signed-off-by: ericharper <complex451@gmail.com>

* fix checkpointing for model parallel

Signed-off-by: ericharper <complex451@gmail.com>

* fix bug in non model parallel

Signed-off-by: ericharper <complex451@gmail.com>

* convert cfg.trainer to dict

Signed-off-by: ericharper <complex451@gmail.com>

* make num_tokentypes configurable for nmt

Signed-off-by: ericharper <complex451@gmail.com>

* update checkpoint_file when using named megatron model in nemo

Signed-off-by: ericharper <complex451@gmail.com>

* make vocab_file configurable

Signed-off-by: ericharper <complex451@gmail.com>

* dataclass can't have mutable default

Signed-off-by: ericharper <complex451@gmail.com>

* style

Signed-off-by: ericharper <complex451@gmail.com>

* unused imports

Signed-off-by: ericharper <complex451@gmail.com>

* revert input example

Signed-off-by: ericharper <complex451@gmail.com>

* check that checkpoint version is not None

Signed-off-by: ericharper <complex451@gmail.com>

* add mp jenkins test

Signed-off-by: ericharper <complex451@gmail.com>

* update docstring

Signed-off-by: ericharper <complex451@gmail.com>

* add docs for pretrained encoders with nemo nmt

Signed-off-by: ericharper <complex451@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Add notebook with recommendations for 8 kHz speech (NVIDIA#2326)

* Added a notebook with best practices for telephony speech

* Added datasets detaiils

* Added training recommendations

* Emptied out cells with results

* Added tutorial to docs

Signed-off-by: jbalam <jbalam@nvidia.com>

* Addressed review comments

Signed-off-by: jbalam <jbalam@nvidia.com>

* Added a line to note original sampling rate of an4

Signed-off-by: jbalam <jbalam@nvidia.com>

* Made changes suggested in review

Signed-off-by: jbalam <jbalam@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Add FastEmit support for RNNT Losses (NVIDIA#2374)

* Temp commit

Signed-off-by: smajumdar <titu1994@gmail.com>

* Initial code for fastemit forward pass

Signed-off-by: smajumdar <titu1994@gmail.com>

* Correct return reg value

Signed-off-by: smajumdar <titu1994@gmail.com>

* Initial cpu impl

Signed-off-by: smajumdar <titu1994@gmail.com>

* Try gpu impl

Signed-off-by: smajumdar <titu1994@gmail.com>

* Try gpu impl

Signed-off-by: smajumdar <titu1994@gmail.com>

* Correct few impl

Signed-off-by: smajumdar <titu1994@gmail.com>

* Update fastemit scaling

Signed-off-by: smajumdar <titu1994@gmail.com>

* Cleanup fastemit

Signed-off-by: smajumdar <titu1994@gmail.com>

* Finalize FastEmit regularization PR

Signed-off-by: smajumdar <titu1994@gmail.com>

* Refactor code to support fastemit regularization

Signed-off-by: smajumdar <titu1994@gmail.com>

Co-authored-by: Samuel Kriman <samuelkriman@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* byt5 unicode implementation, first cut

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* add bytelevel tokenizer

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* update styling

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* avoid circular import

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* fix bugs in hifigan code (NVIDIA#2392)

Signed-off-by: Oktai Tatanov <oktai.tatanov@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Update setup.py (NVIDIA#2394)

Signed-off-by: Jason <jasoli@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Update bytelevel_tokenizer.py

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Update bytelevel_tokenizer.py

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* typo

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* missed one

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* bug fixes

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* style fix

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* bytelevelprocessor is now generic.

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* style fix

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* update checkpointing (NVIDIA#2396)

Signed-off-by: Jason <jasoli@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* style

Signed-off-by: ericharper <complex451@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* woops, didnt merge jenkinsfile the right way

* add newline

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* undo changes to enja processor

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* processor selection decision fix

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* newline fix

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
Co-authored-by: Aleksey Grinchuk (Oleksii Hrinchuk) <grinchuk.alexey@gmail.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: Jason <jasoli@nvidia.com>
Co-authored-by: mchrzanowski <mchrzanowski@nvidia.com>
Co-authored-by: Yang Zhang <yzhang123@users.noreply.github.com>
Co-authored-by: Boris Fomitchev <borisfom@users.noreply.github.com>
Co-authored-by: Jagadeesh Balam <4916480+jbalam-nv@users.noreply.github.com>
Co-authored-by: Samuel Kriman <samuelkriman@gmail.com>
Co-authored-by: Oktai Tatanov <oktai.tatanov@gmail.com>
Co-authored-by: root <root@dgx0026.nsv.rno1.nvmetal.net>
Co-authored-by: root <root@dgx0079.nsv.rno1.nvmetal.net>
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Minor Fix

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Minor Fixes

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Add TextNormalizationTestDataset and testing/evaluation code

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Add TextNormalizationTaggerDataset and training code for tagger

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Restore from local nemo ckpts

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Add TextNormalizationDecoderDataset

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Add interactive mode for neural_text_normalization_test.py

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Add options to do training or not for tagger/decoder

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Renamed

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Implemented setup dataloader for decoder

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Implemented training and validation for decoder

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Data augmentation for decoder training

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Config change

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* add blossom-ci.yml (NVIDIA#2401)

Signed-off-by: ericharper <complex451@gmail.com>
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Merge r1.1 bugfixes into main (NVIDIA#2407)

* Update notebook branch and Jenkinsfile for 1.1.0 testing (NVIDIA#2378)

* update branch

Signed-off-by: ericharper <complex451@gmail.com>

* update jenkinsfile

Signed-off-by: ericharper <complex451@gmail.com>

* [BUGFIX] NMT Multi-node was incorrectly computing num_replicas (NVIDIA#2380)

* fix property when not using model parallel

Signed-off-by: ericharper <complex451@gmail.com>

* fix property when not using model parallel

Signed-off-by: ericharper <complex451@gmail.com>

* add debug statement

Signed-off-by: ericharper <complex451@gmail.com>

* add debug statement

Signed-off-by: ericharper <complex451@gmail.com>

* instantiate with NLPDDPPlugin with num_nodes from trainer config

Signed-off-by: ericharper <complex451@gmail.com>

* Update ASR scripts for tokenizer building and tarred dataset building (NVIDIA#2381)

* Update ASR scripts for tokenizer building and tarred dataset building

Signed-off-by: smajumdar <titu1994@gmail.com>

* Update container

Signed-off-by: smajumdar <titu1994@gmail.com>

* Add STT Zh Citrinet 1024 Gamma 0.25 model

Signed-off-by: smajumdar <titu1994@gmail.com>

* Update notebook (NVIDIA#2391)

Signed-off-by: smajumdar <titu1994@gmail.com>

* ASR Notebooks fix for 1.1.0 (NVIDIA#2395)

* nb fix for spring clean

Signed-off-by: fayejf <fayejf07@gmail.com>

* remove outdated instruction

Signed-off-by: fayejf <fayejf07@gmail.com>

* Mean normalization (NVIDIA#2397)

* norm embeddings

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* move to utils

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* Bugfix adaptive spec augment time masking (NVIDIA#2398)

* bugfix adaptive spec augment

Signed-off-by: smajumdar <titu1994@gmail.com>

* Revert freq mask guard

Signed-off-by: smajumdar <titu1994@gmail.com>

* Revert freq mask guard

Signed-off-by: smajumdar <titu1994@gmail.com>

* Remove static time width clamping

Signed-off-by: smajumdar <titu1994@gmail.com>

* Correct typos and issues with notebooks (NVIDIA#2402)

* Fix Primer notebook

Signed-off-by: smajumdar <titu1994@gmail.com>

* Typo

Signed-off-by: smajumdar <titu1994@gmail.com>

* remove accelerator=DDP in tutorial notebooks to avoid errors. (NVIDIA#2403)

Signed-off-by: Hoo Chang Shin <hshin@nvidia.com>

Co-authored-by: Hoo Chang Shin <hshin@nvidia.com>

* style

Signed-off-by: ericharper <complex451@gmail.com>

* update jenkins branch

Signed-off-by: ericharper <complex451@gmail.com>

* update notebook branch to main

Signed-off-by: ericharper <complex451@gmail.com>

Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com>
Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com>
Co-authored-by: khcs <khcs@users.noreply.github.com>
Co-authored-by: Hoo Chang Shin <hshin@nvidia.com>
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Remove unused imports

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Add initial doc for text_normalization

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Fixed imports warnings

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Minor Fix

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Renamed

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Allowed duplex modes

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Minor Fix

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Add docs for duplex_text_normalization_train and duplex_text_normalization_test

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* docstrings for model codes + minor fix

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Add more comments and doc strings

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Add doc for datasets + Use time.perf_counter()
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Add code for preprocessing Google TN data
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Add more docs and comments + Minor Fixes
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Add more licenses + Fixed comments + Minors
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Moved evaluation logic to DuplexTextNormalizationModel
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Add logging errors
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Updated validation code of tagger + Minors
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Also write tag preds to log file
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Add data augmentation for tagger dataset
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Added experimental decorators
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Updated docs
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Updated duplex_tn_config.yaml
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Compute token precision of tagger using NeMo metrics
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Fixed saving issue when using ddp accelerator
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Refactoring
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Add option to keep punctuations in TextNormalizationTestDataset
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Changes to input preprocessing + decoder's postprocessing
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Fixed styles + Add references
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Renamed examples/nlp/duplex_text_normalization/utils.py to helpers.py
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

Co-authored-by: Jagadeesh Balam <4916480+jbalam-nv@users.noreply.github.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: Samuel Kriman <samuelkriman@gmail.com>
Co-authored-by: Oktai Tatanov <oktai.tatanov@gmail.com>
Co-authored-by: Jason <jasoli@nvidia.com>
Co-authored-by: Mike Chrzanowski <mike.chrzanowski0@gmail.com>
Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
Co-authored-by: Aleksey Grinchuk (Oleksii Hrinchuk) <grinchuk.alexey@gmail.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: mchrzanowski <mchrzanowski@nvidia.com>
Co-authored-by: Yang Zhang <yzhang123@users.noreply.github.com>
Co-authored-by: Boris Fomitchev <borisfom@users.noreply.github.com>
Co-authored-by: root <root@dgx0026.nsv.rno1.nvmetal.net>
Co-authored-by: root <root@dgx0079.nsv.rno1.nvmetal.net>
Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com>
Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com>
Co-authored-by: khcs <khcs@users.noreply.github.com>
Co-authored-by: Hoo Chang Shin <hshin@nvidia.com>
paarthneekhara pushed a commit to paarthneekhara/NeMo that referenced this pull request Sep 17, 2021
* Add notebook with recommendations for 8 kHz speech (NVIDIA#2326)

* Added a notebook with best practices for telephony speech

* Added datasets detaiils

* Added training recommendations

* Emptied out cells with results

* Added tutorial to docs

Signed-off-by: jbalam <jbalam@nvidia.com>

* Addressed review comments

Signed-off-by: jbalam <jbalam@nvidia.com>

* Added a line to note original sampling rate of an4

Signed-off-by: jbalam <jbalam@nvidia.com>

* Made changes suggested in review

Signed-off-by: jbalam <jbalam@nvidia.com>
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Add FastEmit support for RNNT Losses (NVIDIA#2374)

* Temp commit

Signed-off-by: smajumdar <titu1994@gmail.com>

* Initial code for fastemit forward pass

Signed-off-by: smajumdar <titu1994@gmail.com>

* Correct return reg value

Signed-off-by: smajumdar <titu1994@gmail.com>

* Initial cpu impl

Signed-off-by: smajumdar <titu1994@gmail.com>

* Try gpu impl

Signed-off-by: smajumdar <titu1994@gmail.com>

* Try gpu impl

Signed-off-by: smajumdar <titu1994@gmail.com>

* Correct few impl

Signed-off-by: smajumdar <titu1994@gmail.com>

* Update fastemit scaling

Signed-off-by: smajumdar <titu1994@gmail.com>

* Cleanup fastemit

Signed-off-by: smajumdar <titu1994@gmail.com>

* Finalize FastEmit regularization PR

Signed-off-by: smajumdar <titu1994@gmail.com>

* Refactor code to support fastemit regularization

Signed-off-by: smajumdar <titu1994@gmail.com>

Co-authored-by: Samuel Kriman <samuelkriman@gmail.com>
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Implement inference functions of TN models

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Minor Fix

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* fix bugs in hifigan code (NVIDIA#2392)

Signed-off-by: Oktai Tatanov <oktai.tatanov@gmail.com>
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Update setup.py (NVIDIA#2394)

Signed-off-by: Jason <jasoli@nvidia.com>
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* update checkpointing (NVIDIA#2396)

Signed-off-by: Jason <jasoli@nvidia.com>
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* byt5 unicode implementation (NVIDIA#2365)

* Audio Norm (NVIDIA#2285)

* add jenkins test, refactoring

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix new test

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add serial to the default normalizer, add tests

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* manifest test added

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* expose more params, new test cases

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix jenkins, serial clean, exclude range from cardinal

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins dollar sign format

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins dollar sign format

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* addressed review comments

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix decimal in measure

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* move serial in cardinal

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* clean up

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update for SH zero -> oh

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* change n_tagger default

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* bumping version to 1.0.1

Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Add check for numba regardless of device

Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* upper bound for webdataset

Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Correct Dockerfile

Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* update readmes

Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* update README (NVIDIA#2332)

Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* ddp translate GPU allocation fix (NVIDIA#2312)

* fixed branch in IR tutorial

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* ddp translate GPU allocation fix

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* map_location instead of set_device

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Shallow fusion (NVIDIA#2315)

* fixed branch in IR tutorial

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* shallow fusion init commit

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* debug info removed

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* [BUGFIX] Add upper bound to hydra for 1.0.x (NVIDIA#2337)

* upper bound hydra

Signed-off-by: ericharper <complex451@gmail.com>

* upper bound hydra

Signed-off-by: ericharper <complex451@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* update version number

Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* update package version

Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* sparrowhawk tests + punctuation post processing for pynini TN (NVIDIA#2320)

* add jenkins test, refactoring

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update test

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix new test

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add serial to the default normalizer, add tests

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* manifest test added

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* expose more params, new test cases

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix jenkins, serial clean, exclude range from cardinal

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins dollar sign format

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* jenkins dollar sign format

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* addressed review comments

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix decimal in measure

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* move serial in cardinal

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* sh tests init

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* sparrowhawk container tests support added

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add post process to normalize.py, update tests

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* remove duplication

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Update notebooks to 1.0.2 release (NVIDIA#2338)

Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Update ranges for omegaconf and hydra (NVIDIA#2336)

* Update ranges

Signed-off-by: smajumdar <titu1994@gmail.com>

* Updates for Hydra and OmegaConf updates

Signed-off-by: smajumdar <titu1994@gmail.com>

* Style fixes

Signed-off-by: smajumdar <titu1994@gmail.com>

* Correct tests and revert patch for model utils

Signed-off-by: smajumdar <titu1994@gmail.com>

* Correct docstring

Signed-off-by: smajumdar <titu1994@gmail.com>

* Revert unnecessary change

Signed-off-by: smajumdar <titu1994@gmail.com>

* Revert unnecessary change

Signed-off-by: smajumdar <titu1994@gmail.com>

* Guard scheduler for None

Signed-off-by: smajumdar <titu1994@gmail.com>

* default to 0.0 if bpe_dropout is None

Signed-off-by: ericharper <complex451@gmail.com>

* Correctly log class that was restored

Signed-off-by: smajumdar <titu1994@gmail.com>

* Root patch *bpe_dropout

Signed-off-by: smajumdar <titu1994@gmail.com>

Co-authored-by: ericharper <complex451@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Update FastPitch Export (NVIDIA#2355)

Signed-off-by: Jason <jasoli@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* byt5 unicode implementation, first cut

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* add bytelevel tokenizer

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* update out_dir to not collide (NVIDIA#2358)

Signed-off-by: ericharper <complex451@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Update container version to 21.05 (NVIDIA#2309)

* Update container version

Signed-off-by: smajumdar <titu1994@gmail.com>

* Temporarily change export format of waveglow

Signed-off-by: smajumdar <titu1994@gmail.com>

* Add conda update for numba

Signed-off-by: smajumdar <titu1994@gmail.com>

* Update numba compat via global flag for strictness level `--relax_numba_compat`, remove pytorchlightning.metrics, refactor out numba utils to core, update tests

Signed-off-by: smajumdar <titu1994@gmail.com>

* Correct order of numba minimum verion, remove wrong flag from test

Signed-off-by: smajumdar <titu1994@gmail.com>

* Double test of cuda numba

Signed-off-by: smajumdar <titu1994@gmail.com>

* Double test of cuda numba

Signed-off-by: smajumdar <titu1994@gmail.com>

* Enable RNNT tests

Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Text Normalization Update (NVIDIA#2356)

* upper cased date support

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update whitelist, change roman weights

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* docstrings, space fix, init file

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* lgtm

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fraction with measure class

Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* address comment

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Add ASR CTC tutorial on fine-tuning on another language (NVIDIA#2346)

* Add ASR CTC Language finetuning notebook

Signed-off-by: smajumdar <titu1994@gmail.com>

* Add to documentation

Signed-off-by: smajumdar <titu1994@gmail.com>

* Improve documentation

Signed-off-by: smajumdar <titu1994@gmail.com>

* Correct name of the dataset

Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Correct colab link to notebook (NVIDIA#2366)

Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* sgdqa update data directories for testing (NVIDIA#2323)

* sgdqa update data directories for testing

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix syntax

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* check if data dir exists

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* adding pretrained model

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Added documentation for export() (NVIDIA#2330)

* Added export document

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>

* Addressed review comments

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>

Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Update Citrinet model card info (NVIDIA#2369)

* Update model card info

Signed-off-by: smajumdar <titu1994@gmail.com>

* Cleanup Docs

Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* [NMT] Model Parallel Megatron Encoders (NVIDIA#2238)

* add megatron encoder

Signed-off-by: ericharper <complex451@gmail.com>

* added megatron to get_nmt_tokenizer

Signed-off-by: ericharper <complex451@gmail.com>

* add vocab_size and hidden_size to megatron bert

Signed-off-by: ericharper <complex451@gmail.com>

* add megatron encoder module

Signed-off-by: ericharper <complex451@gmail.com>

* fixed horrible typo

Signed-off-by: ericharper <complex451@gmail.com>

* fix typo and add default

Signed-off-by: ericharper <complex451@gmail.com>

* updating nlp overrides for mp nmt

Signed-off-by: ericharper <complex451@gmail.com>

* move some logic back to nlpmodel from overrides

Signed-off-by: ericharper <complex451@gmail.com>

* add checkpoint_file property

Signed-off-by: ericharper <complex451@gmail.com>

* fix property

Signed-off-by: ericharper <complex451@gmail.com>

* num_tokentypes=0

Signed-off-by: ericharper <complex451@gmail.com>

* typo

Signed-off-by: ericharper <complex451@gmail.com>

* typo

Signed-off-by: ericharper <complex451@gmail.com>

* find_unused_parameters=True

Signed-off-by: ericharper <complex451@gmail.com>

* typo

Signed-off-by: ericharper <complex451@gmail.com>

* style

Signed-off-by: ericharper <complex451@gmail.com>

* get instead of pop

Signed-off-by: ericharper <complex451@gmail.com>

* remove token type ids from megatron input example

Signed-off-by: ericharper <complex451@gmail.com>

* pop vocab_size

Signed-off-by: ericharper <complex451@gmail.com>

* fix checkpointing for model parallel

Signed-off-by: ericharper <complex451@gmail.com>

* fix bug in non model parallel

Signed-off-by: ericharper <complex451@gmail.com>

* convert cfg.trainer to dict

Signed-off-by: ericharper <complex451@gmail.com>

* make num_tokentypes configurable for nmt

Signed-off-by: ericharper <complex451@gmail.com>

* update checkpoint_file when using named megatron model in nemo

Signed-off-by: ericharper <complex451@gmail.com>

* make vocab_file configurable

Signed-off-by: ericharper <complex451@gmail.com>

* dataclass can't have mutable default

Signed-off-by: ericharper <complex451@gmail.com>

* style

Signed-off-by: ericharper <complex451@gmail.com>

* unused imports

Signed-off-by: ericharper <complex451@gmail.com>

* revert input example

Signed-off-by: ericharper <complex451@gmail.com>

* check that checkpoint version is not None

Signed-off-by: ericharper <complex451@gmail.com>

* add mp jenkins test

Signed-off-by: ericharper <complex451@gmail.com>

* update docstring

Signed-off-by: ericharper <complex451@gmail.com>

* add docs for pretrained encoders with nemo nmt

Signed-off-by: ericharper <complex451@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Add notebook with recommendations for 8 kHz speech (NVIDIA#2326)

* Added a notebook with best practices for telephony speech

* Added datasets detaiils

* Added training recommendations

* Emptied out cells with results

* Added tutorial to docs

Signed-off-by: jbalam <jbalam@nvidia.com>

* Addressed review comments

Signed-off-by: jbalam <jbalam@nvidia.com>

* Added a line to note original sampling rate of an4

Signed-off-by: jbalam <jbalam@nvidia.com>

* Made changes suggested in review

Signed-off-by: jbalam <jbalam@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Add FastEmit support for RNNT Losses (NVIDIA#2374)

* Temp commit

Signed-off-by: smajumdar <titu1994@gmail.com>

* Initial code for fastemit forward pass

Signed-off-by: smajumdar <titu1994@gmail.com>

* Correct return reg value

Signed-off-by: smajumdar <titu1994@gmail.com>

* Initial cpu impl

Signed-off-by: smajumdar <titu1994@gmail.com>

* Try gpu impl

Signed-off-by: smajumdar <titu1994@gmail.com>

* Try gpu impl

Signed-off-by: smajumdar <titu1994@gmail.com>

* Correct few impl

Signed-off-by: smajumdar <titu1994@gmail.com>

* Update fastemit scaling

Signed-off-by: smajumdar <titu1994@gmail.com>

* Cleanup fastemit

Signed-off-by: smajumdar <titu1994@gmail.com>

* Finalize FastEmit regularization PR

Signed-off-by: smajumdar <titu1994@gmail.com>

* Refactor code to support fastemit regularization

Signed-off-by: smajumdar <titu1994@gmail.com>

Co-authored-by: Samuel Kriman <samuelkriman@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* byt5 unicode implementation, first cut

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* add bytelevel tokenizer

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* update styling

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* avoid circular import

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* fix bugs in hifigan code (NVIDIA#2392)

Signed-off-by: Oktai Tatanov <oktai.tatanov@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Update setup.py (NVIDIA#2394)

Signed-off-by: Jason <jasoli@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Update bytelevel_tokenizer.py

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* Update bytelevel_tokenizer.py

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* typo

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* missed one

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* bug fixes

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* style fix

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* bytelevelprocessor is now generic.

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* style fix

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* update checkpointing (NVIDIA#2396)

Signed-off-by: Jason <jasoli@nvidia.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* style

Signed-off-by: ericharper <complex451@gmail.com>
Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* woops, didnt merge jenkinsfile the right way

* add newline

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* undo changes to enja processor

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* processor selection decision fix

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

* newline fix

Signed-off-by: mchrzanowski <mchrzanowski@nvidia.com>

Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
Co-authored-by: Aleksey Grinchuk (Oleksii Hrinchuk) <grinchuk.alexey@gmail.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: Jason <jasoli@nvidia.com>
Co-authored-by: mchrzanowski <mchrzanowski@nvidia.com>
Co-authored-by: Yang Zhang <yzhang123@users.noreply.github.com>
Co-authored-by: Boris Fomitchev <borisfom@users.noreply.github.com>
Co-authored-by: Jagadeesh Balam <4916480+jbalam-nv@users.noreply.github.com>
Co-authored-by: Samuel Kriman <samuelkriman@gmail.com>
Co-authored-by: Oktai Tatanov <oktai.tatanov@gmail.com>
Co-authored-by: root <root@dgx0026.nsv.rno1.nvmetal.net>
Co-authored-by: root <root@dgx0079.nsv.rno1.nvmetal.net>
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Minor Fix

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Minor Fixes

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Add TextNormalizationTestDataset and testing/evaluation code

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Add TextNormalizationTaggerDataset and training code for tagger

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Restore from local nemo ckpts

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Add TextNormalizationDecoderDataset

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Add interactive mode for neural_text_normalization_test.py

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Add options to do training or not for tagger/decoder

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Renamed

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Implemented setup dataloader for decoder

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Implemented training and validation for decoder

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Data augmentation for decoder training

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Config change

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* add blossom-ci.yml (NVIDIA#2401)

Signed-off-by: ericharper <complex451@gmail.com>
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Merge r1.1 bugfixes into main (NVIDIA#2407)

* Update notebook branch and Jenkinsfile for 1.1.0 testing (NVIDIA#2378)

* update branch

Signed-off-by: ericharper <complex451@gmail.com>

* update jenkinsfile

Signed-off-by: ericharper <complex451@gmail.com>

* [BUGFIX] NMT Multi-node was incorrectly computing num_replicas (NVIDIA#2380)

* fix property when not using model parallel

Signed-off-by: ericharper <complex451@gmail.com>

* fix property when not using model parallel

Signed-off-by: ericharper <complex451@gmail.com>

* add debug statement

Signed-off-by: ericharper <complex451@gmail.com>

* add debug statement

Signed-off-by: ericharper <complex451@gmail.com>

* instantiate with NLPDDPPlugin with num_nodes from trainer config

Signed-off-by: ericharper <complex451@gmail.com>

* Update ASR scripts for tokenizer building and tarred dataset building (NVIDIA#2381)

* Update ASR scripts for tokenizer building and tarred dataset building

Signed-off-by: smajumdar <titu1994@gmail.com>

* Update container

Signed-off-by: smajumdar <titu1994@gmail.com>

* Add STT Zh Citrinet 1024 Gamma 0.25 model

Signed-off-by: smajumdar <titu1994@gmail.com>

* Update notebook (NVIDIA#2391)

Signed-off-by: smajumdar <titu1994@gmail.com>

* ASR Notebooks fix for 1.1.0 (NVIDIA#2395)

* nb fix for spring clean

Signed-off-by: fayejf <fayejf07@gmail.com>

* remove outdated instruction

Signed-off-by: fayejf <fayejf07@gmail.com>

* Mean normalization (NVIDIA#2397)

* norm embeddings

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* move to utils

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* Bugfix adaptive spec augment time masking (NVIDIA#2398)

* bugfix adaptive spec augment

Signed-off-by: smajumdar <titu1994@gmail.com>

* Revert freq mask guard

Signed-off-by: smajumdar <titu1994@gmail.com>

* Revert freq mask guard

Signed-off-by: smajumdar <titu1994@gmail.com>

* Remove static time width clamping

Signed-off-by: smajumdar <titu1994@gmail.com>

* Correct typos and issues with notebooks (NVIDIA#2402)

* Fix Primer notebook

Signed-off-by: smajumdar <titu1994@gmail.com>

* Typo

Signed-off-by: smajumdar <titu1994@gmail.com>

* remove accelerator=DDP in tutorial notebooks to avoid errors. (NVIDIA#2403)

Signed-off-by: Hoo Chang Shin <hshin@nvidia.com>

Co-authored-by: Hoo Chang Shin <hshin@nvidia.com>

* style

Signed-off-by: ericharper <complex451@gmail.com>

* update jenkins branch

Signed-off-by: ericharper <complex451@gmail.com>

* update notebook branch to main

Signed-off-by: ericharper <complex451@gmail.com>

Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com>
Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com>
Co-authored-by: khcs <khcs@users.noreply.github.com>
Co-authored-by: Hoo Chang Shin <hshin@nvidia.com>
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Remove unused imports

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Add initial doc for text_normalization

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Fixed imports warnings

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Minor Fix

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Renamed

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Allowed duplex modes

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Minor Fix

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Add docs for duplex_text_normalization_train and duplex_text_normalization_test

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* docstrings for model codes + minor fix

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Add more comments and doc strings

Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Add doc for datasets + Use time.perf_counter()
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Add code for preprocessing Google TN data
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Add more docs and comments + Minor Fixes
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Add more licenses + Fixed comments + Minors
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Moved evaluation logic to DuplexTextNormalizationModel
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Add logging errors
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Updated validation code of tagger + Minors
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Also write tag preds to log file
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Add data augmentation for tagger dataset
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Added experimental decorators
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Updated docs
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Updated duplex_tn_config.yaml
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Compute token precision of tagger using NeMo metrics
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Fixed saving issue when using ddp accelerator
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Refactoring
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Add option to keep punctuations in TextNormalizationTestDataset
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Changes to input preprocessing + decoder's postprocessing
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Fixed styles + Add references
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

* Renamed examples/nlp/duplex_text_normalization/utils.py to helpers.py
Signed-off-by: Tuan Lai <tuanl@nvidia.com>

Co-authored-by: Jagadeesh Balam <4916480+jbalam-nv@users.noreply.github.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: Samuel Kriman <samuelkriman@gmail.com>
Co-authored-by: Oktai Tatanov <oktai.tatanov@gmail.com>
Co-authored-by: Jason <jasoli@nvidia.com>
Co-authored-by: Mike Chrzanowski <mike.chrzanowski0@gmail.com>
Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
Co-authored-by: Aleksey Grinchuk (Oleksii Hrinchuk) <grinchuk.alexey@gmail.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: mchrzanowski <mchrzanowski@nvidia.com>
Co-authored-by: Yang Zhang <yzhang123@users.noreply.github.com>
Co-authored-by: Boris Fomitchev <borisfom@users.noreply.github.com>
Co-authored-by: root <root@dgx0026.nsv.rno1.nvmetal.net>
Co-authored-by: root <root@dgx0079.nsv.rno1.nvmetal.net>
Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com>
Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com>
Co-authored-by: khcs <khcs@users.noreply.github.com>
Co-authored-by: Hoo Chang Shin <hshin@nvidia.com>
Signed-off-by: Paarth Neekhara <paarth.n@gmail.com>
@blisc blisc deleted the byt5 branch January 11, 2022 16:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.