Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update EncDecClassificationDatasetConfig Dataclass #1815

Merged
merged 1 commit into from
Feb 26, 2021

Conversation

blisc
Copy link
Collaborator

@blisc blisc commented Feb 26, 2021

No description provided.

Signed-off-by: Jason <jasoli@nvidia.com>
@okuchaiev okuchaiev changed the base branch from main to r1.0.0rc1 February 26, 2021 23:07
@okuchaiev okuchaiev changed the base branch from r1.0.0rc1 to main February 26, 2021 23:07
@okuchaiev okuchaiev merged commit 4d6cad6 into NVIDIA:main Feb 26, 2021
@blisc blisc deleted the bug_fix branch February 26, 2021 23:11
@lgtm-com
Copy link

lgtm-com bot commented Feb 26, 2021

This pull request introduces 4 alerts when merging f0e045e into 0f9a772 - view on LGTM.com

new alerts:

  • 4 for Unused import

redoctopus pushed a commit that referenced this pull request Mar 11, 2021
* initial WIP of fs2

Signed-off-by: Jason <jasoli@nvidia.com>

* segmentation tutorial dir fix (#1765)

* fix dir

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* dir fix for colab

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* B4 leftovers (#1766)

* Megatron fixes: lazy init moved back to module for inference to work (#1750)

* Megatron fixes: lazy init moved back to module, Torch version bumped in Docker for ONNX

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>

* Fixed indent

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>

* Fixed checkpoint-dependent attr

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>

* Format fix, extracted function

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>

* Rolling back container version; Fixing hook reset

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>

* Disabled ONNX unit test, kept Megatron forward test

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>

* Restored lazy init calls from setup()

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>

* Style fix

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>

* refactor lazy init

Signed-off-by: ericharper <complex451@gmail.com>

* style

Signed-off-by: ericharper <complex451@gmail.com>

Co-authored-by: ericharper <complex451@gmail.com>

* Dev deps cnt (#1732)

* added deps on new versions of packages

Signed-off-by: Tomasz Kornuta <tkornuta@nvidia.com>

* bumped version of EFF to 0.2.6, added nvidia-pypi to setup reqs

Signed-off-by: Tomasz Kornuta <tkornuta@nvidia.com>

* Using setup.py style fix to fix lack of space style in setup.py

Signed-off-by: Tomasz Kornuta <tkornuta@nvidia.com>

* removed graph surgeon

Signed-off-by: Tomasz Kornuta <tkornuta@nvidia.com>

* pinning version of webdataset to 0.1.40

Signed-off-by: Tomasz Kornuta <tkornuta@nvidia.com>

* Cleaned up unused exports

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>

* Removing extra requirements

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>

Co-authored-by: ericharper <complex451@gmail.com>
Co-authored-by: Tomasz Kornuta <56979727+tkornuta-nvidia@users.noreply.github.com>

* update Dockerfile

Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>

* add new length regulator

Signed-off-by: Jason <jasoli@nvidia.com>

* add new length regulator

Signed-off-by: Jason <jasoli@nvidia.com>

* Fix Primer notebook version and typo (#1773)

Signed-off-by: smajumdar <titu1994@gmail.com>

* use existing modules

Signed-off-by: Jason <jasoli@nvidia.com>

* use old modules

Signed-off-by: Jason <jasoli@nvidia.com>

* bug fixes

Signed-off-by: Jason <jasoli@nvidia.com>

* Tarred Datasets for Monolingual Corpora (#1758)

* Initial commit for monolingual tarred dataset for NMT

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Add coverage to BPE

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Initial working commit of monolingual tarred dataset

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Return beam search results when tgt is None in model forward

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Code formatting fixes

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Added parallel dataset translation, detokenization and unused import fixes

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Style fixes

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* More style and unused import fixes

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Allow setting topk value from CLI args

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Code formatting fixes

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Refactor creating monolingual and parallel datasets

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Add batch translate function to NMT model, refactor dpp translate and monolingual webdataset fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
Co-authored-by: Eric Harper <complex451@gmail.com>

* durs wip

Signed-off-by: Jason <jasoli@nvidia.com>

* switch from spec to audio

Signed-off-by: Jason <jasoli@nvidia.com>

* Ja Source Language Preprocessing (#1781)

* japanese preprocessing

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>

* removing m2m blob

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>

* fix SentencePieceTokenizer method usage

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>

* kwarg messup

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>

* re-order so that ja/zh/else is consistent. switch \' -> \"

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>

* fixing style

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>

* register sentencepiece_model so that it gets included in .nemo file when saved

Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>
Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>

* remove commented out line

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>

Co-authored-by: Mike Chrzanowski <mchrzanowski@nvidia.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>

* Update notebooks to RC1 (#1782)

* update model primer tutorial

Signed-off-by: smajumdar <titu1994@gmail.com>

* Update all notebooks to RC1

Signed-off-by: smajumdar <titu1994@gmail.com>

* Update all notebooks to RC1 + README.rst

Signed-off-by: smajumdar <titu1994@gmail.com>

* Update docker instructions

Signed-off-by: smajumdar <titu1994@gmail.com>

* move the Q/DQ position for better fusion in TRT (#1783)

Signed-off-by: Vincent Huang <vincenth@nvidia.com>

* audio tb

Signed-off-by: Jason <jasoli@nvidia.com>

* didnt specify type of argument (#1785)

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>

Co-authored-by: Mike Chrzanowski <mchrzanowski@nvidia.com>

* Removing attach_onnx_to_onnx (#1790)

* Removing attach_onnx_to_onnx

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>

* Removing onnx concatenation references

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>

* handle aux sentencepiece tokenizer

Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>

* Notebook fix and modified some scripts (#1793)

* Notebook fix and modified some scripts

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* added hi-mia script from earlier nemo 0.x versions

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* add infer script; add prosody info to 2s; switch to log_dur

Signed-off-by: Jason <jasoli@nvidia.com>

* checkpoint name fix (#1798)

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* add masking

Signed-off-by: Jason <jasoli@nvidia.com>

* Refactor of tokenization and detokenization within the NMT model (#1789)

* Cleanup tokenization and detokenization

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fixes to merge moses and chinese/japanese call formats

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Style fixes and remove methods not part of rc

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix circular imports

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix docstring

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Remove unused imports

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Remove unused import

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Remove sentencepiece tokenizer from within model class

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Update nemo/collections/common/tokenizers/japanese_tokenizers.py

Co-authored-by: Mike Chrzanowski <mike.chrzanowski0@gmail.com>

* Add docstring for JapaneseTokenizer and rename variable

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Co-authored-by: Mike Chrzanowski <mike.chrzanowski0@gmail.com>

* transpose

Signed-off-by: Jason <jasoli@nvidia.com>

* bug

Signed-off-by: Jason <jasoli@nvidia.com>

* add use own predictions; fix mask

Signed-off-by: Jason <jasoli@nvidia.com>

* Add Transcription script for all ASR models  (#1786)

* Add CTC transcription scripts

Signed-off-by: smajumdar <titu1994@gmail.com>

* Add speech transcription script

Signed-off-by: smajumdar <titu1994@gmail.com>

* Add speech transcription script

Signed-off-by: smajumdar <titu1994@gmail.com>

* Revert old changes

Signed-off-by: smajumdar <titu1994@gmail.com>

* Revert old changes

Signed-off-by: smajumdar <titu1994@gmail.com>

* Add jenkins test to run transcribe_speech.py

Signed-off-by: smajumdar <titu1994@gmail.com>

* Add missing apostrophe

Signed-off-by: smajumdar <titu1994@gmail.com>

* Correct duplicate stage name

Signed-off-by: smajumdar <titu1994@gmail.com>

* Update jenkins

Signed-off-by: smajumdar <titu1994@gmail.com>

* Update jenkins

Signed-off-by: smajumdar <titu1994@gmail.com>

* Update jenkins

Signed-off-by: smajumdar <titu1994@gmail.com>

* Update jenkins

Signed-off-by: smajumdar <titu1994@gmail.com>

* Update jenkins

Signed-off-by: smajumdar <titu1994@gmail.com>

* Update jenkins

Signed-off-by: smajumdar <titu1994@gmail.com>

* Update jenkins

Signed-off-by: smajumdar <titu1994@gmail.com>

* Update jenkins

Signed-off-by: smajumdar <titu1994@gmail.com>

* Update jenkins

Signed-off-by: smajumdar <titu1994@gmail.com>

* temp remove gpu unittests

Signed-off-by: smajumdar <titu1994@gmail.com>

* Update Jenkinsfile

Signed-off-by: smajumdar <titu1994@gmail.com>

* Update Jenkinsfile

Signed-off-by: smajumdar <titu1994@gmail.com>

* Update Jenkinsfile

Signed-off-by: smajumdar <titu1994@gmail.com>

* Update Jenkinsfile

Signed-off-by: smajumdar <titu1994@gmail.com>

* Update Jenkinsfile

Signed-off-by: smajumdar <titu1994@gmail.com>

* Update Jenkinsfile

Signed-off-by: smajumdar <titu1994@gmail.com>

* Update Jenkinsfile

Signed-off-by: smajumdar <titu1994@gmail.com>

* Update Jenkinsfile

Signed-off-by: smajumdar <titu1994@gmail.com>

* Update Jenkinsfile

Signed-off-by: smajumdar <titu1994@gmail.com>

* Update Jenkinsfile

Signed-off-by: smajumdar <titu1994@gmail.com>

* Update Jenkinsfile

Signed-off-by: smajumdar <titu1994@gmail.com>

* Update Jenkinsfile

Signed-off-by: smajumdar <titu1994@gmail.com>

* Update Jenkinsfile

Signed-off-by: smajumdar <titu1994@gmail.com>

* Update Jenkinsfile

Signed-off-by: smajumdar <titu1994@gmail.com>

* Update Jenkinsfile

Signed-off-by: smajumdar <titu1994@gmail.com>

* Update Jenkinsfile

Signed-off-by: smajumdar <titu1994@gmail.com>

* Update Jenkinsfile

Signed-off-by: smajumdar <titu1994@gmail.com>

* Update Jenkinsfile

Signed-off-by: smajumdar <titu1994@gmail.com>

* Give up on Jenkinsfile

Signed-off-by: smajumdar <titu1994@gmail.com>

* add new lightning trainer properties

Signed-off-by: Jason <jasoli@nvidia.com>

* share the same pre and post processing pipelines for Ja & En (#1801)

* share the same pre and post processing pipelines for Ja & En

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>

* reorder for ordering

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>

* change comment for specificity

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>

* renamed file

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>

* styling fix

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>

* undo styling fix.

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>

* unnecessary multiple variable instantiation

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>

Co-authored-by: Mike Chrzanowski <mchrzanowski@nvidia.com>

* Bio meg dir update (#1796)

* update dir name

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* fix name

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* proper use

Signed-off-by: Jason <jasoli@nvidia.com>

* Fix megatron vocab file (#1803)

* use user specified vocab_file with megatron

Signed-off-by: ericharper <complex451@gmail.com>

* updating all examples

Signed-off-by: ericharper <complex451@gmail.com>

* Adding option to always create .nemo file when writing checkpoint (#1794)

* Adding option to always create .nemo file when writing checkpoint

Signed-off-by: rprenger <rprenger@nvidia.com>

* Fixing an issue where save_best_model=True would have made the trainer start from the best model at every checkpoint save instead of from the latest model

Signed-off-by: rprenger <rprenger@nvidia.com>

* Caching the path of the best model so we don't re-generate .nemo files when they haven't changed

Signed-off-by: rprenger <rprenger@nvidia.com>

* style fix

Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>

Co-authored-by: rprenger <rprenger@nvidia.com>
Co-authored-by: Jason <jasoli@nvidia.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>

* Hotfix for en/ ja preprocessing (#1804)

* share the same pre and post processing pipelines for Ja & En

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>

* reorder for ordering

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>

* change comment for specificity

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>

* renamed file

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>

* styling fix

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>

* undo styling fix.

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>

* unnecessary multiple variable instantiation

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>

* hotfix

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>

* share the same pre and post processing pipelines for Ja & En

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>

* reorder for ordering

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>

* change comment for specificity

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>

* renamed file

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>

* styling fix

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>

* undo styling fix.

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>

* unnecessary multiple variable instantiation

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>

* hotfix

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>

* style fix agai

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>

Co-authored-by: Mike Chrzanowski <mchrzanowski@nvidia.com>

* update Jenkins file

Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>

* cfg.vocab_file was being updated to the .nemo location instead of the actual location (#1808)

Signed-off-by: ericharper <complex451@gmail.com>

* add proper ifs

Signed-off-by: Jason <jasoli@nvidia.com>

* set max seq length for inference (#1809)

* update inference

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* make params explicit

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* Applying PR 1794 to  r1.0.0rc1 (#1812)

* apply ryan's PR to r1.0.0rc1

* Update exp_manager.py

add new line

Co-authored-by: Mike Chrzanowski <mchrzanowski@nvidia.com>

* diarization tutorial (#1814)

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* dataclass (#1815)

Signed-off-by: Jason <jasoli@nvidia.com>

* log pitch

Signed-off-by: Jason <jasoli@nvidia.com>

* Run all tests in RC

Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>

* check if attr exist (#1817)

Signed-off-by: ericharper <complex451@gmail.com>

* Remove CTC parts from RNNT transcribe (#1816)

Signed-off-by: smajumdar <titu1994@gmail.com>

* Rc1 fix bert lm model 2 (#1818)

* check if attr exist

Signed-off-by: ericharper <complex451@gmail.com>

* check if cfg.tokenizer is None

Signed-off-by: ericharper <complex451@gmail.com>

* Fix language filtering (#1791)

Signed-off-by: PeganovAnton <peganoff2@mail.ru>

Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>

* HifiGAN finetuning on synthetic mels (#1780)

* switched LR annealing to cosine, fixed type checks

Signed-off-by: Felix Kreuk <felixkreuk@gmail.com>

* added fine-tune dataset, added fine-tuning to hifigan training

Signed-off-by: Felix Kreuk <felixkreuk@gmail.com>

* added bias denoising

Signed-off-by: Felix Kreuk <felixkreuk@gmail.com>

* added yaml config specifications

Signed-off-by: Felix Kreuk <felixkreuk@gmail.com>

* fixed bot checks

Signed-off-by: Felix Kreuk <felixkreuk@gmail.com>

* max_steps exported to yaml

Signed-off-by: Felix Kreuk <felixkreuk@gmail.com>

* switch to soundfile for audio loading, set max_steps instead of max_epochs in Trainer

Signed-off-by: Felix Kreuk <felixkreuk@gmail.com>

Co-authored-by: Jason <jasoli@nvidia.com>

* Update exp_manager and callbacks for lightning 1.2.0 (#1774)

* update exp_manager and callbacks for lightning 1.2.0

Signed-off-by: Jason <jasoli@nvidia.com>

* add back filepath; remove global_rank and local_rank from ASR models

Signed-off-by: Jason <jasoli@nvidia.com>

* remove more global_rank local_rank

Signed-off-by: Jason <jasoli@nvidia.com>

* more bug fixes

Signed-off-by: Jason <jasoli@nvidia.com>

* del not pop

Signed-off-by: Jason <jasoli@nvidia.com>

* add open_dict

Signed-off-by: Jason <jasoli@nvidia.com>

* add properties

Signed-off-by: Jason <jasoli@nvidia.com>

* remove test for now

Signed-off-by: Jason <jasoli@nvidia.com>

* Add SPE tokenizer.vocab to registered archive (#1821)

* Add SPE tokenizer.vocab to registered archives

Signed-off-by: smajumdar <titu1994@gmail.com>

* Add SPE tokenizer.vocab to registered archives

Signed-off-by: smajumdar <titu1994@gmail.com>

* Add support for computing CTC / RNNT alignments (#1772)

* Add logprob calculation support for RNNT (without batching)

Signed-off-by: smajumdar <titu1994@gmail.com>

* Add batched support for RNNT alignments

Signed-off-by: smajumdar <titu1994@gmail.com>

* Add docstring

Signed-off-by: smajumdar <titu1994@gmail.com>

* Add beam=1 decoding support for beam search logit preservation

Signed-off-by: smajumdar <titu1994@gmail.com>

* Update greedy alignment

Signed-off-by: smajumdar <titu1994@gmail.com>

* Alignments with beam search working

Signed-off-by: smajumdar <titu1994@gmail.com>

* Add docstring about computing alignments

Signed-off-by: smajumdar <titu1994@gmail.com>

* Add full alignment calculation support for ASR Models

Signed-off-by: smajumdar <titu1994@gmail.com>

* Add hypothesis output tests

Signed-off-by: smajumdar <titu1994@gmail.com>

* Correct documentation

Signed-off-by: smajumdar <titu1994@gmail.com>

* Update beam search doc

Signed-off-by: smajumdar <titu1994@gmail.com>

* Remove old code

Signed-off-by: smajumdar <titu1994@gmail.com>

* Update configs

Signed-off-by: smajumdar <titu1994@gmail.com>

* Fix for variable names in tarred dataset creation (#1827)

* Fix for variable names in tarred dataset creation

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Another bug in filename variable

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Adding global and local rank that was removed for some reason

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Remove local/global rank again

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>

* Patch ASR Notebooks (#1831)

* Patch ASR Notebooks

Signed-off-by: smajumdar <titu1994@gmail.com>

* Patch ASR Notebooks

Signed-off-by: smajumdar <titu1994@gmail.com>

* Add SPE support for huge dataset corpus (#1822)

* Add support for extremely large corpus fitting of SentencePicee

Signed-off-by: smajumdar <titu1994@gmail.com>

* Add support for extremely large corpus fitting of SentencePicee

Signed-off-by: smajumdar <titu1994@gmail.com>

* Remove log message

Signed-off-by: smajumdar <titu1994@gmail.com>

* Remove log message

Signed-off-by: smajumdar <titu1994@gmail.com>

* Fixing NMT DDP override that no longer works with PTL 1.2  (#1829)

* add world size attribute back to constructor

Signed-off-by: ericharper <complex451@gmail.com>

* replace r1.0.0rc1 with main in Jenkinsfile

Signed-off-by: ericharper <complex451@gmail.com>

* set find_unused_paramters to True by default for NLP models

Signed-off-by: ericharper <complex451@gmail.com>

* set find_unused_paramters to True by default for NLP models

Signed-off-by: ericharper <complex451@gmail.com>

* set find_unused_paramters to True by default for NLP models

Signed-off-by: ericharper <complex451@gmail.com>

* temporarily remove model parallel jenkins test

Signed-off-by: ericharper <complex451@gmail.com>

* check hasattr first

Signed-off-by: ericharper <complex451@gmail.com>

* check hasattr first

Signed-off-by: ericharper <complex451@gmail.com>

* check hasattr first

Signed-off-by: ericharper <complex451@gmail.com>

* overriding ddp plugin

Signed-off-by: ericharper <complex451@gmail.com>

* check if trainer is None

Signed-off-by: ericharper <complex451@gmail.com>

* add find_unused_parameters to accelerator attribute instead of connector

Signed-off-by: ericharper <complex451@gmail.com>

* move override to .setup

Signed-off-by: ericharper <complex451@gmail.com>

* use self.trainer instead of self._trainer

Signed-off-by: ericharper <complex451@gmail.com>

* set find_unused for non NLPModel

Signed-off-by: ericharper <complex451@gmail.com>

* style

Signed-off-by: ericharper <complex451@gmail.com>

* remove unused import

Signed-off-by: ericharper <complex451@gmail.com>

* Fix TTS Notebook bugs (#1837)

* ix notebooks

Signed-off-by: Jason <jasoli@nvidia.com>

* reqs

Signed-off-by: Jason <jasoli@nvidia.com>

* wanb try catch

Signed-off-by: Jason <jasoli@nvidia.com>

* typo fixes (#1838)

* punct tutorial fix

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* typos fixed

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* [WIP] Refactoring translation routines (#1805)

* using batch_translate in eval_step

Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>

* refactor + some fixes

Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>

* Fix chinese, japanese tokenizer imports breaking asr install

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Refactor language specific tokenizers to implement tokenize,detokenize and normalize

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Style fixes

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Bug fix in determining target processor

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Few more fixes

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Detokenization fix for EnJa

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Remove comments for finished work

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix enja decoding (#1820)

* apply ryan's PR to r1.0.0rc1

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>

* changes

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>

* next

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>

* * undo detokenization change
* update all eval steps to tokenize ja/en correctly.

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>

* bug

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>

* undo naming change

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>

* get types right. annoying.

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>

* next changes.

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>

* newline

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>

* change

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>

* next round

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>

* next round

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>

* final fix?

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>

* switching en<>ja pipeline to integers. which prevents coding issues.

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>

* unnecessary logging import

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>

* undo change

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>

* remove unneeded detokenization file now that the sentencepieceprocessor
does the detokenization

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>

* Update mt_enc_dec_model.py

remove line

* more succinct

* remove sentencepiecedetokenizer

* remove ability to not specify a sentencepiecetokenizer path

* comment

* undo ptl 1.2.0 changes, which break draco training

Co-authored-by: Mike Chrzanowski <mchrzanowski@nvidia.com>

* fix style

Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>

* Move back to PTL 1.2

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* undo changes to callbacks

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Remove global rank from model constructor

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix enja decoding (round 2) (#1835)

* apply ryan's PR to r1.0.0rc1

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>

* changes

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>

* next

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>

* * undo detokenization change
* update all eval steps to tokenize ja/en correctly.

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>

* bug

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>

* undo naming change

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>

* get types right. annoying.

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>

* next changes.

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>

* newline

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>

* change

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>

* next round

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>

* next round

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>

* final fix?

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>

* switching en<>ja pipeline to integers. which prevents coding issues.

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>

* unnecessary logging import

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>

* undo change

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>

* remove unneeded detokenization file now that the sentencepieceprocessor
does the detokenization

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>

* Update mt_enc_dec_model.py

remove line

* more succinct

* remove sentencepiecedetokenizer

* remove ability to not specify a sentencepiecetokenizer path

* comment

* undo ptl 1.2.0 changes, which break draco training

* removing sentencepiece usage for en<>ja

* few more

* fix style

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>

* remove unused re imort

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>

* normalize only when lang is en

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>

* styline

Co-authored-by: Mike Chrzanowski <mchrzanowski@nvidia.com>

* Undo lightning change

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Style fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Remove unused imports

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Co-authored-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Mike Chrzanowski <mike.chrzanowski0@gmail.com>
Co-authored-by: Mike Chrzanowski <mchrzanowski@nvidia.com>

* Fix text norm tutorial (#1836)

* fix nlp typos in notebooks

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix tutorial for jupyter notebook

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix path name (#1840)

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* Limit the maximum length of subwords generated from corpus (#1842)

* Add support for limiting length of subwords

Signed-off-by: smajumdar <titu1994@gmail.com>

* Add support for limiting length of subwords

Signed-off-by: smajumdar <titu1994@gmail.com>

* Update docstring

Signed-off-by: smajumdar <titu1994@gmail.com>

* Update docstring

Signed-off-by: smajumdar <titu1994@gmail.com>

* grammar fix (#1843)

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* CI Fixes for Lightning 1.2.1 (#1839)

* updates

Signed-off-by: Jason <jasoli@nvidia.com>

* add back pleasefixme

Signed-off-by: Jason <jasoli@nvidia.com>

* rmtree

Signed-off-by: Jason <jasoli@nvidia.com>

* add cleanup_local_folder fixtures instead of rmtree

Signed-off-by: Jason <jasoli@nvidia.com>

* bugfix?

Signed-off-by: Jason <jasoli@nvidia.com>

* add back pleasefixme

Signed-off-by: Jason <jasoli@nvidia.com>

* typo

Signed-off-by: Jason <jasoli@nvidia.com>

* set melGAN to find_unused = True

Signed-off-by: Jason <jasoli@nvidia.com>

* force deletion

Signed-off-by: Jason <jasoli@nvidia.com>

* fix 'DATA_DIR not found'. (#1846)

Signed-off-by: Hoo Chang Shin <hshin@nvidia.com>

Co-authored-by: Hoo Chang Shin <hshin@nvidia.com>

* bug removed : onnx file was not getting added to the tarfile (.enemo)  (#1832)

* bug removed :
**what was wrong**  : renaming and adding onnx to tar  was not working
**How solved**  :  make atemp copy of the file rename and add to tar(.enemo) cleanup extra file

Signed-off-by: supatel <supatel@gitlab-master.nvidia.com>

* format fixed

Signed-off-by: supatel <supatel@gitlab-master.nvidia.com>

* code formatting with black

Signed-off-by: supatel <supatel@gitlab-master.nvidia.com>

* style

Signed-off-by: Jason <jasoli@nvidia.com>

Co-authored-by: supatel <supatel@gitlab-master.nvidia.com>
Co-authored-by: Jason <jasoli@nvidia.com>

* fix output path (#1845)

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix asr notebooks (#1847)

Signed-off-by: fayejf <fayejf07@gmail.com>

* Patch SPE tokenizer not being available in older ASR Checkpoints (#1848)

Signed-off-by: smajumdar <titu1994@gmail.com>

* bumping version to 1.0.0rc2

Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>

* Try TTS Updates Again (#1849)

* set find_unused

Signed-off-by: Jason <jasoli@nvidia.com>

* fix t2 header

Signed-off-by: Jason <jasoli@nvidia.com>

* add header

Signed-off-by: Jason <jasoli@nvidia.com>

* more fixes

Signed-off-by: Jason <jasoli@nvidia.com>

* update headers

Signed-off-by: Jason <jasoli@nvidia.com>

* headers

Signed-off-by: Jason <jasoli@nvidia.com>

* undo tacotron2 change

Signed-off-by: Jason <jasoli@nvidia.com>

* Cleanup save/restore (#1851)

* Cleanup save/restore

* Remove EFF save/restore routes
* Once we can take EFF dependency we will use EFF.Archive directly

Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>

* fix copyright headers

Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>

* Freeze modules during transcribe to prevent gradient accumulation during loop (#1853)

Signed-off-by: smajumdar <titu1994@gmail.com>

* ASR with Speaker Diarization noteboook (#1850)

* ASR with Speaker Diarization noteboook

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* changed format to speaker first

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* wording corrections

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* Update README.rst

README is pointing to a container that hasn't been released yet.

* Fix qa tutorial (#1860)

* fix output path

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* fix

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* remote hard fix path (#1862)

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* EnJa tokenize output format fix (#1863)

* tookenization fix

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>

* better naming of output variable

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>

* revert changes and fix enja tokenize func

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>

* revert chinese changes. break 1-liner into 2 in enja

Signed-off-by: Mike Chrzanowski <mchrzanowski@nvidia.com>

Co-authored-by: Mike Chrzanowski <mchrzanowski@nvidia.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>

* sync val metrics (#1861)

Signed-off-by: ericharper <complex451@gmail.com>

Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>

* RC1 NeMo Core Docs Update (#1858)

* update docs

Signed-off-by: ericharper <complex451@gmail.com>

* update docs

Signed-off-by: ericharper <complex451@gmail.com>

* update

Signed-off-by: ericharper <complex451@gmail.com>

* update

Signed-off-by: ericharper <complex451@gmail.com>

* update

Signed-off-by: ericharper <complex451@gmail.com>

* update

Signed-off-by: ericharper <complex451@gmail.com>

* update

Signed-off-by: ericharper <complex451@gmail.com>

* update

Signed-off-by: ericharper <complex451@gmail.com>

* update

Signed-off-by: ericharper <complex451@gmail.com>

* update

Signed-off-by: ericharper <complex451@gmail.com>

* update

Signed-off-by: ericharper <complex451@gmail.com>

* update

Signed-off-by: ericharper <complex451@gmail.com>

* switch to stft_patch (#1864)

Signed-off-by: Jason <jasoli@nvidia.com>

Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>

* Update CI container to 21.02 (#1865)

* Update CI container to 21.02

Signed-off-by: smajumdar <titu1994@gmail.com>

* Correct ffmpeg install

Signed-off-by: smajumdar <titu1994@gmail.com>

* update squad inference to use correct gpu if list of gpus is passed

Signed-off-by: ericharper <complex451@gmail.com>

* trainer.test seems to be working properly with ddp now

Signed-off-by: ericharper <complex451@gmail.com>

Co-authored-by: ericharper <complex451@gmail.com>

* Some renaming.

Signed-off-by: Stanislav Beliaev <stasbelyaev96@gmail.com>

* TalkNet 1.x draft.

Signed-off-by: Stanislav Beliaev <stasbelyaev96@gmail.com>

* Three pipelines complete.

Signed-off-by: Stanislav Beliaev <stasbelyaev96@gmail.com>

* Fix some comments.

Signed-off-by: Stanislav Beliaev <stasbelyaev96@gmail.com>

* Fix style issues.

Signed-off-by: Stanislav Beliaev <stasbelyaev96@gmail.com>

* TTS Notebook and PR issues.

Signed-off-by: Stanislav Beliaev <stasbelyaev96@gmail.com>

* TalkNet style issues.

Signed-off-by: Stanislav Beliaev <stasbelyaev96@gmail.com>

* TalkNet doc.

Signed-off-by: Stanislav Beliaev <stasbelyaev96@gmail.com>

* Small fix.

Signed-off-by: Stanislav Beliaev <stasbelyaev96@gmail.com>

* round

Signed-off-by: Jason <jasoli@nvidia.com>

* pin lightning

Signed-off-by: Jason <jasoli@nvidia.com>

* print more torch stuff

Signed-off-by: Jason <jasoli@nvidia.com>

* wip of adding talknet durations

Signed-off-by: Jason <jasoli@nvidia.com>

* wip

Signed-off-by: Jason <jasoli@nvidia.com>

* training working

Signed-off-by: Jason <jasoli@nvidia.com>

Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Co-authored-by: Boris Fomitchev <borisfom@users.noreply.github.com>
Co-authored-by: ericharper <complex451@gmail.com>
Co-authored-by: Tomasz Kornuta <56979727+tkornuta-nvidia@users.noreply.github.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@nvidia.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
Co-authored-by: Mike Chrzanowski <mike.chrzanowski0@gmail.com>
Co-authored-by: Mike Chrzanowski <mchrzanowski@nvidia.com>
Co-authored-by: Xiaodong (Vincent) Huang <vincenth@nvidia.com>
Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com>
Co-authored-by: Ryan Prenger <ryanprenger@baidu.com>
Co-authored-by: rprenger <rprenger@nvidia.com>
Co-authored-by: PeganovAnton <peganoff2@mail.ru>
Co-authored-by: Felix Kreuk <felixkreuk@gmail.com>
Co-authored-by: Yang Zhang <yzhang123@users.noreply.github.com>
Co-authored-by: khcs <khcs@users.noreply.github.com>
Co-authored-by: Hoo Chang Shin <hshin@nvidia.com>
Co-authored-by: SUNIL PATEL <snlpatel001213@hotmail.com>
Co-authored-by: supatel <supatel@gitlab-master.nvidia.com>
Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com>
Co-authored-by: Stanislav Beliaev <stasbelyaev96@gmail.com>
mousebaiker pushed a commit to mousebaiker/NeMo that referenced this pull request Jul 8, 2021
Signed-off-by: Jason <jasoli@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants