T5 metrics fix (#7037) · dorotat-nv/NeMo@3c5c4d5

Commit

T5 metrics fix (NVIDIA#7037)

* Fix race condition when executing with multi-node where some ranks does not wait for setup (NVIDIA#7016)

Signed-off-by: Kim Ngo <6362111+findkim@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Added bool types to neural_types export (NVIDIA#7032)

Signed-off-by: tbartley94 <tbartley@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* rnnt and char utils (NVIDIA#6971)

* rnnt_ngram_merge

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* char level bug

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* fix tab text gen (NVIDIA#7022) (NVIDIA#7031)

Signed-off-by: Yi Dong <yidong@nvidia.com>
Co-authored-by: Yi Dong <43824965+yidong72@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fixed kwargs for metric instance init

Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fixed kwargs for metric instance init

Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* removed kwagrs

Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Updated config desc

Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* ASR Confidence update and tutorial (NVIDIA#6810)

* small fixes and tests

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* various fixes for the tutorial

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* tutorial added

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* for for a little oops after rebasement

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix tests

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* unused import removed

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* fix review comments

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* deprecated parameters for greedy configs

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* move re-assigning to configs

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* fix comments 2

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* fix config tests

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* fix ece test (my env was bugged apparently)

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* renamings for confidence ensemble

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fox comments 3

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* return dropped tutorial

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* CI flips back and forth, increasing tolerance

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

---------

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* install_bs (NVIDIA#7019) (NVIDIA#7028)

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Co-authored-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* fixes for spellmapper (NVIDIA#6994) (NVIDIA#7000)

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>
Co-authored-by: bene-ges <antonova_sasha@list.ru>
Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* added back the retro documents (NVIDIA#7033)

Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Remove pyyaml (NVIDIA#7052) (NVIDIA#7054)

Signed-off-by: smajumdar <titu1994@gmail.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* st standalone model (NVIDIA#6969)

* st standalone model

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* style fix

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* sacrebleu import fix, unused imports removed

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* import guard for nlp inside asr transformer bpe model

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* codeql fixes

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* comments answered

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* import ordering fix

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* yttm for asr removed

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* logging added

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* added inference and translate method

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* remove pos emb from state dict for old models (NVIDIA#7068)

* remove pos emb from state dict

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* move to nlp_model

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update comment

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix nmt test

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix nmt test

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fix typo in ASR-TTS tutorial (NVIDIA#7049)

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fixed tutorial's name (NVIDIA#7047)

Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>
Co-authored-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fix documentation for Numba (NVIDIA#7065) (NVIDIA#7077)

* Fix documentation for Numba

* Update force float32 flag dynamically

* Update force float32 flag dynamically

* Fix nemo version

---------

Signed-off-by: smajumdar <titu1994@gmail.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Update Frame-VAD doc and fix onnx export (NVIDIA#7076)

* update fvad doc

Signed-off-by: stevehuang52 <heh@nvidia.com>

* fix typo

Signed-off-by: stevehuang52 <heh@nvidia.com>

* update fvad example

Signed-off-by: stevehuang52 <heh@nvidia.com>

* update

Signed-off-by: stevehuang52 <heh@nvidia.com>

* fix onnx export

Signed-off-by: stevehuang52 <heh@nvidia.com>

* update test

Signed-off-by: stevehuang52 <heh@nvidia.com>

* refactor

Signed-off-by: stevehuang52 <heh@nvidia.com>

* update doc

Signed-off-by: stevehuang52 <heh@nvidia.com>

* update

Signed-off-by: stevehuang52 <heh@nvidia.com>

---------

Signed-off-by: stevehuang52 <heh@nvidia.com>
Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* memmap worker arg (NVIDIA#7062)

* memmap worker arg

Signed-off-by: arendu <adithya.r@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update

Signed-off-by: arendu <adithya.r@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update

Signed-off-by: arendu <adithya.r@gmail.com>

* update

Signed-off-by: arendu <adithya.r@gmail.com>

---------

Signed-off-by: arendu <adithya.r@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fix caching bug in causal convolutions for cache-aware ASR models (NVIDIA#7034) (NVIDIA#7082)

Co-authored-by: Vahid Noroozi <VahidooX@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fast Conformer global token fix (NVIDIA#7085)

* old way

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* remove extra

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* clean

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* clean

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* clean

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* fix

Signed-off-by: sam1373 <samuelkriman@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: sam1373 <samuelkriman@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Refined export_config (NVIDIA#7053) (NVIDIA#7066)

* Refined export_config
* Rolling back hierarchy change
---------

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>
Co-authored-by: Boris Fomitchev <borisfom@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* small Bugfix (NVIDIA#7081)

* small Bugfix (NVIDIA#7079)

* fix branch

Signed-off-by: fayejf <fayejf07@gmail.com>

* fix typo

Signed-off-by: fayejf <fayejf07@gmail.com>

* fix link

Signed-off-by: fayejf <fayejf07@gmail.com>

---------

Signed-off-by: fayejf <fayejf07@gmail.com>

* Update tutorials/nlp/SpellMapper_English_ASR_Customization.ipynb

Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>

* Update tutorials/nlp/SpellMapper_English_ASR_Customization.ipynb

Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>

---------

Signed-off-by: fayejf <fayejf07@gmail.com>
Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Added script to extract ASR CTC and RNNT models from ASR hybrid models (NVIDIA#7092)

* Added script to extract ctc and rnnt models from hybrid models

Signed-off-by: Daniel Egert <degert@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updated hybrid extraction script for review request 1

Signed-off-by: Daniel Egert <degert@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Updated hybrid convert script to remove --cuda flag

Signed-off-by: Daniel Egert <degert@nvidia.com>

---------

Signed-off-by: Daniel Egert <degert@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Adding docs and models for multiple lookahead cache-aware ASR (NVIDIA#7067) (NVIDIA#7094)

Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* update TTS readme (NVIDIA#7088)

* update TTS readme

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

---------

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fix absolute path in path join call (NVIDIA#7099)

Signed-off-by: Jan Beckmann <king-jan1999@hotmail.de>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Disable distopt contiguous param buffer by default (NVIDIA#7095)

Signed-off-by: Tim Moon <tmoon@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* microphone demo (NVIDIA#7110)

Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Co-authored-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* [Fix] load_state_dict in nlp_model.py (NVIDIA#7086)

* Fix load_state_dict in nlp_model.py

Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fix plot function in vad_utils.py (NVIDIA#7113)

Fix plot function in vad_utils.py

Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fixed small bug with NoisePerturbationWithNormalization (NVIDIA#7118)

Signed-off-by: Daniel Egert <degert@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fix import guard checks (NVIDIA#7124)

Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Revert "Fix import guard checks (NVIDIA#7124)" (NVIDIA#7125)

This reverts commit a46e325.

Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fix import guard checks (NVIDIA#7126)

* Fix import guard checks

Signed-off-by: smajumdar <titu1994@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: smajumdar <titu1994@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Add updated fc ctc and rnnt xxl models (NVIDIA#7128) (NVIDIA#7130)

Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* [TTS] Create EnCodec training recipe (NVIDIA#6852)

* [TTS] Create EnCodec training recipe

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Update encodec recipe

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Rename EnCodec to AudioCodec

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Add EnCodec unit tests

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Add copyright header to distributed.py

Signed-off-by: Ryan <rlangman@nvidia.com>

---------

Signed-off-by: Ryan <rlangman@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Fix rank where torch.distributed may not be initialized yet and would not wait for tokenizer file caching (NVIDIA#7061)

Signed-off-by: Kim Ngo <6362111+findkim@users.noreply.github.com>
Co-authored-by: David <amosalla@asu.edu>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* fix default attention size (NVIDIA#7141) (NVIDIA#7143)

Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* fix evaluator.py for various exceptions by ast (NVIDIA#7150)

Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* [TTS][ZH] add Chinese TTS recipes based on IPA symbol sets. (NVIDIA#6893)

* [TTS] add Chinese TTS recipe based on IPA.
* add new pinyin and ipa dictionaries with 36 finals.
* add yaml configs for 24-final pinyin and ipa.
* add copyright header
* add a directory level 24finals to discriminate from 36 finals.

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

* unify configs into a single one and add detailed comments providing supported candidates.

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

* choose 36-final IPA as default phoneme dict

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

---------

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* [TTS] Add output audio format to preprocessing (NVIDIA#6889)

* [TTS] Add output audio format to preprocessing

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Add format validation

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Fix data tutorial

Signed-off-by: Ryan <rlangman@nvidia.com>

---------

Signed-off-by: Ryan <rlangman@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* freeze (NVIDIA#7152)

Signed-off-by: arendu <adithya.r@gmail.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* make sure any empty segments are removed (NVIDIA#7155)

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Update RIR generation scripts (NVIDIA#6547)

- fix: reduce room size if evaluation of params fails
- added randomized mic placement
- added diffuse noise generation
- added an option to specify the format and subtype for saved audio

Signed-off-by: Ante Jukić <ajukic@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* A quickstart speech enhancement tutorial (NVIDIA#6492)

A simple example of training a model for speech enhancement task

Signed-off-by: Ante Jukić <ajukic@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* NFA subtitle file config - specify colors and vertical alignment (NVIDIA#7160)

* allow specifying colors of text in ASS subtitle file

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* specify vertical_alignment instead of marginv in ass_file_config

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

* add documentation of CTMFileConfig and ASSFileConfig to NFA README

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>

---------

Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Eagerly accumulate embedding grads into fp32 buffer (NVIDIA#6958) (NVIDIA#7153)

Signed-off-by: Tim Moon <tmoon@nvidia.com>
Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* TE bug fix (NVIDIA#7027) (NVIDIA#7036)

Signed-off-by: Dmytro Pykhtar <dpykhtar@nvidia.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* [TTS] Remove nested TTS configs (NVIDIA#7154)

* [TTS] Remove nested TTS configs

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Modify tutorial to support multiple sampling rates

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Clarify min_duration unit

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Default 22.05kHz highfreq to null

Signed-off-by: Ryan <rlangman@nvidia.com>

---------

Signed-off-by: Ryan <rlangman@nvidia.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Merge release r1.20.0 to main (NVIDIA#7167)

* update package info

Signed-off-by: ericharper <complex451@gmail.com>

* Add ASR with TTS Tutorial. Fix enhancer usage. (NVIDIA#6955)

* Add ASR with TTS Tutorial
* Fix enhancer usage

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* install_bs (NVIDIA#7019)

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* Fix typo and branch in tutorial (NVIDIA#7048)

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>

* fix syntax error introduced in PR-7079 (NVIDIA#7102)

* fix syntax error introduced in PR-7079

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* fixes for pr review

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

---------

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* fix links for TN (NVIDIA#7117)

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update branch (NVIDIA#7135)

Signed-off-by: ericharper <complex451@gmail.com>

* Fixed main and merging this to r1.20 (NVIDIA#7127)

* Fixed main and merging this to r1.20

Signed-off-by: Taejin Park <tango4j@gmail.com>

* Update vad_utils.py

Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>

---------

Signed-off-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Co-authored-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>

* update branch

Signed-off-by: ericharper <complex451@gmail.com>

* fix version

Signed-off-by: ericharper <complex451@gmail.com>

* resolve conflict the other way

Signed-off-by: ericharper <complex451@gmail.com>

* keep both

Signed-off-by: ericharper <complex451@gmail.com>

* revert keep both

Signed-off-by: ericharper <complex451@gmail.com>

---------

Signed-off-by: ericharper <complex451@gmail.com>
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>
Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Co-authored-by: Vladimir Bataev <vbataev@nvidia.com>
Co-authored-by: Nikolay Karpov <karpnv@gmail.com>
Co-authored-by: bene-ges <antonova_sasha@list.ru>
Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Co-authored-by: Taejin Park <tango4j@gmail.com>
Co-authored-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Upgrade to pytorch lightning 2.0 (NVIDIA#6433)

* Upgrade pytorch lightning version in requirements

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Initial fixes for PTL2.0

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add further fixes to support lightning 2.0

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add replacements for replace_sampler_ddp, resume_from_checkpoint_fit_path and few occurances of validation_epoch_end

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Replace all occurances of validation_epoch_end to on_validation_epoch_end

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Replace training_epoch_end, test_epoch_end with on_train_epoch_end and on_test_epoch_end respectively

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Change logger=None to logger=False in Trainer object

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove PTL2.0 deprecated Trainer args from TrainerConfig dataclass

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Modify trainer.precision check and other small edits

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Replace logger=None with logger=False in test_ptl_stateless_timer.py Trainer

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add default values for args to fix Attribute Error

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add the following modifications

1) Remove outputs arg from on_validation_epoch_end, on_test_epoch_end and make it an arg of the class
2) Replace resume_from_checkpoint with ckpt_path as needed
3) Explicitly add accelerator as 'CPU' in UTs being run on CPU

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove outputs arg from on_validation_epoch_end, on_test_epoch_end

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove outputs arg in on_validation_epoch_end in MultiBinaryAccuracy docstrings

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add val, test outputs as instance vars in PunctuationCapitalizationModel and TokenClassificationModel

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Replace trainer.fit_loop.max_steps with trainer.fit_loop.epoch_loop.max_steps in test_optimizers_schedulers.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Revert an extra space that was mistakenly added

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Use self.validation_step_outputs and self.test_step_outputs in test_ema.py for uniformity

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Use self.validation_step_outputs and self.test_step_outputs in test_ptl_stateless_timer.py and check_for_ranks.py for uniformity

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add self.validation_step_outputs.clear() and self.test_step_outputs.clear() wherever missing

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove outputs arg from on_train_epoch_end

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove outputs from on_validation_epoch_end in multi_binary_acc.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove output args from on_validation_epoch_end in the docstrings of some ASR files

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove output args from on_validation_epoch_end and clear memory from validation_step_outputs

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add on_validation_epoch_end and remove outputs args for nlp models

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Append output of validation_step to validation_step_outputs in EncDecClassificationModel

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add the following changes

1) Index self.validation_step_outputs and self.test_step.outputs with dataloader_idx wherever needed
2) Initialize self.validation_step_outputs and self.test_step.outputs as empty lists and add support for multi dataloaders if they exist
3) Remove self.pre_configure_ddp from NLPDDPStrategy class as its removed in PTL 2.0

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add default value dataloader_idx=0 for on_validation_batch_end() in megatron_base_model.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* TypeCast precision to str in attention.py and utils_funcs.py to avoid TypeError

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add if condition check for multiple dataloaders when appending to validation outputs

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Separate validation pass to be used with both validation_step and test_step

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add if condition check for multiple dataloader while appending to test_step_outputs in punctuation_capitalization_model.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add condition check for multiple dataloaders based on type of trainer.val/test_dataloaders or self._validation/test_dl instead of len

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Comment Megatron T5 IA3 PP=2 in CI pipeline due to dataloader_iter issue with PTL 2.0

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Modify precision checks to account for 16-mixed and bf16-mixed

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Append output of validation/test_step to self.validation/test_step_outputs in CTCG2PModel

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Modify find_unused_parameters=True in g2p_heteronym model

1) Add find_unused_parameters=True for DDP strategy in g2p_heteronym_classification_train_and_evaluate.py
2) Remove args output in validation/test_step and add instance variables instead for heteronym_classification.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove outputs from on_test_epoch_end in DialogueGPTClassificationModel

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add validation/test outputs in sgdqa_model and modify dialogue_config.yaml

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add split arg self.test_step_outputs to TextClassificationModel

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add test_step_outputs to dialogue and text classification models

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Change condition check for multiple dataloaders:

1) Replace ds_item as list in dialogue_config.yaml
2) Check for len of val/test_dataloaders or validation/test_dl along with type check of list in sgdqa_model.py while appending outputs of validation/test_step
3) Check for len of _validation/test_dl for creating self.validation/test_step_outputs in ModelPT and punctuation_cpitalization_model.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add additional condition for multi dataloaders

Check len(self.trainer.val/test_dataloaders) > 1 along with type(self.trainer.val/test_dataloaders) == list for multi dataloaders in validation/test_step

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add val step outputs and default val for dataloader_idx

1) Append validation_step outout to self.validation_step_outputs in MultiLabelIntentSlotClassificationMode
2) Add default val for dataloader_idx for on_test_batch_start/end in TimingCallback
3) Add self.validation/test_step_outputs in BERTQAModel and remove outputs arg

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add val/test_step_outputs to S2SQAModel and GPTQAModel

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Edit JenkinsFile for bert_pretrainig.py

Edit Jenkinsfile for this test to disable validation as a workaround for trainer.val_dataloader None error

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Modify precision to support 16-mixed, bf16-mixed in megatron_gpt_pretraining.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add ddp_find_unused_parameters_true and remove output args

1) Add ddp_find_unused_parameters_true fro trainer.strategy in self_alignment_pretraining.py as it has unused parameters
2) Remove output args and add self.validation/test_step_outputs to validation/test_step in mt_enc_dec_model.py
3) Comment tests in JenkinsFile that need to be fixed

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Precision fix in megatron_nmt_training.py for 16-mixed, bf16-mixed

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Precision fix for megatron_bert_pretraining.py and megatron_bert_model.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Precision fix and validation/test_step_outputs

1) Add fix to account for 16-mixed and bf16-mixed in megatron_retro_mutransfer_pretrain.py, megatron_retro_pretraining.py
2) Reset ckpt_path for test in enc_dec_nmt.py
3) Remove outputs args and add validation/test_step_outputs in megatron_retrieval_model.py
4) Comment Megatron Bert Pretraining and Resume Training with Pipeline Paralleism and add back NMT Training Post-LN

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Precision fix and skip few failing tests

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add missing comment lines in JenkinsFile

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Comment jenkin tests and super().on_validation_epoch_end() in megatron_gpt_sft_model.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Minor edit JenkinsFile

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Minor edit in jenkins file

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Edit in Jenkins file

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Comment missed lines in Jenkins file

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix precision and validation/test outputs

1) Add precision fix to account for 16-mixed and bf16-mixed in megatron_t5_pretraining.py
2) Remove outputs args and add append loss to self.validation/test_step_outputs in megatron_lm_encoder_decoder_model.py
3) Add back resume_from_checkpoint in the megatron_t5_config.yaml
4) Comment out certain tests in Jenkins file

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix precision and validation/test/predict errors in megatron_t5_prompt_learning.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Precision fix and edit precision typo in all files

1) Account for 16-mixed and bf16-mixed in megatron_bart_pretraining.py and megatron_t5_seq2seq_finetune.py
2) Fix precision typo in all files

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix all CI TTS tests and comment few Jenkins tests

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Combine xx_epoch_end and on_xx_epoch_end

Add on_inference_epoch_end to inference_epoch_end function and have a single on_validation/test_epoch_end in megatron_finetune_model.py and megatron_gpt_sft_model.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add a missing comment in JenkinsFile

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add try except StopIteration in validation_step for models with dataloader_iter

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove pyyaml from requirements

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add try except for inference_step in megatron_finetune_model.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove limit_val_batches for mockGPTDataset test

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add new self.validation_step_outputs for MegatronGPTSFTModel

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Minor edit Jenkinsfile

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Initialize self.validation/test_step_outputs in megatron_gpt_sft_model.py

Initialize self.validation/test_step_outputs in setup of MegatronGPTSFTModel to take care of cases when datalaoders are not setup in ModelPT for example while restoring the model.

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove resume_from_checkpoint if trainer arg in conf yaml files

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove resume_from_checkpoint as trainer arg in GPT, T5 configs

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove resume_from_checkpoint in duplex_tn_config.yaml

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix typos, unused imports and refactor code to remove redundant funcs

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove commented code in megatron_nmt_model.py

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Fix overriden functions to match parent class functions

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Prefetch dataloader_iter to prevent hang for PP>1

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Override setup() in NLPDDPStrategy to avoid hang during predict with PP>1

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Uncomment tests in JenkinsFile

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Add '16' to precision checks and other minor fixes

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Clear validation/test_step_outputs with dataloader_idx for multi dataloaders

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Minor edits

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Modify precision checks to avoid indexing

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Remove self.validation_step_outputs_sft and add dataloader_idx to clear outputs

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* Reference checkpoint with trainer.ckpt_path

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add _prefetch to NLPModel and minor fixes

Signed-off-by: Abhishree <abhishreetm@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add limit_val_batches in JenkinsFile for NMT

1) Add trainer.limit_val_batches in Megatron NMT Training TP=2
2) Remove unused import in ModelPT

Signed-off-by: Abhishree <abhishreetm@gmail.com>

---------

Signed-off-by: Abhishree <abhishreetm@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* Include the scripts for preprocessing OAST and unit tests for chat sft datasets (NVIDIA#7112)

* scripts for sft

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix style

Signed-off-by: Yi Dong <yidong@nvidia.com>

* adde special token only for huggingface model

Signed-off-by: Yi Dong <yidong@nvidia.com>

* change default name

Signed-off-by: Yi Dong <yidong@nvidia.com>

* print out error datapoint content

Signed-off-by: Yi Dong <yidong@nvidia.com>

* show error id

Signed-off-by: Yi Dong <yidong@nvidia.com>

* annotation script working

Signed-off-by: Yi Dong <yidong@nvidia.com>

* try to be compatible with huggingface tokenizer

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added examples

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added lang

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added lang

Signed-off-by: Yi Dong <yidong@nvidia.com>

* text to value special case

Signed-off-by: Yi Dong <yidong@nvidia.com>

* configure the slider

Signed-off-by: Yi Dong <yidong@nvidia.com>

* annoatation handles lang

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added the unit test for chat sft dataset

Signed-off-by: Yi Dong <yidong@nvidia.com>

* used the file in the test dir

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix json error

Signed-off-by: Yi Dong <yidong@nvidia.com>

* load local tokenizer

Signed-off-by: Yi Dong <yidong@nvidia.com>

* remove mask count check

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added HF dataset backend

Signed-off-by: Yi Dong <yidong@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Yi Dong <yidong@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* add paths to labeler. (NVIDIA#7087)

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Kim Ngo <6362111+findkim@users.noreply.github.com>
Signed-off-by: jubick1337 <mattyson.so@gmail.com>
Signed-off-by: tbartley94 <tbartley@nvidia.com>
Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>
Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>
Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>
Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>
Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: arendu <adithya.r@gmail.com>
Signed-off-by: sam1373 <samuelkriman@gmail.com>
Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>
Signed-off-by: fayejf <fayejf07@gmail.com>
Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>
Signed-off-by: Daniel Egert <degert@nvidia.com>
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Jan Beckmann <king-jan1999@hotmail.de>
Signed-off-by: Tim Moon <tmoon@nvidia.com>
Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Signed-off-by: Ryan <rlangman@nvidia.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
Signed-off-by: Ante Jukić <ajukic@nvidia.com>
Signed-off-by: Dmytro Pykhtar <dpykhtar@nvidia.com>
Signed-off-by: ericharper <complex451@gmail.com>
Signed-off-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: Abhishree <abhishreetm@gmail.com>
Co-authored-by: Kim Ngo <6362111+findkim@users.noreply.github.com>
Co-authored-by: tbartley94 <90423858+tbartley94@users.noreply.github.com>
Co-authored-by: Nikolay Karpov <karpnv@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Yi Dong <43824965+yidong72@users.noreply.github.com>
Co-authored-by: Aleksandr Laptev <alaptev@nvidia.com>
Co-authored-by: bene-ges <antonova_sasha@list.ru>
Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Co-authored-by: Aleksey Grinchuk (Oleksii Hrinchuk) <grinchuk.alexey@gmail.com>
Co-authored-by: Vladimir Bataev <vbataev@nvidia.com>
Co-authored-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com>
Co-authored-by: Adi Renduchintala <adithyare@nvidia.com>
Co-authored-by: Vahid Noroozi <VahidooX@users.noreply.github.com>
Co-authored-by: Samuel Kriman <samuelkriman@gmail.com>
Co-authored-by: Boris Fomitchev <borisfom@users.noreply.github.com>
Co-authored-by: trias702 <25867060+trias702@users.noreply.github.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: Jan Beckmann <king-jan1999@hotmail.de>
Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Co-authored-by: lleaver <137942999+lleaver@users.noreply.github.com>
Co-authored-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com>
Co-authored-by: Ryan Langman <rlangman@nvidia.com>
Co-authored-by: David <amosalla@asu.edu>
Co-authored-by: Elena Rastorgueva <80532067+erastorgueva-nv@users.noreply.github.com>
Co-authored-by: anteju <108555623+anteju@users.noreply.github.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: Taejin Park <tango4j@gmail.com>
Co-authored-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com>
Signed-off-by: dorotat <dorotat@nvidia.com>

Loading branch information

34 people authored and dorotat-nv committed Aug 24, 2023

1 parent c6387a3 commit 3c5c4d5

examples/nlp/language_modeling/conf/megatron_t5_finetune.yaml

-Original file line number
+Diff line change
@@ Expand Up / @@ -87,7 +87,7 @@ model: @@
           add_bos_to_input: ${data.train_ds.add_bos_to_input}
           add_eos_to_input: ${data.train_ds.add_eos_to_input}
           metric:
-            name: "exact_string_match" # Name of the evaluation metric to use.
+            name: "exact_string_match" # Name of the evaluation metric to use. Supported metrics: [`exact_string_match`, `rouge`, `pearson_corr_coef`, `spearman_corr_coef`, `f1`, `accuracy`, `average_precision`]
             average: micro # Average the metric over the dataset. Options: ['macro', 'micro']. Works only for 'F1', 'accuracy' etc. Refer to torchmetrics for metrics where this is supported.
             num_classes: null # Number of classes for the metric. Works only for 'F1', 'accuracy' and 'average_precision' etc. Refer to torchmetrics for metrics where this is supported.
             class_labels: null # If the targets in your dataset are strings and not integers/float, you need to provide a list of class labels (size = num_classes) so we can convert from strings to integer categories to compute the metric.
@@ Expand Down @@

nemo/collections/nlp/models/language_modeling/megatron_finetune_model.py

            
                      Original file line number
                      Diff line number
                      Diff line change
                  
    @@ -106,24 +106,36 @@ def setup_metric(self, data_cfg):
  
                            )

                metric_name = data_cfg.metric.name

                metric = MetricStringToTorchMetric[metric_name]

                metric_class = MetricStringToTorchMetric[metric_name]

                # GLUE will not have a "src_file_name" attribute and will always have only a single metric.

                if hasattr(data_cfg, "src_file_name") or hasattr(data_cfg, "file_names"):

                    if hasattr(data_cfg, "src_file_name") and isinstance(data_cfg.src_file_name, ListConfig):

                        # We pass average and num_classes to the metric constructor via kwargs even if they don't exist for each metric.

                    if (

                        hasattr(data_cfg, "src_file_name")

                        and isinstance(data_cfg.src_file_name, ListConfig)

                        and metric_name != 'rouge'

                    ):

                        metric = [

                            metric(average=data_cfg.metric.average, num_classes=data_cfg.metric.num_classes)

                            metric_class(average=data_cfg.metric.average, num_classes=data_cfg.metric.num_classes)

                            for _ in range(len(data_cfg.src_file_name))

                        ]

                    elif hasattr(data_cfg, "file_names") and isinstance(data_cfg.file_names, ListConfig):

                    elif (

                        hasattr(data_cfg, "file_names")

                        and isinstance(data_cfg.file_names, ListConfig)

                        and metric_name != 'rouge'

                    ):

                        metric = [

                            metric(average=data_cfg.metric.average, num_classes=data_cfg.metric.num_classes)

                            metric_class(average=data_cfg.metric.average, num_classes=data_cfg.metric.num_classes)

                            for _ in range(len(data_cfg.file_names))

                        ]

                    elif hasattr(data_cfg, "src_file_name") and isinstance(data_cfg.src_file_name, ListConfig):

                        metric = [metric_class() for _ in range(len(data_cfg.src_file_name))]

                    elif hasattr(data_cfg, "file_names") and isinstance(data_cfg.file_names, ListConfig):

                        metric = [metric_class() for _ in range(len(data_cfg.file_names))]

                    else:

                        metric = [metric(average=data_cfg.metric.average, num_classes=data_cfg.metric.num_classes)]

                        metric = [metric_class(average=data_cfg.metric.average, num_classes=data_cfg.metric.num_classes)]

                else:

                    metric = [metric()]  # GLUE does need to specify average or num_classes.

                    metric = [metric_class()]  # GLUE does need to specify average or num_classes.

            return metric, metric_name

    @@ -221,7 +233,7 @@ def cast_for_metric(self, pred, label, metric_name, class_labels=None, labels_ar
  
                else:

                    pred = class_labels.index(pred)

                if label not in class_labels:

                    raise ValueError(f"Ground truth labe; {label} is not in the class labels list : {class_labels}")

                    raise ValueError(f"Ground truth label {label} is not in the class labels list : {class_labels}")

                label = class_labels.index(label)

                pred = torch.LongTensor([pred]).to(self.device)

                label = torch.LongTensor([label]).to(self.device)

0 comments on commit `3c5c4d5`

Please sign in to comment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit

There are no files selected for viewing

0 comments on commit `3c5c4d5`

Commit

There are no files selected for viewing

0 comments on commit 3c5c4d5

0 comments on commit `3c5c4d5`