Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge main to asr_normalize #7084

Merged
merged 123 commits into from
Jul 20, 2023
Merged

Merge main to asr_normalize #7084

merged 123 commits into from
Jul 20, 2023

Conversation

KunalDhawan
Copy link
Collaborator

What does this PR do ?

Merging main to asr_normalize branch

Collection: [Note which collection this PR will affect]

Changelog

  • Add specific line by line info of high level changes in this PR.

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this 

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

  • Related to # (issue)

arendu and others added 30 commits June 1, 2023 11:42
* update to load from ckpt

Signed-off-by: arendu <adithya.r@gmail.com>

* update

Signed-off-by: arendu <adithya.r@gmail.com>

* load ckpt peft model

Signed-off-by: arendu <adithya.r@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update style

Signed-off-by: arendu <adithya.r@gmail.com>

---------

Signed-off-by: arendu <adithya.r@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* add model, dataset, necessary utils and tests

Signed-off-by: stevehuang52 <heh@nvidia.com>

* fix tarred data

Signed-off-by: stevehuang52 <heh@nvidia.com>

* fix typo

Signed-off-by: stevehuang52 <heh@nvidia.com>

* add fvad examples and update utils

Signed-off-by: stevehuang52 <heh@nvidia.com>

* add copyright

Signed-off-by: stevehuang52 <heh@nvidia.com>

* refactor and add tests

Signed-off-by: stevehuang52 <heh@nvidia.com>

* update dataset

Signed-off-by: stevehuang52 <heh@nvidia.com>

* update test

Signed-off-by: stevehuang52 <heh@nvidia.com>

* refactor

Signed-off-by: stevehuang52 <heh@nvidia.com>

* refactor

Signed-off-by: stevehuang52 <heh@nvidia.com>

* fix typos

Signed-off-by: stevehuang52 <heh@nvidia.com>

---------

Signed-off-by: stevehuang52 <heh@nvidia.com>
Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com>
Co-authored-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
* bug fixes

Signed-off-by: Alexandra Antonova <aleksandraa@nvidia.com>

* fix bugs, add preparation and evaluation scripts, add readme

Signed-off-by: Alexandra Antonova <aleksandraa@nvidia.com>

* small fixes

Signed-off-by: Alexandra Antonova <aleksandraa@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add real coverage calculation, small fixes, more debug information

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* add option to pass a filelist and output folder - to handle inference from multiple input files

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* added preprocessing for yago wikipedia articles - finding yago entities and their subphrases

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* yago wiki preprocessing, sampling, pseudonormalization

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* more scripts for preparation of training examples

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* bug fixes

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* add some alphabet checks

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* add bert on subwords, concatenate it to bert on characters

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* add calculation of character_pos_to_subword_pos

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* bug fix

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* bug fix

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* pdb

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* tensor join bug fix

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* double hidden_size in classifier

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* pdb

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* default index value 0 instead of -1 because index cannot be negative

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* pad index value 0 instead of -1 because index cannot be negative

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* remove pdb

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* fix bugs, add creation of tarred dataset

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* add possibility to change sequence len at inference

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* change sampling of dummy candidates at inference, add candidate info file

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* fix import

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* fix bug

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* update transcription now uses info

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* write path

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* 1. add tarred dataset support(untested). 2. fix bug with ban_ngrams in indexing

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* skip short_sent if no real candidates

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* fix import

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* add braceexpand

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* fixes

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* fix bug

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* fix bug

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* fix bug in np.ones

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* fix bug in collate

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* change tensor type to long because of error in torch.gather

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* fix for empty spans tensor

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* same fixes in _collate_fn for tarred dataset

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* fix bug from previous commit

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* change int types to be shorter to minimize tar size

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* refactoring of datasets and inference

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* bug fix

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* bug fix

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* bug fix

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* tar by 100k examples, small fixes

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* small fixes, add analytics script

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* Add functions for dynamic programming comparison to get best path by ngrams

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* fixes

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* small fix

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* fixes to support testing on SPGISpeech

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* add preprocessing for userlibri

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* some refactoring

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* some refactoring

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* move some functions to utils to reuse from other project

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* move some functions to utils to reuse from other project

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* move some functions to utils to reuse from other project

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* small refactoring before pr. Add bash-scripts reproducing evaluation

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* style fix

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* small fixes in inference

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* bug fix - didn't move window on last symbol

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix bug - shuffle was before truncation of sorted candidates

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* refactoring, fix some bugs

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* variour fixes. Add word_indices at inference

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add candidate positions

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Move data preparation and evaluation to other repo

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* add infer_reproduce_paper. Refactoring

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* refactor inference using fragment indices

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* add some helper functions

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix bug with parameters order

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix bugs

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* refactoring, fix bug

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add multiple variants of adjusting start/end positions

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* more fixes

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add unit tests, other fixes

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* fix CodeQl warnings

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* bug fixes

Signed-off-by: Alexandra Antonova <aleksandraa@nvidia.com>

* fix bugs, add preparation and evaluation scripts, add readme

Signed-off-by: Alexandra Antonova <aleksandraa@nvidia.com>

* small fixes

Signed-off-by: Alexandra Antonova <aleksandraa@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add real coverage calculation, small fixes, more debug information

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* add option to pass a filelist and output folder - to handle inference from multiple input files

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* added preprocessing for yago wikipedia articles - finding yago entities and their subphrases

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* yago wiki preprocessing, sampling, pseudonormalization

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* more scripts for preparation of training examples

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* bug fixes

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* add some alphabet checks

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* add bert on subwords, concatenate it to bert on characters

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* add calculation of character_pos_to_subword_pos

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* bug fix

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* bug fix

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* pdb

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* tensor join bug fix

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* double hidden_size in classifier

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* pdb

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* default index value 0 instead of -1 because index cannot be negative

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* pad index value 0 instead of -1 because index cannot be negative

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* remove pdb

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* fix bugs, add creation of tarred dataset

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* add possibility to change sequence len at inference

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* change sampling of dummy candidates at inference, add candidate info file

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* fix import

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* fix bug

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* update transcription now uses info

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* write path

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* 1. add tarred dataset support(untested). 2. fix bug with ban_ngrams in indexing

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* skip short_sent if no real candidates

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* fix import

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* add braceexpand

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* fixes

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* fix bug

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* fix bug

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* fix bug in np.ones

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* fix bug in collate

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* change tensor type to long because of error in torch.gather

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* fix for empty spans tensor

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* same fixes in _collate_fn for tarred dataset

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* fix bug from previous commit

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* change int types to be shorter to minimize tar size

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* refactoring of datasets and inference

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* bug fix

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* bug fix

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* bug fix

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* tar by 100k examples, small fixes

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* small fixes, add analytics script

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* Add functions for dynamic programming comparison to get best path by ngrams

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* fixes

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* small fix

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* fixes to support testing on SPGISpeech

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* add preprocessing for userlibri

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* some refactoring

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* some refactoring

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* move some functions to utils to reuse from other project

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* move some functions to utils to reuse from other project

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* move some functions to utils to reuse from other project

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* small refactoring before pr. Add bash-scripts reproducing evaluation

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* style fix

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* small fixes in inference

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* bug fix - didn't move window on last symbol

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix bug - shuffle was before truncation of sorted candidates

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* refactoring, fix some bugs

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* variour fixes. Add word_indices at inference

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add candidate positions

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Move data preparation and evaluation to other repo

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* add infer_reproduce_paper. Refactoring

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* refactor inference using fragment indices

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* add some helper functions

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix bug with parameters order

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix bugs

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* refactoring, fix bug

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add multiple variants of adjusting start/end positions

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* more fixes

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add unit tests, other fixes

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* fix

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix CodeQl warnings

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* add script for full inference pipeline, refactoring

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* add tutorial

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* take example data from HuggingFace

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* add docs

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* fix comment

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* fix bug

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* small fixes for PR

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* add some more tests

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* try to fix tests adding with_downloads

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* skip tests with tokenizer download

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

---------

Signed-off-by: Alexandra Antonova <aleksandraa@nvidia.com>
Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>
Co-authored-by: Alexandra Antonova <aleksandraa@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* [TTS] Implement new vocoder dataset

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Redo config structure, minor fixes

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Fix alignment logging

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Fix script usage example

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Fixed epoch LR scheduling

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Support .nemo checkpoint in FP callback

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Remove align interpolator

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Remove HiFi-GAN defaults list interpolation

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Rename weighted_sample_steps to weighted_sampling_steps_per_epoch

Signed-off-by: Ryan <rlangman@nvidia.com>

---------

Signed-off-by: Ryan <rlangman@nvidia.com>
* deb infer

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* deb infer

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* clean up

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* dont do maxlen trunc for non abs pos emb

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* dont do maxlen trunc for non abs pos emb

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* convert for training only

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add eval test, add save .nemo for sft model

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* jenkins format fix

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update jenkins

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update jenkins

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix jenkins

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* remove test, ci timeout

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix for m_gpt_eval.py

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* jenkins test

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix gpt_eval with sft model

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* revert jenkins

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* keep float conversion for model.generate()

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix inference dtype

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* TDT model pull request, initial draft

Signed-off-by: Hainan Xu <hainanx@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* TDT PR WIP

Signed-off-by: Hainan Xu <hainanx@nvidia.com>

* TDT PR WIP

Signed-off-by: Hainan Xu <hainanx@nvidia.com>

* TDT PR WIP

Signed-off-by: Hainan Xu <hainanx@nvidia.com>

* TDT WIP

Signed-off-by: Hainan Xu <hainanx@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* TDT WIP

Signed-off-by: Hainan Xu <hainanx@nvidia.com>

* TDT WIP

Signed-off-by: Hainan Xu <hainanx@nvidia.com>

* TDT WIP

Signed-off-by: Hainan Xu <hainanx@nvidia.com>

* TDT WIP

Signed-off-by: Hainan Xu <hainanx@nvidia.com>

* TDT WIP

Signed-off-by: Hainan Xu <hainanx@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* TDT WIP

Signed-off-by: Hainan Xu <hainanx@nvidia.com>

* TDT WIP

Signed-off-by: Hainan Xu <hainanx@nvidia.com>

* TDT WIP

Signed-off-by: Hainan Xu <hainanx@nvidia.com>

* TDT WIP

Signed-off-by: Hainan Xu <hainanx@nvidia.com>

* TDT WIP

Signed-off-by: Hainan Xu <hainanx@nvidia.com>

* addressed some review comments, part1

Signed-off-by: Hainan Xu <hainanx@nvidia.com>

* addressed some review comments, part1, one line fix

Signed-off-by: Hainan Xu <hainanx@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add tests for comparing TDT alphas with pytorch VS kernel computation

Signed-off-by: Hainan Xu <hainanx@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add tests for comparing multiblank alphas with pytorch VS kernel computation

Signed-off-by: Hainan Xu <hainanx@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add tests for fixed case computation for TDT

Signed-off-by: Hainan Xu <hainanx@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add more comments for greedy-batch decoding for TDT

Signed-off-by: Hainan Xu <hainanx@nvidia.com>

* include config for TDT model with stateless decoders

Signed-off-by: Hainan Xu <hainanx@nvidia.com>

* add reference to TDT in Readme

Signed-off-by: Hainan Xu <hainanx@nvidia.com>

* slight modification of config file comments

Signed-off-by: Hainan Xu <hainanx@nvidia.com>

* addressed more comments

Signed-off-by: Hainan Xu <hainanx@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* more detailed comments for tdt kernel

Signed-off-by: Hainan Xu <hainanx@nvidia.com>

* one line fix

Signed-off-by: Hainan Xu <hainanx@nvidia.com>

* fixed small bug that results in test fails for rnnt_decoding

Signed-off-by: Hainan Xu <hainanx@nvidia.com>

* fixed small bug that results in test fails for rnnt_decoding

Signed-off-by: Hainan Xu <hainanx@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixed small bug that results in test fails for rnnt_decoding

Signed-off-by: Hainan Xu <hainanx@nvidia.com>

* remove unused import

Signed-off-by: Hainan Xu <hainanx@nvidia.com>

---------

Signed-off-by: Hainan Xu <hainanx@nvidia.com>
Co-authored-by: Hainan Xu <hainanx@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* fix get param



* change name



---------

Signed-off-by: ericharper <complex451@gmail.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
* initial POC for LDDL Bert

* Finish LDDL POC

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* address comments

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix merge head

* resolving merge

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for  val/test loaders

* change to new LDDL class + add winding

* fix logging level

* fix winding

* test fix

* fixes to winding

* add file system

* add prepemption optimizations

* more logging

* more prints

* better logging

* asfsf

* add barrier

* removing prints

* working with mb lddl loader

* final changes

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update requirements file with LDDL



* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert adding to requirements

---------

Signed-off-by: wdykas <wdykas@nvidia.com>
Co-authored-by: wdykas <73254672+wdykas@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: Mikołaj Błaż <mblaz@nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Added a visual utterance-level comparison of two ASR models

Signed-off-by: George <gzelenfroind@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
#6791)

* Construct FP8 amax reduction group

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Update Megatron-core version in CI

Signed-off-by: Tim Moon <tmoon@nvidia.com>

---------

Signed-off-by: Tim Moon <tmoon@nvidia.com>
Co-authored-by: Tim Moon <tmoon@nvidia.com>
Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
* new lora test

Signed-off-by: arendu <adithya.r@gmail.com>

* updates

Signed-off-by: arendu <adithya.r@gmail.com>

* check for chat

Signed-off-by: arendu <adithya.r@gmail.com>

* update

Signed-off-by: arendu <adithya.r@gmail.com>

* update

Signed-off-by: arendu <adithya.r@gmail.com>

* small train set

Signed-off-by: arendu <adithya.r@gmail.com>

* update

Signed-off-by: arendu <adithya.r@gmail.com>

* precision change

Signed-off-by: arendu <adithya.r@gmail.com>

* fixed typo in paths

Signed-off-by: arendu <adithya.r@gmail.com>

* full data with limit val batches

Signed-off-by: arendu <adithya.r@gmail.com>

* tp2 instead of pp2

Signed-off-by: arendu <adithya.r@gmail.com>

* tp2 instead of pp2

Signed-off-by: arendu <adithya.r@gmail.com>

---------

Signed-off-by: arendu <adithya.r@gmail.com>
Signed-off-by: Adi Renduchintala <adithya.r@gmail.com>
Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>
* add call to p2p overlap



* update Jenkins for test



---------

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>
Signed-off-by: Eric Harper <complex451@gmail.com>
Co-authored-by: Abhinav Khattar <aklife97@gmail.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
…6793)

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Markel Sanz Ausin <markelsanz14@gmail.com>
Co-authored-by: Markel Sanz Ausin <markelsanz14@gmail.com>
* repro for gpt eval mp mem issue

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* add print statements for memory allocation

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* adjusted hot fix that prevents softmax on the entire output embedding,now memory bottlenecked by attention softmax which needs to be solved with FA or long attention

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* using compute_logprob to configure inference

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* enable compute logprob for peft

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* remove print statements

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix ci

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* added docstrings

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add missing config

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* remove truncate prompt length feature

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* tensor before all gather needs to be contiguous

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>

---------

Signed-off-by: Yang Zhang <yangzhang@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: tbartley94 <tbartley@nvidia.com>
Signed-off-by: Abhinav Khattar <aklife97@gmail.com>
Co-authored-by: Abhinav Khattar <aklife97@gmail.com>
Signed-off-by: arendu <adithya.r@gmail.com>
If datasets are stored on a read-only medium, index files
cannot be created into adjacent files and an
alternative directory must be specified for index
mapping files.

This commit adds an optional `index_mapping_dir` to
the constructors.
Unit tests are also added.



[pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Update path formatting for relative paths

Signed-off-by: Greg Heinrich <gheinrich@nvidia.com>
* Add kv cache support for transformer TE path

Signed-off-by: Yen-Shi Wang <yenshiw@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Mark get_data_parallel_group as WAR

Signed-off-by: Yen-Shi Wang <yenshiw@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Initialize process group for FP8 training

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* Update Megatron GPT eval script for non-FP8 path

Signed-off-by: Yen-Shi Wang <yenshiw@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Yen-Shi Wang <yenshiw@nvidia.com>
Signed-off-by: Tim Moon <tmoon@nvidia.com>
Signed-off-by: Yen-Shi Wang <6960565+yen-shi@users.noreply.github.com>
Co-authored-by: Yen-Shi Wang <yenshiw@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Tim Moon <tmoon@nvidia.com>
Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
* initial commit

Signed-off-by: Dima Rekesh <bmwshop@gmail.com>

* typos

Signed-off-by: Dima Rekesh <bmwshop@gmail.com>

* tweaks to padding

Signed-off-by: Dima Rekesh <bmwshop@gmail.com>

* comments

Signed-off-by: Dima Rekesh <bmwshop@gmail.com>

* attempt at first working version

Signed-off-by: Dima Rekesh <bmwshop@gmail.com>

* typos and fixed p calculation

Signed-off-by: Dima Rekesh <bmwshop@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removing merge artifacts

Signed-off-by: Dima Rekesh <bmwshop@gmail.com>

* typo

Signed-off-by: Dima Rekesh <bmwshop@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removing unnessary imports

Signed-off-by: Dima Rekesh <bmwshop@gmail.com>

* if batch split succeeded no need to conv again

Signed-off-by: Dima Rekesh <bmwshop@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* adding channel wise split

Signed-off-by: Dima Rekesh <bmwshop@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* adding reference to pytorch issue 80020

Signed-off-by: Dima Rekesh <bmwshop@gmail.com>

* removing time chunking methods

Signed-off-by: Dima Rekesh <bmwshop@gmail.com>

* accounting for the actual self._stride value

Signed-off-by: Dima Rekesh <bmwshop@gmail.com>

* limiting the fix to dw_striding subsampling

Signed-off-by: Dima Rekesh <bmwshop@gmail.com>

* renamed methods

Signed-off-by: Dima Rekesh <bmwshop@gmail.com>

* one more accounting for the actual self._stride value

Signed-off-by: Dima Rekesh <bmwshop@gmail.com>

* support for causal convs

Signed-off-by: Dima Rekesh <bmwshop@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* option to set conv chunking size manually

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixing imports

* subsampling test

Signed-off-by: Dima Rekesh <bmwshop@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* rename variable

Signed-off-by: Dima Rekesh <bmwshop@gmail.com>

* imports in test

Signed-off-by: Dima Rekesh <bmwshop@gmail.com>

* more runtime checks

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* a more careful test

Signed-off-by: Dima Rekesh <bmwshop@gmail.com>

* bug in causal

Signed-off-by: Dima Rekesh <bmwshop@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix in causal

Signed-off-by: Dima Rekesh <bmwshop@gmail.com>

* change_conv_chunking_factor methods

Signed-off-by: Dima Rekesh <bmwshop@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* renamed methods

Signed-off-by: Dima Rekesh <bmwshop@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* disabling chunking by default

Signed-off-by: Dima Rekesh <bmwshop@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* typo

Signed-off-by: Dima Rekesh <bmwshop@gmail.com>

* changing default chunking to auto

Signed-off-by: Dima Rekesh <bmwshop@gmail.com>

* only split if needed

Signed-off-by: Dima Rekesh <bmwshop@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* only split if needed

Signed-off-by: Dima Rekesh <bmwshop@gmail.com>

---------

Signed-off-by: Dima Rekesh <bmwshop@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Dima Rekesh <bmwshop@gmail.com>
Signed-off-by: Nithin Rao Koluguri <nithinraok>
Co-authored-by: Nithin Rao Koluguri <nithinraok>
* add reference to our paper

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

* add paper reference to docs

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>

---------

Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>
Signed-off-by: smajumdar <titu1994@gmail.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
* added methods.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* added methods.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* added initial code.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* added initial code.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* added initial code.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* added config files.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* fixed bugs.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* updated confs.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* updated confs.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* updated confs.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* updated confs.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* improved f.conv1d

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* pulled from main.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* pulled from main.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* added postpostnorm.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* fixed the target continiouse bug.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* added dw_striding causal.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* added print for debugging.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* added print for debugging.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* fixed causal convolutions.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* added _midnorm.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* fixed transcribe.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* cleaned code.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* moved back configs.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* moved back configs.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* updated fast emit for FC models.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* updated fast emit for FC models.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixed bug.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* fixed bug and addressed comments.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixed configs.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* fixed configs.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

* dropped the test.

Signed-off-by: Vahid <vnoroozi@nvidia.com>

---------

Signed-off-by: Vahid <vnoroozi@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
karpnv and others added 21 commits July 13, 2023 09:05
* aliases

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* add NEMO_PATH

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* expand_aliases

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

---------

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Igor Gitman <igitman@nvidia.com>
…ble models (#7012) (#7013)

* [TTS] fastpitch: add english libritts model with asr stft parameters (25 ms 10 ms)



* [TTS] enhancer: add pretrained model intended for asr finetuning



---------

Signed-off-by: Roman Korostik <rkorostik@nvidia.com>
* Add ASR with TTS Tutorial
* Fix enhancer usage

Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Co-authored-by: Vladimir Bataev <vbataev@nvidia.com>
* Add end_strings to SamplingParams

Signed-off-by: Gerald Shen <geshen@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Gerald Shen <geshen@nvidia.com>

* Add end_strings to megatron_gpt_inference.yaml

Signed-off-by: Gerald Shen <geshen@nvidia.com>

* Add end_strings to sampling params

Signed-off-by: Gerald Shen <geshen@nvidia.com>

* Remove extra_id_1 from default end_strings

Signed-off-by: Gerald Shen <geshen@nvidia.com>

* Fix require_grad typos (#6930)

Signed-off-by: Sergii Dymchenko <sdym@fb.com>
Signed-off-by: Gerald Shen <geshen@nvidia.com>

* fix syntax error

Signed-off-by: Gerald Shen <geshen@nvidia.com>

* fix the mpt chatbot (#6957) (#6968)

Signed-off-by: Yi Dong <yidong@nvidia.com>
Co-authored-by: Yi Dong <43824965+yidong72@users.noreply.github.com>
Signed-off-by: Gerald Shen <geshen@nvidia.com>

* add support for max_total_length=4096 for 43b (#6763)

* add support for max_total_length=4096 for 43b

Signed-off-by: Zhilin Wang <wangzhilin12061996@hotmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Zhilin Wang <wangzhilin12061996@hotmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Gerald Shen <geshen@nvidia.com>

* rnnt_greedy_decoding.py: typos? auto-repressively -> auto-regressively (#6989)

Signed-off-by: Vadim Kantorov <vadimkantorov@gmail.com>
Signed-off-by: Gerald Shen <geshen@nvidia.com>

* Cache handling without input tensors mutation (#6980) (#6996)

* Cache handling without input tensors mutation



* Cleanup



* Cleanup#2



* Cleanup#3



---------

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>
Co-authored-by: Boris Fomitchev <borisfom@users.noreply.github.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Signed-off-by: Gerald Shen <geshen@nvidia.com>

* Hybrid conformer export (#6983) (#6995)

* Implemented generic kv-pair setting of export_config from args



* Hybrid conformer export



* Hybrid decoder export



* Cleanup



* Changed from **kwargs



* Docstring



* Docs added



* Stringify args



* Added docs for ASR export configs



* lowercase ctc



---------

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>
Co-authored-by: Boris Fomitchev <borisfom@users.noreply.github.com>
Signed-off-by: Gerald Shen <geshen@nvidia.com>

* Fixing an issue with confidence ensembles (#6987) (#7004)

* Bug fix for the confidence ensembles



* Relax constraints for the test



---------

Signed-off-by: Igor Gitman <igitman@nvidia.com>
Co-authored-by: Igor Gitman <igitman@nvidia.com>
Signed-off-by: Gerald Shen <geshen@nvidia.com>

* [TTS] Add cosine distance option to TTS aligner (#6806)

* [TTS] Add cosine distance option to TTS aligner

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] Update aligner comments

Signed-off-by: Ryan <rlangman@nvidia.com>

---------

Signed-off-by: Ryan <rlangman@nvidia.com>
Signed-off-by: Gerald Shen <geshen@nvidia.com>

* Minor MPT-7B fixes and creation script update (#6982)

* Initial commit of minor MPT-7B fixes

Signed-off-by: Daniel Egert <degert@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Daniel Egert <degert@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Gerald Shen <geshen@nvidia.com>

* Change Jenkins timeout (#6997)

* change timeout

Signed-off-by: ericharper <complex451@gmail.com>

* change to 8 hours

Signed-off-by: ericharper <complex451@gmail.com>

---------

Signed-off-by: ericharper <complex451@gmail.com>
Signed-off-by: Gerald Shen <geshen@nvidia.com>

* remove hard coded input and output fields (#7008)

* remove hard coded input and output fields

Signed-off-by: arendu <adithya.r@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: arendu <adithya.r@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Gerald Shen <geshen@nvidia.com>

* RoPE length extrapolation with interpolation (#7005)

* Push changes

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fixes

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* add continue training script

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [WIP] nonlinear interp

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* override encoder_seq_len

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Remove nonlinear

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* sft with pi (#7006)

* sft with pi

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* update values only if not None"

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* Address comments

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add info

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Empty

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

---------

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Signed-off-by: Gerald Shen <geshen@nvidia.com>

* use proper config

Signed-off-by: Gerald Shen <geshen@nvidia.com>

* Add end_strings to SamplingParams

Signed-off-by: Gerald Shen <geshen@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Gerald Shen <geshen@nvidia.com>

* Add end_strings to megatron_gpt_inference.yaml

Signed-off-by: Gerald Shen <geshen@nvidia.com>

* Add end_strings to sampling params

Signed-off-by: Gerald Shen <geshen@nvidia.com>

* Remove extra_id_1 from default end_strings

Signed-off-by: Gerald Shen <geshen@nvidia.com>

* fix syntax error

Signed-off-by: Gerald Shen <geshen@nvidia.com>

* use proper config

Signed-off-by: Gerald Shen <geshen@nvidia.com>

---------

Signed-off-by: Gerald Shen <geshen@nvidia.com>
Signed-off-by: Sergii Dymchenko <sdym@fb.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Zhilin Wang <wangzhilin12061996@hotmail.com>
Signed-off-by: Vadim Kantorov <vadimkantorov@gmail.com>
Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>
Signed-off-by: Igor Gitman <igitman@nvidia.com>
Signed-off-by: Ryan <rlangman@nvidia.com>
Signed-off-by: Daniel Egert <degert@nvidia.com>
Signed-off-by: ericharper <complex451@gmail.com>
Signed-off-by: arendu <adithya.r@gmail.com>
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Sergii Dymchenko <kit1980@gmail.com>
Co-authored-by: Gerald Shen <geshen@nvidia.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Yi Dong <43824965+yidong72@users.noreply.github.com>
Co-authored-by: Zhilin Wang <wangzhilin12061996@hotmail.com>
Co-authored-by: Vadim Kantorov <vadimkantorov@gmail.com>
Co-authored-by: Boris Fomitchev <borisfom@users.noreply.github.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: Igor Gitman <igitman@nvidia.com>
Co-authored-by: Ryan Langman <rlangman@nvidia.com>
Co-authored-by: trias702 <25867060+trias702@users.noreply.github.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: Adi Renduchintala <adithyare@nvidia.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
…es not wait for setup (#7016)

Signed-off-by: Kim Ngo <6362111+findkim@users.noreply.github.com>
Signed-off-by: tbartley94 <tbartley@nvidia.com>
* rnnt_ngram_merge

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* char level bug

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Co-authored-by: Yi Dong <43824965+yidong72@users.noreply.github.com>
* small fixes and tests

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* various fixes for the tutorial

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* tutorial added

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* for for a little oops after rebasement

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix tests

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* unused import removed

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* fix review comments

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* deprecated parameters for greedy configs

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* move re-assigning to configs

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* fix comments 2

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* fix config tests

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* fix ece test (my env was bugged apparently)

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* renamings for confidence ensemble

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fox comments 3

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* return dropped tutorial

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

* CI flips back and forth, increasing tolerance

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>

---------

Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Co-authored-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Alexandra Antonova <antonova_sasha@list.ru>
Co-authored-by: bene-ges <antonova_sasha@list.ru>
Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: smajumdar <titu1994@gmail.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
* st standalone model

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* style fix

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* sacrebleu import fix, unused imports removed

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* import guard for nlp inside asr transformer bpe model

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* codeql fixes

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* comments answered

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* import ordering fix

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* yttm for asr removed

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* logging added

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* added inference and translate method

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: AlexGrinch <grinchuk.alexey@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* remove pos emb from state dict

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* move to nlp_model

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update comment

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* fix nmt test

Signed-off-by: Evelina <ebakhturina@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix nmt test

Signed-off-by: Evelina <ebakhturina@nvidia.com>

---------

Signed-off-by: Evelina <ebakhturina@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com>
Co-authored-by: Vladimir Bataev <vbataev@nvidia.com>
* Fix documentation for Numba



* Update force float32 flag dynamically



* Update force float32 flag dynamically



* Fix nemo version



---------

Signed-off-by: smajumdar <titu1994@gmail.com>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
* update fvad doc

Signed-off-by: stevehuang52 <heh@nvidia.com>

* fix typo

Signed-off-by: stevehuang52 <heh@nvidia.com>

* update fvad example

Signed-off-by: stevehuang52 <heh@nvidia.com>

* update

Signed-off-by: stevehuang52 <heh@nvidia.com>

* fix onnx export

Signed-off-by: stevehuang52 <heh@nvidia.com>

* update test

Signed-off-by: stevehuang52 <heh@nvidia.com>

* refactor

Signed-off-by: stevehuang52 <heh@nvidia.com>

* update doc

Signed-off-by: stevehuang52 <heh@nvidia.com>

* update

Signed-off-by: stevehuang52 <heh@nvidia.com>

---------

Signed-off-by: stevehuang52 <heh@nvidia.com>
Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com>
* memmap worker arg

Signed-off-by: arendu <adithya.r@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update

Signed-off-by: arendu <adithya.r@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update

Signed-off-by: arendu <adithya.r@gmail.com>

* update

Signed-off-by: arendu <adithya.r@gmail.com>

---------

Signed-off-by: arendu <adithya.r@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
return costs

@staticmethod
def backward(ctx, grad_output):

Check notice

Code scanning / CodeQL

Explicit returns mixed with implicit (fall through) returns

Mixing implicit and explicit returns may indicate an error as implicit returns always return None.
# Have to turn off activations_checkpoint_method for inference
try:
model.model.language_model.encoder.activations_checkpoint_method = None
except AttributeError:

Check notice

Code scanning / CodeQL

Empty except

'except' clause does nothing but pass and there is no explanatory comment.

try:
model.frozen_model.model.language_model.encoder.activations_checkpoint_method = None
except AttributeError:

Check notice

Code scanning / CodeQL

Empty except

'except' clause does nothing but pass and there is no explanatory comment.
Comment on lines +23 to +28
from nemo.collections.nlp.parts.nlp_overrides import (
MegatronHalfPrecisionPlugin,
NLPDDPStrategy,
NLPSaveRestoreConnector,
PipelineMixedPrecisionPlugin,
)

Check notice

Code scanning / CodeQL

Unused import

Import of 'MegatronHalfPrecisionPlugin' is not used. Import of 'PipelineMixedPrecisionPlugin' is not used.
def multi_test_epoch_end(self, outputs, dataloader_idx: int = 0):
return self.multi_validation_epoch_end(outputs, dataloader_idx, eval_mode="test")

def test_dataloader(self):

Check notice

Code scanning / CodeQL

Explicit returns mixed with implicit (fall through) returns

Mixing implicit and explicit returns may indicate an error as implicit returns always return None.
from nemo.collections.nlp.modules.common.lm_utils import get_transformer
from nemo.collections.nlp.modules.common.transformer import BeamSearchSequenceGenerator, TransformerEncoder

NLP_AVAILABLE = True

Check notice

Code scanning / CodeQL

Unused global variable

The global variable 'NLP_AVAILABLE' is not used.

NLP_AVAILABLE = True
except (ImportError, ModuleNotFoundError):
NLP_AVAILABLE = False

Check notice

Code scanning / CodeQL

Unused global variable

The global variable 'NLP_AVAILABLE' is not used.
zip(tarred_audio_filepaths, manifest_filepaths)
):
conf = copy.deepcopy(config)
conf['manifest_filepath'] = manifest_filepath

Check failure

Code scanning / CodeQL

Modification of parameter with default

This expression mutates a [default value](1).
conf = copy.deepcopy(config)
conf['manifest_filepath'] = manifest_filepath
with open_dict(conf):
conf['tarred_audio_filepaths'] = tarred_audio_filepath

Check failure

Code scanning / CodeQL

Modification of parameter with default

This expression mutates a [default value](1).
@KunalDhawan KunalDhawan merged commit 1b17b22 into asr_normalize Jul 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.