Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for concat map dataset #5133

Merged
merged 10 commits into from
Nov 15, 2022
Merged

Conversation

1-800-BAD-CODE
Copy link
Contributor

@1-800-BAD-CODE 1-800-BAD-CODE commented Oct 10, 2022

What does this PR do ?

Fix for concat map dataset #5117

Only caveat I know of is that it will not reshuffle each epoch when using round robin. I'm not sure if anyone actually uses round robin in practice, though.

Usage

from collections import defaultdict

import torch
from pytorch_lightning import seed_everything

from nemo.collections.common.data import ConcatMapDataset

seed_everything(42)


class IntegerDataset(torch.utils.data.Dataset):
    def __init__(self, n: int, dataset_id: int):
        self._ints = list(range(n))
        self._id = torch.tensor(dataset_id)

    def __len__(self):
        return len(self._ints)

    def __getitem__(self, idx):
        worker_info = torch.utils.data.get_worker_info()
        worker_id = worker_info.id
        return self._id, worker_id, self._ints[idx]


for sampling_technique in ["temperature", "random", "round-robin"]:
    probs = [0.1, 0.2, 0.3]
    concat_dataset = ConcatMapDataset(
        datasets=[
            IntegerDataset(5, 0),
            IntegerDataset(10, 1),
            IntegerDataset(20, 2),
        ],
        seed=123456,
        sampling_temperature=2,
        sampling_probabilities=probs,
        sampling_technique=sampling_technique
    )

    dataloader = torch.utils.data.DataLoader(
        dataset=concat_dataset,
        batch_size=1,
        num_workers=2,
        shuffle=False  # Should be False to preserve round robin
    )

    print(f"Item-by-item results for technique '{sampling_technique}':")
    collected = defaultdict(list)
    for batch in dataloader:
        batch_dataset_id, batch_worker_id, batch_value = batch
        print(
            f"Dataset {batch_dataset_id.item()} worker {batch_worker_id.item()} fetched {batch_value.item()}"
        )
        collected[batch_dataset_id.item()].append(batch_value.item())

    print("*" * 80)
    print(f"Summary of returned results for each dataset for technique '{sampling_technique}'")
    if sampling_technique == "random":
        print(f"\t{probs=}")
    for dataset_id, values in sorted(collected.items(), key=lambda x: x[0]):
        dataset_raw_length = len(concat_dataset.datasets[dataset_id])
        print(f"\tDataset #{dataset_id} (raw length {dataset_raw_length}): {values}")
    print("*" * 80)
    print()

Output:

Item-by-item results for technique 'temperature':
Global seed set to 42
Dataset 2 worker 0 fetched 0
Dataset 2 worker 1 fetched 12
Dataset 1 worker 0 fetched 3
Dataset 1 worker 1 fetched 1
Dataset 0 worker 0 fetched 0
Dataset 2 worker 1 fetched 18
Dataset 1 worker 0 fetched 5
Dataset 0 worker 1 fetched 4
Dataset 0 worker 0 fetched 3
Dataset 2 worker 1 fetched 1
Dataset 0 worker 0 fetched 2
Dataset 0 worker 1 fetched 1
Dataset 2 worker 0 fetched 3
Dataset 1 worker 1 fetched 6
Dataset 0 worker 0 fetched 1
Dataset 1 worker 1 fetched 2
Dataset 2 worker 0 fetched 5
Dataset 0 worker 1 fetched 3
Dataset 1 worker 0 fetched 9
Dataset 0 worker 1 fetched 0
Dataset 2 worker 0 fetched 9
Dataset 2 worker 1 fetched 8
Dataset 2 worker 0 fetched 13
Dataset 2 worker 1 fetched 7
Dataset 1 worker 0 fetched 4
Dataset 2 worker 1 fetched 19
Dataset 2 worker 0 fetched 4
Dataset 2 worker 1 fetched 16
Dataset 2 worker 0 fetched 10
Dataset 2 worker 1 fetched 17
Dataset 0 worker 0 fetched 2
Dataset 0 worker 1 fetched 4
Dataset 1 worker 0 fetched 7
Dataset 1 worker 1 fetched 0
Dataset 0 worker 0 fetched 1
Dataset 1 worker 1 fetched 8
Dataset 0 worker 0 fetched 2
Dataset 1 worker 1 fetched 5
Dataset 2 worker 0 fetched 14
Dataset 2 worker 1 fetched 2
Dataset 1 worker 0 fetched 0
Dataset 2 worker 1 fetched 6
Dataset 1 worker 0 fetched 7
Dataset 2 worker 1 fetched 15
Dataset 0 worker 0 fetched 3
Dataset 2 worker 1 fetched 11
********************************************************************************
Summary of returned results for each dataset for technique 'temperature'
	Dataset #0; raw length 5; num drawn = 13; values = [0, 4, 3, 2, 1, 1, 3, 0, 2, 4, 1, 2, 3]
	Dataset #1; raw length 10; num drawn = 13; values = [3, 1, 5, 6, 2, 9, 4, 7, 0, 8, 5, 0, 7]
	Dataset #2; raw length 20; num drawn = 20; values = [0, 12, 18, 1, 3, 5, 9, 8, 13, 7, 19, 4, 16, 10, 17, 14, 2, 6, 15, 11]
********************************************************************************

Item-by-item results for technique 'random':
Dataset 2 worker 0 fetched 0
Dataset 2 worker 1 fetched 12
Dataset 2 worker 0 fetched 18
Dataset 1 worker 1 fetched 3
Dataset 0 worker 0 fetched 0
Dataset 2 worker 1 fetched 1
Dataset 1 worker 0 fetched 1
Dataset 0 worker 1 fetched 4
Dataset 0 worker 0 fetched 3
Dataset 2 worker 1 fetched 3
Dataset 0 worker 0 fetched 2
Dataset 1 worker 1 fetched 5
Dataset 1 worker 0 fetched 6
Dataset 2 worker 1 fetched 5
Dataset 1 worker 0 fetched 2
Dataset 2 worker 1 fetched 9
Dataset 2 worker 0 fetched 8
Dataset 2 worker 1 fetched 13
Dataset 0 worker 0 fetched 1
Dataset 1 worker 1 fetched 9
Dataset 1 worker 0 fetched 4
Dataset 0 worker 1 fetched 0
Dataset 2 worker 0 fetched 7
Dataset 2 worker 1 fetched 19
Dataset 2 worker 0 fetched 4
Dataset 2 worker 1 fetched 16
Dataset 1 worker 0 fetched 7
Dataset 2 worker 1 fetched 10
Dataset 2 worker 0 fetched 17
Dataset 2 worker 1 fetched 14
Dataset 2 worker 0 fetched 2
Dataset 2 worker 1 fetched 6
Dataset 0 worker 0 fetched 3
Dataset 0 worker 1 fetched 2
Dataset 0 worker 0 fetched 1
Dataset 1 worker 1 fetched 0
Dataset 1 worker 0 fetched 8
Dataset 1 worker 1 fetched 8
Dataset 0 worker 0 fetched 4
Dataset 1 worker 1 fetched 7
Dataset 2 worker 0 fetched 15
Dataset 2 worker 1 fetched 11
********************************************************************************
Summary of returned results for each dataset for technique 'random'
	probs=[0.1, 0.2, 0.3]
	Dataset #0; raw length 5; num drawn = 10; values = [0, 4, 3, 2, 1, 0, 3, 2, 1, 4]
	Dataset #1; raw length 10; num drawn = 12; values = [3, 1, 5, 6, 2, 9, 4, 7, 0, 8, 8, 7]
	Dataset #2; raw length 20; num drawn = 20; values = [0, 12, 18, 1, 3, 5, 9, 8, 13, 7, 19, 4, 16, 10, 17, 14, 2, 6, 15, 11]
********************************************************************************

Item-by-item results for technique 'round-robin':
Dataset 0 worker 0 fetched 0
Dataset 1 worker 1 fetched 3
Dataset 2 worker 0 fetched 0
Dataset 0 worker 1 fetched 4
Dataset 1 worker 0 fetched 1
Dataset 2 worker 1 fetched 12
Dataset 0 worker 0 fetched 3
Dataset 1 worker 1 fetched 5
Dataset 2 worker 0 fetched 18
Dataset 0 worker 1 fetched 2
Dataset 1 worker 0 fetched 6
Dataset 2 worker 1 fetched 1
Dataset 0 worker 0 fetched 1
Dataset 1 worker 1 fetched 2
Dataset 2 worker 0 fetched 3
Dataset 0 worker 1 fetched 0
Dataset 1 worker 0 fetched 9
Dataset 2 worker 1 fetched 5
Dataset 0 worker 0 fetched 2
Dataset 1 worker 1 fetched 4
Dataset 2 worker 0 fetched 9
Dataset 0 worker 1 fetched 1
Dataset 1 worker 0 fetched 7
Dataset 2 worker 1 fetched 8
Dataset 0 worker 0 fetched 3
Dataset 1 worker 1 fetched 0
Dataset 2 worker 0 fetched 13
Dataset 0 worker 1 fetched 4
Dataset 1 worker 0 fetched 8
Dataset 2 worker 1 fetched 7
Dataset 0 worker 0 fetched 1
Dataset 1 worker 1 fetched 6
Dataset 2 worker 0 fetched 19
Dataset 0 worker 1 fetched 3
Dataset 1 worker 0 fetched 3
Dataset 2 worker 1 fetched 4
Dataset 0 worker 0 fetched 2
Dataset 1 worker 1 fetched 5
Dataset 2 worker 0 fetched 16
Dataset 0 worker 1 fetched 4
Dataset 1 worker 0 fetched 9
Dataset 2 worker 1 fetched 10
Dataset 0 worker 0 fetched 0
Dataset 1 worker 1 fetched 8
Dataset 2 worker 0 fetched 17
Dataset 0 worker 1 fetched 3
Dataset 1 worker 0 fetched 7
Dataset 2 worker 1 fetched 14
Dataset 0 worker 0 fetched 2
Dataset 1 worker 1 fetched 1
Dataset 2 worker 0 fetched 2
Dataset 0 worker 1 fetched 0
Dataset 1 worker 0 fetched 2
Dataset 2 worker 1 fetched 6
Dataset 0 worker 0 fetched 1
Dataset 1 worker 1 fetched 4
Dataset 2 worker 0 fetched 15
Dataset 0 worker 1 fetched 4
Dataset 1 worker 0 fetched 0
Dataset 2 worker 1 fetched 11
********************************************************************************
Summary of returned results for each dataset for technique 'round-robin'
	Dataset #0; raw length 5; num drawn = 20; values = [0, 4, 3, 2, 1, 0, 2, 1, 3, 4, 1, 3, 2, 4, 0, 3, 2, 0, 1, 4]
	Dataset #1; raw length 10; num drawn = 20; values = [3, 1, 5, 6, 2, 9, 4, 7, 0, 8, 6, 3, 5, 9, 8, 7, 1, 2, 4, 0]
	Dataset #2; raw length 20; num drawn = 20; values = [0, 12, 18, 1, 3, 5, 9, 8, 13, 7, 19, 4, 16, 10, 17, 14, 2, 6, 15, 11]
********************************************************************************

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

@1-800-BAD-CODE 1-800-BAD-CODE marked this pull request as ready for review October 11, 2022 00:02
@github-actions
Copy link
Contributor

This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days.

@github-actions github-actions bot added the stale label Oct 27, 2022
@okuchaiev
Copy link
Member

@1-800-BAD-CODE is this still WIP or ready for review?

@github-actions github-actions bot removed the stale label Oct 28, 2022
@1-800-BAD-CODE 1-800-BAD-CODE changed the title [WIP] fix for concat map dataset Fix for concat map dataset Oct 31, 2022
@1-800-BAD-CODE
Copy link
Contributor Author

@okuchaiev ready for review; I previously changed the status but didn't update the title.

I'll add some of my own critical comments:

  • I believe that round-robin should shuffle every epoch, but this PR does not.
  • For large datasets, the non-round-robin techniques don't re-sample from the dataset each epoch, and some samples from each dataset are used more than the others (the ones chosen during the initial oversampling).

These are slightly harder to solve for a map-style dataset, since when the data loader says "give me item 113" we should always return the same item "113", whereas with the iterable-style datasets we simply need to return an arbitrary unique item.

Copy link
Contributor

@MaximumEntropy MaximumEntropy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fixes!

@okuchaiev okuchaiev merged commit c21074e into NVIDIA:main Nov 15, 2022
tango4j pushed a commit that referenced this pull request Nov 17, 2022
* change for concat map dataset

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Exhaust longest dataset

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: 1-800-BAD-CODE <>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
nithinraok pushed a commit that referenced this pull request Nov 21, 2022
* first commit on eval_diar_with_asr.py

Signed-off-by: Taejin Park <tango4j@gmail.com>

* Add a standalone diarization-ASR evaluation transcript

Signed-off-by: Taejin Park <tango4j@gmail.com>

* Fixed examples in docstrings

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed staticmethod error

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Added description on eval modes

Signed-off-by: Taejin Park <tango4j@gmail.com>

* adding diar_infer_general.yaml

Signed-off-by: Taejin Park <tango4j@gmail.com>

* fix msdd_model in general yaml file

Signed-off-by: Taejin Park <tango4j@gmail.com>

* fixed errors in yaml file

Signed-off-by: Taejin Park <tango4j@gmail.com>

* combine into 1 commit

Signed-off-by: Taejin Park <tango4j@gmail.com>

* Added description on eval modes

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add MoE support for T5 model (w/o expert parallel) (#5409)

* clean

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* kwarg ref

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* fix

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* fix

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* test

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* test

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* test

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* test

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* test

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* test

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* extra args

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* test

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* rm prints

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* style

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* review comments

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* review comments

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* review comments

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* fix

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fix args (#5410) (#5416)

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>

* Fix for concat map dataset (#5133)

* change for concat map dataset

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Exhaust longest dataset

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: 1-800-BAD-CODE <>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>

* Add temporary fix for CUDA issue in Dockerfile (#5421) (#5422)

Signed-off-by: Yu Yao <yuya@nvidia.com>

Signed-off-by: Yu Yao <yuya@nvidia.com>

Signed-off-by: Yu Yao <yuya@nvidia.com>
Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com>

* Fix GPT generation when using sentencepiece tokenizer (#5413) (#5428)

* Fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Yi Dong <yidong@nvidia.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Yi Dong <yidong@nvidia.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>

* Support for finetuning and finetuning inference with .ckpt files & batch size refactoring (#5339)

* Initial refactor

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Resolve config before passing to load_from_checkpoint

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fixes for model parallel and nemo restore

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fixes for eval

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert config changes

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Refactor

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix typo

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Remove comments

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Minor

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix validation reconfiguration

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Remove old comment

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixes for test_ds

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Revert "Add temporary fix for CUDA issue in Dockerfile (#5421)" (#5431) (#5432)

This reverts commit 0718b17.

Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com>

* [ITN] fix year date graph, cardinals extension for hundreds (#5435)

* wip

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add lociko's hundreds extension for cardinals

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add optional end

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restart ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update doc in terms of get_label for lang id model (#5366)

* reflect PR 5278 ion doc

Signed-off-by: fayejf <fayejf07@gmail.com>

* reflect comment

Signed-off-by: fayejf <fayejf07@gmail.com>

Signed-off-by: fayejf <fayejf07@gmail.com>

* Revert workaround for T5 that sets number of workers to 0 & sync_batch_comm=False (#5420) (#5433)

* Revert workers workaround

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix in config

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>

* Fixed bug in notebook (#5382) (#5394)

Signed-off-by: Virginia Adams <vadams@nvidia.com>

Signed-off-by: Virginia Adams <vadams@nvidia.com>

Signed-off-by: Virginia Adams <vadams@nvidia.com>
Co-authored-by: Virginia Adams <78445382+vadam5@users.noreply.github.com>

* Fixing bug in Megatron BERT when loss mask is all zeros (#5424)

* Fixing bug when loss mask is fully zero

Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update megatron_bert_model.py

Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>

* Update dataset_utils.py

Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update dataset_utils.py

Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>

* Update dataset_utils.py

Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>

Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>

* Use updated API for overlapping grad sync with pipeline parallelism (#5236)

Signed-off-by: Tim Moon <tmoon@nvidia.com>

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* support to disable sequence length + 1 input tokens for each sample in MegatronGPT (#5363)

* support to disable sequence length + 1 input tokens for MegatronGPT

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: Anmol Gupta <anmolg@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>

* [TTS] Create script for processing TTS training audio (#5262)

* Create script for processing TTS training audio
* Update VAD trimming logic
* Remove unused import

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] remove useless logic for set_tokenizer. (#5430)

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

* Fix setting up of `ReduceLROnPlateau` learning rate scheduler (#5444)

* Fix tests

Signed-off-by: PeganovAnton <peganoff2@mail.ru>

* Add accidentally lost changes

Signed-off-by: PeganovAnton <peganoff2@mail.ru>

Signed-off-by: PeganovAnton <peganoff2@mail.ru>

* Create codeql.yml (#5445)

Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>

Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>

* Fix for getting tokenizer in character-based ASR models when using tarred dataset (#5442)

Signed-off-by: Jonghwan Hyeon <hyeon0145@gmail.com>

Signed-off-by: Jonghwan Hyeon <hyeon0145@gmail.com>

* Combine 5 commits

adding diar_infer_general.yaml

Signed-off-by: Taejin Park <tango4j@gmail.com>

Update codeql.yml

Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>

Update codeql.yml

Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>

fix msdd_model in general yaml file

Signed-off-by: Taejin Park <tango4j@gmail.com>

fixed errors in yaml file

Signed-off-by: Taejin Park <tango4j@gmail.com>

* moved eval_der function and fixed tqdm options

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Changed minor error in docstrings

Signed-off-by: Taejin Park <tango4j@gmail.com>

* removed score_labels and changed leave=True

Signed-off-by: Taejin Park <tango4j@gmail.com>

Signed-off-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: Abhinav Khattar <aklife97@gmail.com>
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: Yu Yao <yuya@nvidia.com>
Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: fayejf <fayejf07@gmail.com>
Signed-off-by: Virginia Adams <vadams@nvidia.com>
Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Signed-off-by: Tim Moon <tmoon@nvidia.com>
Signed-off-by: Ryan <rlangman@nvidia.com>
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: PeganovAnton <peganoff2@mail.ru>
Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>
Signed-off-by: Jonghwan Hyeon <hyeon0145@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Abhinav Khattar <aklife97@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Shane Carroll <50530592+1-800-BAD-CODE@users.noreply.github.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com>
Co-authored-by: Yi Dong <yidong@nvidia.com>
Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com>
Co-authored-by: Virginia Adams <78445382+vadam5@users.noreply.github.com>
Co-authored-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Co-authored-by: anmolgupt <14880251+anmolgupt@users.noreply.github.com>
Co-authored-by: Anmol Gupta <anmolg@nvidia.com>
Co-authored-by: Ryan Langman <rlangman@nvidia.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: PeganovAnton <peganoff2@mail.ru>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: Jonghwan Hyeon <jonghwanhyeon93@gmail.com>
1-800-BAD-CODE added a commit to 1-800-BAD-CODE/NeMo that referenced this pull request Nov 26, 2022
* first commit on eval_diar_with_asr.py

Signed-off-by: Taejin Park <tango4j@gmail.com>

* Add a standalone diarization-ASR evaluation transcript

Signed-off-by: Taejin Park <tango4j@gmail.com>

* Fixed examples in docstrings

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed staticmethod error

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Added description on eval modes

Signed-off-by: Taejin Park <tango4j@gmail.com>

* adding diar_infer_general.yaml

Signed-off-by: Taejin Park <tango4j@gmail.com>

* fix msdd_model in general yaml file

Signed-off-by: Taejin Park <tango4j@gmail.com>

* fixed errors in yaml file

Signed-off-by: Taejin Park <tango4j@gmail.com>

* combine into 1 commit

Signed-off-by: Taejin Park <tango4j@gmail.com>

* Added description on eval modes

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add MoE support for T5 model (w/o expert parallel) (NVIDIA#5409)

* clean

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* kwarg ref

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* fix

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* fix

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* test

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* test

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* test

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* test

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* test

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* test

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* extra args

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* test

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* rm prints

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* style

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* review comments

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* review comments

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* review comments

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* fix

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fix args (NVIDIA#5410) (NVIDIA#5416)

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>

* Fix for concat map dataset (NVIDIA#5133)

* change for concat map dataset

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Exhaust longest dataset

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: 1-800-BAD-CODE <>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>

* Add temporary fix for CUDA issue in Dockerfile (NVIDIA#5421) (NVIDIA#5422)

Signed-off-by: Yu Yao <yuya@nvidia.com>

Signed-off-by: Yu Yao <yuya@nvidia.com>

Signed-off-by: Yu Yao <yuya@nvidia.com>
Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com>

* Fix GPT generation when using sentencepiece tokenizer (NVIDIA#5413) (NVIDIA#5428)

* Fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Yi Dong <yidong@nvidia.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Yi Dong <yidong@nvidia.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>

* Support for finetuning and finetuning inference with .ckpt files & batch size refactoring (NVIDIA#5339)

* Initial refactor

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Resolve config before passing to load_from_checkpoint

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fixes for model parallel and nemo restore

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fixes for eval

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert config changes

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Refactor

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix typo

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Remove comments

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Minor

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix validation reconfiguration

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Remove old comment

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixes for test_ds

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Revert "Add temporary fix for CUDA issue in Dockerfile (NVIDIA#5421)" (NVIDIA#5431) (NVIDIA#5432)

This reverts commit 0718b17.

Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com>

* [ITN] fix year date graph, cardinals extension for hundreds (NVIDIA#5435)

* wip

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add lociko's hundreds extension for cardinals

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add optional end

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restart ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update doc in terms of get_label for lang id model (NVIDIA#5366)

* reflect PR 5278 ion doc

Signed-off-by: fayejf <fayejf07@gmail.com>

* reflect comment

Signed-off-by: fayejf <fayejf07@gmail.com>

Signed-off-by: fayejf <fayejf07@gmail.com>

* Revert workaround for T5 that sets number of workers to 0 & sync_batch_comm=False (NVIDIA#5420) (NVIDIA#5433)

* Revert workers workaround

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix in config

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>

* Fixed bug in notebook (NVIDIA#5382) (NVIDIA#5394)

Signed-off-by: Virginia Adams <vadams@nvidia.com>

Signed-off-by: Virginia Adams <vadams@nvidia.com>

Signed-off-by: Virginia Adams <vadams@nvidia.com>
Co-authored-by: Virginia Adams <78445382+vadam5@users.noreply.github.com>

* Fixing bug in Megatron BERT when loss mask is all zeros (NVIDIA#5424)

* Fixing bug when loss mask is fully zero

Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update megatron_bert_model.py

Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>

* Update dataset_utils.py

Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update dataset_utils.py

Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>

* Update dataset_utils.py

Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>

Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>

* Use updated API for overlapping grad sync with pipeline parallelism (NVIDIA#5236)

Signed-off-by: Tim Moon <tmoon@nvidia.com>

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* support to disable sequence length + 1 input tokens for each sample in MegatronGPT (NVIDIA#5363)

* support to disable sequence length + 1 input tokens for MegatronGPT

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: Anmol Gupta <anmolg@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>

* [TTS] Create script for processing TTS training audio (NVIDIA#5262)

* Create script for processing TTS training audio
* Update VAD trimming logic
* Remove unused import

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] remove useless logic for set_tokenizer. (NVIDIA#5430)

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

* Fix setting up of `ReduceLROnPlateau` learning rate scheduler (NVIDIA#5444)

* Fix tests

Signed-off-by: PeganovAnton <peganoff2@mail.ru>

* Add accidentally lost changes

Signed-off-by: PeganovAnton <peganoff2@mail.ru>

Signed-off-by: PeganovAnton <peganoff2@mail.ru>

* Create codeql.yml (NVIDIA#5445)

Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>

Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>

* Fix for getting tokenizer in character-based ASR models when using tarred dataset (NVIDIA#5442)

Signed-off-by: Jonghwan Hyeon <hyeon0145@gmail.com>

Signed-off-by: Jonghwan Hyeon <hyeon0145@gmail.com>

* Combine 5 commits

adding diar_infer_general.yaml

Signed-off-by: Taejin Park <tango4j@gmail.com>

Update codeql.yml

Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>

Update codeql.yml

Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>

fix msdd_model in general yaml file

Signed-off-by: Taejin Park <tango4j@gmail.com>

fixed errors in yaml file

Signed-off-by: Taejin Park <tango4j@gmail.com>

* moved eval_der function and fixed tqdm options

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Changed minor error in docstrings

Signed-off-by: Taejin Park <tango4j@gmail.com>

* removed score_labels and changed leave=True

Signed-off-by: Taejin Park <tango4j@gmail.com>

Signed-off-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: Abhinav Khattar <aklife97@gmail.com>
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: Yu Yao <yuya@nvidia.com>
Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: fayejf <fayejf07@gmail.com>
Signed-off-by: Virginia Adams <vadams@nvidia.com>
Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Signed-off-by: Tim Moon <tmoon@nvidia.com>
Signed-off-by: Ryan <rlangman@nvidia.com>
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: PeganovAnton <peganoff2@mail.ru>
Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>
Signed-off-by: Jonghwan Hyeon <hyeon0145@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Abhinav Khattar <aklife97@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Shane Carroll <50530592+1-800-BAD-CODE@users.noreply.github.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com>
Co-authored-by: Yi Dong <yidong@nvidia.com>
Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com>
Co-authored-by: Virginia Adams <78445382+vadam5@users.noreply.github.com>
Co-authored-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Co-authored-by: anmolgupt <14880251+anmolgupt@users.noreply.github.com>
Co-authored-by: Anmol Gupta <anmolg@nvidia.com>
Co-authored-by: Ryan Langman <rlangman@nvidia.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: PeganovAnton <peganoff2@mail.ru>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: Jonghwan Hyeon <jonghwanhyeon93@gmail.com>
Signed-off-by: shane carroll <shane.carroll@utsa.edu>
hainan-xv pushed a commit to hainan-xv/NeMo that referenced this pull request Nov 29, 2022
* change for concat map dataset

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Exhaust longest dataset

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: 1-800-BAD-CODE <>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: Hainan Xu <hainanx@nvidia.com>
hainan-xv pushed a commit to hainan-xv/NeMo that referenced this pull request Nov 29, 2022
* first commit on eval_diar_with_asr.py

Signed-off-by: Taejin Park <tango4j@gmail.com>

* Add a standalone diarization-ASR evaluation transcript

Signed-off-by: Taejin Park <tango4j@gmail.com>

* Fixed examples in docstrings

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed staticmethod error

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Added description on eval modes

Signed-off-by: Taejin Park <tango4j@gmail.com>

* adding diar_infer_general.yaml

Signed-off-by: Taejin Park <tango4j@gmail.com>

* fix msdd_model in general yaml file

Signed-off-by: Taejin Park <tango4j@gmail.com>

* fixed errors in yaml file

Signed-off-by: Taejin Park <tango4j@gmail.com>

* combine into 1 commit

Signed-off-by: Taejin Park <tango4j@gmail.com>

* Added description on eval modes

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add MoE support for T5 model (w/o expert parallel) (NVIDIA#5409)

* clean

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* kwarg ref

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* fix

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* fix

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* test

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* test

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* test

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* test

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* test

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* test

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* extra args

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* test

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* rm prints

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* style

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* review comments

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* review comments

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* review comments

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* fix

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fix args (NVIDIA#5410) (NVIDIA#5416)

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>

* Fix for concat map dataset (NVIDIA#5133)

* change for concat map dataset

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Exhaust longest dataset

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: 1-800-BAD-CODE <>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>

* Add temporary fix for CUDA issue in Dockerfile (NVIDIA#5421) (NVIDIA#5422)

Signed-off-by: Yu Yao <yuya@nvidia.com>

Signed-off-by: Yu Yao <yuya@nvidia.com>

Signed-off-by: Yu Yao <yuya@nvidia.com>
Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com>

* Fix GPT generation when using sentencepiece tokenizer (NVIDIA#5413) (NVIDIA#5428)

* Fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Yi Dong <yidong@nvidia.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Yi Dong <yidong@nvidia.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>

* Support for finetuning and finetuning inference with .ckpt files & batch size refactoring (NVIDIA#5339)

* Initial refactor

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Resolve config before passing to load_from_checkpoint

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fixes for model parallel and nemo restore

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fixes for eval

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert config changes

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Refactor

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix typo

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Remove comments

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Minor

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix validation reconfiguration

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Remove old comment

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixes for test_ds

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Revert "Add temporary fix for CUDA issue in Dockerfile (NVIDIA#5421)" (NVIDIA#5431) (NVIDIA#5432)

This reverts commit 0718b17.

Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com>

* [ITN] fix year date graph, cardinals extension for hundreds (NVIDIA#5435)

* wip

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add lociko's hundreds extension for cardinals

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add optional end

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restart ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update doc in terms of get_label for lang id model (NVIDIA#5366)

* reflect PR 5278 ion doc

Signed-off-by: fayejf <fayejf07@gmail.com>

* reflect comment

Signed-off-by: fayejf <fayejf07@gmail.com>

Signed-off-by: fayejf <fayejf07@gmail.com>

* Revert workaround for T5 that sets number of workers to 0 & sync_batch_comm=False (NVIDIA#5420) (NVIDIA#5433)

* Revert workers workaround

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix in config

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>

* Fixed bug in notebook (NVIDIA#5382) (NVIDIA#5394)

Signed-off-by: Virginia Adams <vadams@nvidia.com>

Signed-off-by: Virginia Adams <vadams@nvidia.com>

Signed-off-by: Virginia Adams <vadams@nvidia.com>
Co-authored-by: Virginia Adams <78445382+vadam5@users.noreply.github.com>

* Fixing bug in Megatron BERT when loss mask is all zeros (NVIDIA#5424)

* Fixing bug when loss mask is fully zero

Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update megatron_bert_model.py

Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>

* Update dataset_utils.py

Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update dataset_utils.py

Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>

* Update dataset_utils.py

Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>

Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>

* Use updated API for overlapping grad sync with pipeline parallelism (NVIDIA#5236)

Signed-off-by: Tim Moon <tmoon@nvidia.com>

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* support to disable sequence length + 1 input tokens for each sample in MegatronGPT (NVIDIA#5363)

* support to disable sequence length + 1 input tokens for MegatronGPT

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: Anmol Gupta <anmolg@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>

* [TTS] Create script for processing TTS training audio (NVIDIA#5262)

* Create script for processing TTS training audio
* Update VAD trimming logic
* Remove unused import

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] remove useless logic for set_tokenizer. (NVIDIA#5430)

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

* Fix setting up of `ReduceLROnPlateau` learning rate scheduler (NVIDIA#5444)

* Fix tests

Signed-off-by: PeganovAnton <peganoff2@mail.ru>

* Add accidentally lost changes

Signed-off-by: PeganovAnton <peganoff2@mail.ru>

Signed-off-by: PeganovAnton <peganoff2@mail.ru>

* Create codeql.yml (NVIDIA#5445)

Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>

Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>

* Fix for getting tokenizer in character-based ASR models when using tarred dataset (NVIDIA#5442)

Signed-off-by: Jonghwan Hyeon <hyeon0145@gmail.com>

Signed-off-by: Jonghwan Hyeon <hyeon0145@gmail.com>

* Combine 5 commits

adding diar_infer_general.yaml

Signed-off-by: Taejin Park <tango4j@gmail.com>

Update codeql.yml

Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>

Update codeql.yml

Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>

fix msdd_model in general yaml file

Signed-off-by: Taejin Park <tango4j@gmail.com>

fixed errors in yaml file

Signed-off-by: Taejin Park <tango4j@gmail.com>

* moved eval_der function and fixed tqdm options

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Changed minor error in docstrings

Signed-off-by: Taejin Park <tango4j@gmail.com>

* removed score_labels and changed leave=True

Signed-off-by: Taejin Park <tango4j@gmail.com>

Signed-off-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: Abhinav Khattar <aklife97@gmail.com>
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: Yu Yao <yuya@nvidia.com>
Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: fayejf <fayejf07@gmail.com>
Signed-off-by: Virginia Adams <vadams@nvidia.com>
Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Signed-off-by: Tim Moon <tmoon@nvidia.com>
Signed-off-by: Ryan <rlangman@nvidia.com>
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: PeganovAnton <peganoff2@mail.ru>
Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>
Signed-off-by: Jonghwan Hyeon <hyeon0145@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Abhinav Khattar <aklife97@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Shane Carroll <50530592+1-800-BAD-CODE@users.noreply.github.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com>
Co-authored-by: Yi Dong <yidong@nvidia.com>
Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com>
Co-authored-by: Virginia Adams <78445382+vadam5@users.noreply.github.com>
Co-authored-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Co-authored-by: anmolgupt <14880251+anmolgupt@users.noreply.github.com>
Co-authored-by: Anmol Gupta <anmolg@nvidia.com>
Co-authored-by: Ryan Langman <rlangman@nvidia.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: PeganovAnton <peganoff2@mail.ru>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: Jonghwan Hyeon <jonghwanhyeon93@gmail.com>
Signed-off-by: Hainan Xu <hainanx@nvidia.com>
hainan-xv pushed a commit to hainan-xv/NeMo that referenced this pull request Nov 29, 2022
* change for concat map dataset

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Exhaust longest dataset

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: 1-800-BAD-CODE <>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: Hainan Xu <hainanx@nvidia.com>
hainan-xv pushed a commit to hainan-xv/NeMo that referenced this pull request Nov 29, 2022
* first commit on eval_diar_with_asr.py

Signed-off-by: Taejin Park <tango4j@gmail.com>

* Add a standalone diarization-ASR evaluation transcript

Signed-off-by: Taejin Park <tango4j@gmail.com>

* Fixed examples in docstrings

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed staticmethod error

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Added description on eval modes

Signed-off-by: Taejin Park <tango4j@gmail.com>

* adding diar_infer_general.yaml

Signed-off-by: Taejin Park <tango4j@gmail.com>

* fix msdd_model in general yaml file

Signed-off-by: Taejin Park <tango4j@gmail.com>

* fixed errors in yaml file

Signed-off-by: Taejin Park <tango4j@gmail.com>

* combine into 1 commit

Signed-off-by: Taejin Park <tango4j@gmail.com>

* Added description on eval modes

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add MoE support for T5 model (w/o expert parallel) (NVIDIA#5409)

* clean

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* kwarg ref

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* fix

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* fix

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* test

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* test

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* test

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* test

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* test

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* test

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* extra args

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* test

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* rm prints

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* style

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* review comments

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* review comments

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* review comments

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* fix

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fix args (NVIDIA#5410) (NVIDIA#5416)

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>

* Fix for concat map dataset (NVIDIA#5133)

* change for concat map dataset

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Exhaust longest dataset

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: 1-800-BAD-CODE <>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>

* Add temporary fix for CUDA issue in Dockerfile (NVIDIA#5421) (NVIDIA#5422)

Signed-off-by: Yu Yao <yuya@nvidia.com>

Signed-off-by: Yu Yao <yuya@nvidia.com>

Signed-off-by: Yu Yao <yuya@nvidia.com>
Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com>

* Fix GPT generation when using sentencepiece tokenizer (NVIDIA#5413) (NVIDIA#5428)

* Fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Yi Dong <yidong@nvidia.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Yi Dong <yidong@nvidia.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>

* Support for finetuning and finetuning inference with .ckpt files & batch size refactoring (NVIDIA#5339)

* Initial refactor

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Resolve config before passing to load_from_checkpoint

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fixes for model parallel and nemo restore

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fixes for eval

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert config changes

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Refactor

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix typo

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Remove comments

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Minor

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix validation reconfiguration

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Remove old comment

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixes for test_ds

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Revert "Add temporary fix for CUDA issue in Dockerfile (NVIDIA#5421)" (NVIDIA#5431) (NVIDIA#5432)

This reverts commit 0718b17.

Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com>

* [ITN] fix year date graph, cardinals extension for hundreds (NVIDIA#5435)

* wip

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add lociko's hundreds extension for cardinals

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add optional end

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restart ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update doc in terms of get_label for lang id model (NVIDIA#5366)

* reflect PR 5278 ion doc

Signed-off-by: fayejf <fayejf07@gmail.com>

* reflect comment

Signed-off-by: fayejf <fayejf07@gmail.com>

Signed-off-by: fayejf <fayejf07@gmail.com>

* Revert workaround for T5 that sets number of workers to 0 & sync_batch_comm=False (NVIDIA#5420) (NVIDIA#5433)

* Revert workers workaround

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix in config

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>

* Fixed bug in notebook (NVIDIA#5382) (NVIDIA#5394)

Signed-off-by: Virginia Adams <vadams@nvidia.com>

Signed-off-by: Virginia Adams <vadams@nvidia.com>

Signed-off-by: Virginia Adams <vadams@nvidia.com>
Co-authored-by: Virginia Adams <78445382+vadam5@users.noreply.github.com>

* Fixing bug in Megatron BERT when loss mask is all zeros (NVIDIA#5424)

* Fixing bug when loss mask is fully zero

Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update megatron_bert_model.py

Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>

* Update dataset_utils.py

Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update dataset_utils.py

Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>

* Update dataset_utils.py

Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>

Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>

* Use updated API for overlapping grad sync with pipeline parallelism (NVIDIA#5236)

Signed-off-by: Tim Moon <tmoon@nvidia.com>

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* support to disable sequence length + 1 input tokens for each sample in MegatronGPT (NVIDIA#5363)

* support to disable sequence length + 1 input tokens for MegatronGPT

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: Anmol Gupta <anmolg@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>

* [TTS] Create script for processing TTS training audio (NVIDIA#5262)

* Create script for processing TTS training audio
* Update VAD trimming logic
* Remove unused import

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] remove useless logic for set_tokenizer. (NVIDIA#5430)

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

* Fix setting up of `ReduceLROnPlateau` learning rate scheduler (NVIDIA#5444)

* Fix tests

Signed-off-by: PeganovAnton <peganoff2@mail.ru>

* Add accidentally lost changes

Signed-off-by: PeganovAnton <peganoff2@mail.ru>

Signed-off-by: PeganovAnton <peganoff2@mail.ru>

* Create codeql.yml (NVIDIA#5445)

Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>

Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>

* Fix for getting tokenizer in character-based ASR models when using tarred dataset (NVIDIA#5442)

Signed-off-by: Jonghwan Hyeon <hyeon0145@gmail.com>

Signed-off-by: Jonghwan Hyeon <hyeon0145@gmail.com>

* Combine 5 commits

adding diar_infer_general.yaml

Signed-off-by: Taejin Park <tango4j@gmail.com>

Update codeql.yml

Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>

Update codeql.yml

Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>

fix msdd_model in general yaml file

Signed-off-by: Taejin Park <tango4j@gmail.com>

fixed errors in yaml file

Signed-off-by: Taejin Park <tango4j@gmail.com>

* moved eval_der function and fixed tqdm options

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Changed minor error in docstrings

Signed-off-by: Taejin Park <tango4j@gmail.com>

* removed score_labels and changed leave=True

Signed-off-by: Taejin Park <tango4j@gmail.com>

Signed-off-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: Abhinav Khattar <aklife97@gmail.com>
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: Yu Yao <yuya@nvidia.com>
Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: fayejf <fayejf07@gmail.com>
Signed-off-by: Virginia Adams <vadams@nvidia.com>
Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Signed-off-by: Tim Moon <tmoon@nvidia.com>
Signed-off-by: Ryan <rlangman@nvidia.com>
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: PeganovAnton <peganoff2@mail.ru>
Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>
Signed-off-by: Jonghwan Hyeon <hyeon0145@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Abhinav Khattar <aklife97@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Shane Carroll <50530592+1-800-BAD-CODE@users.noreply.github.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com>
Co-authored-by: Yi Dong <yidong@nvidia.com>
Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com>
Co-authored-by: Virginia Adams <78445382+vadam5@users.noreply.github.com>
Co-authored-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Co-authored-by: anmolgupt <14880251+anmolgupt@users.noreply.github.com>
Co-authored-by: Anmol Gupta <anmolg@nvidia.com>
Co-authored-by: Ryan Langman <rlangman@nvidia.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: PeganovAnton <peganoff2@mail.ru>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: Jonghwan Hyeon <jonghwanhyeon93@gmail.com>
Signed-off-by: Hainan Xu <hainanx@nvidia.com>
JimmyZhang12 pushed a commit to JimmyZhang12/NeMo that referenced this pull request Dec 14, 2022
* change for concat map dataset

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Exhaust longest dataset

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: 1-800-BAD-CODE <>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
andrusenkoau pushed a commit to andrusenkoau/NeMo that referenced this pull request Jan 5, 2023
* change for concat map dataset

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Exhaust longest dataset

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: 1-800-BAD-CODE <>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>
andrusenkoau pushed a commit to andrusenkoau/NeMo that referenced this pull request Jan 5, 2023
* first commit on eval_diar_with_asr.py

Signed-off-by: Taejin Park <tango4j@gmail.com>

* Add a standalone diarization-ASR evaluation transcript

Signed-off-by: Taejin Park <tango4j@gmail.com>

* Fixed examples in docstrings

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed staticmethod error

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Added description on eval modes

Signed-off-by: Taejin Park <tango4j@gmail.com>

* adding diar_infer_general.yaml

Signed-off-by: Taejin Park <tango4j@gmail.com>

* fix msdd_model in general yaml file

Signed-off-by: Taejin Park <tango4j@gmail.com>

* fixed errors in yaml file

Signed-off-by: Taejin Park <tango4j@gmail.com>

* combine into 1 commit

Signed-off-by: Taejin Park <tango4j@gmail.com>

* Added description on eval modes

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add MoE support for T5 model (w/o expert parallel) (NVIDIA#5409)

* clean

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* kwarg ref

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* fix

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* fix

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* test

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* test

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* test

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* test

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* test

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* test

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* extra args

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* test

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* rm prints

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* style

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* review comments

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* review comments

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* review comments

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* fix

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fix args (NVIDIA#5410) (NVIDIA#5416)

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>

* Fix for concat map dataset (NVIDIA#5133)

* change for concat map dataset

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Exhaust longest dataset

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: 1-800-BAD-CODE <>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>

* Add temporary fix for CUDA issue in Dockerfile (NVIDIA#5421) (NVIDIA#5422)

Signed-off-by: Yu Yao <yuya@nvidia.com>

Signed-off-by: Yu Yao <yuya@nvidia.com>

Signed-off-by: Yu Yao <yuya@nvidia.com>
Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com>

* Fix GPT generation when using sentencepiece tokenizer (NVIDIA#5413) (NVIDIA#5428)

* Fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Yi Dong <yidong@nvidia.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Yi Dong <yidong@nvidia.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>

* Support for finetuning and finetuning inference with .ckpt files & batch size refactoring (NVIDIA#5339)

* Initial refactor

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Resolve config before passing to load_from_checkpoint

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fixes for model parallel and nemo restore

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fixes for eval

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert config changes

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Refactor

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix typo

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Remove comments

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Minor

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix validation reconfiguration

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Remove old comment

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixes for test_ds

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Revert "Add temporary fix for CUDA issue in Dockerfile (NVIDIA#5421)" (NVIDIA#5431) (NVIDIA#5432)

This reverts commit 0718b17.

Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com>

* [ITN] fix year date graph, cardinals extension for hundreds (NVIDIA#5435)

* wip

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add lociko's hundreds extension for cardinals

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* add optional end

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* restart ci

Signed-off-by: ekmb <ebakhturina@nvidia.com>

Signed-off-by: ekmb <ebakhturina@nvidia.com>

* update doc in terms of get_label for lang id model (NVIDIA#5366)

* reflect PR 5278 ion doc

Signed-off-by: fayejf <fayejf07@gmail.com>

* reflect comment

Signed-off-by: fayejf <fayejf07@gmail.com>

Signed-off-by: fayejf <fayejf07@gmail.com>

* Revert workaround for T5 that sets number of workers to 0 & sync_batch_comm=False (NVIDIA#5420) (NVIDIA#5433)

* Revert workers workaround

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix in config

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>

* Fixed bug in notebook (NVIDIA#5382) (NVIDIA#5394)

Signed-off-by: Virginia Adams <vadams@nvidia.com>

Signed-off-by: Virginia Adams <vadams@nvidia.com>

Signed-off-by: Virginia Adams <vadams@nvidia.com>
Co-authored-by: Virginia Adams <78445382+vadam5@users.noreply.github.com>

* Fixing bug in Megatron BERT when loss mask is all zeros (NVIDIA#5424)

* Fixing bug when loss mask is fully zero

Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update megatron_bert_model.py

Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>

* Update dataset_utils.py

Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update dataset_utils.py

Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>

* Update dataset_utils.py

Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>

Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>

* Use updated API for overlapping grad sync with pipeline parallelism (NVIDIA#5236)

Signed-off-by: Tim Moon <tmoon@nvidia.com>

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* support to disable sequence length + 1 input tokens for each sample in MegatronGPT (NVIDIA#5363)

* support to disable sequence length + 1 input tokens for MegatronGPT

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Co-authored-by: Anmol Gupta <anmolg@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>

* [TTS] Create script for processing TTS training audio (NVIDIA#5262)

* Create script for processing TTS training audio
* Update VAD trimming logic
* Remove unused import

Signed-off-by: Ryan <rlangman@nvidia.com>

* [TTS] remove useless logic for set_tokenizer. (NVIDIA#5430)

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

* Fix setting up of `ReduceLROnPlateau` learning rate scheduler (NVIDIA#5444)

* Fix tests

Signed-off-by: PeganovAnton <peganoff2@mail.ru>

* Add accidentally lost changes

Signed-off-by: PeganovAnton <peganoff2@mail.ru>

Signed-off-by: PeganovAnton <peganoff2@mail.ru>

* Create codeql.yml (NVIDIA#5445)

Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>

Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>

* Fix for getting tokenizer in character-based ASR models when using tarred dataset (NVIDIA#5442)

Signed-off-by: Jonghwan Hyeon <hyeon0145@gmail.com>

Signed-off-by: Jonghwan Hyeon <hyeon0145@gmail.com>

* Combine 5 commits

adding diar_infer_general.yaml

Signed-off-by: Taejin Park <tango4j@gmail.com>

Update codeql.yml

Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>

Update codeql.yml

Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>

fix msdd_model in general yaml file

Signed-off-by: Taejin Park <tango4j@gmail.com>

fixed errors in yaml file

Signed-off-by: Taejin Park <tango4j@gmail.com>

* moved eval_der function and fixed tqdm options

Signed-off-by: Taejin Park <tango4j@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Changed minor error in docstrings

Signed-off-by: Taejin Park <tango4j@gmail.com>

* removed score_labels and changed leave=True

Signed-off-by: Taejin Park <tango4j@gmail.com>

Signed-off-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: Abhinav Khattar <aklife97@gmail.com>
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: Yu Yao <yuya@nvidia.com>
Signed-off-by: ekmb <ebakhturina@nvidia.com>
Signed-off-by: fayejf <fayejf07@gmail.com>
Signed-off-by: Virginia Adams <vadams@nvidia.com>
Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Signed-off-by: Tim Moon <tmoon@nvidia.com>
Signed-off-by: Ryan <rlangman@nvidia.com>
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: PeganovAnton <peganoff2@mail.ru>
Signed-off-by: Somshubra Majumdar <titu1994@gmail.com>
Signed-off-by: Jonghwan Hyeon <hyeon0145@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Abhinav Khattar <aklife97@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: Shane Carroll <50530592+1-800-BAD-CODE@users.noreply.github.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
Co-authored-by: yaoyu-33 <54727607+yaoyu-33@users.noreply.github.com>
Co-authored-by: Yi Dong <yidong@nvidia.com>
Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com>
Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com>
Co-authored-by: Virginia Adams <78445382+vadam5@users.noreply.github.com>
Co-authored-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com>
Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Co-authored-by: anmolgupt <14880251+anmolgupt@users.noreply.github.com>
Co-authored-by: Anmol Gupta <anmolg@nvidia.com>
Co-authored-by: Ryan Langman <rlangman@nvidia.com>
Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Co-authored-by: PeganovAnton <peganoff2@mail.ru>
Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
Co-authored-by: Jonghwan Hyeon <jonghwanhyeon93@gmail.com>
Signed-off-by: andrusenkoau <andrusenkoau@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants