Maximum sample-based training for Megatron NMT and Text Memmap based Seq2seq Pre-training #4396

MaximumEntropy · 2022-06-17T20:46:45Z

What does this PR do ?

Trains Megatron-based NMT models based on maximum number of samples.
Added support in text_memmap and csv_memmap in Megatron encoder-decoder models (T5, BART, UL2)

Collection: NLP

Usage

Add to command line

  model.data.data_impl=text_mmap \
  +model.data.data_impl_kwargs.newline_int=10 \
  +model.data.data_impl_kwargs.header_lines=0 \
  +model.data.data_impl_kwargs.workers=null \
  +model.data.data_impl_kwargs.sort_dataset_paths=False

# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

New Feature
Bugfix
Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

Related to # (issue)

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

…e_training

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

lgtm-com · 2022-06-17T20:58:36Z

This pull request introduces 5 alerts when merging 6c5a163 into 317739f - view on LGTM.com

new alerts:

4 for Wrong number of arguments in a class instantiation
1 for Wrong name for an argument in a class instantiation

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

lgtm-com · 2022-06-17T21:56:11Z

This pull request introduces 3 alerts and fixes 1 when merging 5299acd into 317739f - view on LGTM.com

new alerts:

3 for Wrong number of arguments in a class instantiation

fixed alerts:

1 for Unused import

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

lgtm-com · 2022-06-18T23:18:08Z

This pull request introduces 3 alerts and fixes 1 when merging 7a7ad85 into e542d7f - view on LGTM.com

new alerts:

3 for Wrong number of arguments in a class instantiation

fixed alerts:

1 for Unused import

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

lgtm-com · 2022-06-20T16:47:47Z

This pull request introduces 3 alerts and fixes 1 when merging 734edd3 into e542d7f - view on LGTM.com

new alerts:

3 for Wrong number of arguments in a class instantiation

fixed alerts:

1 for Unused import

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

lgtm-com · 2022-06-21T19:09:05Z

This pull request introduces 3 alerts and fixes 1 when merging 9131474 into e542d7f - view on LGTM.com

new alerts:

3 for Wrong number of arguments in a class instantiation

fixed alerts:

1 for Unused import

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

lgtm-com · 2022-06-22T04:07:44Z

This pull request introduces 3 alerts and fixes 1 when merging 3a101bf into 41f27a5 - view on LGTM.com

new alerts:

3 for Wrong number of arguments in a class instantiation

fixed alerts:

1 for Unused import

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

lgtm-com · 2022-06-23T23:26:16Z

This pull request introduces 3 alerts and fixes 1 when merging 245fc90 into 41f27a5 - view on LGTM.com

new alerts:

3 for Wrong number of arguments in a class instantiation

fixed alerts:

1 for Unused import

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

lgtm-com · 2022-07-28T23:15:03Z

This pull request fixes 2 alerts when merging 18207e7 into 72d78d8 - view on LGTM.com

fixed alerts:

1 for Unused local variable
1 for Unused import

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

… into megatron_nmt_sample_training

ericharper

Thanks!

lgtm-com · 2022-07-29T00:51:16Z

This pull request fixes 2 alerts when merging 930be3e into 59d635c - view on LGTM.com

fixed alerts:

1 for Unused local variable
1 for Unused import

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

lgtm-com · 2022-07-29T04:51:31Z

This pull request fixes 2 alerts when merging ef353c5 into 588c6ca - view on LGTM.com

fixed alerts:

1 for Unused local variable
1 for Unused import

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

lgtm-com · 2022-07-29T20:22:03Z

This pull request introduces 1 alert and fixes 2 when merging 3b41977 into 2f85541 - view on LGTM.com

new alerts:

1 for Unused import

fixed alerts:

1 for Unused local variable
1 for Unused import

lgtm-com · 2022-07-29T20:34:49Z

This pull request introduces 1 alert and fixes 2 when merging 621dbf7 into 2f85541 - view on LGTM.com

new alerts:

1 for Unused import

fixed alerts:

1 for Unused local variable
1 for Unused import

ericharper

LGTM. Thanks!

lgtm-com · 2022-07-29T21:23:09Z

This pull request introduces 1 alert and fixes 2 when merging 82e6560 into 4fef5dd - view on LGTM.com

new alerts:

1 for Unused import

fixed alerts:

1 for Unused local variable
1 for Unused import

lgtm-com · 2022-07-30T01:07:05Z

This pull request introduces 1 alert and fixes 2 when merging b739756 into 1be2bda - view on LGTM.com

new alerts:

1 for Unused import

fixed alerts:

1 for Unused local variable
1 for Unused import

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

lgtm-com · 2022-07-30T01:41:03Z

This pull request introduces 1 alert and fixes 2 when merging 7a8b244 into 1be2bda - view on LGTM.com

new alerts:

1 for Unused import

fixed alerts:

1 for Unused local variable
1 for Unused import

lgtm-com · 2022-07-30T03:48:55Z

This pull request introduces 1 alert and fixes 2 when merging e4d5619 into 21cf961 - view on LGTM.com

new alerts:

1 for Unused import

fixed alerts:

1 for Unused local variable
1 for Unused import

* bug fix - sample rate was being ignored in vocoder dataset when not loading mel Signed-off-by: Paarth Neekhara <paarth.n@gmail.com> * handled n segments for a different sampling rate than original sampling rate Signed-off-by: Paarth Neekhara <paarth.n@gmail.com> * Added case for n_segments 0, warning for n_segments greater than file length Signed-off-by: Paarth Neekhara <paarth.n@gmail.com> * Fix metric setup for finetuning without a test set (NVIDIA#4585) * Fix metric setup for finetuning without a test set Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix log key Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Remove pdb Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Minor Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix skip train ds building while finetuning Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com> * r1.10.0 MegaMolBART Compatibility (NVIDIA#4603) * 1. Added vocab_size property to RegExTokenizer. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Fixed passing hiddens directly. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Debugging. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Debugging. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Debugging. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Fixed style. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Added support in encoder outputs. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Added comments. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Debugging. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Added automatic mapping of kwargs to args in forward. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Added encode function. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Debugging. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Debugging. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Debugging. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Debugging. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Debugging. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Debugging. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Debugging. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Debugging. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Debugging. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Debugging. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Debugging. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Debugging. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Debugging. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Debugging. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Fixed style. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Debugging. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Debugging. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Fixed style. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Debugging. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. Fixed style. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Debugging. Signed-off-by: Micha Livne <mlivne@nvidia.com> * 1. PP and TP works (but not together) Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Separated get_forward_output_only_func_encode and get_forward_output_only_func_decode. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * update branch Signed-off-by: ericharper <complex451@gmail.com> * Set headscale false (NVIDIA#4364) Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Add wandb as dependency (NVIDIA#4365) Signed-off-by: smajumdar <smajumdar@nvidia.com> * Raise trainer error (NVIDIA#4356) Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> Co-authored-by: Micha Livne <michalivne@users.noreply.github.com> * Set headscale false (NVIDIA#4364) (NVIDIA#4366) Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> Signed-off-by: smajumdar <smajumdar@nvidia.com> * Finetuning changes for BART (NVIDIA#4003) * Temp Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Checkpoint converter to nemo for bart Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Style Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> Co-authored-by: Micha Livne <michalivne@users.noreply.github.com> * Make position embedding expansion specific to a batch to avoid checkpoint size mismatches (NVIDIA#4357) * Style Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix logging warning Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> Co-authored-by: Micha Livne <michalivne@users.noreply.github.com> * 1. Added return logits to validation. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Debugging. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Debugging. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Debugging. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Debugging. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Debugging. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Debugging. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Fixed unkown token during sampling. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Debugging. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Debugging. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Debugging. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Debugging. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Debugging. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Debugging. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Debugging. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Fixed RegExTokenizer loading. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Debugging. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Debugging. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Fixed ckpt file with samples int(0). Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Fixed regex tokenizer. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Debugging. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Debugging. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Debugging. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Fixed allowing enc_tokens to be None. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Added ability to ignore tokens by id during decode. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Debugging. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Fixed regex tokenizer .nemo loading issue. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Fixed style. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Debugging. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Debugging. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Fixed style. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Fixed RegEx test. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * r1.10.0 untie embeddings weights (NVIDIA#4519) * 1. Debugging. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Added independent decoder embeddings, and independent decoder token_head. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Added support in yaml config. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Fixed initialization. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Added tests for untied embeddings and decoder token head. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Updated share_word_embeddings to share_token_embeddings. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Debugging. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Fixed style. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Fixed error in __del__ when TextMemMapDataset fails to build. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Fixed comments. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1.Made method private. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Fixed config names. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Fixed alerts and style. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Fixed PP, TP, PP+TP still fails. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Debugging. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> Co-authored-by: Micha Livne <mlivne@nvidia.com> Co-authored-by: ericharper <complex451@gmail.com> Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca> Co-authored-by: Somshubra Majumdar <titu1994@gmail.com> * Update megatron t5 interface to dialogue (NVIDIA#4626) * G2P Aligner (NVIDIA#4604) * Aligner inference notebook in progress. Preprocessing, forward, attn viz Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com> * Hard attn, duration extraction, distance matrix Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com> * Started: phoneme disambiguation using Aligner distance matrix Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com> * Decouple encode_from_g2p() from phoneme tokenizer encode() for disambiguation inference Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com> * Aligner G2P disambiguation using mean L2 embedding distance Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com> * Rename aligner inference notebook Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com> * Header text for Aligner notebook, formatting Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com> * Aligner notebook formatting, header, license updates Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com> * Aligner G2P disambiguation script draft Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com> * Aligner G2P disambiguation script finished Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com> * Remove normalization step to fix words with apostrophes (G2P) Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com> * Fix normalization args for G2P disambiguation Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com> * Allow str to be passed in for supp data, add 'text_normalized' as manifest option Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com> * Aligner G2P script fixes: normalization, tokenization, add brackets around tokens, etc. Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com> * Only disambiguate words in the given heteronyms list Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com> * Filtering option for disambiguation script Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com> * Add confidence thresholding, add PASTY to cmudict entries Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com> * TTS Aligner tutorial updates to generic path text Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com> * Add confidence to aligner_g2p.py run example Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com> * Move avg word distance function to Aligner encoder, add docstring, fix license Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com> * Aligner Inference notebook updates (link to sample, resources added) Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com> * Fix HF check for model card info (NVIDIA#4628) Signed-off-by: smajumdar <smajumdar@nvidia.com> * Tiny VAD refactoring for postprocessing (NVIDIA#4625) * binarization start index Signed-off-by: fayejf <fayejf07@gmail.com> * fix frame len Signed-off-by: fayejf <fayejf07@gmail.com> * style fix Signed-off-by: fayejf <fayejf07@gmail.com> * rame UNIT_FRAME_LEN Signed-off-by: fayejf <fayejf07@gmail.com> * update overlap script and fix lgtm Signed-off-by: fayejf <fayejf07@gmail.com> * style fi Signed-off-by: fayejf <fayejf07@gmail.com> * Fix ITN pt (NVIDIA#4623) Signed-off-by: Guilherme Steinmann <guist@linse.ufsc.br> * [TN] bug fix "hundred" in Audio-based, added method so split text in sentences (NVIDIA#4610) * fix duplex inference with grammars Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix hundred TN audio bug, add split text Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix header year Signed-off-by: ekmb <ebakhturina@nvidia.com> * style fix Signed-off-by: ekmb <ebakhturina@nvidia.com> * exclude I from roman-ordinal form Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix graph_with_and Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix tests Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix split regex Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix warning Signed-off-by: ekmb <ebakhturina@nvidia.com> * [Text Processing] G2P for OOV and heteronyms (NVIDIA#4624) * add models Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix header and t5 inference Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix jenkins Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix jenkins Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix lgtm Signed-off-by: ekmb <ebakhturina@nvidia.com> * review fixes Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix if/else and removed unused imports Signed-off-by: ekmb <ebakhturina@nvidia.com> * replace ModelPT with G2PModel Signed-off-by: ekmb <ebakhturina@nvidia.com> * black Signed-off-by: ekmb <ebakhturina@nvidia.com> * add missing headers Signed-off-by: ekmb <ebakhturina@nvidia.com> * jenkins Signed-off-by: ekmb <ebakhturina@nvidia.com> * jenkins Signed-off-by: ekmb <ebakhturina@nvidia.com> * fix TRANSFORMERS_OFFLINE flag Signed-off-by: ekmb <ebakhturina@nvidia.com> * jenkins Signed-off-by: ekmb <ebakhturina@nvidia.com> * jenkins Signed-off-by: ekmb <ebakhturina@nvidia.com> * jenkins Signed-off-by: ekmb <ebakhturina@nvidia.com> * Update README.rst * Fp16 support for Conformer (NVIDIA#4571) * adding auto-select best precision for mhsa * cleanup * moving mhsa32 check into mhsa * switching to torch.cuda.is_bf16_supported() * now using torch.is_autocast_enabled() * added to non rel mhsa * only forcing 32bit subsampling if using bf16 * removing unused imports * moving contexts to utils Signed-off-by: Dima Rekesh <drekesh@nvidia.com> * formatting Signed-off-by: Dima Rekesh <drekesh@nvidia.com> * naming Co-authored-by: Dima Rekesh <drekesh@nvidia.com> Co-authored-by: Somshubra Majumdar <titu1994@gmail.com> * Maximum sample-based training for Megatron NMT and Text Memmap based Seq2seq Pre-training (NVIDIA#4396) * Update blendable dataset, and refactor seq2seq data Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Blendable dataset with binarized mmap working Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Pass seed from cfg to dataset Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix multilingual setup Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Add on epoch start reconfiguration Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Style Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Update tokenizer creation for multilingual Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Tmp Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Update NMT script Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Remove unused import Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Update training script Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Log consumed samples Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Logging on val epoch end Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Style Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Remove redundant print Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Ckpt averaging for non model parallel megatron models Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Style Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Empty Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Update error message Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Style Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Remove check Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Restore fixes Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Remove ipdb Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fixes Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * 1. Debugging. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Debugging. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Testing a simple solution Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Fixed. Seems to work. Need to validate. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Added support in CSV and text memmap toMEgatron encoder-decoder Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Added support in CSV. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Fixed style. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Fixed style. 2. Fixed bugs. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Debugging. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Fixed bugs. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Fixed style. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Updated yaml. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Fixed warnings. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Fixed style. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Fixed style. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Fixed a bug. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Added a test for text_memmap Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * Fix retro Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * add docstrings Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Minor Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Uncomment CI tests and fix existing gpt ci tests Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Tmp Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Remove max step hacking and move on_train_batch_end to base model Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Empty Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> Co-authored-by: Micha Livne <michalivne@users.noreply.github.com> Co-authored-by: Micha Livne <mlivne@cs.toronto.edu> Co-authored-by: Eric Harper <complex451@gmail.com> * NeMo Megatron Doc updates1 (NVIDIA#4633) * Work on NeMo Megatron OSS documentation Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com> * NeMo Megatron doc updates Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca> Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com> Co-authored-by: Micha Livne <michalivne@users.noreply.github.com> Co-authored-by: Micha Livne <mlivne@nvidia.com> Co-authored-by: ericharper <complex451@gmail.com> Co-authored-by: Somshubra Majumdar <titu1994@gmail.com> Co-authored-by: Zhilin Wang <wangzhilin12061996@hotmail.com> Co-authored-by: Jocelyn <jocelynh@nvidia.com> Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com> Co-authored-by: Guilherme Steinmann <guist@linse.ufsc.br> Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com> Co-authored-by: Dima Rekesh <bmwshop@gmail.com> Co-authored-by: Dima Rekesh <drekesh@nvidia.com> Co-authored-by: Micha Livne <mlivne@cs.toronto.edu>

…Seq2seq Pre-training (NVIDIA#4396) * Update blendable dataset, and refactor seq2seq data Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Blendable dataset with binarized mmap working Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Pass seed from cfg to dataset Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix multilingual setup Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Add on epoch start reconfiguration Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Style Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Update tokenizer creation for multilingual Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Tmp Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Update NMT script Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Remove unused import Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Update training script Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Log consumed samples Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Logging on val epoch end Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Style Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Remove redundant print Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Ckpt averaging for non model parallel megatron models Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Style Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Empty Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Update error message Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Style Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Remove check Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Restore fixes Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Remove ipdb Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fixes Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * 1. Debugging. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Debugging. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Testing a simple solution Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Fixed. Seems to work. Need to validate. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Added support in CSV and text memmap toMEgatron encoder-decoder Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Added support in CSV. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Fixed style. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Fixed style. 2. Fixed bugs. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Debugging. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Fixed bugs. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Fixed style. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Updated yaml. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Fixed warnings. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Fixed style. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Fixed style. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Fixed a bug. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Added a test for text_memmap Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * Fix retro Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * add docstrings Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Minor Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Uncomment CI tests and fix existing gpt ci tests Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Tmp Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Remove max step hacking and move on_train_batch_end to base model Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Empty Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> Co-authored-by: Micha Livne <michalivne@users.noreply.github.com> Co-authored-by: Micha Livne <mlivne@cs.toronto.edu> Co-authored-by: Eric Harper <complex451@gmail.com> Signed-off-by: David Mosallanezhad <dmosallanezh@nvidia.com>

…Seq2seq Pre-training (NVIDIA#4396) * Update blendable dataset, and refactor seq2seq data Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Blendable dataset with binarized mmap working Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Pass seed from cfg to dataset Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix multilingual setup Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Add on epoch start reconfiguration Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Style Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Update tokenizer creation for multilingual Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Tmp Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Update NMT script Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Remove unused import Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Update training script Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Log consumed samples Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Logging on val epoch end Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Style Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Remove redundant print Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Ckpt averaging for non model parallel megatron models Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Style Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Empty Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Update error message Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Style Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Remove check Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Restore fixes Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Remove ipdb Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fixes Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * 1. Debugging. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Debugging. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Testing a simple solution Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Fixed. Seems to work. Need to validate. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Added support in CSV and text memmap toMEgatron encoder-decoder Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Added support in CSV. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Fixed style. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Fixed style. 2. Fixed bugs. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Debugging. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Fixed bugs. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Fixed style. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Updated yaml. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Fixed warnings. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Fixed style. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Fixed style. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Fixed a bug. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Added a test for text_memmap Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * Fix retro Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * add docstrings Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Minor Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Uncomment CI tests and fix existing gpt ci tests Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Tmp Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Remove max step hacking and move on_train_batch_end to base model Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Empty Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> Co-authored-by: Micha Livne <michalivne@users.noreply.github.com> Co-authored-by: Micha Livne <mlivne@cs.toronto.edu> Co-authored-by: Eric Harper <complex451@gmail.com> Signed-off-by: Anas Abou Allaban <aabouallaban@pm.me>

…Seq2seq Pre-training (NVIDIA#4396) * Update blendable dataset, and refactor seq2seq data Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Blendable dataset with binarized mmap working Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Pass seed from cfg to dataset Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix multilingual setup Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Add on epoch start reconfiguration Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Style Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Update tokenizer creation for multilingual Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Tmp Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Update NMT script Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Remove unused import Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Update training script Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Log consumed samples Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Logging on val epoch end Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Style Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Remove redundant print Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Ckpt averaging for non model parallel megatron models Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Style Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Empty Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Update error message Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Style Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Remove check Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Restore fixes Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Remove ipdb Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fixes Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * 1. Debugging. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Debugging. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Testing a simple solution Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Fixed. Seems to work. Need to validate. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Added support in CSV and text memmap toMEgatron encoder-decoder Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Added support in CSV. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Fixed style. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Fixed style. 2. Fixed bugs. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Debugging. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Fixed bugs. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Fixed style. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Updated yaml. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Fixed warnings. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Fixed style. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Fixed style. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Fixed a bug. Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * 1. Added a test for text_memmap Signed-off-by: Micha Livne <mlivne@cs.toronto.edu> * Fix retro Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * add docstrings Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Minor Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Uncomment CI tests and fix existing gpt ci tests Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Tmp Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Remove max step hacking and move on_train_batch_end to base model Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Empty Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> Co-authored-by: Micha Livne <michalivne@users.noreply.github.com> Co-authored-by: Micha Livne <mlivne@cs.toronto.edu> Co-authored-by: Eric Harper <complex451@gmail.com> Signed-off-by: Hainan Xu <hainanx@nvidia.com>

MaximumEntropy added 9 commits June 15, 2022 09:22

Update blendable dataset, and refactor seq2seq data

a6a42cf

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Blendable dataset with binarized mmap working

33f37b5

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Pass seed from cfg to dataset

b954b8d

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Fix multilingual setup

48913e9

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Add on epoch start reconfiguration

1d2c492

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Style

a5ec9c2

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Merge branch 'main' of github.com:NVIDIA/NeMo into megatron_nmt_sampl…

464838b

…e_training

Update tokenizer creation for multilingual

41ad987

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Tmp

6c5a163

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

MaximumEntropy added 2 commits June 17, 2022 14:37

Update NMT script

4fc09cd

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Remove unused import

5299acd

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Update training script

7a7ad85

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Log consumed samples

734edd3

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

MaximumEntropy added 3 commits June 21, 2022 11:46

Logging on val epoch end

be2bc94

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Style

140000d

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Merge branch 'main' into megatron_nmt_sample_training

9131474

MaximumEntropy requested review from aklife97, soumye, michalivne and ericharper June 21, 2022 18:59

Remove redundant print

3a101bf

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Ckpt averaging for non model parallel megatron models

245fc90

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Style

cc0ec96

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Uncomment CI tests and fix existing gpt ci tests

9542e74

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

MaximumEntropy dismissed ericharper’s stale review via 9542e74 July 28, 2022 23:00

Merge branch 'main' into megatron_nmt_sample_training

18207e7

MaximumEntropy added 2 commits July 28, 2022 17:33

Fix

7a44b98

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Merge branch 'megatron_nmt_sample_training' of github.com:NVIDIA/NeMo…

930be3e

… into megatron_nmt_sample_training

ericharper previously approved these changes Jul 29, 2022

View reviewed changes

Tmp

ef353c5

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

MaximumEntropy dismissed ericharper’s stale review via ef353c5 July 29, 2022 04:41

Remove max step hacking and move on_train_batch_end to base model

3b41977

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Merge branch 'main' into megatron_nmt_sample_training

621dbf7

Merge branch 'main' into megatron_nmt_sample_training

82e6560

ericharper approved these changes Jul 29, 2022

View reviewed changes

Merge branch 'main' into megatron_nmt_sample_training

b739756

Empty

7a8b244

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Merge branch 'main' into megatron_nmt_sample_training

e4d5619

MaximumEntropy merged commit 0b7df7a into main Jul 30, 2022

MaximumEntropy deleted the megatron_nmt_sample_training branch July 30, 2022 04:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Maximum sample-based training for Megatron NMT and Text Memmap based Seq2seq Pre-training #4396

Maximum sample-based training for Megatron NMT and Text Memmap based Seq2seq Pre-training #4396

MaximumEntropy commented Jun 17, 2022 •

edited by michalivne

lgtm-com bot commented Jun 17, 2022

lgtm-com bot commented Jun 17, 2022

lgtm-com bot commented Jun 18, 2022

lgtm-com bot commented Jun 20, 2022

lgtm-com bot commented Jun 21, 2022

lgtm-com bot commented Jun 22, 2022

lgtm-com bot commented Jun 23, 2022

lgtm-com bot commented Jul 28, 2022

ericharper left a comment

lgtm-com bot commented Jul 29, 2022

lgtm-com bot commented Jul 29, 2022

lgtm-com bot commented Jul 29, 2022

lgtm-com bot commented Jul 29, 2022

ericharper left a comment

lgtm-com bot commented Jul 29, 2022

lgtm-com bot commented Jul 30, 2022

lgtm-com bot commented Jul 30, 2022

lgtm-com bot commented Jul 30, 2022

Maximum sample-based training for Megatron NMT and Text Memmap based Seq2seq Pre-training #4396

Maximum sample-based training for Megatron NMT and Text Memmap based Seq2seq Pre-training #4396

Conversation

MaximumEntropy commented Jun 17, 2022 • edited by michalivne

What does this PR do ?

Usage

Before your PR is "Ready for review"

Who can review?

Additional Information

lgtm-com bot commented Jun 17, 2022

lgtm-com bot commented Jun 17, 2022

lgtm-com bot commented Jun 18, 2022

lgtm-com bot commented Jun 20, 2022

lgtm-com bot commented Jun 21, 2022

lgtm-com bot commented Jun 22, 2022

lgtm-com bot commented Jun 23, 2022

lgtm-com bot commented Jul 28, 2022

ericharper left a comment

Choose a reason for hiding this comment

lgtm-com bot commented Jul 29, 2022

lgtm-com bot commented Jul 29, 2022

lgtm-com bot commented Jul 29, 2022

lgtm-com bot commented Jul 29, 2022

ericharper left a comment

Choose a reason for hiding this comment

lgtm-com bot commented Jul 29, 2022

lgtm-com bot commented Jul 30, 2022

lgtm-com bot commented Jul 30, 2022

lgtm-com bot commented Jul 30, 2022

MaximumEntropy commented Jun 17, 2022 •

edited by michalivne