Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maximum sample-based training for Megatron NMT and Text Memmap based Seq2seq Pre-training #4396

Merged
merged 74 commits into from Jul 30, 2022
Merged
Show file tree
Hide file tree
Changes from 67 commits
Commits
Show all changes
74 commits
Select commit Hold shift + click to select a range
a6a42cf
Update blendable dataset, and refactor seq2seq data
MaximumEntropy Jun 15, 2022
33f37b5
Blendable dataset with binarized mmap working
MaximumEntropy Jun 15, 2022
b954b8d
Pass seed from cfg to dataset
MaximumEntropy Jun 15, 2022
48913e9
Fix multilingual setup
MaximumEntropy Jun 15, 2022
1d2c492
Add on epoch start reconfiguration
MaximumEntropy Jun 17, 2022
a5ec9c2
Style
MaximumEntropy Jun 17, 2022
464838b
Merge branch 'main' of github.com:NVIDIA/NeMo into megatron_nmt_sampl…
MaximumEntropy Jun 17, 2022
41ad987
Update tokenizer creation for multilingual
MaximumEntropy Jun 17, 2022
6c5a163
Tmp
MaximumEntropy Jun 17, 2022
4fc09cd
Update NMT script
MaximumEntropy Jun 17, 2022
5299acd
Remove unused import
MaximumEntropy Jun 17, 2022
7a7ad85
Update training script
MaximumEntropy Jun 18, 2022
734edd3
Log consumed samples
MaximumEntropy Jun 20, 2022
be2bc94
Logging on val epoch end
MaximumEntropy Jun 21, 2022
140000d
Style
MaximumEntropy Jun 21, 2022
9131474
Merge branch 'main' into megatron_nmt_sample_training
MaximumEntropy Jun 21, 2022
3a101bf
Remove redundant print
MaximumEntropy Jun 22, 2022
245fc90
Ckpt averaging for non model parallel megatron models
MaximumEntropy Jun 23, 2022
cc0ec96
Style
MaximumEntropy Jun 27, 2022
6c48ceb
Merge branch 'main' into megatron_nmt_sample_training
MaximumEntropy Jun 27, 2022
acda5b1
Empty
MaximumEntropy Jun 28, 2022
c616dc3
Merge branch 'megatron_nmt_sample_training' of github.com:NVIDIA/NeMo…
MaximumEntropy Jun 28, 2022
1b00f74
Merge branch 'main' into megatron_nmt_sample_training
michalivne Jun 29, 2022
e60d0e2
Update error message
MaximumEntropy Jul 5, 2022
90567c9
Style
MaximumEntropy Jul 5, 2022
7cdc23f
Merge branch 'main' into megatron_nmt_sample_training
MaximumEntropy Jul 5, 2022
6382ba1
Remove check
MaximumEntropy Jul 11, 2022
f082c8f
Restore fixes
MaximumEntropy Jul 12, 2022
8890561
Remove ipdb
MaximumEntropy Jul 12, 2022
3088cd4
Fixes
MaximumEntropy Jul 12, 2022
a7cf4b9
1. Debugging.
michalivne Jul 18, 2022
3b23ab1
1. Debugging.
michalivne Jul 20, 2022
a8770cc
1. Testing a simple solution
michalivne Jul 20, 2022
6fe170c
1. Fixed. Seems to work. Need to validate.
michalivne Jul 20, 2022
a6f234d
1. Added support in CSV and text memmap toMEgatron encoder-decoder
michalivne Jul 21, 2022
4ecf5ab
1. Added support in CSV.
michalivne Jul 21, 2022
7d73821
1. Fixed style.
michalivne Jul 21, 2022
e5b1d81
Merge branch 'main' into megatron_nmt_sample_training
michalivne Jul 21, 2022
d65c70d
1. Fixed style.
michalivne Jul 21, 2022
fc7a75b
1. Debugging.
michalivne Jul 21, 2022
992ea7d
1. Fixed bugs.
michalivne Jul 21, 2022
d3a73c9
1. Fixed style.
michalivne Jul 21, 2022
4706bbd
1. Updated yaml.
michalivne Jul 21, 2022
fa1b965
Fix conflicts
MaximumEntropy Jul 24, 2022
94a124b
1. Fixed warnings.
michalivne Jul 25, 2022
fdd4fb5
1. Fixed style.
michalivne Jul 25, 2022
eda1939
Merge branch 'megatron_nmt_sample_training' of github.com:NVIDIA/NeMo…
michalivne Jul 25, 2022
8cf9aea
1. Fixed style.
michalivne Jul 25, 2022
8c8da12
1. Fixed a bug.
michalivne Jul 25, 2022
4230f43
Merge branch 'main' into megatron_nmt_sample_training
michalivne Jul 25, 2022
e52ee5e
Merge branch 'main' into megatron_nmt_sample_training
michalivne Jul 25, 2022
90db362
Merge branch 'main' into megatron_nmt_sample_training
michalivne Jul 26, 2022
32ddd8c
Merge branch 'main' into megatron_nmt_sample_training
michalivne Jul 26, 2022
4762d75
1. Added a test for text_memmap
michalivne Jul 26, 2022
c7ee1de
Merge branch 'main' into megatron_nmt_sample_training
michalivne Jul 26, 2022
8c6b591
Merge branch 'main' into megatron_nmt_sample_training
ericharper Jul 26, 2022
245abea
Merge branch 'main' into megatron_nmt_sample_training
michalivne Jul 26, 2022
dd84a7b
Merge branch 'main' into megatron_nmt_sample_training
michalivne Jul 27, 2022
65cf10d
Fix retro
MaximumEntropy Jul 27, 2022
257a870
add docstrings
MaximumEntropy Jul 28, 2022
f5028bc
Merge branch 'main' into megatron_nmt_sample_training
MaximumEntropy Jul 28, 2022
001e69f
Minor
MaximumEntropy Jul 28, 2022
d36001a
Merge branch 'megatron_nmt_sample_training' of github.com:NVIDIA/NeMo…
MaximumEntropy Jul 28, 2022
9542e74
Uncomment CI tests and fix existing gpt ci tests
MaximumEntropy Jul 28, 2022
18207e7
Merge branch 'main' into megatron_nmt_sample_training
MaximumEntropy Jul 28, 2022
7a44b98
Fix
MaximumEntropy Jul 29, 2022
930be3e
Merge branch 'megatron_nmt_sample_training' of github.com:NVIDIA/NeMo…
MaximumEntropy Jul 29, 2022
ef353c5
Tmp
MaximumEntropy Jul 29, 2022
3b41977
Remove max step hacking and move on_train_batch_end to base model
MaximumEntropy Jul 29, 2022
621dbf7
Merge branch 'main' into megatron_nmt_sample_training
MaximumEntropy Jul 29, 2022
82e6560
Merge branch 'main' into megatron_nmt_sample_training
MaximumEntropy Jul 29, 2022
b739756
Merge branch 'main' into megatron_nmt_sample_training
MaximumEntropy Jul 30, 2022
7a8b244
Empty
MaximumEntropy Jul 30, 2022
e4d5619
Merge branch 'main' into megatron_nmt_sample_training
ericharper Jul 30, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
141 changes: 74 additions & 67 deletions Jenkinsfile
Expand Up @@ -2734,7 +2734,7 @@ pipeline {
trainer.devices=2 \
trainer.accelerator=gpu \
trainer.log_every_n_steps=1 \
trainer.val_check_interval=10 \
trainer.val_check_interval=2 \
trainer.limit_val_batches=2 \
trainer.accumulate_grad_batches=1 \
trainer.max_steps=3 \
Expand All @@ -2759,36 +2759,36 @@ pipeline {
model.activations_checkpoint_num_layers=1 \
model.data.data_prefix=[.5,/home/TestData/nlp/megatron_gpt/data/gpt/simple_wiki_gpt_preproc_text_document,.5,/home/TestData/nlp/megatron_gpt/data/gpt/simple_wiki_gpt_preproc_text_document] \
model.data.index_mapping_dir=examples/nlp/language_modeling/gpt_index_mappings"
// sh "python examples/nlp/language_modeling/megatron_gpt_pretraining.py \
// trainer.devices=2 \
// trainer.accelerator=gpu \
// trainer.log_every_n_steps=1 \
// trainer.val_check_interval=10 \
// trainer.limit_val_batches=1 \
// trainer.accumulate_grad_batches=1 \
// trainer.max_steps=20 \
// trainer.precision=16 \
// trainer.gradient_clip_val=1.0 \
// exp_manager.exp_dir=examples/nlp/language_modeling/gpt_pretrain_results \
// exp_manager.resume_if_exists=True \
// model.tensor_model_parallel_size=2 \
// model.optim.name=fused_adam \
// model.optim.lr=2e-4 \
// model.optim.sched.warmup_steps=2 \
// model.optim.sched.constant_steps=2 \
// model.optim.sched.min_lr=8e-5 \
// model.max_position_embeddings=128 \
// model.encoder_seq_length=128 \
// model.data.seq_length=128 \
// model.tokenizer.vocab_file=/home/TestData/nlp/megatron_gpt/data/gpt/vocab.json \
// model.tokenizer.merge_file=/home/TestData/nlp/megatron_gpt/data/gpt/merges.txt \
// model.num_layers=8 \
// model.hidden_size=256 \
// model.num_attention_heads=8 \
// model.activations_checkpoint_method='block' \
// model.activations_checkpoint_num_layers=1 \
// model.data.data_prefix=[.5,/home/TestData/nlp/megatron_gpt/data/gpt/simple_wiki_gpt_preproc_text_document,.5,/home/TestData/nlp/megatron_gpt/data/gpt/simple_wiki_gpt_preproc_text_document] \
// model.data.index_mapping_dir=examples/nlp/language_modeling/gpt_index_mappings"
sh "python examples/nlp/language_modeling/megatron_gpt_pretraining.py \
trainer.devices=2 \
trainer.accelerator=gpu \
trainer.log_every_n_steps=1 \
trainer.val_check_interval=2 \
trainer.limit_val_batches=1 \
trainer.accumulate_grad_batches=1 \
trainer.max_steps=3 \
trainer.precision=16 \
trainer.gradient_clip_val=1.0 \
exp_manager.exp_dir=examples/nlp/language_modeling/gpt_pretrain_results \
exp_manager.resume_if_exists=True \
model.tensor_model_parallel_size=2 \
model.optim.name=fused_adam \
model.optim.lr=2e-4 \
model.optim.sched.warmup_steps=2 \
model.optim.sched.constant_steps=2 \
model.optim.sched.min_lr=8e-5 \
model.max_position_embeddings=128 \
model.encoder_seq_length=128 \
model.data.seq_length=128 \
model.tokenizer.vocab_file=/home/TestData/nlp/megatron_gpt/data/gpt/vocab.json \
model.tokenizer.merge_file=/home/TestData/nlp/megatron_gpt/data/gpt/merges.txt \
model.num_layers=8 \
model.hidden_size=256 \
model.num_attention_heads=8 \
model.activations_checkpoint_method='block' \
model.activations_checkpoint_num_layers=1 \
model.data.data_prefix=[.5,/home/TestData/nlp/megatron_gpt/data/gpt/simple_wiki_gpt_preproc_text_document,.5,/home/TestData/nlp/megatron_gpt/data/gpt/simple_wiki_gpt_preproc_text_document] \
model.data.index_mapping_dir=examples/nlp/language_modeling/gpt_index_mappings"
sh "rm -rf examples/nlp/language_modeling/gpt_pretrain_results"
sh "rm -rf examples/nlp/language_modeling/gpt_index_mappings"
}
Expand All @@ -2805,7 +2805,7 @@ pipeline {
sh "python examples/nlp/language_modeling/megatron_gpt_pretraining.py \
trainer.devices=2 \
trainer.log_every_n_steps=1 \
trainer.val_check_interval=10 \
trainer.val_check_interval=2 \
trainer.limit_val_batches=2 \
trainer.accumulate_grad_batches=1 \
trainer.max_steps=3 \
Expand All @@ -2831,36 +2831,36 @@ pipeline {
model.activations_checkpoint_num_layers=1 \
model.data.data_prefix=[.5,/home/TestData/nlp/megatron_gpt/data/gpt/simple_wiki_gpt_preproc_text_document,.5,/home/TestData/nlp/megatron_gpt/data/gpt/simple_wiki_gpt_preproc_text_document] \
model.data.index_mapping_dir=examples/nlp/language_modeling/gpt_index_mappings"
// sh "python examples/nlp/language_modeling/megatron_gpt_pretraining.py \
// trainer.devices=2 \
// trainer.log_every_n_steps=1 \
// trainer.val_check_interval=10 \
// trainer.limit_val_batches=2 \
// trainer.accumulate_grad_batches=1 \
// trainer.max_steps=20 \
// trainer.precision=16 \
// trainer.gradient_clip_val=1.0 \
// exp_manager.exp_dir=examples/nlp/language_modeling/gpt_pretrain_results \
// exp_manager.resume_if_exists=True \
// model.pipeline_model_parallel_size=2 \
// model.tensor_model_parallel_size=1 \
// model.optim.name=fused_adam \
// model.optim.lr=2e-4 \
// model.optim.sched.warmup_steps=2 \
// model.optim.sched.constant_steps=2 \
// model.optim.sched.min_lr=8e-5 \
// model.max_position_embeddings=128 \
// model.encoder_seq_length=128 \
// model.data.seq_length=128 \
// model.tokenizer.vocab_file=/home/TestData/nlp/megatron_gpt/data/gpt/vocab.json \
// model.tokenizer.merge_file=/home/TestData/nlp/megatron_gpt/data/gpt/merges.txt \
// model.num_layers=8 \
// model.hidden_size=256 \
// model.num_attention_heads=8 \
// model.activations_checkpoint_method='block' \
// model.activations_checkpoint_num_layers=1 \
// model.data.data_prefix=[.5,/home/TestData/nlp/megatron_gpt/data/gpt/simple_wiki_gpt_preproc_text_document,.5,/home/TestData/nlp/megatron_gpt/data/gpt/simple_wiki_gpt_preproc_text_document] \
// model.data.index_mapping_dir=examples/nlp/language_modeling/gpt_index_mappings"
sh "python examples/nlp/language_modeling/megatron_gpt_pretraining.py \
trainer.devices=2 \
trainer.log_every_n_steps=1 \
trainer.val_check_interval=2 \
trainer.limit_val_batches=2 \
trainer.accumulate_grad_batches=1 \
trainer.max_steps=3 \
trainer.precision=16 \
trainer.gradient_clip_val=1.0 \
exp_manager.exp_dir=examples/nlp/language_modeling/gpt_pretrain_results \
exp_manager.resume_if_exists=True \
model.pipeline_model_parallel_size=2 \
model.tensor_model_parallel_size=1 \
model.optim.name=fused_adam \
model.optim.lr=2e-4 \
model.optim.sched.warmup_steps=2 \
model.optim.sched.constant_steps=2 \
model.optim.sched.min_lr=8e-5 \
model.max_position_embeddings=128 \
model.encoder_seq_length=128 \
model.data.seq_length=128 \
model.tokenizer.vocab_file=/home/TestData/nlp/megatron_gpt/data/gpt/vocab.json \
model.tokenizer.merge_file=/home/TestData/nlp/megatron_gpt/data/gpt/merges.txt \
model.num_layers=8 \
model.hidden_size=256 \
model.num_attention_heads=8 \
model.activations_checkpoint_method='block' \
model.activations_checkpoint_num_layers=1 \
model.data.data_prefix=[.5,/home/TestData/nlp/megatron_gpt/data/gpt/simple_wiki_gpt_preproc_text_document,.5,/home/TestData/nlp/megatron_gpt/data/gpt/simple_wiki_gpt_preproc_text_document] \
model.data.index_mapping_dir=examples/nlp/language_modeling/gpt_index_mappings"
sh "rm -rf examples/nlp/language_modeling/gpt_pretrain_results"
sh "rm -rf examples/nlp/language_modeling/gpt_index_mappings"
}
Expand Down Expand Up @@ -3064,10 +3064,14 @@ pipeline {
model.activations_checkpoint_method='block' \
model.activations_checkpoint_num_layers=1 \
model.transformer_block_type='pre_ln' \
model.data.data_prefix=[.5,/home/TestData/nlp/megatron_t5/data/pile_val_small_bert_tokenizer_text_document,.5,/home/TestData/nlp/megatron_t5/data/pile_val_small_bert_tokenizer_text_document] \
model.data.data_prefix=[.5,/home/TestData/nlp/nmt/toy_data/wmt14-de-en.src,.5,/home/TestData/nlp/nmt/toy_data/wmt14-de-en.ref] \
model.position_embedding_type=relative \
model.data.index_mapping_dir=examples/nlp/language_modeling/t5_index_mappings \
model.data.respect_document_boundaries=False \
model.data.data_impl=text_mmap \
+model.data.data_impl_kwargs.newline_int=10 \
+model.data.data_impl_kwargs.header_lines=0 \
+model.data.data_impl_kwargs.workers=null \
+model.data.data_impl_kwargs.sort_dataset_paths=False \
model.share_token_embeddings=False \
model.share_decoder_tokens_head_embeddings=False"
sh "python examples/nlp/language_modeling/megatron_t5_pretraining.py \
Expand All @@ -3091,11 +3095,14 @@ pipeline {
model.bias_activation_fusion=False \
model.activations_checkpoint_method='block' \
model.activations_checkpoint_num_layers=1 \
model.transformer_block_type='pre_ln' \
model.data.data_prefix=[.5,/home/TestData/nlp/megatron_t5/data/pile_val_small_bert_tokenizer_text_document,.5,/home/TestData/nlp/megatron_t5/data/pile_val_small_bert_tokenizer_text_document] \
model.data.data_prefix=[.5,/home/TestData/nlp/nmt/toy_data/wmt14-de-en.src,.5,/home/TestData/nlp/nmt/toy_data/wmt14-de-en.ref] \
model.position_embedding_type=relative \
model.data.index_mapping_dir=examples/nlp/language_modeling/t5_index_mappings \
model.data.respect_document_boundaries=False \
model.data.data_impl=text_mmap \
+model.data.data_impl_kwargs.newline_int=10 \
+model.data.data_impl_kwargs.header_lines=0 \
+model.data.data_impl_kwargs.workers=null \
+model.data.data_impl_kwargs.sort_dataset_paths=False \
model.share_token_embeddings=False \
model.share_decoder_tokens_head_embeddings=False"
sh "rm -rf examples/nlp/language_modeling/t5_pretrain_results"
Expand Down
13 changes: 13 additions & 0 deletions examples/nlp/language_modeling/conf/megatron_bart_config.yaml
Expand Up @@ -122,6 +122,19 @@ model:
data_prefix: ???
index_mapping_dir: null # path to save index mapping .npy files, by default will save in the same location as data_prefix
data_impl: mmap
# data_impl_kwargs: # currently used only for text_mmap, csv_mmap (should be data_impl dependant)
# # defaults for text_memmap
# newline_int: 10 # byte-value of newline (Use ord('\n') to get value)
# header_lines: 0 # skip first N header lines
# workers: null # number of workers when creating missing index files (null defaults to cpu_num // 2)
# sort_dataset_paths: False # if True datasets will be sorted by name
# # defaults for csv_memmap
# newline_int: 10 # byte-value of newline
# header_lines: 1 # skip first N header lines
# workers: null # number of workers when creating missing index files (null defaults to cpu_num // 2)
# sort_dataset_paths: False # if True datasets will be sorted by name
# data_col: 1 # column to use for data
# data_sep: ',' # string to split text into columns
splits_string: 949,45,5
seq_length: ${model.seq_length}
skip_warmup: True
Expand Down
15 changes: 14 additions & 1 deletion examples/nlp/language_modeling/conf/megatron_t5_config.yaml
Expand Up @@ -124,7 +124,20 @@ model:
# - /raid/data/pile/my-t5_01_text_document
data_prefix: ???
index_mapping_dir: null # path to save index mapping .npy files, by default will save in the same location as data_prefix
data_impl: mmap
data_impl: mmap # mmap, retmmap, text_mmap, csv_mmap
# data_impl_kwargs: # currently used only for text_mmap, csv_mmap (should be data_impl dependant)
# # defaults for text_memmap
# newline_int: 10 # byte-value of newline (Use ord('\n') to get value)
# header_lines: 0 # skip first N header lines
# workers: null # number of workers when creating missing index files (null defaults to cpu_num // 2)
# sort_dataset_paths: False # if True datasets will be sorted by name
# # defaults for csv_memmap
# newline_int: 10 # byte-value of newline
# header_lines: 1 # skip first N header lines
# workers: null # number of workers when creating missing index files (null defaults to cpu_num // 2)
# sort_dataset_paths: False # if True datasets will be sorted by name
# data_col: 1 # column to use for data
# data_sep: ',' # string to split text into columns
splits_string: 949,45,5
seq_length: ${model.seq_length}
seq_length_dec: 128
Expand Down
13 changes: 13 additions & 0 deletions examples/nlp/language_modeling/conf/megatron_ul2_config.yaml
Expand Up @@ -121,6 +121,19 @@ model:
data_prefix: ???
index_mapping_dir: null # path to save index mapping .npy files, by default will save in the same location as data_prefix
data_impl: mmap
# data_impl_kwargs: # currently used only for text_mmap, csv_mmap (should be data_impl dependant)
# # defaults for text_memmap
# newline_int: 10 # byte-value of newline (Use ord('\n') to get value)
# header_lines: 0 # skip first N header lines
# workers: null # number of workers when creating missing index files (null defaults to cpu_num // 2)
# sort_dataset_paths: False # if True datasets will be sorted by name
# # defaults for csv_memmap
# newline_int: 10 # byte-value of newline
# header_lines: 1 # skip first N header lines
# workers: null # number of workers when creating missing index files (null defaults to cpu_num // 2)
# sort_dataset_paths: False # if True datasets will be sorted by name
# data_col: 1 # column to use for data
# data_sep: ',' # string to split text into columns
splits_string: 949,45,5
seq_length: ${model.seq_length}
seq_length_dec: ${model.seq_length}
Expand Down
8 changes: 6 additions & 2 deletions examples/nlp/machine_translation/megatron_nmt_training.py
Expand Up @@ -145,8 +145,12 @@ def main(cfg) -> None:
pretrained_cfg.train_ds = cfg.model.train_ds
pretrained_cfg.train_ds.micro_batch_size = cfg.model.micro_batch_size
pretrained_cfg.train_ds.global_batch_size = cfg.model.global_batch_size
pretrained_cfg.validation_ds = cfg.model.validation_ds
pretrained_cfg.test_ds = cfg.model.test_ds
if hasattr(cfg.model, 'validation_ds'):
pretrained_cfg.validation_ds = cfg.model.validation_ds
else:
raise AttributeError(f"No validation dataset found in config.")
ericharper marked this conversation as resolved.
Show resolved Hide resolved
if hasattr(cfg.model, 'test_ds'):
pretrained_cfg.test_ds = cfg.model.test_ds

# Class target for the new class being restored.
pretrained_cfg.target = (
Expand Down