Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support for new mcore ds features #9388

Merged
merged 17 commits into from
Jun 11, 2024
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 6 additions & 5 deletions examples/nlp/language_modeling/conf/megatron_gpt_config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ exp_manager:

model:
# use GPTModel from megatron.core
mcore_gpt: False
mcore_gpt: True

# specify micro_batch_size, global_batch_size, and model parallelism
# gradient accumulation will be done automatically based on data_parallel_size
Expand Down Expand Up @@ -240,9 +240,9 @@ model:
# Dictionary: can override from CLI "model.data.data_prefix"={"train":[1.0, /path/to/data], "validation":/path/to/data, "test":/path/to/test}
# Or see example below:
# "model.data.data_prefix: {train:[1.0,/path/to/data], validation:[/path/to/data], test:[/path/to/test]}"
data_prefix: ???
data_prefix: []
index_mapping_dir: null # path to save index mapping .npy files, by default will save in the same location as data_prefix
data_impl: mmap
data_impl: mock
mmap_bin_files: True
splits_string: 900,50,50
seq_length: ${model.encoder_seq_length}
Expand All @@ -252,9 +252,10 @@ model:
reset_position_ids: False # Reset position ids after end-of-document token
reset_attention_mask: False # Reset attention mask after end-of-document token
eod_mask_loss: False # Mask loss for the end of document tokens
validation_drop_last: True # Set to false if the last partial validation samples is to be consumed
validation_drop_last: False # Set to false if the last partial validation samples is to be consumed
add_extra_token: True # Option to draw sequences with one extra token to ensure the sample input tokens and sample output tokens are both of the desired sequence length
no_seqlen_plus_one_input_tokens: False # Set to True to disable fetching (sequence length + 1) input tokens, instead get (sequence length) input tokens and mask the last token
pad_samples_to_global_batch_size: False # Set to True if you want to pad the last partial batch with -1's to equal global batch size
pad_samples_to_global_batch_size: True # Set to True if you want to pad the last partial batch with -1's to equal global batch size
shuffle_documents: True # Set to False to disable documents shuffling. Sample index will still be shuffled
exchange_indices_distributed: False # Set to True to exchange indices via torch.distributed instead of filesystem

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -91,8 +91,7 @@
return (num_available_samples - 1) // self.micro_batch_times_data_parallel_size + 1

@abc.abstractmethod
def __iter__(self):
...
def __iter__(self): ...
Dismissed Show dismissed Hide dismissed


class MegatronPretrainingSampler(BaseMegatronSampler):
Expand All @@ -107,7 +106,7 @@
indices = range(self.consumed_samples, self.total_samples)
if (not self.drop_last) and self.pad_samples_to_global_batch_size:
pad_samples_num = -len(indices) % self.global_batch_size
pad_indices = range(-1, -pad_samples_num - 1, -1)
pad_indices = [None for _ in range(pad_samples_num)]
dimapihtar marked this conversation as resolved.
Show resolved Hide resolved
indices = chain(indices, pad_indices)

for idx in indices:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1497,6 +1497,8 @@ def build_train_valid_test_datasets(self):
"reset_attention_mask": self.reset_attention_mask,
"eod_mask_loss": self.eod_mask_loss,
"mmap_bin_files": self.cfg.data.get("mmap_bin_files", True),
"drop_last_partial_validation_sequence": self.cfg.data.get("validation_drop_last", False),
"add_extra_token_to_sequence": self.cfg.data.get("add_extra_token", False),
}

data_prefix = self.cfg.data.data_prefix
Expand Down
Loading