Adds a Knob for OnlineSampling by introducing 'global_sample_mapping' in the SFT config.yaml #9913
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What does this PR do ?
Adds a Knob for OnlineSampling by introducing a new parameter 'global_sample_mapping' in the SFT config YAML(default value is false).
The feature was proposed in this Nvbug. This will allow users to choose the data sample method that best suits their needs.
Collection: NeMo's NLP collection
Changelog
Usage
change 'global_sample_mapping' in megatron_gpt_finetuning_config.yaml
False (default): Use OnlineSampleMapping. Shuffle the dataset within each epoch.
True: Shuffle the replicated data all together.
GitHub Actions CI
The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.
The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".
Before your PR is "Ready for review"
Pre checks:
PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.
Who can review?
@ericharper
Anyone in the community is free to review the PR once the checks have passed.
Additional Information
The second suggested Improvement has not been fixed yet