Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implements the UL2 Dataset and config #4184

Merged
merged 57 commits into from
Jun 1, 2022
Merged

Implements the UL2 Dataset and config #4184

merged 57 commits into from
Jun 1, 2022

Conversation

MaximumEntropy
Copy link
Contributor

What does this PR do ?

  1. Implements the UL2 Dataset from https://arxiv.org/abs/2205.05131.
  2. Adds a megatron_ul2_config.yaml to train an encoder-decoder model on this data using t5 pretraining script.

Collection: NLP

Changelog

  • UL2 dataset class that inherits from T5Dataset
  • UL2 yaml config
  • Minor refactor of LMAdaptedT5Dataset

Usage

python megatron_t5_pretraining.py -cn megatron_ul2_config

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

  • Related to # (issue)

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
@lgtm-com
Copy link

lgtm-com bot commented May 17, 2022

This pull request introduces 3 alerts when merging e90094e into 8318980 - view on LGTM.com

new alerts:

  • 1 for Mismatch in multiple assignment
  • 1 for Unused local variable
  • 1 for Unused import

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
@lgtm-com
Copy link

lgtm-com bot commented May 26, 2022

This pull request introduces 1 alert and fixes 1 when merging 8cbcd1c into 84265ac - view on LGTM.com

new alerts:

  • 1 for Mismatch in multiple assignment

fixed alerts:

  • 1 for Unused import

@lgtm-com
Copy link

lgtm-com bot commented May 26, 2022

This pull request introduces 8 alerts and fixes 1 when merging d2e984d into 84265ac - view on LGTM.com

new alerts:

  • 7 for Module-level cyclic import
  • 1 for Mismatch in multiple assignment

fixed alerts:

  • 1 for Unused import

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
@lgtm-com
Copy link

lgtm-com bot commented May 26, 2022

This pull request introduces 1 alert and fixes 1 when merging 3fdb609 into 84265ac - view on LGTM.com

new alerts:

  • 1 for Mismatch in multiple assignment

fixed alerts:

  • 1 for Unused import

yidong72
yidong72 previously approved these changes May 27, 2022
Copy link
Collaborator

@yidong72 yidong72 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
@lgtm-com
Copy link

lgtm-com bot commented May 31, 2022

This pull request introduces 1 alert and fixes 1 when merging f2c1919 into 7a9a8f0 - view on LGTM.com

new alerts:

  • 1 for Mismatch in multiple assignment

fixed alerts:

  • 1 for Unused import

@lgtm-com
Copy link

lgtm-com bot commented May 31, 2022

This pull request introduces 1 alert and fixes 1 when merging 56c7a71 into 7a9a8f0 - view on LGTM.com

new alerts:

  • 1 for Mismatch in multiple assignment

fixed alerts:

  • 1 for Unused import

yidong72
yidong72 previously approved these changes May 31, 2022
@lgtm-com
Copy link

lgtm-com bot commented May 31, 2022

This pull request introduces 1 alert and fixes 1 when merging 48a17f3 into 7a9a8f0 - view on LGTM.com

new alerts:

  • 1 for Mismatch in multiple assignment

fixed alerts:

  • 1 for Unused import

@lgtm-com
Copy link

lgtm-com bot commented May 31, 2022

This pull request introduces 1 alert and fixes 1 when merging cc940b7 into e838862 - view on LGTM.com

new alerts:

  • 1 for Mismatch in multiple assignment

fixed alerts:

  • 1 for Unused import

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
@lgtm-com
Copy link

lgtm-com bot commented May 31, 2022

This pull request introduces 1 alert and fixes 1 when merging c983d20 into f6936ce - view on LGTM.com

new alerts:

  • 1 for Mismatch in multiple assignment

fixed alerts:

  • 1 for Unused import

@MaximumEntropy MaximumEntropy merged commit 2af3786 into main Jun 1, 2022
@MaximumEntropy MaximumEntropy deleted the ul2_t5 branch June 1, 2022 01:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants