Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Asymmetric Encoder and Decoder Configuration for Megatron Models #4568

Merged
merged 51 commits into from
Aug 2, 2022

Conversation

MaximumEntropy
Copy link
Contributor

Signed-off-by: MaximumEntropy sandeep.subramanian.1@umontreal.ca

What does this PR do ?

Adds the ability to configure encoder and decoder asymmetrically for encoder and decoder.

Collection: NLP

Changelog

  • Make YAML 1-step hierarchical
  • Function in lm_encoder_decoder to maintain backward compatibility.

Usage

  • You can potentially add a usage example below
model.encoder.num_layers=12
model.decoder.num_layers=2

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

  • Related to # (issue)

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
@lgtm-com
Copy link

lgtm-com bot commented Jul 20, 2022

This pull request introduces 1 alert when merging 997aa17 into fea3775 - view on LGTM.com

new alerts:

  • 1 for Unused import

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
@lgtm-com
Copy link

lgtm-com bot commented Jul 20, 2022

This pull request introduces 1 alert when merging d8ac136 into fea3775 - view on LGTM.com

new alerts:

  • 1 for Unused import

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Copy link
Collaborator

@michalivne michalivne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great PR! See comments.

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
@lgtm-com
Copy link

lgtm-com bot commented Jul 29, 2022

This pull request fixes 2 alerts when merging 3f567b5 into 1d25c90 - view on LGTM.com

fixed alerts:

  • 2 for Unused local variable

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
@lgtm-com
Copy link

lgtm-com bot commented Jul 29, 2022

This pull request fixes 2 alerts when merging 9e20e5f into 588c6ca - view on LGTM.com

fixed alerts:

  • 2 for Unused local variable

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
@lgtm-com
Copy link

lgtm-com bot commented Aug 1, 2022

This pull request fixes 2 alerts when merging 463d9a8 into ce16320 - view on LGTM.com

fixed alerts:

  • 2 for Unused local variable

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
@lgtm-com
Copy link

lgtm-com bot commented Aug 1, 2022

This pull request fixes 2 alerts when merging 019edef into ce16320 - view on LGTM.com

fixed alerts:

  • 2 for Unused local variable

@lgtm-com
Copy link

lgtm-com bot commented Aug 1, 2022

This pull request fixes 2 alerts when merging 763203c into 1a9daa5 - view on LGTM.com

fixed alerts:

  • 2 for Unused local variable

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
@lgtm-com
Copy link

lgtm-com bot commented Aug 1, 2022

This pull request fixes 2 alerts when merging 182a787 into 1a9daa5 - view on LGTM.com

fixed alerts:

  • 2 for Unused local variable

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
@lgtm-com
Copy link

lgtm-com bot commented Aug 1, 2022

This pull request fixes 2 alerts when merging 52a9396 into 1a9daa5 - view on LGTM.com

fixed alerts:

  • 2 for Unused local variable

@okuchaiev
Copy link
Member

/blossom-ci

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
@lgtm-com
Copy link

lgtm-com bot commented Aug 1, 2022

This pull request fixes 2 alerts when merging d18ee78 into 1a9daa5 - view on LGTM.com

fixed alerts:

  • 2 for Unused local variable

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
@lgtm-com
Copy link

lgtm-com bot commented Aug 1, 2022

This pull request fixes 2 alerts when merging 5e32eeb into 1a9daa5 - view on LGTM.com

fixed alerts:

  • 2 for Unused local variable

Copy link
Collaborator

@michalivne michalivne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! See minor comments and questions

@@ -59,9 +59,6 @@ def __init__(
encoder_attn_mask_type=AttnMaskType.padding,
hidden_dropout=0.1,
attention_dropout=0.1,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happened to

        position_embedding_type='learned_absolute',
        relative_attention_num_buckets=32,
        relative_attention_max_distance=128,

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed RPE support from perceivers for now.

@@ -130,6 +130,7 @@ def __init__(
model_type=parent_model_type,
transformer_block_type=transformer_block_type,
headscale=headscale,
gradient_accumulation_fusion=False, # TODO: This has to be False for enc-dec models for now.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will the model work if grad accumulation is set to true?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is just a jit fusion thing that was implemented specifically for GPT that I turned off explicitly here for T5.

decoder_arch,
vocab_size,
hidden_size,
encoder_cfg: DictConfig,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Way cleaner! Nice!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its still hard to keep track of args everywhere, but we're getting close to making it clean :)

return kv_channels

def _validate_enc_dec_hidden_size(self, encoder_cfg, decoder_cfg):
if encoder_cfg.hidden_size != decoder_cfg.hidden_size:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok enc/dec hidden size is validated here.

hidden_size % num_attention_heads == 0
), 'hidden_size must be divisible by num_attention_heads if kv_channels is None'
kv_channels = hidden_size // num_attention_heads
encoder_kv_channels, decoder_kv_channels = self._validate_config()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need those values encoder_kv_channels, decoder_kv_channels?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because these are typically provided as None in the yaml and then computed internally based on hidden size and num attention heads.

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
@lgtm-com
Copy link

lgtm-com bot commented Aug 1, 2022

This pull request fixes 2 alerts when merging fe68b84 into 1a9daa5 - view on LGTM.com

fixed alerts:

  • 2 for Unused local variable

Copy link
Collaborator

@michalivne michalivne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@lgtm-com
Copy link

lgtm-com bot commented Aug 2, 2022

This pull request fixes 2 alerts when merging 7a4b576 into 2c8eb53 - view on LGTM.com

fixed alerts:

  • 2 for Unused local variable

@lgtm-com
Copy link

lgtm-com bot commented Aug 2, 2022

This pull request fixes 2 alerts when merging db6b55e into a592f6f - view on LGTM.com

fixed alerts:

  • 2 for Unused local variable

@MaximumEntropy MaximumEntropy merged commit 8e1436b into main Aug 2, 2022
@MaximumEntropy MaximumEntropy deleted the asymmetric_enc_dec_megatron branch August 2, 2022 02:24
Davood-M pushed a commit to Davood-M/NeMo that referenced this pull request Aug 9, 2022
…DIA#4568)

* Initial asymmetric

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Update YAML

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix other yaml files

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Style

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Remove unused import

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Update configs

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* NMT fixes

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Add missing arg

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Style

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Jenkins test updates

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* More fixes

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Rank check fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Add enc/dec specific rpe configs

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Style

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* CI

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix perceiver and model type

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Remove RPE args from perceiver

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Validate RPE and perceivers

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Style

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix layer type

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* deep copy config and better backward compat

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Perceiver compatibility and headscale changes

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix shapes

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Style

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Update CI test

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix heads

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Style

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Address comments

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Co-authored-by: Micha Livne <michalivne@users.noreply.github.com>
Signed-off-by: David Mosallanezhad <dmosallanezh@nvidia.com>
piraka9011 pushed a commit to piraka9011/NeMo that referenced this pull request Aug 25, 2022
…DIA#4568)

* Initial asymmetric

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Update YAML

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix other yaml files

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Style

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Remove unused import

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Update configs

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* NMT fixes

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Add missing arg

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Style

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Jenkins test updates

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* More fixes

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Rank check fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Add enc/dec specific rpe configs

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Style

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* CI

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix perceiver and model type

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Remove RPE args from perceiver

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Validate RPE and perceivers

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Style

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix layer type

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* deep copy config and better backward compat

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Perceiver compatibility and headscale changes

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix shapes

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Style

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Update CI test

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix heads

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Style

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Address comments

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Co-authored-by: Micha Livne <michalivne@users.noreply.github.com>
Signed-off-by: Anas Abou Allaban <aabouallaban@pm.me>
hainan-xv pushed a commit to hainan-xv/NeMo that referenced this pull request Nov 29, 2022
…DIA#4568)

* Initial asymmetric

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Update YAML

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix other yaml files

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Style

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Remove unused import

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Update configs

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* NMT fixes

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Add missing arg

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Style

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Jenkins test updates

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* More fixes

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Rank check fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Add enc/dec specific rpe configs

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Style

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* CI

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix perceiver and model type

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Remove RPE args from perceiver

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Validate RPE and perceivers

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Style

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix layer type

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* deep copy config and better backward compat

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Perceiver compatibility and headscale changes

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix shapes

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Style

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Update CI test

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix heads

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Style

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Address comments

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

Co-authored-by: Micha Livne <michalivne@users.noreply.github.com>
Signed-off-by: Hainan Xu <hainanx@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants