-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Asymmetric Encoder and Decoder Configuration for Megatron Models #4568
Conversation
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
This pull request introduces 1 alert when merging 997aa17 into fea3775 - view on LGTM.com new alerts:
|
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
This pull request introduces 1 alert when merging d8ac136 into fea3775 - view on LGTM.com new alerts:
|
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great PR! See comments.
nemo/collections/nlp/modules/common/megatron/token_level_encoder_decoder.py
Show resolved
Hide resolved
nemo/collections/nlp/modules/common/megatron/token_level_encoder_decoder.py
Outdated
Show resolved
Hide resolved
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
This pull request fixes 2 alerts when merging 3f567b5 into 1d25c90 - view on LGTM.com fixed alerts:
|
This pull request fixes 2 alerts when merging 9e20e5f into 588c6ca - view on LGTM.com fixed alerts:
|
This pull request fixes 2 alerts when merging 463d9a8 into ce16320 - view on LGTM.com fixed alerts:
|
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
…into asymmetric_enc_dec_megatron
This pull request fixes 2 alerts when merging 019edef into ce16320 - view on LGTM.com fixed alerts:
|
This pull request fixes 2 alerts when merging 763203c into 1a9daa5 - view on LGTM.com fixed alerts:
|
…into asymmetric_enc_dec_megatron
This pull request fixes 2 alerts when merging 182a787 into 1a9daa5 - view on LGTM.com fixed alerts:
|
This pull request fixes 2 alerts when merging 52a9396 into 1a9daa5 - view on LGTM.com fixed alerts:
|
/blossom-ci |
This pull request fixes 2 alerts when merging d18ee78 into 1a9daa5 - view on LGTM.com fixed alerts:
|
This pull request fixes 2 alerts when merging 5e32eeb into 1a9daa5 - view on LGTM.com fixed alerts:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! See minor comments and questions
nemo/collections/nlp/models/language_modeling/megatron_lm_encoder_decoder_model.py
Show resolved
Hide resolved
nemo/collections/nlp/modules/common/megatron/megatron_encoder_decoder.py
Show resolved
Hide resolved
@@ -59,9 +59,6 @@ def __init__( | |||
encoder_attn_mask_type=AttnMaskType.padding, | |||
hidden_dropout=0.1, | |||
attention_dropout=0.1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happened to
position_embedding_type='learned_absolute',
relative_attention_num_buckets=32,
relative_attention_max_distance=128,
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed RPE support from perceivers for now.
@@ -130,6 +130,7 @@ def __init__( | |||
model_type=parent_model_type, | |||
transformer_block_type=transformer_block_type, | |||
headscale=headscale, | |||
gradient_accumulation_fusion=False, # TODO: This has to be False for enc-dec models for now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will the model work if grad accumulation is set to true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is just a jit fusion thing that was implemented specifically for GPT that I turned off explicitly here for T5.
decoder_arch, | ||
vocab_size, | ||
hidden_size, | ||
encoder_cfg: DictConfig, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Way cleaner! Nice!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its still hard to keep track of args everywhere, but we're getting close to making it clean :)
return kv_channels | ||
|
||
def _validate_enc_dec_hidden_size(self, encoder_cfg, decoder_cfg): | ||
if encoder_cfg.hidden_size != decoder_cfg.hidden_size: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok enc/dec hidden size is validated here.
hidden_size % num_attention_heads == 0 | ||
), 'hidden_size must be divisible by num_attention_heads if kv_channels is None' | ||
kv_channels = hidden_size // num_attention_heads | ||
encoder_kv_channels, decoder_kv_channels = self._validate_config() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need those values encoder_kv_channels, decoder_kv_channels
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because these are typically provided as None in the yaml and then computed internally based on hidden size and num attention heads.
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
This pull request fixes 2 alerts when merging fe68b84 into 1a9daa5 - view on LGTM.com fixed alerts:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
This pull request fixes 2 alerts when merging 7a4b576 into 2c8eb53 - view on LGTM.com fixed alerts:
|
This pull request fixes 2 alerts when merging db6b55e into a592f6f - view on LGTM.com fixed alerts:
|
…DIA#4568) * Initial asymmetric Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Update YAML Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix other yaml files Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Style Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Remove unused import Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Update configs Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * NMT fixes Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Add missing arg Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Style Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Jenkins test updates Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * More fixes Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Rank check fix Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Add enc/dec specific rpe configs Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Style Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * CI Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix perceiver and model type Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Remove RPE args from perceiver Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Validate RPE and perceivers Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Style Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix layer type Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * deep copy config and better backward compat Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Perceiver compatibility and headscale changes Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix shapes Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Style Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Update CI test Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix heads Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Style Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Address comments Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> Co-authored-by: Micha Livne <michalivne@users.noreply.github.com> Signed-off-by: David Mosallanezhad <dmosallanezh@nvidia.com>
…DIA#4568) * Initial asymmetric Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Update YAML Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix other yaml files Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Style Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Remove unused import Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Update configs Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * NMT fixes Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Add missing arg Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Style Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Jenkins test updates Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * More fixes Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Rank check fix Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Add enc/dec specific rpe configs Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Style Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * CI Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix perceiver and model type Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Remove RPE args from perceiver Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Validate RPE and perceivers Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Style Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix layer type Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * deep copy config and better backward compat Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Perceiver compatibility and headscale changes Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix shapes Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Style Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Update CI test Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix heads Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Style Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Address comments Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> Co-authored-by: Micha Livne <michalivne@users.noreply.github.com> Signed-off-by: Anas Abou Allaban <aabouallaban@pm.me>
…DIA#4568) * Initial asymmetric Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Update YAML Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix other yaml files Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Style Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Remove unused import Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Update configs Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * NMT fixes Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Add missing arg Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Style Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Jenkins test updates Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * More fixes Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Rank check fix Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Add enc/dec specific rpe configs Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Style Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * CI Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix perceiver and model type Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Remove RPE args from perceiver Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Validate RPE and perceivers Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Style Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix layer type Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * deep copy config and better backward compat Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Perceiver compatibility and headscale changes Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix shapes Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Style Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Update CI test Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix heads Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Style Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Address comments Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> Co-authored-by: Micha Livne <michalivne@users.noreply.github.com> Signed-off-by: Hainan Xu <hainanx@nvidia.com>
Signed-off-by: MaximumEntropy sandeep.subramanian.1@umontreal.ca
What does this PR do ?
Adds the ability to configure encoder and decoder asymmetrically for encoder and decoder.
Collection: NLP
Changelog
Usage
Before your PR is "Ready for review"
Pre checks:
PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.
Who can review?
Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.
Additional Information