New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hiddens modules bugfixes #8466
Hiddens modules bugfixes #8466
Conversation
844271d
to
8493c96
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Fantastic PR, great catch of multiple issues.
I have a couple more suggestions to add to PR:
- Currently reconstruction loss mixes between samples in the batch. We should fix that.
- Add support in sample-level (not token level recon.) - token level should average iver hidden_size as well.
- Add support in scaling only gradients.
nemo/collections/nlp/models/language_modeling/megatron_lm_encoder_decoder_model.py
Show resolved
Hide resolved
nemo/collections/nlp/models/language_modeling/megatron_lm_encoder_decoder_model.py
Show resolved
Hide resolved
nemo/collections/nlp/modules/common/megatron/hiddens/megatron_hiddens.py
Show resolved
Hide resolved
nemo/collections/nlp/modules/common/megatron/hiddens/megatron_hiddens.py
Show resolved
Hide resolved
nemo/collections/nlp/modules/common/megatron/hiddens/megatron_hiddens.py
Show resolved
Hide resolved
nemo/collections/nlp/modules/common/megatron/hiddens/megatron_hidden_loss.py
Show resolved
Hide resolved
8493c96
to
4202406
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Once CI pass it is ready to be merged.
jenkins |
4202406
to
d4c60bf
Compare
jenkins |
d4c60bf
to
955b215
Compare
jenkins |
nemo/collections/nlp/modules/common/megatron/hiddens/megatron_hidden_loss.py
Fixed
Show resolved
Hide resolved
fd75f8a
to
aefe1d5
Compare
… time. Signed-off-by: John St John <jstjohn@nvidia.com>
aefe1d5
to
8418d81
Compare
jenkins |
This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days. |
I need to get back to this.
…On Wed, Mar 20, 2024 at 6:44 PM github-actions[bot] < ***@***.***> wrote:
This PR is stale because it has been open for 14 days with no activity.
Remove stale label or comment or update or this will be closed in 7 days.
—
Reply to this email directly, view it on GitHub
<#8466 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AADQCBXYXVSTXGBJXW6C7DDYZI3QHAVCNFSM6AAAAABDSCNISGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMJRGAZTGMJVGA>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days. |
This PR was closed because it has been inactive for 7 days since being marked as stale. |
What does this PR do ?
Hiddens modules such as
a_mim
andvae
output a number of keys, including{"z", "z_mean"}
for example. The current implementation supports specifying a single key to be used as the encoder output, and generally in configs this is set toz
for those variational losses. This is not what you expect at inference time though. In that case you want the latent space to be a point estimate (the mean) rather than a noisy sampling. This PR accomplishes this by adding an optional second encoder output key to be used at inference time, and if it is not set then the current key is used for both. Inference/training is determined by checking the pytorch module propertyself.training
.Collection: nlp
Changelog
Usage
# Add a code snippet demonstrating how to use this
Jenkins CI
To run Jenkins, a NeMo User with write access must comment
jenkins
on the PR.Before your PR is "Ready for review"
Pre checks:
PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.
Who can review?
Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.
Additional Information