Hiddens modules bugfixes #8466

jstjohn · 2024-02-21T00:15:30Z

What does this PR do ?

Hiddens modules such as a_mim and vae output a number of keys, including {"z", "z_mean"} for example. The current implementation supports specifying a single key to be used as the encoder output, and generally in configs this is set to z for those variational losses. This is not what you expect at inference time though. In that case you want the latent space to be a point estimate (the mean) rather than a noisy sampling. This PR accomplishes this by adding an optional second encoder output key to be used at inference time, and if it is not set then the current key is used for both. Inference/training is determined by checking the pytorch module property self.training.

Collection: nlp

Changelog

Add specific line by line info of high level changes in this PR.

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

Jenkins CI

To run Jenkins, a NeMo User with write access must comment jenkins on the PR.

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

New Feature
Bugfix
Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

Related to # (issue)

michalivne

LGTM! Fantastic PR, great catch of multiple issues.
I have a couple more suggestions to add to PR:

Currently reconstruction loss mixes between samples in the batch. We should fix that.
Add support in sample-level (not token level recon.) - token level should average iver hidden_size as well.
Add support in scaling only gradients.

nemo/collections/nlp/models/language_modeling/megatron_lm_encoder_decoder_model.py

nemo/collections/nlp/modules/common/megatron/hiddens/megatron_hiddens.py

nemo/collections/nlp/modules/common/megatron/hiddens/megatron_hidden_loss.py

michalivne

LGTM! Once CI pass it is ready to be merged.

michalivne · 2024-02-21T21:44:22Z

jenkins

jstjohn · 2024-02-21T22:45:11Z

jenkins

michalivne · 2024-02-21T22:50:35Z

jenkins

nemo/collections/nlp/modules/common/megatron/hiddens/megatron_hidden_loss.py

… time. Signed-off-by: John St John <jstjohn@nvidia.com>

michalivne · 2024-03-06T15:15:08Z

jenkins

github-actions · 2024-03-21T01:44:13Z

This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days.

jstjohn · 2024-03-21T03:00:15Z

I need to get back to this.

…

On Wed, Mar 20, 2024 at 6:44 PM github-actions[bot] < ***@***.***> wrote: This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days. — Reply to this email directly, view it on GitHub <#8466 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AADQCBXYXVSTXGBJXW6C7DDYZI3QHAVCNFSM6AAAAABDSCNISGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMJRGAZTGMJVGA> . You are receiving this because you authored the thread.Message ID: ***@***.***>

github-actions · 2024-04-05T01:44:23Z

This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days.

github-actions · 2024-04-13T01:39:15Z

This PR was closed because it has been inactive for 7 days since being marked as stale.

github-actions bot added the NLP label Feb 21, 2024

jstjohn force-pushed the hiddens_train_v_inference branch 4 times, most recently from 844271d to 8493c96 Compare February 21, 2024 17:14

michalivne reviewed Feb 21, 2024

View reviewed changes

jstjohn force-pushed the hiddens_train_v_inference branch from 8493c96 to 4202406 Compare February 21, 2024 21:40

michalivne marked this pull request as ready for review February 21, 2024 21:43

michalivne previously approved these changes Feb 21, 2024

View reviewed changes

jstjohn dismissed michalivne’s stale review via d4c60bf February 21, 2024 22:25

jstjohn force-pushed the hiddens_train_v_inference branch from 4202406 to d4c60bf Compare February 21, 2024 22:25

jstjohn force-pushed the hiddens_train_v_inference branch from d4c60bf to 955b215 Compare February 21, 2024 22:49

github-advanced-security bot found potential problems Feb 22, 2024

View reviewed changes

nemo/collections/nlp/modules/common/megatron/hiddens/megatron_hidden_loss.py Fixed Show resolved Hide resolved

jstjohn changed the title ~~Support different keys for training/inference modes with hiddens modules~~ Hiddens modules bugfixes Feb 23, 2024

jstjohn force-pushed the hiddens_train_v_inference branch from fd75f8a to aefe1d5 Compare February 29, 2024 16:38

Fix hiddens loss decode bug and support returning z_mean at inference…

8418d81

… time. Signed-off-by: John St John <jstjohn@nvidia.com>

jstjohn force-pushed the hiddens_train_v_inference branch from aefe1d5 to 8418d81 Compare February 29, 2024 16:43

Merge branch 'main' into hiddens_train_v_inference

183b95a

github-actions bot added the stale label Mar 21, 2024

github-actions bot removed the stale label Mar 22, 2024

github-actions bot added the stale label Apr 5, 2024

github-actions bot closed this Apr 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hiddens modules bugfixes #8466

Hiddens modules bugfixes #8466

jstjohn commented Feb 21, 2024 •

edited

michalivne left a comment

michalivne left a comment

michalivne commented Feb 21, 2024

jstjohn commented Feb 21, 2024

michalivne commented Feb 21, 2024

michalivne commented Mar 6, 2024

github-actions bot commented Mar 21, 2024

jstjohn commented Mar 21, 2024 via email

github-actions bot commented Apr 5, 2024

github-actions bot commented Apr 13, 2024

Hiddens modules bugfixes #8466

Hiddens modules bugfixes #8466

Conversation

jstjohn commented Feb 21, 2024 • edited

What does this PR do ?

Changelog

Usage

Jenkins CI

Before your PR is "Ready for review"

Who can review?

Additional Information

michalivne left a comment

Choose a reason for hiding this comment

michalivne left a comment

Choose a reason for hiding this comment

michalivne commented Feb 21, 2024

jstjohn commented Feb 21, 2024

michalivne commented Feb 21, 2024

michalivne commented Mar 6, 2024

github-actions bot commented Mar 21, 2024

jstjohn commented Mar 21, 2024 via email

github-actions bot commented Apr 5, 2024

github-actions bot commented Apr 13, 2024

jstjohn commented Feb 21, 2024 •

edited