Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hiddens modules bugfixes #8466

Closed
wants to merge 2 commits into from

Conversation

jstjohn
Copy link
Collaborator

@jstjohn jstjohn commented Feb 21, 2024

What does this PR do ?

Hiddens modules such as a_mim and vae output a number of keys, including {"z", "z_mean"} for example. The current implementation supports specifying a single key to be used as the encoder output, and generally in configs this is set to z for those variational losses. This is not what you expect at inference time though. In that case you want the latent space to be a point estimate (the mean) rather than a noisy sampling. This PR accomplishes this by adding an optional second encoder output key to be used at inference time, and if it is not set then the current key is used for both. Inference/training is determined by checking the pytorch module property self.training.

Collection: nlp

Changelog

  • Add specific line by line info of high level changes in this PR.

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this 

Jenkins CI

To run Jenkins, a NeMo User with write access must comment jenkins on the PR.

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

  • Related to # (issue)

@github-actions github-actions bot added the NLP label Feb 21, 2024
@jstjohn jstjohn force-pushed the hiddens_train_v_inference branch 4 times, most recently from 844271d to 8493c96 Compare February 21, 2024 17:14
Copy link
Collaborator

@michalivne michalivne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Fantastic PR, great catch of multiple issues.
I have a couple more suggestions to add to PR:

  1. Currently reconstruction loss mixes between samples in the batch. We should fix that.
  2. Add support in sample-level (not token level recon.) - token level should average iver hidden_size as well.
  3. Add support in scaling only gradients.

@michalivne michalivne marked this pull request as ready for review February 21, 2024 21:43
michalivne
michalivne previously approved these changes Feb 21, 2024
Copy link
Collaborator

@michalivne michalivne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Once CI pass it is ready to be merged.

@michalivne
Copy link
Collaborator

jenkins

@jstjohn
Copy link
Collaborator Author

jstjohn commented Feb 21, 2024

jenkins

@michalivne
Copy link
Collaborator

jenkins

@jstjohn jstjohn changed the title Support different keys for training/inference modes with hiddens modules Hiddens modules bugfixes Feb 23, 2024
… time.

Signed-off-by: John St John <jstjohn@nvidia.com>
@michalivne
Copy link
Collaborator

jenkins

Copy link
Contributor

This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days.

@github-actions github-actions bot added the stale label Mar 21, 2024
@jstjohn
Copy link
Collaborator Author

jstjohn commented Mar 21, 2024 via email

@github-actions github-actions bot removed the stale label Mar 22, 2024
Copy link
Contributor

github-actions bot commented Apr 5, 2024

This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days.

@github-actions github-actions bot added the stale label Apr 5, 2024
Copy link
Contributor

This PR was closed because it has been inactive for 7 days since being marked as stale.

@github-actions github-actions bot closed this Apr 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants