Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load lang id any2any training mengruw #8961

Closed

Conversation

mengruwNv
Copy link

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Collection: [Note which collection this PR will affect]

Changelog

  • Add specific line by line info of high level changes in this PR.

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this 

Jenkins CI

To run Jenkins, a NeMo User with write access must comment jenkins on the PR.

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

  • Related to # (issue)

titu1994 and others added 28 commits May 17, 2023 15:13
Signed-off-by: smajumdar <titu1994@gmail.com>
Signed-off-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: smajumdar <titu1994@gmail.com>
* [Temp] VP Fixes

Signed-off-by: smajumdar <titu1994@gmail.com>

* Revert logging

Signed-off-by: smajumdar <titu1994@gmail.com>

---------

Signed-off-by: smajumdar <titu1994@gmail.com>
(cherry picked from commit b6f46a0)
Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu>
* check for first or last stage

Signed-off-by: ericharper <complex451@gmail.com>

* remove redundant check

Signed-off-by: ericharper <complex451@gmail.com>

* fix typo

Signed-off-by: ericharper <complex451@gmail.com>

* add map_location

Signed-off-by: ericharper <complex451@gmail.com>

---------

Signed-off-by: ericharper <complex451@gmail.com>
* Bug fix to restore act ckpt

Signed-off-by: Markel Sanz Ausin <markelsanz14@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Markel Sanz Ausin <markelsanz14@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* Bug fix to reset sequence parallelism

Signed-off-by: Markel Sanz Ausin <markelsanz14@gmail.com>

* Update seq par reset/restore

Signed-off-by: Markel Sanz Ausin <markelsanz14@gmail.com>

* Add nested loop

Signed-off-by: Markel Sanz Ausin <markelsanz14@gmail.com>

---------

Signed-off-by: Markel Sanz Ausin <markelsanz14@gmail.com>
…ng (NVIDIA#6744)

* fix checkpointed forward and add test for full activation checkpointing

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* add method

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* add method

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

---------

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>
Signed-off-by: smajumdar <titu1994@gmail.com>
* add call to p2p overlap

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

* update Jenkins for test

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

---------

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>
* fix get param

Signed-off-by: ericharper <complex451@gmail.com>

* change name

Signed-off-by: ericharper <complex451@gmail.com>

---------

Signed-off-by: ericharper <complex451@gmail.com>
* initial POC for LDDL Bert

* Finish LDDL POC

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* address comments

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix merge head

* resolving merge

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add support for  val/test loaders

* change to new LDDL class + add winding

* fix logging level

* fix winding

* test fix

* fixes to winding

* add file system

* add prepemption optimizations

* more logging

* more prints

* better logging

* asfsf

* add barrier

* removing prints

* working with mb lddl loader

* final changes

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update requirements file with LDDL

Signed-off-by: wdykas <wdykas@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* revert adding to requirements

---------

Signed-off-by: wdykas <wdykas@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
NVIDIA#6740)

* Construct FP8 amax reduction group

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* update core for CI

Signed-off-by: Abhinav Khattar <aklife97@gmail.com>

---------

Signed-off-by: Tim Moon <tmoon@nvidia.com>
Signed-off-by: Abhinav Khattar <aklife97@gmail.com>
Co-authored-by: Abhinav Khattar <aklife97@gmail.com>
…#6780)

* add interfaces for tp_communication overlap

[pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Interface to provide custom userbuffer communicator settings by yaml file

[pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Construct MPI process group for userbuffers support

Signed-off-by: Tim Moon <tmoon@nvidia.com>

---------

Signed-off-by: Tim Moon <tmoon@nvidia.com>
Co-authored-by: Tim Moon <tmoon@nvidia.com>
Co-authored-by: Abhinav Khattar <aklife97@gmail.com>
* Fix TTS adapter tutorial

Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu>

* Fix version

Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu>

---------

Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu>
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: Markel Sanz Ausin <markelsanz14@gmail.com>
Signed-off-by: Abhinav Khattar <aklife97@gmail.com>
* add trainer.validate example

Signed-off-by: ericharper <complex451@gmail.com>

* clean up white space

Signed-off-by: ericharper <complex451@gmail.com>

* add mbs and gbs to the config

Signed-off-by: ericharper <complex451@gmail.com>

---------

Signed-off-by: ericharper <complex451@gmail.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
* add model pretraining and customization classes

Signed-off-by: ericharper <complex451@gmail.com>

* fix

Signed-off-by: ericharper <complex451@gmail.com>

* test width

Signed-off-by: ericharper <complex451@gmail.com>

* increase middle pane width

Signed-off-by: ericharper <complex451@gmail.com>

* add modules and datasets

Signed-off-by: ericharper <complex451@gmail.com>

* remove global in t5 dataset s and fix formatting in megatron base model

Signed-off-by: ericharper <complex451@gmail.com>

---------

Signed-off-by: ericharper <complex451@gmail.com>
* Apply garbage collection inverval to validation steps

Signed-off-by: Sangkug Lym <slym@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Sangkug Lym <slym@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: ericharper <complex451@gmail.com>
@github-actions github-actions bot added core Changes to NeMo Core TTS labels Apr 17, 2024
@mengruwNv
Copy link
Author

Hi, @Davood-M I suggested some fix for the language ids mis-match issue when training any2any NMT model. Could you please help to review and merge it to a proper branch? We need the code for April release. Thank you in advance!

Copy link
Contributor

github-actions bot commented May 2, 2024

This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days.

@github-actions github-actions bot added the stale label May 2, 2024
Copy link
Contributor

github-actions bot commented May 9, 2024

This PR was closed because it has been inactive for 7 days since being marked as stale.

@github-actions github-actions bot closed this May 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet