-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Huvu/mcore retro #8284
Closed
Closed
Huvu/mcore retro #8284
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: eharper <eharper@nvidia.com>
* Add dist ckpt support for regular optimizers Signed-off-by: Mikołaj Błaż <mblaz@nvidia.com> * [tutorial] fixed missing RIR scripts file. (#8257) Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> * fix imports Signed-off-by: dimapihtar <dpihtar@gmail.com> * imports fix Signed-off-by: dimapihtar <dpihtar@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ci imports fix Signed-off-by: dimapihtar <dpihtar@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * revert asr notebook Signed-off-by: dimapihtar <dpihtar@gmail.com> * revert asr notebook Signed-off-by: dimapihtar <dpihtar@gmail.com> --------- Signed-off-by: Mikołaj Błaż <mblaz@nvidia.com> Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Signed-off-by: dimapihtar <dpihtar@gmail.com> Co-authored-by: Eric Harper <complex451@gmail.com> Co-authored-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com> Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com> Co-authored-by: dimapihtar <dpihtar@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
huvunvidia
force-pushed
the
huvu/mcore_retro
branch
4 times, most recently
from
January 31, 2024 22:45
6fa440c
to
1c67ad1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CodeQL found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
huvunvidia
force-pushed
the
huvu/mcore_retro
branch
2 times, most recently
from
February 1, 2024 17:39
367deb3
to
fab0836
Compare
* add notebook Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * rename old notebook to Buffered_Streaming Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * call setup_streaming_params in set_default_att_context_size method Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * update links in docs Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * update links to tutorials in docs Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * remove hard-coding Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> * rename var Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> --------- Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com>
* fix path location and branch Signed-off-by: Nithin Rao Koluguri <nithinraok> * change to a floating point number Signed-off-by: Nithin Rao Koluguri <nithinraok> --------- Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Somshubra Majumdar <titu1994@gmail.com>
huvunvidia
force-pushed
the
huvu/mcore_retro
branch
from
February 2, 2024 17:39
6718421
to
704f273
Compare
* add deallocate pipeline output optimization Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com> Co-authored-by: Jimmy Zhang <jiemingz@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
…megaconf (#8299) * save cp_size to self Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com> * use parallel_state instead of self Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com> --------- Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com> Co-authored-by: Jimmy Zhang <jiemingz@nvidia.com> Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: dimapihtar <dpihtar@gmail.com>
* update peft doc Signed-off-by: Chen Cui <chcui@nvidia.com> * remove old prompt learning doc and notebook Signed-off-by: Chen Cui <chcui@nvidia.com> * fix table Signed-off-by: Chen Cui <chcui@nvidia.com> * fix table Signed-off-by: Chen Cui <chcui@nvidia.com> * fix table Signed-off-by: Chen Cui <chcui@nvidia.com> * Merge branch 'r1.23.0' into chcui/update_peft_doc Signed-off-by: Chen Cui <chcui@nvidia.com> * revert accidental changes Signed-off-by: Chen Cui <chcui@nvidia.com> * revert accidental changes Signed-off-by: Chen Cui <chcui@nvidia.com> --------- Signed-off-by: Chen Cui <chcui@nvidia.com>
…8242) (#8324) * Rebasing canary changes at current main Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Move the changes from asr transformer to nlp transformer as originally intended Signed-off-by: Piotr Żelasko <petezor@gmail.com> * update eval to strip spaces before punctuations Signed-off-by: stevehuang52 <heh@nvidia.com> * update pc strip Signed-off-by: stevehuang52 <heh@nvidia.com> * [canary] Refactor: `PromptedAudioToTextLhotseDataset` and `EncDecMultiTaskModel` (#8247) * Create a separate CanaryDataset and use it inside `transformer_bpe_models.py`. Ditches `token_sequence_format`. Signed-off-by: Piotr Żelasko <petezor@gmail.com> * [canary] Refactor: move changes in transformer_bpe_models.py to Canar… (#8252) * [canary] Refactor: move changes in transformer_bpe_models.py to CanaryModel Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Rename `CanaryModel` to `EncDecMultiTaskModel` and remove inheritance from `EncDecTransfModelBPE`; add a separate config for this model Signed-off-by: Piotr Żelasko <petezor@gmail.com> --------- Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Rename `CanaryDataset` to `PromptedAudioToTextLhotseDataset`; add `prompt_format_fn` argument; clean-up the `_canary_prompt_format` function a bit Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Move tokenization into `prompt_format_fn`, fix usage, add docs Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Backward-compatible utterance validation Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Improve type annotations Signed-off-by: Piotr Żelasko <petezor@gmail.com> * config and prompt_fn registration changes from review Signed-off-by: Piotr Żelasko <petezor@gmail.com> --------- Signed-off-by: Piotr Żelasko <petezor@gmail.com> * fix transcribe config Signed-off-by: stevehuang52 <heh@nvidia.com> * Refactor Canary to follow schema of remaining ASR models (#8260) * Initial draft of multi task beam decoding strategy Signed-off-by: smajumdar <titu1994@gmail.com> * Stabilize inference Signed-off-by: smajumdar <titu1994@gmail.com> * Update AED Multi Task model to mostly conform to Archetype-Type format. Update config Signed-off-by: smajumdar <titu1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add change decoding strategy Signed-off-by: smajumdar <titu1994@gmail.com> * Remove redundant imports Signed-off-by: smajumdar <titu1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Cleanup Signed-off-by: smajumdar <titu1994@gmail.com> * Cleanup Signed-off-by: smajumdar <titu1994@gmail.com> * remove asr transformer dependency on nlp Signed-off-by: stevehuang52 <heh@nvidia.com> * clean up Signed-off-by: stevehuang52 <heh@nvidia.com> * copy token_classifier from nlp to asr Signed-off-by: stevehuang52 <heh@nvidia.com> * Address comments Signed-off-by: smajumdar <titu1994@gmail.com> * Add typing to beam decoding Signed-off-by: smajumdar <titu1994@gmail.com> * Make prompt format configurable Signed-off-by: smajumdar <titu1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * drop asr dependency on nlp Signed-off-by: stevehuang52 <heh@nvidia.com> --------- Signed-off-by: smajumdar <titu1994@gmail.com> Signed-off-by: stevehuang52 <heh@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: stevehuang52 <heh@nvidia.com> * fix transcribe, update asr evaluator Signed-off-by: stevehuang52 <heh@nvidia.com> * Extend the docs for the canary prompt_fn Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Incorporate changes from Nithin's code review Signed-off-by: Piotr Żelasko <petezor@gmail.com> * training bug fix and adding launch script for speech_multitask (#8270) * bug fix and adding launch script for speech_multitask Signed-off-by: Krishna Puvvada <kpuvvada@nvidia.com> * update launch script example in speech_to_text_aed.py Signed-off-by: Krishna Puvvada <kpuvvada@nvidia.com> --------- Signed-off-by: Krishna Puvvada <kpuvvada@nvidia.com> Co-authored-by: Krishna Puvvada <kpuvvada@nvidia.com> * Fix: drop_last must be true in validation/test otherwise the training will hang Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com> * revert to current transcribe API Signed-off-by: stevehuang52 <heh@nvidia.com> * revert changes to NLP, update docs Signed-off-by: stevehuang52 <heh@nvidia.com> * update eval utils Signed-off-by: stevehuang52 <heh@nvidia.com> * update docs Signed-off-by: stevehuang52 <heh@nvidia.com> * Remove DALI; rename compute_audio_loss to compute_loss Signed-off-by: Piotr Żelasko <petezor@gmail.com> * set default use_model_transcribe=False Signed-off-by: stevehuang52 <heh@nvidia.com> * change os.path.dirname to pathlib Signed-off-by: stevehuang52 <heh@nvidia.com> * [canary] Test for CanaryTokenizer + refactoring (#8285) * Test for CanaryTokenizer Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Attempt at refactor... Signed-off-by: Piotr Żelasko <petezor@gmail.com> --------- Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Update config for AED models (#8294) Signed-off-by: smajumdar <titu1994@gmail.com> * set default calculate_wer=False in transcribe_speech.py Signed-off-by: stevehuang52 <heh@nvidia.com> * Attention encoder-decoder models for multiple speech-to-text tasks Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Apply suggestions from code review, part 1 Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com> Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Apply suggestions from code review, part 2 Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Document compute_loss Signed-off-by: Piotr Żelasko <petezor@gmail.com> * update transcribe_speech.py Signed-off-by: stevehuang52 <heh@nvidia.com> * add docstring Signed-off-by: stevehuang52 <heh@nvidia.com> * Attention encoder-decoder models for multiple speech-to-text tasks Signed-off-by: Piotr Żelasko <petezor@gmail.com> --------- Signed-off-by: Piotr Żelasko <petezor@gmail.com> Signed-off-by: stevehuang52 <heh@nvidia.com> Signed-off-by: smajumdar <titu1994@gmail.com> Signed-off-by: Krishna Puvvada <kpuvvada@nvidia.com> Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com> Co-authored-by: stevehuang52 <heh@nvidia.com> Co-authored-by: Somshubra Majumdar <titu1994@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Krishna Puvvada <93558329+krishnacpuvvada@users.noreply.github.com> Co-authored-by: Krishna Puvvada <kpuvvada@nvidia.com> Co-authored-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com> Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com> (cherry picked from commit d10726d) Co-authored-by: Piotr Żelasko <petezor@gmail.com>
huvunvidia
force-pushed
the
huvu/mcore_retro
branch
from
February 5, 2024 22:38
d3fae2d
to
2e6e1a5
Compare
…kdir, discrepancy in total samples and samples with neighbors retrieved, tokenizers)
huvunvidia
force-pushed
the
huvu/mcore_retro
branch
from
February 13, 2024 15:14
06f9517
to
09d9ce2
Compare
Signed-off-by: Chen Cui <chcui@nvidia.com>
* Turn on drop last Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Some neva fixes Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Taejin Park <tango4j@gmail.com>
Signed-off-by: Nithin Rao Koluguri <nithinraok> Co-authored-by: Nithin Rao Koluguri <nithinraok>
Signed-off-by: George <gzelenfroind@nvidia.com>
Signed-off-by: Huiying Li <willwin.lee@gmail.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* Fixing mcore bert for TP, PP and SP * Fixing mcore bert for TP, PP and SP * Fixing mcore version * Fixing mcore version * Update Jenkinsfile Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com> * Update Jenkinsfile Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com> * Update Jenkinsfile Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com> --------- Signed-off-by: Shanmugam Ramasamy <111910568+shanmugamr1992@users.noreply.github.com> Co-authored-by: Shanmugam Ramasamy <shanmugamr@shanmugamr-mlt.client.nvidia.com> Co-authored-by: Eric Harper <complex451@gmail.com>
* Add settings to suppress bf16 compile errors in CI on V100 Signed-off-by: Abhishree <abhishreetm@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Abhishree <abhishreetm@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* MoE parameter passing Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Pass EP/MoE params in consumer scripts. Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * PR fixes Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Use latest commit of mcore-0.5 Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * CI fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Co-authored-by: Alexandros Koumparoulis <akoumparouli@dgx1v-loki-21.nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* Add fp8 support for SD/Update notebook paths Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: eharper <eharper@nvidia.com>
* Update requirements_multimodal.txt Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Chen Cui <chcui@nvidia.com>
* add dep notice Signed-off-by: eharper <eharper@nvidia.com> * revert Signed-off-by: eharper <eharper@nvidia.com> --------- Signed-off-by: eharper <eharper@nvidia.com>
* Revert FP8 integration Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Mingyuan Ma <mingyuanm@nvidia.com>
This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days. |
huvunvidia
force-pushed
the
huvu/mcore_retro
branch
from
March 4, 2024 17:33
d8ae971
to
21984a1
Compare
…able training and saving checkpoint)
huvunvidia
force-pushed
the
huvu/mcore_retro
branch
from
March 5, 2024 15:17
9223f9e
to
b5d8aec
Compare
for more information, see https://pre-commit.ci
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What does this PR do ?
Implementing RETRO's mcore model, dataloader's wrapper, training script.
Add a one line overview of what this PR aims to accomplish.
Collection: [Note which collection this PR will affect]
Changelog
Usage
# Add a code snippet demonstrating how to use this
Jenkins CI
To run Jenkins, a NeMo User with write access must comment
jenkins
on the PR.Before your PR is "Ready for review"
Pre checks:
PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.
Who can review?
Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.
Additional Information