-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix notebook misconfiguration
error
#3975
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: Yi Dong <doyend@gmail.com>
ericharper
approved these changes
Apr 12, 2022
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks!
ericharper
pushed a commit
that referenced
this pull request
Apr 20, 2022
Signed-off-by: Yi Dong <doyend@gmail.com> Co-authored-by: Yi Dong <doyend@gmail.com>
ericharper
added a commit
that referenced
this pull request
Apr 20, 2022
* update version Signed-off-by: ericharper <complex451@gmail.com> * Stateless timer fix for PTL 1.6 (#3925) * Stateless timer fix for PTL 1.6 Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Stateless timer PTL test Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix year Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Style Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Remove unused imports Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Style Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * GPU test Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Style Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * clean import Signed-off-by: ericharper <complex451@gmail.com> Co-authored-by: ericharper <complex451@gmail.com> * Fix issues with librosa deprecations (#3950) Signed-off-by: smajumdar <titu1994@gmail.com> * Fix notebook bugs for branch r1.8.0 (#3948) * load the model from ngc Signed-off-by: Yi Dong <doyend@gmail.com> * fix all biomegatron notebook Signed-off-by: Yi Dong <doyend@gmail.com> * fix the typos Signed-off-by: Yi Dong <doyend@gmail.com> * remove output Signed-off-by: Yi Dong <doyend@gmail.com> * fix isort Signed-off-by: Yi Dong <doyend@gmail.com> * fix merge error Signed-off-by: Yi Dong <doyend@gmail.com> * change ntpath for isort workaround Signed-off-by: Yi Dong <doyend@gmail.com> * fix unit test Signed-off-by: Yi Dong <doyend@gmail.com> * fix ci Signed-off-by: Yi Dong <doyend@gmail.com> * fix ci bert pretraining Signed-off-by: Yi Dong <doyend@gmail.com> * make it compatible with main Signed-off-by: Yi Dong <doyend@gmail.com> * add the teste for biomegatron ner Signed-off-by: Yi Dong <doyend@gmail.com> * fix argument Signed-off-by: Yi Dong <doyend@gmail.com> * fix usablity issue Signed-off-by: Yi Dong <doyend@gmail.com> * work around Signed-off-by: Yi Dong <doyend@gmail.com> Co-authored-by: Yi Dong <doyend@gmail.com> Co-authored-by: Eric Harper <complex451@gmail.com> * Fix global batch fit loop (#3936) * add lightning module hooks for global batch Signed-off-by: ericharper <complex451@gmail.com> * clean scripts Signed-off-by: ericharper <complex451@gmail.com> * style Signed-off-by: ericharper <complex451@gmail.com> * remove unused import Signed-off-by: ericharper <complex451@gmail.com> * DP=1 fix Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * set num dataset workers to 2 Signed-off-by: ericharper <complex451@gmail.com> * update validation_loop with GlobalDataFetcher Signed-off-by: ericharper <complex451@gmail.com> * add test global data fetcher Signed-off-by: ericharper <complex451@gmail.com> * Drop last for test ds Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix test epoch end Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Style Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix eval Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix reconfigure microbatch in the complete method Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * add comments Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Set init consumed samples Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * fix shuffle Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * add save_restore_connector arg Signed-off-by: ericharper <complex451@gmail.com> * Fix padding for labels and loss mask Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * GLUE/XNLI CI tests Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * limit val batches in hydra fix Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Restart CI Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix unittest Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> Co-authored-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Exports 22.03 war (#3957) * Fixed fastpitch for 22.03 Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com> * cleanup Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com> * Restored mask expansion; added WAR for test container images Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com> * style Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com> * Refactor restorefrom (#3927) * update package info (#3926) Signed-off-by: ericharper <complex451@gmail.com> * Refactor restore_from Signed-off-by: Ramanathan Arunachalam <rarunachalam@nvidia.com> * Move export related python files to scripts/export/ Signed-off-by: Ramanathan Arunachalam <rarunachalam@nvidia.com> * Return state dict after modification function * Remove Megatron legacy parameter in common.py restore_from function Signed-off-by: Ramanathan Arunachalam <rarunachalam@nvidia.com> * ability to set log_predictions to false (#3929) * Bumping Python version Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com> * fixing style Signed-off-by: Oleksii Kuchaiev <okuchaiev@nvidia.com> * load the model from ngc Signed-off-by: Yi Dong <doyend@gmail.com> * fix all biomegatron notebook Signed-off-by: Yi Dong <doyend@gmail.com> * fix the typos Signed-off-by: Yi Dong <doyend@gmail.com> * remove output Signed-off-by: Yi Dong <doyend@gmail.com> * fix isort Signed-off-by: Yi Dong <doyend@gmail.com> * fix merge error Signed-off-by: Yi Dong <doyend@gmail.com> * change ntpath for isort workaround Signed-off-by: Yi Dong <doyend@gmail.com> * fix unit test Signed-off-by: Yi Dong <doyend@gmail.com> * fix ci Signed-off-by: Yi Dong <doyend@gmail.com> * fix ci bert pretraining Signed-off-by: Yi Dong <doyend@gmail.com> * Rearrage export files; Style fix; Extend legacy MegatronBert conversion to NLP models nemo version updation * Glu activation variants (#3951) * Temp Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Add reglu and swiglu activations Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Style Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Style on unrelated file Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * CI changes to test activations Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix unused import Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Style fix beacuse of merge from main Signed-off-by: Ramanathan Arunachalam <rarunachalam@nvidia.com> * make it compatible with main Signed-off-by: Yi Dong <doyend@gmail.com> * add the teste for biomegatron ner Signed-off-by: Yi Dong <doyend@gmail.com> * fix argument Signed-off-by: Yi Dong <doyend@gmail.com> * fix usablity issue Signed-off-by: Yi Dong <doyend@gmail.com> * FastPitch FT notebook - Improving Speech Quality clarifications (#3954) * FastPitch FT notebook - Improving Speech Quality clarifications Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com> * Add pynini dependency install to FastPitch FT notebook Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com> * Pin pynini install for FastPitch FT tutorial Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com> * work around Signed-off-by: Yi Dong <doyend@gmail.com> Co-authored-by: Eric Harper <complex451@gmail.com> Co-authored-by: Ramanathan Arunachalam <rarunachalam@nvidia.com> Co-authored-by: Dima Rekesh <bmwshop@gmail.com> Co-authored-by: Oleksii Kuchaiev <okuchaiev@nvidia.com> Co-authored-by: Yi Dong <doyend@gmail.com> Co-authored-by: Yi Dong <43824965+yidong72@users.noreply.github.com> Co-authored-by: Boris Fomitchev <borisfom@users.noreply.github.com> Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca> Co-authored-by: Jocelyn <jocelynh@nvidia.com> * Bump TTS deprecation version to 1.9 (#3955) * bump deprecation version Signed-off-by: Jason <jasoli@nvidia.com> * update talknet depre Signed-off-by: Jason <jasoli@nvidia.com> * added conformer for zh. (#3970) Signed-off-by: Vahid <vnoroozi@nvidia.com> * Add pinned pynini and scipy installs to TTS training tutorial (#3967) Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com> * Fix variable name and move models to CPU in Change partition (#3972) * fixes Signed-off-by: Abhinav Khattar <aklife97@gmail.com> * add CI Signed-off-by: Abhinav Khattar <aklife97@gmail.com> Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca> * fix misconfiguration (#3975) Signed-off-by: Yi Dong <doyend@gmail.com> Co-authored-by: Yi Dong <doyend@gmail.com> * Fix NMT variable passing bug (#3985) * fix Signed-off-by: Abhinav Khattar <aklife97@gmail.com> * stylefix Signed-off-by: Abhinav Khattar <aklife97@gmail.com> * Compatability override to load_state_dict for old TTS checkpoints (#3978) * Compatability override to load_state_dict for old TTS checkpoints Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com> * Tacotron2 training notebook fix - add GPU argument Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com> * Add hann window override warning for old model loading Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com> * Notebook Bug Fixes for r1.8.0 (#3989) * Made config related bug fixes Signed-off-by: Virginia Adams <vadams@nvidia.com> * Fixed cfg.get syntax Signed-off-by: Virginia Adams <vadams@nvidia.com> * Fix compat override for TalkNet Aligner (#3993) * Fix compatibility override for TalkNet Aligner Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com> * Remove extraneous logging import Signed-off-by: Jocelyn Huang <jocelynh@nvidia.com> * docs fixes (#3987) * docs fixes Signed-off-by: ekmb <ebakhturina@nvidia.com> * rename files in docs Signed-off-by: ekmb <ebakhturina@nvidia.com> * docs improvement Signed-off-by: ekmb <ebakhturina@nvidia.com> * arg renamed Signed-off-by: ekmb <ebakhturina@nvidia.com> * Fix nemo megatron restore with artifacts (#3997) * update config_path in register_artifact Signed-off-by: ericharper <complex451@gmail.com> * fix register_artifact calls Signed-off-by: ericharper <complex451@gmail.com> * fix register_artifact calls Signed-off-by: ericharper <complex451@gmail.com> * update log messages to include merges file Signed-off-by: ericharper <complex451@gmail.com> * add default prompts to config Signed-off-by: ericharper <complex451@gmail.com> * Fixes val_check_interval, skip loading train data during eval (#3968) * Change stage check Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix bugs in megatron t5 glue eval scripts Signed-off-by: Yu Yao <yuya@nvidia.com> * Fix reconfigure Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Change check Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix hasattr Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix typo in cfg structure Signed-off-by: Yu Yao <yuya@nvidia.com> * Update megatron t5 glue eval config file Signed-off-by: Yu Yao <yuya@nvidia.com> * Reconfigure to avoid drop last Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Fix for train step reconfigure as well Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * Update megatron t5 glue eval config file drop_last to False Signed-off-by: Yu Yao <yuya@nvidia.com> * Style Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> * limit test batches Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca> Co-authored-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com> Co-authored-by: Eric Harper <complex451@gmail.com> * LogProb calculation performance fix (#3984) * performance fix for logprob computation Signed-off-by: Yi Dong <doyend@gmail.com> * fix redandant assign Signed-off-by: Yi Dong <doyend@gmail.com> * fix bug to add gather from TP workers Signed-off-by: Yi Dong <doyend@gmail.com> Co-authored-by: Yi Dong <doyend@gmail.com> * Fix link issues in export example notebook and fix pretrained model info for MegatronBert (#4004) Signed-off-by: Ramanathan Arunachalam <rarunachalam@nvidia.com> Co-authored-by: Ramanathan Arunachalam <rarunachalam@nvidia.com> * Fix single GPU training issue + change deprecated Lightning args (#4010) * change vars Signed-off-by: Abhinav Khattar <aklife97@gmail.com> * style fix Signed-off-by: Abhinav Khattar <aklife97@gmail.com> * Fix P-Tune T5 model (#4001) * fix ptune t5 Signed-off-by: Yi Dong <doyend@gmail.com> * fix ci test Signed-off-by: Yi Dong <doyend@gmail.com> * fix the ci fail because of the order problem Signed-off-by: Yi Dong <doyend@gmail.com> Co-authored-by: Yi Dong <doyend@gmail.com> Co-authored-by: Eric Harper <complex451@gmail.com> * Megatron work-arounds (#3998) * WAR around Apex issue, and making sure output is FP32 Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com> * Fixing merge issues; moving dummy Trainer; adding float() casts Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com> * Fixing ColumnParallelLinear call Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com> * Cleanup Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com> * Cleanup#2 Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com> * fix the broadcast shape mismatch (#4017) Signed-off-by: Yi Dong <doyend@gmail.com> Co-authored-by: Yi Dong <doyend@gmail.com> Co-authored-by: Eric Harper <complex451@gmail.com> * add known issues (#4024) Signed-off-by: ericharper <complex451@gmail.com> * update readme with conda env setup instructions Signed-off-by: ericharper <complex451@gmail.com> * typo Signed-off-by: ericharper <complex451@gmail.com> * update package info Signed-off-by: ericharper <complex451@gmail.com> * update branch Signed-off-by: ericharper <complex451@gmail.com> * update package info Signed-off-by: ericharper <complex451@gmail.com> * revert apex guard removal Signed-off-by: ericharper <complex451@gmail.com> * revert --language to --lang Signed-off-by: ericharper <complex451@gmail.com> * fix apex guard Signed-off-by: ericharper <complex451@gmail.com> * remove set_trace Signed-off-by: ericharper <complex451@gmail.com> * typo Signed-off-by: ericharper <complex451@gmail.com> * typo Signed-off-by: ericharper <complex451@gmail.com> * fix apex guard Signed-off-by: ericharper <complex451@gmail.com> * remove unreachable statement Signed-off-by: ericharper <complex451@gmail.com> * remove duplicate lines Signed-off-by: ericharper <complex451@gmail.com> * remove duplicate lines Signed-off-by: ericharper <complex451@gmail.com> Co-authored-by: Sandeep Subramanian <sandeep.subramanian.1@umontreal.ca> Co-authored-by: Somshubra Majumdar <titu1994@gmail.com> Co-authored-by: Yi Dong <43824965+yidong72@users.noreply.github.com> Co-authored-by: Yi Dong <doyend@gmail.com> Co-authored-by: Boris Fomitchev <borisfom@users.noreply.github.com> Co-authored-by: Ramanathan Arunachalam <ramanathan.arun@rutgers.edu> Co-authored-by: Ramanathan Arunachalam <rarunachalam@nvidia.com> Co-authored-by: Dima Rekesh <bmwshop@gmail.com> Co-authored-by: Oleksii Kuchaiev <okuchaiev@nvidia.com> Co-authored-by: Jocelyn <jocelynh@nvidia.com> Co-authored-by: Jason <jasoli@nvidia.com> Co-authored-by: Vahid Noroozi <VahidooX@users.noreply.github.com> Co-authored-by: Abhinav Khattar <aklife97@gmail.com> Co-authored-by: Virginia Adams <78445382+vadam5@users.noreply.github.com> Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com> Co-authored-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What does this PR do ?
The notebook is complaining about Trainer
misconfiguration
error because it tries to spawn processes in the python interactive session. Fixed it by using torch elastic environment.