-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mcore dist opt ckpt fix #9156
Mcore dist opt ckpt fix #9156
Commits on May 16, 2024
-
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Configuration menu - View commit details
-
Copy full SHA for 9d13f15 - Browse repository at this point
Copy the full SHA 9d13f15View commit details -
pass dp_zero_gather_scatter to starded-state-dict
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Configuration menu - View commit details
-
Copy full SHA for 1a64904 - Browse repository at this point
Copy the full SHA 1a64904View commit details -
Apply isort and black reformatting
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Configuration menu - View commit details
-
Copy full SHA for cb78ce9 - Browse repository at this point
Copy the full SHA cb78ce9View commit details -
introduce dist_ckpt_parallel_save option
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Configuration menu - View commit details
-
Copy full SHA for 006b6d8 - Browse repository at this point
Copy the full SHA 006b6d8View commit details -
determine sharding type from dist_ckpt_parallel_save
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Configuration menu - View commit details
-
Copy full SHA for 545fc51 - Browse repository at this point
Copy the full SHA 545fc51View commit details -
Apply isort and black reformatting
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Configuration menu - View commit details
-
Copy full SHA for 4643470 - Browse repository at this point
Copy the full SHA 4643470View commit details
Commits on May 17, 2024
-
read model.disk_ckpt_parallel_save from cfg and pass it to mcore dist…
… ckpt Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Configuration menu - View commit details
-
Copy full SHA for 8fa988d - Browse repository at this point
Copy the full SHA 8fa988dView commit details -
Apply isort and black reformatting
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Configuration menu - View commit details
-
Copy full SHA for 4b297e0 - Browse repository at this point
Copy the full SHA 4b297e0View commit details
Commits on May 21, 2024
-
Pass is_loading to mcore_optim.py's sharded_state_dict
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Configuration menu - View commit details
-
Copy full SHA for 82b07c9 - Browse repository at this point
Copy the full SHA 82b07c9View commit details -
Apply isort and black reformatting
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Configuration menu - View commit details
-
Copy full SHA for 27eb553 - Browse repository at this point
Copy the full SHA 27eb553View commit details -
Merge branch 'main' into akoumparouli/mcore_dist_opt_ckpt
Signed-off-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Configuration menu - View commit details
-
Copy full SHA for d7cf7f0 - Browse repository at this point
Copy the full SHA d7cf7f0View commit details
Commits on May 22, 2024
-
Update nemo/core/optim/mcore_optim.py
Co-authored-by: mikolajblaz <mikolajblaz@users.noreply.github.com> Signed-off-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Configuration menu - View commit details
-
Copy full SHA for 0a9cd71 - Browse repository at this point
Copy the full SHA 0a9cd71View commit details