-
Notifications
You must be signed in to change notification settings - Fork 2
feat: pretrain dfm automodel #36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
linnanwang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These looks good to me, thank you Alex
efea72b to
0ba1055
Compare
|
/ok to test 3c64e65 |
@akoumpa, there was an error processing your request: See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/2/ |
|
/ok to test fa1b851 |
linnanwang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
/ok to test a867798 |
cf8ae22 to
aa8779a
Compare
|
/ok to test c3ea450 |
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
0668fe9 to
90f9bbc
Compare
|
/ok to test a0c5367 |
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
|
/ok to test 7b108d1 |
linnanwang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tested and everthing looks good.
|
/ok to test 2d48619 |
@linnanwang, there was an error processing your request: See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/2/ |
|
/ok to test 7b108d1 |
* init Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * add sigma_min/amx Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * add sigma_min/max Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * rename fientune.py to train.py Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * add from_config Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * pass scheduler and model Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * update param Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * introduce NeMoWanPipeline Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * add mode Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * update build_model_and_optimizer Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * update Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * update NeMoWanPipeline Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * rename Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * move examples Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * move Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix imports Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * lint Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * more lint Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix import Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix 3rdparty & pyproject Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * add torch Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * update uv.lock Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * update Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * revert 3rdparty Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * update uv.lock Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * update uv.lock Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> --------- Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* init Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * add sigma_min/amx Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * add sigma_min/max Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * rename fientune.py to train.py Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * add from_config Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * pass scheduler and model Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * update param Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * introduce NeMoWanPipeline Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * add mode Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * update build_model_and_optimizer Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * update Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * update NeMoWanPipeline Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * rename Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * move examples Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * move Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix imports Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * lint Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * more lint Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix import Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix 3rdparty & pyproject Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * add torch Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * update uv.lock Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * update Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * revert 3rdparty Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * update uv.lock Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * update uv.lock Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> --------- Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Signed-off-by: Lawrence Lane <llane@nvidia.com>
git clone -b akoumparouli/feat_pretrain_dfm_automodel git@github.com:NVIDIA-NeMo/DFM.gitgit submodule update --init --recursive 3rdparty/Single GPU
uv run --group automodel --group torch-cu124 --with . python3 examples/automodel/pretrain/pretrain.py -c examples/automodel/pretrain/wan2_1_t2v_flow.yamlMulti GPU
uv run --group automodel --group torch-cu124 --with . torchrun --nproc-per-node=2 examples/automodel/pretrain/pretrain.py -c examples/automodel/pretrain/wan2_1_t2v_flow.yaml