Skip to content

Conversation

@akoumpa
Copy link
Contributor

@akoumpa akoumpa commented Nov 13, 2025

  • git clone -b akoumparouli/feat_pretrain_dfm_automodel git@github.com:NVIDIA-NeMo/DFM.git
  • git submodule update --init --recursive 3rdparty/

Single GPU

  • uv run --group automodel --group torch-cu124 --with . python3 examples/automodel/pretrain/pretrain.py -c examples/automodel/pretrain/wan2_1_t2v_flow.yaml

Multi GPU

  • uv run --group automodel --group torch-cu124 --with . torchrun --nproc-per-node=2 examples/automodel/pretrain/pretrain.py -c examples/automodel/pretrain/wan2_1_t2v_flow.yaml

@copy-pr-bot
Copy link

copy-pr-bot bot commented Nov 13, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@akoumpa akoumpa changed the title Akoumparouli/feat pretrain dfm automodel feat: pretrain dfm automodel Nov 13, 2025
@akoumpa akoumpa marked this pull request as draft November 13, 2025 00:04
@linnanwang linnanwang self-requested a review November 13, 2025 00:12
@linnanwang linnanwang self-assigned this Nov 13, 2025
linnanwang
linnanwang previously approved these changes Nov 13, 2025
Copy link
Contributor

@linnanwang linnanwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These looks good to me, thank you Alex

@akoumpa akoumpa force-pushed the akoumparouli/feat_pretrain_dfm_automodel branch from efea72b to 0ba1055 Compare November 13, 2025 07:04
@akoumpa
Copy link
Contributor Author

akoumpa commented Nov 13, 2025

/ok to test 3c64e65

@copy-pr-bot
Copy link

copy-pr-bot bot commented Nov 13, 2025

/ok to test 3c64e65

@akoumpa, there was an error processing your request: E2

See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/2/

@akoumpa
Copy link
Contributor Author

akoumpa commented Nov 13, 2025

/ok to test fa1b851

linnanwang
linnanwang previously approved these changes Nov 13, 2025
Copy link
Contributor

@linnanwang linnanwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@akoumpa akoumpa marked this pull request as ready for review November 13, 2025 18:12
@akoumpa
Copy link
Contributor Author

akoumpa commented Nov 13, 2025

/ok to test a867798

@akoumpa
Copy link
Contributor Author

akoumpa commented Nov 13, 2025

/ok to test c3ea450

@akoumpa akoumpa closed this Nov 13, 2025
@akoumpa akoumpa reopened this Nov 13, 2025
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
@akoumpa akoumpa force-pushed the akoumparouli/feat_pretrain_dfm_automodel branch from 0668fe9 to 90f9bbc Compare November 17, 2025 18:25
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
@akoumpa
Copy link
Contributor Author

akoumpa commented Nov 17, 2025

/ok to test a0c5367

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
@akoumpa
Copy link
Contributor Author

akoumpa commented Nov 17, 2025

/ok to test 7b108d1

Copy link
Contributor

@linnanwang linnanwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tested and everthing looks good.

@linnanwang
Copy link
Contributor

/ok to test 2d48619

@copy-pr-bot
Copy link

copy-pr-bot bot commented Nov 17, 2025

/ok to test 2d48619

@linnanwang, there was an error processing your request: E2

See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/2/

@linnanwang
Copy link
Contributor

/ok to test 7b108d1

@akoumpa akoumpa merged commit 19753e8 into main Nov 17, 2025
16 checks passed
linnanwang pushed a commit that referenced this pull request Nov 17, 2025
* init

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* add sigma_min/amx

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* add sigma_min/max

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* rename fientune.py to train.py

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* add from_config

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* pass scheduler and model

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* update param

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* introduce NeMoWanPipeline

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* add mode

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* update build_model_and_optimizer

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* update

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* update NeMoWanPipeline

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* rename

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* move examples

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* move

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix imports

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* lint

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* more lint

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix import

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix 3rdparty & pyproject

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* add torch

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* update uv.lock

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* update

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* revert 3rdparty

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* update uv.lock

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* update uv.lock

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

---------

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
lbliii pushed a commit that referenced this pull request Nov 19, 2025
* init

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* add sigma_min/amx

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* add sigma_min/max

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* rename fientune.py to train.py

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* add from_config

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* pass scheduler and model

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* update param

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* introduce NeMoWanPipeline

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* add mode

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* update build_model_and_optimizer

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* update

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* update NeMoWanPipeline

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* rename

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* move examples

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* move

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix imports

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* lint

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* more lint

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix import

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix 3rdparty & pyproject

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* add torch

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* update uv.lock

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* update

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* revert 3rdparty

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* update uv.lock

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* update uv.lock

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

---------

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants