Skip to content

fix pp dump patch#629

Merged
Xiaoming-AMD merged 2 commits intodev/tas/moe_package_v2.0from
dev/lhz/fix_pp_dump
Mar 26, 2026
Merged

fix pp dump patch#629
Xiaoming-AMD merged 2 commits intodev/tas/moe_package_v2.0from
dev/lhz/fix_pp_dump

Conversation

@lhzhang333
Copy link
Copy Markdown
Collaborator

No description provided.

HuangWei-95 added 2 commits March 25, 2026 18:36
- Add before_train patch to wrap get_forward_backward_func with schedule_wrapper
- Primus pipeline / ZeroBubble: only wrap schedule; handlers already use fwd_bwd_wrapper
- Megatron native: wrap schedule and patch forward_step/backward_step via set_dump_pp_data_patch
- Add guarded schedule_wrapper to avoid double-wrap in legacy MegatronTrainer flow
- Add after_train patch to call dump_pp_data for core runtime
… set_dump_pp_data

- Introduce _make_guarded_set_dump_pp_data_patch to ensure set_dump_pp_data_patch is only applied once, addressing legacy MegatronTrainer calls.
- Update patch_pp_dump_data_before_train to utilize the new guarded version, enhancing stability in the training pipeline.
@Xiaoming-AMD Xiaoming-AMD merged commit b272569 into dev/tas/moe_package_v2.0 Mar 26, 2026
2 checks passed
lhzhang333 added a commit that referenced this pull request Apr 7, 2026
Co-authored-by: HuangWei-95 <weihuan@amd.com>
@lhzhang333 lhzhang333 deleted the dev/lhz/fix_pp_dump branch April 13, 2026 06:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants