[Dev][fix] FSDP EP-overlap CUDA-graph guard uses post-refactor API by buptzyb · Pull Request #4834 · NVIDIA/Megatron-LM

buptzyb · 2026-05-17T02:04:10Z

Summary

PR #3796 (A2A overlap for Megatron-FSDP) introduced a guard in
mcore_fsdp_adapter.py that iterates the legacy config.cuda_graph_scope
list. After PR #4293 normalized cuda_graph_scope to None in
TransformerConfig.__post_init__, this iteration raises
TypeError: 'NoneType' object is not iterable whenever a user combines
overlap_moe_expert_parallel_comm=True with cuda_graph_impl in {"local", "transformer_engine"}.

The bug is currently dormant on dev because:

The functional test deepseek_proxy_fsdp_ep2_fsdp2_ep_overlap does not set --cuda-graph-impl (defaults to "none").
The unit test test_fsdp_1f1b_overlap.py likewise leaves cuda_graph_impl="none".
The outer guard if config.cuda_graph_impl not in ["none", "full_iteration"] short-circuits the buggy iteration in both cases.

But any legitimate config that combines EP overlap with per-layer CUDA graphs will crash immediately. The equivalent silent failure on main (PR #3797, where no outer guard exists) was triggering the analogous merge-queue failures the corresponding refactor PR (#4292) — that fix has already been pushed there.

This PR applies the same migration to dev: drop the legacy for scope in config.cuda_graph_scope iteration and the now-unused CudaGraphScope import, and rely on a single cuda_graph_impl assertion that matches the original intent (per-layer CUDA graphs are not supported with FSDP EP overlap).

Test plan

Verify existing tests/unit_tests/a2a_overlap/test_fsdp_1f1b_overlap.py continues to pass (cuda_graph_impl defaults to "none", same path).
Verify tests/functional_tests/test_cases/moe/deepseek_proxy_fsdp_ep2_fsdp2_ep_overlap continues to pass.
Manually verify a config with --overlap-moe-expert-parallel-comm --cuda-graph-impl=local now produces a clear assertion error instead of a TypeError traceback.

cc @Wohox — friendly ping, this is a follow-up to your #3796 to make it compatible with the post-refactor CUDA-graph API. Please review when you have a moment.

🤖 Generated with Claude Code

PR NVIDIA#3796 ("Support A2A Overlap for Megatron-FSDP") landed on dev with guard logic that iterates the legacy config.cuda_graph_scope list: if config.cuda_graph_impl not in ["none", "full_iteration"]: partial_cuda_graph_scopes = [ scope for scope in config.cuda_graph_scope ... ] After PR NVIDIA#4293 normalized cuda_graph_scope to None in __post_init__, the inner iteration crashes with TypeError: 'NoneType' object is not iterable whenever a user combines overlap_moe_expert_parallel_comm with cuda_graph_impl in {"local", "transformer_engine"}. Drop the legacy iteration; the outer check on cuda_graph_impl is the only signal needed under the new API. Also drop the now-unused CudaGraphScope import. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

copy-pr-bot · 2026-05-17T02:04:14Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

buptzyb · 2026-05-17T02:06:26Z

/ok to test 089a5ed

Wohox

LGTM!

svcnvidia-nemo-ci · 2026-05-18T04:56:15Z

🔄 Merge queue validation started!

You can track the progress here: https://github.com/NVIDIA/Megatron-LM/actions/runs/26014336816

buptzyb requested a review from Wohox May 17, 2026 02:04

copy-pr-bot Bot temporarily deployed to test May 17, 2026 02:07 Inactive

Wohox approved these changes May 17, 2026

View reviewed changes

buptzyb marked this pull request as ready for review May 18, 2026 00:47

buptzyb requested review from a team as code owners May 18, 2026 00:47

svcnvidia-nemo-ci added the complexity: low label May 18, 2026

hxbai approved these changes May 18, 2026

View reviewed changes

hxbai added this pull request to the merge queue May 18, 2026

Merged via the queue into NVIDIA:dev with commit bac68ec May 18, 2026
69 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Dev][fix] FSDP EP-overlap CUDA-graph guard uses post-refactor API#4834

[Dev][fix] FSDP EP-overlap CUDA-graph guard uses post-refactor API#4834
hxbai merged 1 commit into
NVIDIA:devfrom
buptzyb:fix/dev-fsdp-adapter-cuda-graph-api

buptzyb commented May 17, 2026

Uh oh!

copy-pr-bot Bot commented May 17, 2026

Uh oh!

buptzyb commented May 17, 2026

Uh oh!

Wohox left a comment

Uh oh!

svcnvidia-nemo-ci commented May 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

buptzyb commented May 17, 2026

Summary

Test plan

Uh oh!

copy-pr-bot Bot commented May 17, 2026

Uh oh!

buptzyb commented May 17, 2026

Uh oh!

Wohox left a comment

Choose a reason for hiding this comment

Uh oh!

svcnvidia-nemo-ci commented May 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants