Refactor CUDA graph API: decompose cuda_graph_scope into full_iteration impl, inference scope, and per-layer capture modules#4292
Conversation
f36afad to
0f07b30
Compare
0f07b30 to
113c46a
Compare
614115f to
9461d5a
Compare
|
/ok to test 542a36d |
|
Can we please make sure that the existing |
@mathemakitten I looked at |
Ah you're right! Thanks for checking. |
mathemakitten
left a comment
There was a problem hiding this comment.
Thank you for the thorough refactor! I've tried my best to catch the outstanding inconsistencies in the docs, mostly related to inference.
| num_cuda_graphs=( | ||
| args.inference_dynamic_batching_num_cuda_graphs | ||
| if args.cuda_graph_impl == "local" | ||
| if args.cuda_graph_impl in ("local", "full_iteration") |
There was a problem hiding this comment.
I don't think we need to keep the full_iteration impl + dynamic inference path here? We only want to setup inference graphs if inference_cuda_graph_scope is used, right?
There was a problem hiding this comment.
Okay, in this way the old and new behavior won't align in the full_iter path. But I'm good if you think this is safe.
I modified it to if args.inference_cuda_graph_scope != InferenceCudaGraphScope.none here. Also in DynamicInferenceEngine.create_cuda_graphs().
| "\n\n*** WARNING: 'full_iteration' CUDA graph scope used during inference! " | ||
| "This will not create inference CUDA graphs. Use '--cuda-graph-scope=full_iteration_inference' instead. ***\n" | ||
| "\n\n*** WARNING: '--cuda-graph-impl=full_iteration' used during inference! " | ||
| "For compatibility, this preserves the legacy '--cuda-graph-modules=full_iteration' " |
There was a problem hiding this comment.
| "For compatibility, this preserves the legacy '--cuda-graph-modules=full_iteration' " | |
| "For compatibility, this preserves the legacy '--cuda-graph-scope=full_iteration' " |
There was a problem hiding this comment.
Following the above idea, I directly judge inference_cuda_graph_scope != InferenceCudaGraphScope.none and cuda_graph_impl == "local" here, so this warning message is no longer needed.
ec3e88a to
aa901d7
Compare
aa901d7 to
86f86b0
Compare
|
/ok to test 86f86b0 |
| } | ||
|
|
||
|
|
||
| class NormalizedCudaGraphModules(NamedTuple): |
There was a problem hiding this comment.
Why do we need a class here? This is only ever returned upon normalize_cuda_graph_modules right? Do we actually take advantage of the NamedTuple functionality?
| ) | ||
|
|
||
|
|
||
| def get_deprecated_cuda_graph_modules_migration( |
There was a problem hiding this comment.
Since this function is only called once and is so short, does it really need a whole function?
| return NormalizedCudaGraphModules(normalized_scopes, deprecated_scopes, False) | ||
|
|
||
|
|
||
| def normalize_inference_cuda_graph_scope( |
There was a problem hiding this comment.
Same comment here, why does this need to be put into a function? Does carving this out into a function really improve readability?
What does this PR do ?
Dev PR #4293
This PR decomposes the overloaded
cuda_graph_scopefield into three dedicated, semantically distinct concepts and cleans up naming throughout.Problem
The old API overloaded
--cuda-graph-scopewith three unrelated concerns:--cuda-graph-scope full_iteration→ full-iteration training graphs (a capture strategy, not a module)--cuda-graph-scope full_iteration_inference→ block-owned inference graphs (inference ownership, not a module)--cuda-graph-scope attn mlp→ per-layer capture regions (the actual intended use)CudaGraphScopemixed iteration-level control flow with per-layer module selection. The three concepts have nothing in common and cannot be meaningfully combined in one field.Solution
Four concrete changes:
full_iterationbecomes its owncuda_graph_implvalue.--cuda-graph-impl full_iterationreplaces--cuda-graph-impl local --cuda-graph-scope full_iteration.--inference-cuda-graph-scopeis a new dedicated field.InferenceCudaGraphScope(none/layer/block) replacesfull_iteration_inferenceincuda_graph_scope. The default for--cuda-graph-impl localislayer(preserving prior behaviour).CudaGraphScopeis renamed toCudaGraphModule;cuda_graph_scopetocuda_graph_modules.With the two non-module values removed, the enum and field names now accurately reflect their purpose: selecting which per-layer modules to capture.
Normalization is centralized in
cuda_graph_config.py.normalize_cuda_graph_modules,normalize_inference_cuda_graph_scope, andvalidate_deprecated_cuda_graph_modules_migration_inputsare shared betweenTransformerConfig.__post_init__andvalidate_args, removing duplication and making the migration logic testable in isolation.Backward compatibility
All deprecated inputs are still accepted and silently migrated at startup:
--enable-cuda-graph--cuda-graph-impl local--external-cuda-graph--cuda-graph-impl transformer_engine--cuda-graph-scope full_iteration--cuda-graph-impl full_iteration--cuda-graph-scope full_iteration_inference--cuda-graph-impl local --inference-cuda-graph-scope block--cuda-graph-scope attn mlp--cuda-graph-modules attn mlpfrom megatron.core.transformer.enums import CudaGraphScopeCudaGraphScopealias kept; useCudaGraphModuleTransformerConfig(cuda_graph_scope=...)cuda_graph_scopefield kept, deprecated; usecuda_graph_modulesConflicting combinations (e.g. passing both a deprecated value and the new flag with an incompatible value) are rejected with a clear assertion.
Contribution process
Pre-checks
Code review
Feel free to message or comment the @mcore-oncall to help accelerate your merge into main. The less complex your PR is, the faster it will be approved and merged!
All PRs start as draft. If you open a non-draft PR, it will be automatically converted to draft.
Step 1: Mark PR as "Ready for Review"
.github/CODEOWNERS.Final Review might get declined if these requirements are not fulfilled.
Step 2: Final Review
For PRs that change
megatron/core, once all expert reviewers have approved, theFinal Reviewlabel is applied automatically and final reviewers are assigned.For PRs outside
megatron/core, this step is skipped.Step 3: Approved
Once all required reviewers have approved, the
Approvedlabel is applied automatically.Merge
Any member of mcore-engineers will be able to merge your PR.
For MRs into `dev` branch
The proposed review process for `dev` branch is under active discussion.MRs are mergable after one approval by either
eharper@nvidia.comorzijiey@nvidia.com.