-
Notifications
You must be signed in to change notification settings - Fork 659
[FDConfig] disable use_sequence_parallel_moe default #5222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FDConfig] disable use_sequence_parallel_moe default #5222
Conversation
|
Thanks for your contribution! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR temporarily disables use_sequence_parallel_moe by default when CUDAGraph is enabled in mixed mode and PD disaggregation decode nodes to work around a hang issue. The change adds configuration checks that automatically set use_sequence_parallel_moe to False when these incompatible conditions are detected.
- Adds automatic disabling of
use_sequence_parallel_moewhen using CUDAGraph in mixed and decode splitwise modes - Removes trailing whitespace from
requirements.txt
Reviewed changes
Copilot reviewed 1 out of 2 changed files in this pull request and generated 7 comments.
| File | Description |
|---|---|
| fastdeploy/config.py | Adds conditional logic to disable use_sequence_parallel_moe for mixed and decode modes when CUDAGraph is enabled, with corresponding warning messages |
| requirements.txt | Removes trailing whitespace from line 40 (formatting fix) |
fastdeploy/config.py
Outdated
| if self.parallel_config.use_sequence_parallel_moe and self.graph_opt_config.use_cudagraph: | ||
| self.parallel_config.use_sequence_parallel_moe = False | ||
| logger.info( | ||
| "Warning: sequence parallel moe do not support Mixed mode with cudagraph. We set use_sequence_parallel_moe to False." | ||
| ) | ||
| self.model_config.moe_phase = MoEPhase(phase="prefill") | ||
| elif self.scheduler_config.splitwise_role == "prefill": | ||
| self.model_config.moe_phase = MoEPhase(phase="prefill") | ||
| elif self.scheduler_config.splitwise_role == "decode": | ||
| if self.parallel_config.use_sequence_parallel_moe and self.graph_opt_config.use_cudagraph: | ||
| self.parallel_config.use_sequence_parallel_moe = False | ||
| logger.info( | ||
| "Warning: sequence parallel moe do not support PD's decode node with cudagraph. We set use_sequence_parallel_moe to False." | ||
| ) |
Copilot
AI
Nov 25, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code duplicates the same logic for disabling use_sequence_parallel_moe when using cudagraph in both "mixed" and "decode" modes. Consider extracting this into a helper method or consolidating the check to improve maintainability and reduce code duplication.
Example refactoring:
def _disable_sequence_parallel_moe_if_needed(self, mode_name):
if self.parallel_config.use_sequence_parallel_moe and self.graph_opt_config.use_cudagraph:
self.parallel_config.use_sequence_parallel_moe = False
logger.warning(
f"Sequence parallel MoE does not support {mode_name} mode with cudagraph. "
"Setting use_sequence_parallel_moe to False."
)
# Then use:
if self.scheduler_config.splitwise_role == "mixed":
self._disable_sequence_parallel_moe_if_needed("Mixed")
self.model_config.moe_phase = MoEPhase(phase="prefill")
elif self.scheduler_config.splitwise_role == "decode":
self._disable_sequence_parallel_moe_if_needed("PD's decode node")
self.model_config.moe_phase = MoEPhase(phase="decode")
fastdeploy/config.py
Outdated
| if self.scheduler_config.splitwise_role == "mixed": | ||
| if self.parallel_config.use_sequence_parallel_moe and self.graph_opt_config.use_cudagraph: | ||
| self.parallel_config.use_sequence_parallel_moe = False | ||
| logger.info( |
Copilot
AI
Nov 25, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use logger.warning() instead of logger.info() for warning messages. This message is alerting users about an automatic configuration change due to incompatibility, which should be logged at the warning level for better visibility and consistency with similar patterns in the codebase (see line 1648).
Suggested change:
logger.warning(
"Warning: sequence parallel moe do not support Mixed mode with cudagraph. We set use_sequence_parallel_moe to False."
)| logger.info( | |
| logger.warning( |
fastdeploy/config.py
Outdated
| elif self.scheduler_config.splitwise_role == "decode": | ||
| if self.parallel_config.use_sequence_parallel_moe and self.graph_opt_config.use_cudagraph: | ||
| self.parallel_config.use_sequence_parallel_moe = False | ||
| logger.info( |
Copilot
AI
Nov 25, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use logger.warning() instead of logger.info() for warning messages. This message is alerting users about an automatic configuration change due to incompatibility, which should be logged at the warning level for better visibility and consistency with similar patterns in the codebase (see line 1648).
Suggested change:
logger.warning(
"Warning: sequence parallel moe do not support PD's decode node with cudagraph. We set use_sequence_parallel_moe to False."
)| logger.info( | |
| logger.warning( |
fastdeploy/config.py
Outdated
| if self.parallel_config.use_sequence_parallel_moe and self.graph_opt_config.use_cudagraph: | ||
| self.parallel_config.use_sequence_parallel_moe = False | ||
| logger.info( | ||
| "Warning: sequence parallel moe do not support Mixed mode with cudagraph. We set use_sequence_parallel_moe to False." |
Copilot
AI
Nov 25, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Grammar error: "sequence parallel moe do not support" should be "sequence parallel moe does not support". The subject is singular, so it requires "does not" instead of "do not".
| "Warning: sequence parallel moe do not support Mixed mode with cudagraph. We set use_sequence_parallel_moe to False." | |
| "Warning: sequence parallel moe does not support Mixed mode with cudagraph. We set use_sequence_parallel_moe to False." |
fastdeploy/config.py
Outdated
| if self.parallel_config.use_sequence_parallel_moe and self.graph_opt_config.use_cudagraph: | ||
| self.parallel_config.use_sequence_parallel_moe = False | ||
| logger.info( | ||
| "Warning: sequence parallel moe do not support PD's decode node with cudagraph. We set use_sequence_parallel_moe to False." |
Copilot
AI
Nov 25, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Grammar error: "sequence parallel moe do not support" should be "sequence parallel moe does not support". The subject is singular, so it requires "does not" instead of "do not".
fastdeploy/config.py
Outdated
| if self.parallel_config.use_sequence_parallel_moe and self.graph_opt_config.use_cudagraph: | ||
| self.parallel_config.use_sequence_parallel_moe = False | ||
| logger.info( | ||
| "Warning: sequence parallel moe do not support Mixed mode with cudagraph. We set use_sequence_parallel_moe to False." |
Copilot
AI
Nov 25, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The warning message starts with "Warning:" which is redundant when using a logger. When using logger.warning() (as recommended), the log level is already indicated by the logger. The message should directly state the issue.
Suggested change:
"Sequence parallel MoE does not support Mixed mode with cudagraph. Setting use_sequence_parallel_moe to False."| "Warning: sequence parallel moe do not support Mixed mode with cudagraph. We set use_sequence_parallel_moe to False." | |
| "Sequence parallel MoE does not support Mixed mode with cudagraph. Setting use_sequence_parallel_moe to False." |
fastdeploy/config.py
Outdated
| if self.parallel_config.use_sequence_parallel_moe and self.graph_opt_config.use_cudagraph: | ||
| self.parallel_config.use_sequence_parallel_moe = False | ||
| logger.info( | ||
| "Warning: sequence parallel moe do not support PD's decode node with cudagraph. We set use_sequence_parallel_moe to False." |
Copilot
AI
Nov 25, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The warning message starts with "Warning:" which is redundant when using a logger. When using logger.warning() (as recommended), the log level is already indicated by the logger. The message should directly state the issue.
Suggested change:
"Sequence parallel MoE does not support PD's decode node with cudagraph. Setting use_sequence_parallel_moe to False."| "Warning: sequence parallel moe do not support PD's decode node with cudagraph. We set use_sequence_parallel_moe to False." | |
| "Sequence parallel MoE does not support PD's decode node with cudagraph. Setting use_sequence_parallel_moe to False." |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #5222 +/- ##
==========================================
Coverage ? 59.70%
==========================================
Files ? 317
Lines ? 38695
Branches ? 5818
==========================================
Hits ? 23104
Misses ? 13764
Partials ? 1827
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Motivation
use_sequence_parallel_moe目前在混合式以及PD分离的D节点时,开启cudagraph会hang,先临时默认关闭。
Modifications
Usage or Command
Accuracy Tests
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.