[FDConfig] disable use_sequence_parallel_moe default #5222

yuanlehome · 2025-11-25T10:11:16Z

Motivation

use_sequence_parallel_moe目前在混合式以及PD分离的D节点时，开启cudagraph会hang，先临时默认关闭。

Modifications

Usage or Command

Accuracy Tests

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

paddle-bot · 2025-11-25T10:11:23Z

Thanks for your contribution!

Copilot

Pull request overview

This PR temporarily disables use_sequence_parallel_moe by default when CUDAGraph is enabled in mixed mode and PD disaggregation decode nodes to work around a hang issue. The change adds configuration checks that automatically set use_sequence_parallel_moe to False when these incompatible conditions are detected.

Adds automatic disabling of use_sequence_parallel_moe when using CUDAGraph in mixed and decode splitwise modes
Removes trailing whitespace from requirements.txt

Reviewed changes

Copilot reviewed 1 out of 2 changed files in this pull request and generated 7 comments.

File	Description
fastdeploy/config.py	Adds conditional logic to disable `use_sequence_parallel_moe` for mixed and decode modes when CUDAGraph is enabled, with corresponding warning messages
requirements.txt	Removes trailing whitespace from line 40 (formatting fix)

Copilot · 2025-11-25T10:15:24Z

fastdeploy/config.py

+            if self.parallel_config.use_sequence_parallel_moe and self.graph_opt_config.use_cudagraph:
+                self.parallel_config.use_sequence_parallel_moe = False
+                logger.info(
+                    "Warning: sequence parallel moe do not support Mixed mode with cudagraph. We set use_sequence_parallel_moe to False."
+                )
            self.model_config.moe_phase = MoEPhase(phase="prefill")
        elif self.scheduler_config.splitwise_role == "prefill":
            self.model_config.moe_phase = MoEPhase(phase="prefill")
        elif self.scheduler_config.splitwise_role == "decode":
+            if self.parallel_config.use_sequence_parallel_moe and self.graph_opt_config.use_cudagraph:
+                self.parallel_config.use_sequence_parallel_moe = False
+                logger.info(
+                    "Warning: sequence parallel moe do not support PD's decode node with cudagraph. We set use_sequence_parallel_moe to False."
+                )


This code duplicates the same logic for disabling use_sequence_parallel_moe when using cudagraph in both "mixed" and "decode" modes. Consider extracting this into a helper method or consolidating the check to improve maintainability and reduce code duplication.

Example refactoring:

def _disable_sequence_parallel_moe_if_needed(self, mode_name): if self.parallel_config.use_sequence_parallel_moe and self.graph_opt_config.use_cudagraph: self.parallel_config.use_sequence_parallel_moe = False logger.warning( f"Sequence parallel MoE does not support {mode_name} mode with cudagraph. " "Setting use_sequence_parallel_moe to False." ) # Then use: if self.scheduler_config.splitwise_role == "mixed": self._disable_sequence_parallel_moe_if_needed("Mixed") self.model_config.moe_phase = MoEPhase(phase="prefill") elif self.scheduler_config.splitwise_role == "decode": self._disable_sequence_parallel_moe_if_needed("PD's decode node") self.model_config.moe_phase = MoEPhase(phase="decode")

Copilot · 2025-11-25T10:15:25Z

fastdeploy/config.py

        if self.scheduler_config.splitwise_role == "mixed":
+            if self.parallel_config.use_sequence_parallel_moe and self.graph_opt_config.use_cudagraph:
+                self.parallel_config.use_sequence_parallel_moe = False
+                logger.info(


Use logger.warning() instead of logger.info() for warning messages. This message is alerting users about an automatic configuration change due to incompatibility, which should be logged at the warning level for better visibility and consistency with similar patterns in the codebase (see line 1648).

Suggested change:

logger.warning( "Warning: sequence parallel moe do not support Mixed mode with cudagraph. We set use_sequence_parallel_moe to False." )

Suggested change

logger.info(

logger.warning(

Copilot · 2025-11-25T10:15:25Z

fastdeploy/config.py

        elif self.scheduler_config.splitwise_role == "decode":
+            if self.parallel_config.use_sequence_parallel_moe and self.graph_opt_config.use_cudagraph:
+                self.parallel_config.use_sequence_parallel_moe = False
+                logger.info(


Use logger.warning() instead of logger.info() for warning messages. This message is alerting users about an automatic configuration change due to incompatibility, which should be logged at the warning level for better visibility and consistency with similar patterns in the codebase (see line 1648).

Suggested change:

logger.warning( "Warning: sequence parallel moe do not support PD's decode node with cudagraph. We set use_sequence_parallel_moe to False." )

Suggested change

logger.info(

logger.warning(

Copilot · 2025-11-25T10:15:25Z

fastdeploy/config.py

+            if self.parallel_config.use_sequence_parallel_moe and self.graph_opt_config.use_cudagraph:
+                self.parallel_config.use_sequence_parallel_moe = False
+                logger.info(
+                    "Warning: sequence parallel moe do not support Mixed mode with cudagraph. We set use_sequence_parallel_moe to False."


Grammar error: "sequence parallel moe do not support" should be "sequence parallel moe does not support". The subject is singular, so it requires "does not" instead of "do not".

Suggested change

"Warning: sequence parallel moe do not support Mixed mode with cudagraph. We set use_sequence_parallel_moe to False."

"Warning: sequence parallel moe does not support Mixed mode with cudagraph. We set use_sequence_parallel_moe to False."

Copilot · 2025-11-25T10:15:26Z

fastdeploy/config.py

+            if self.parallel_config.use_sequence_parallel_moe and self.graph_opt_config.use_cudagraph:
+                self.parallel_config.use_sequence_parallel_moe = False
+                logger.info(
+                    "Warning: sequence parallel moe do not support PD's decode node with cudagraph. We set use_sequence_parallel_moe to False."


Grammar error: "sequence parallel moe do not support" should be "sequence parallel moe does not support". The subject is singular, so it requires "does not" instead of "do not".

Copilot · 2025-11-25T10:15:26Z

fastdeploy/config.py

+            if self.parallel_config.use_sequence_parallel_moe and self.graph_opt_config.use_cudagraph:
+                self.parallel_config.use_sequence_parallel_moe = False
+                logger.info(
+                    "Warning: sequence parallel moe do not support Mixed mode with cudagraph. We set use_sequence_parallel_moe to False."


The warning message starts with "Warning:" which is redundant when using a logger. When using logger.warning() (as recommended), the log level is already indicated by the logger. The message should directly state the issue.

Suggested change:

"Sequence parallel MoE does not support Mixed mode with cudagraph. Setting use_sequence_parallel_moe to False."

Suggested change

"Warning: sequence parallel moe do not support Mixed mode with cudagraph. We set use_sequence_parallel_moe to False."

"Sequence parallel MoE does not support Mixed mode with cudagraph. Setting use_sequence_parallel_moe to False."

Copilot · 2025-11-25T10:15:26Z

fastdeploy/config.py

+            if self.parallel_config.use_sequence_parallel_moe and self.graph_opt_config.use_cudagraph:
+                self.parallel_config.use_sequence_parallel_moe = False
+                logger.info(
+                    "Warning: sequence parallel moe do not support PD's decode node with cudagraph. We set use_sequence_parallel_moe to False."


The warning message starts with "Warning:" which is redundant when using a logger. When using logger.warning() (as recommended), the log level is already indicated by the logger. The message should directly state the issue.

Suggested change:

"Sequence parallel MoE does not support PD's decode node with cudagraph. Setting use_sequence_parallel_moe to False."

Suggested change

"Warning: sequence parallel moe do not support PD's decode node with cudagraph. We set use_sequence_parallel_moe to False."

"Sequence parallel MoE does not support PD's decode node with cudagraph. Setting use_sequence_parallel_moe to False."

codecov-commenter · 2025-11-25T12:18:32Z

Codecov Report

❌ Patch coverage is 50.00000% with 3 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@e6b4b1f). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
fastdeploy/config.py	50.00%	2 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             develop    #5222   +/-   ##
==========================================
  Coverage           ?   59.70%           
==========================================
  Files              ?      317           
  Lines              ?    38695           
  Branches           ?     5818           
==========================================
  Hits               ?    23104           
  Misses             ?    13764           
  Partials           ?     1827

Flag	Coverage Δ
GPU	`59.70% <50.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

disable use_sequence_parallel_moe default

d6f8afd

Copilot AI review requested due to automatic review settings November 25, 2025 10:11

Copilot started reviewing on behalf of yuanlehome November 25, 2025 10:11 View session

Copilot finished reviewing on behalf of yuanlehome November 25, 2025 10:13

Copilot AI reviewed Nov 25, 2025

View reviewed changes

update

a920d6c

gongshaotian approved these changes Nov 25, 2025

View reviewed changes

Jiang-Jia-Jun approved these changes Nov 25, 2025

View reviewed changes

Merge branch 'develop' into disable_use_sequence_parallel_moe

1eb7609

Jiang-Jia-Jun merged commit 66e096d into PaddlePaddle:develop Nov 25, 2025
4 of 8 checks passed

yuanlehome mentioned this pull request Nov 27, 2025

[Cherry-Pick] Add method to disable sequence parallel MoE if needed(#5222) #5268

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FDConfig] disable use_sequence_parallel_moe default #5222

[FDConfig] disable use_sequence_parallel_moe default #5222

Uh oh!

yuanlehome commented Nov 25, 2025

Uh oh!

paddle-bot bot commented Nov 25, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Nov 25, 2025

Uh oh!

Copilot AI Nov 25, 2025

Uh oh!

Copilot AI Nov 25, 2025

Uh oh!

Copilot AI Nov 25, 2025

Uh oh!

Copilot AI Nov 25, 2025

Uh oh!

Copilot AI Nov 25, 2025

Uh oh!

Copilot AI Nov 25, 2025

Uh oh!

codecov-commenter commented Nov 25, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	"Warning: sequence parallel moe do not support Mixed mode with cudagraph. We set use_sequence_parallel_moe to False."
	"Warning: sequence parallel moe does not support Mixed mode with cudagraph. We set use_sequence_parallel_moe to False."

	"Warning: sequence parallel moe do not support Mixed mode with cudagraph. We set use_sequence_parallel_moe to False."
	"Sequence parallel MoE does not support Mixed mode with cudagraph. Setting use_sequence_parallel_moe to False."

[FDConfig] disable use_sequence_parallel_moe default #5222

[FDConfig] disable use_sequence_parallel_moe default #5222

Uh oh!

Conversation

yuanlehome commented Nov 25, 2025

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

paddle-bot bot commented Nov 25, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov-commenter commented Nov 25, 2025 •

edited

Loading