Skip to content

[Optimization] Auto set num_max_dispatch_tokens_per_rank#7237

Merged
freeliuzc merged 6 commits intoPaddlePaddle:developfrom
RichardWooSJTU:auto_dispatch_tokens
Apr 15, 2026
Merged

[Optimization] Auto set num_max_dispatch_tokens_per_rank#7237
freeliuzc merged 6 commits intoPaddlePaddle:developfrom
RichardWooSJTU:auto_dispatch_tokens

Conversation

@RichardWooSJTU
Copy link
Copy Markdown
Collaborator

@RichardWooSJTU RichardWooSJTU commented Apr 8, 2026

Motivation

当前低时延EP通信所需要的num_max_dispatch_tokens_per_rank参数只能通过model目录中的config.json指定,在max_num_seqs发生改变时还需要手动设置不太友好

💡 If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending the original PR ID at the end. For example, [Cherry-Pick][CI] Add check trigger and logic(#5191)

💡 如若此PR是Cherry Pick,PR标题需遵循格式,在最开始加上[Cherry-Pick]标签,以及最后面加上原PR ID,例如[Cherry-Pick][CI] Add check trigger and logic(#5191)

Modifications

考虑到当前需要改变这个参数的场景仅有

  1. 关闭投机解码时,设置为max_num_seqs即可
  2. 打开投机解码是,设置为max_num_seqs * (num_spec_tokens+1)

Usage or Command

Accuracy Tests

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

@paddle-bot
Copy link
Copy Markdown

paddle-bot bot commented Apr 8, 2026

Thanks for your contribution!

PaddlePaddle-bot

This comment was marked as outdated.

PaddlePaddle-bot

This comment was marked as outdated.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 9, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (develop@e0a1653). Learn more about missing BASE report.

Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #7237   +/-   ##
==========================================
  Coverage           ?   73.52%           
==========================================
  Files              ?      383           
  Lines              ?    53644           
  Branches           ?     8421           
==========================================
  Hits               ?    39440           
  Misses             ?    11524           
  Partials           ?     2680           
Flag Coverage Δ
GPU 73.52% <100.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

PaddlePaddle-bot

This comment was marked as outdated.

PaddlePaddle-bot

This comment was marked as outdated.

Copy link
Copy Markdown

@PaddlePaddle-bot PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Code Review | 2026-04-15

📋 Review 摘要

PR 概述:自动计算并设置 num_max_dispatch_tokens_per_rank 参数,根据 max_num_seqs 和投机解码配置动态调整

变更范围fastdeploy/config.py

影响面 Tag[FDConfig]

问题

级别 文件 概述
🔴 Bug fastdeploy/config.py:2173 变量 num_spec_tokens 作用域错误,投机解码关闭时会导致 NameError

总体评价

PR 的优化意图合理,能够提升用户体验(无需手动同步配置),但存在一个阻塞性 Bug 需要修复:变量 num_spec_tokens 在投机解码关闭时未定义,会导致运行时错误。修复后建议添加单元测试覆盖各种场景。

Comment thread fastdeploy/config.py
f"Auto-setting num_max_dispatch_tokens_per_rank from "
f"{self.model_config.num_max_dispatch_tokens_per_rank} to {auto_dispatch_tokens} "
f"(max_num_seqs={self.scheduler_config.max_num_seqs}"
f"{f', num_speculative_tokens={num_spec_tokens}' if self.speculative_config is not None and self.speculative_config.method is not None else ''})."
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug 变量作用域错误

num_spec_tokens 变量只在投机解码开启时(第 2161 行)才被定义,但在第 2173 行的日志中使用了该变量。当投机解码关闭时,num_spec_tokens 未定义会导致 NameError

虽然 f-string 中有条件判断 if ... else '',但 Python 在编译时会先评估所有表达式,所以即使条件为 False,num_spec_tokens 也必须存在。

修复建议

# 在使用前先定义 num_spec_tokens
num_spec_tokens = getattr(
    self.speculative_config, "num_speculative_tokens", 0
) if self.speculative_config is not None and self.speculative_config.method is not None else 0

if self.speculative_config is not None and self.speculative_config.method is not None:
    num_spec_tokens = self.speculative_config.num_speculative_tokens
    auto_dispatch_tokens = self.scheduler_config.max_num_seqs * (num_spec_tokens + 1)
else:
    auto_dispatch_tokens = self.scheduler_config.max_num_seqs

或者在日志中使用更简单的表达式:

logger.info(
    f"Auto-setting num_max_dispatch_tokens_per_rank from "
    f"{self.model_config.num_max_dispatch_tokens_per_rank} to {auto_dispatch_tokens} "
    f"(max_num_seqs={self.scheduler_config.max_num_seqs}"
    f"{', num_speculative_tokens=' + str(self.speculative_config.num_speculative_tokens) if self.speculative_config is not None and self.speculative_config.method is not None else ''})."
)

@freeliuzc freeliuzc merged commit dec0b06 into PaddlePaddle:develop Apr 15, 2026
35 of 38 checks passed
RichardWooSJTU added a commit to RichardWooSJTU/FastDeploy that referenced this pull request Apr 16, 2026
…e#7237)

* auto set num_max_dispatch_tokens_per_rank

* fix ci

* fix ci

* fix ci
RichardWooSJTU added a commit that referenced this pull request Apr 16, 2026
)(#7426) (#7436)

* [Optimization] Auto set num_max_dispatch_tokens_per_rank (#7237)

* auto set num_max_dispatch_tokens_per_rank

* fix ci

* fix ci

* fix ci

* fix deep gemm import (#7425)

* allow parallel dp starting (#7426)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants