[Optimization] Auto set num_max_dispatch_tokens_per_rank#7237
[Optimization] Auto set num_max_dispatch_tokens_per_rank#7237freeliuzc merged 6 commits intoPaddlePaddle:developfrom
Conversation
|
Thanks for your contribution! |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #7237 +/- ##
==========================================
Coverage ? 73.52%
==========================================
Files ? 383
Lines ? 53644
Branches ? 8421
==========================================
Hits ? 39440
Misses ? 11524
Partials ? 2680
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…into auto_dispatch_tokens
PaddlePaddle-bot
left a comment
There was a problem hiding this comment.
🤖 AI Code Review |
2026-04-15
📋 Review 摘要
PR 概述:自动计算并设置 num_max_dispatch_tokens_per_rank 参数,根据 max_num_seqs 和投机解码配置动态调整
变更范围:fastdeploy/config.py
影响面 Tag:[FDConfig]
问题
| 级别 | 文件 | 概述 |
|---|---|---|
| 🔴 Bug | fastdeploy/config.py:2173 |
变量 num_spec_tokens 作用域错误,投机解码关闭时会导致 NameError |
总体评价
PR 的优化意图合理,能够提升用户体验(无需手动同步配置),但存在一个阻塞性 Bug 需要修复:变量 num_spec_tokens 在投机解码关闭时未定义,会导致运行时错误。修复后建议添加单元测试覆盖各种场景。
| f"Auto-setting num_max_dispatch_tokens_per_rank from " | ||
| f"{self.model_config.num_max_dispatch_tokens_per_rank} to {auto_dispatch_tokens} " | ||
| f"(max_num_seqs={self.scheduler_config.max_num_seqs}" | ||
| f"{f', num_speculative_tokens={num_spec_tokens}' if self.speculative_config is not None and self.speculative_config.method is not None else ''})." |
There was a problem hiding this comment.
🔴 Bug 变量作用域错误
num_spec_tokens 变量只在投机解码开启时(第 2161 行)才被定义,但在第 2173 行的日志中使用了该变量。当投机解码关闭时,num_spec_tokens 未定义会导致 NameError。
虽然 f-string 中有条件判断 if ... else '',但 Python 在编译时会先评估所有表达式,所以即使条件为 False,num_spec_tokens 也必须存在。
修复建议:
# 在使用前先定义 num_spec_tokens
num_spec_tokens = getattr(
self.speculative_config, "num_speculative_tokens", 0
) if self.speculative_config is not None and self.speculative_config.method is not None else 0
if self.speculative_config is not None and self.speculative_config.method is not None:
num_spec_tokens = self.speculative_config.num_speculative_tokens
auto_dispatch_tokens = self.scheduler_config.max_num_seqs * (num_spec_tokens + 1)
else:
auto_dispatch_tokens = self.scheduler_config.max_num_seqs或者在日志中使用更简单的表达式:
logger.info(
f"Auto-setting num_max_dispatch_tokens_per_rank from "
f"{self.model_config.num_max_dispatch_tokens_per_rank} to {auto_dispatch_tokens} "
f"(max_num_seqs={self.scheduler_config.max_num_seqs}"
f"{', num_speculative_tokens=' + str(self.speculative_config.num_speculative_tokens) if self.speculative_config is not None and self.speculative_config.method is not None else ''})."
)…e#7237) * auto set num_max_dispatch_tokens_per_rank * fix ci * fix ci * fix ci
Motivation
当前低时延EP通信所需要的num_max_dispatch_tokens_per_rank参数只能通过model目录中的config.json指定,在max_num_seqs发生改变时还需要手动设置不太友好
Modifications
考虑到当前需要改变这个参数的场景仅有
max_num_seqs即可max_num_seqs * (num_spec_tokens+1)Usage or Command
Accuracy Tests
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.