[Feature] support nvfp4 tbo#7259
Conversation
|
root seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
|
Thanks for your contribution! |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #7259 +/- ##
==========================================
Coverage ? 73.84%
==========================================
Files ? 383
Lines ? 53548
Branches ? 8390
==========================================
Hits ? 39543
Misses ? 11238
Partials ? 2767
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
PaddlePaddle-bot
left a comment
There was a problem hiding this comment.
📋 Review 摘要
PR 概述:为 NVFP4 MoE 添加 TBO (Thread Batching Optimization) 支持
变更范围:model_executor/layers/quantization/nvfp4.py
影响面 Tag:[Feature] [Quantization]
📝 PR 规范检查
✅ PR 标题 [Feature] support nvfp4 tbo 符合规范
✅ PR 描述填写了 Motivation 字段
建议补全 Checklist:
- Add at least a tag in the PR title.
[Feature] - Format your code, run
pre-commitbefore commit. - Add unit tests. Please write the reason in this PR if no unit tests.
问题
| 级别 | 文件 | 概述 |
|---|---|---|
| 🟡 建议 | nvfp4.py:708 |
global_values 全局字典只写入未读取,疑似调试代码遗留 |
| 🟡 建议 | nvfp4.py:730 |
大量 tensor 被存储到全局字典,可能导致内存泄漏 |
总体评价
PR 实现了 NVFP4 MoE 的 TBO 支持,整体逻辑正确。与 fused_moe_blackwell_backend.py 和 fused_moe_deepgemm_backend.py 的实现模式保持一致。存在一处遗留的调试代码,建议删除以避免潜在内存泄漏问题。
| if self.ep_prefill_runner.ep_engine.async_finish: | ||
| event.current_stream_wait() | ||
|
|
||
| global global_values |
There was a problem hiding this comment.
🟡 建议 global_values 全局字典只写入未读取,疑似调试代码遗留。
整个代码库中没有任何地方读取 global_values 中存储的 tensor。建议删除相关代码(第 708-731 行、797 行、807 行)或添加注释说明其调试用途。
| global_values[thread_name]["recv_x_scale"] = recv_x_scale | ||
| global_values[thread_name]["recv_topk_idx"] = recv_topk_idx | ||
| global_values[thread_name]["recv_topk_weights"] = recv_topk_weights | ||
| global_values[thread_name]["handle"] = handle |
There was a problem hiding this comment.
🟡 建议 大量 tensor 被存储到全局字典但从未读取,可能导致内存泄漏。
global_values 是模块级全局变量,存储了 x、recv_x_value、handle 等多个 tensor 对象。这些对象会被持续引用,在高并发场景下可能导致内存无法及时回收。
建议:如果这些存储确实用于调试,应在生产环境中移除;如果需要保留,考虑使用 weakref 或定期清理机制。
Motivation
PR 概述:为 NVFP4 MoE 添加 TBO (Two Batch Overlap) 支持
Modifications
Usage or Command
Accuracy Tests
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.