[Cherry-Pick][Optimization] Enable text-only deployment for multimodal models#7234
Conversation
|
Thanks for your contribution! |
|
liuruian seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
fastdeploy-bot
left a comment
There was a problem hiding this comment.
🤖 AI Code Review |
2026-04-08 12:17 CST
📋 Review 摘要
PR 概述:为多模态模型提供纯文本部署模式,通过 --deploy-modality 'text' 开关获得干净的纯文本 runtime,提升 QPS 约 2.5 倍。
变更范围:config、engine、worker、attention backend
影响面 Tag:[Optimization] [Engine] [Models]
问题
| 级别 | 文件 | 概述 |
|---|---|---|
| 🟡 建议 | fastdeploy/worker/input_batch.py:831 |
ProposerInputBatch 中存在重复的 tensor 初始化 |
| ❓ 疑问 | fastdeploy/config.py:1941 |
直接修改 model_config 属性的设计 |
总体评价
PR 核心逻辑正确,引入 enable_mm_runtime 和 enable_rope_3d_runtime 属性有效区分了"模型是否支持多模态"与"是否启用多模态 runtime",测试文件也正确更新了 mock 配置。建议修复 ProposerInputBatch.init_share_inputs 中的重复初始化代码。
| -1, | ||
| dtype="int32", | ||
| ) | ||
| self.attn_mask_offsets = paddle.full( |
There was a problem hiding this comment.
🟡 建议 代码中存在重复的 tensor 初始化。attn_mask_offsets、attn_mask_offsets_full 和 attn_mask_offsets_decoder 在同一方法中被初始化了两次(第 817-821 行和第 831-839 行),虽然参数相同不影响功能,但浪费了内存分配和计算资源。
建议删除第 831-839 行的重复初始化代码。
| logger.info( | ||
| "Deploy modality is text; forcing the multimodal-capable model onto the 2D RoPE runtime path." | ||
| ) | ||
| setattr(self.model_config, "rope_3d", False) |
There was a problem hiding this comment.
❓ 疑问 直接修改 model_config 的 rope_3d 和 use_3d_rope 属性。虽然这不是 bug,且代码中有对应的日志说明,但直接修改模型配置对象可能不是最佳实践。
更好的方式是让 enable_rope_3d_runtime 属性直接检查 deploy_modality,而不是修改 model_config。这样能避免修改原始配置对象,使逻辑更清晰。
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## release/2.5 #7234 +/- ##
==============================================
Coverage ? 69.48%
==============================================
Files ? 390
Lines ? 54382
Branches ? 8574
==============================================
Hits ? 37788
Misses ? 13866
Partials ? 2728
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
46ad25d
into
PaddlePaddle:release/2.5
Motivation
在部署多模态模型的时候,当开启--deploy-modality 'text' 开关,获得一个干净的纯文runtime. 不会有多余的多模部分来干扰服务的资源和推理性能. 收益: xx 多模态模型在使用后, 纯文 benchamrk,QPS 提升2.5倍.
Modifications
enable_mm 代表模型具有多模态能力. enable_mm_runtime 代表多模态runtime,enable_mm_runtime=false 代表纯文runtime.
Usage or Command
多模态模型起服务带上--deploy-modality 'text'开关.
Accuracy Tests
Base 模型,打开和关闭--deploy-modality 'text' ,纯文请求的输入token和输出token一致.
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.