[Cherry-Pick][Optimization] Enable text-only deployment for multimodal models by K11OntheBoat · Pull Request #7234 · PaddlePaddle/FastDeploy

K11OntheBoat · 2026-04-08T03:48:34Z

Motivation

在部署多模态模型的时候，当开启--deploy-modality 'text' 开关，获得一个干净的纯文runtime. 不会有多余的多模部分来干扰服务的资源和推理性能. 收益: xx 多模态模型在使用后, 纯文 benchamrk，QPS 提升2.5倍.

Modifications

enable_mm 代表模型具有多模态能力. enable_mm_runtime 代表多模态runtime，enable_mm_runtime=false 代表纯文runtime.

Usage or Command

多模态模型起服务带上--deploy-modality 'text'开关.

Accuracy Tests

Base 模型，打开和关闭--deploy-modality 'text' ，纯文请求的输入token和输出token一致.

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

paddle-bot · 2026-04-08T03:48:42Z

Thanks for your contribution!

CLAassistant · 2026-04-08T03:48:42Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.

liuruian seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

fastdeploy-bot

🤖 AI Code Review | 2026-04-08 12:17 CST

📋 Review 摘要

PR 概述：为多模态模型提供纯文本部署模式，通过 --deploy-modality 'text' 开关获得干净的纯文本 runtime，提升 QPS 约 2.5 倍。

变更范围：config、engine、worker、attention backend

影响面 Tag：[Optimization] [Engine] [Models]

问题

级别	文件	概述
🟡 建议	`fastdeploy/worker/input_batch.py:831`	ProposerInputBatch 中存在重复的 tensor 初始化
❓ 疑问	`fastdeploy/config.py:1941`	直接修改 model_config 属性的设计

总体评价

PR 核心逻辑正确，引入 enable_mm_runtime 和 enable_rope_3d_runtime 属性有效区分了"模型是否支持多模态"与"是否启用多模态 runtime"，测试文件也正确更新了 mock 配置。建议修复 ProposerInputBatch.init_share_inputs 中的重复初始化代码。

fastdeploy-bot · 2026-04-08T04:17:13Z

fastdeploy/worker/input_batch.py

                -1,
                dtype="int32",
            )
+            self.attn_mask_offsets = paddle.full(


🟡 建议 代码中存在重复的 tensor 初始化。attn_mask_offsets、attn_mask_offsets_full 和 attn_mask_offsets_decoder 在同一方法中被初始化了两次（第 817-821 行和第 831-839 行），虽然参数相同不影响功能，但浪费了内存分配和计算资源。

建议删除第 831-839 行的重复初始化代码。

fastdeploy-bot · 2026-04-08T04:17:13Z

fastdeploy/config.py

+                logger.info(
+                    "Deploy modality is text; forcing the multimodal-capable model onto the 2D RoPE runtime path."
+                )
+            setattr(self.model_config, "rope_3d", False)


❓ 疑问 直接修改 model_config 的 rope_3d 和 use_3d_rope 属性。虽然这不是 bug，且代码中有对应的日志说明，但直接修改模型配置对象可能不是最佳实践。

更好的方式是让 enable_rope_3d_runtime 属性直接检查 deploy_modality，而不是修改 model_config。这样能避免修改原始配置对象，使逻辑更清晰。

codecov-commenter · 2026-04-08T05:11:01Z

Codecov Report

❌ Patch coverage is 60.46512% with 17 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (release/2.5@2d6fa35). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
fastdeploy/config.py	57.14%	4 Missing and 2 partials ⚠️
fastdeploy/worker/input_batch.py	50.00%	5 Missing and 1 partial ⚠️
fastdeploy/engine/common_engine.py	0.00%	1 Missing and 2 partials ⚠️
fastdeploy/engine/async_llm.py	0.00%	0 Missing and 1 partial ⚠️
...ecutor/layers/attention/flash_mask_attn_backend.py	0.00%	1 Missing ⚠️

Additional details and impacted files

@@              Coverage Diff               @@
##             release/2.5    #7234   +/-   ##
==============================================
  Coverage               ?   69.48%           
==============================================
  Files                  ?      390           
  Lines                  ?    54382           
  Branches               ?     8574           
==============================================
  Hits                   ?    37788           
  Misses                 ?    13866           
  Partials               ?     2728

Flag	Coverage Δ
GPU	`69.48% <60.46%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

[Optimization] Enable text-only deployment for multimodal models

6c4d990

K11OntheBoat had a problem deploying to Metax_ci April 8, 2026 03:48 — with GitHub Actions Failure

paddle-bot bot added the contributor External developers label Apr 8, 2026

fastdeploy-bot reviewed Apr 8, 2026

View reviewed changes

freeliuzc approved these changes Apr 8, 2026

View reviewed changes

zhoutianzi666 approved these changes Apr 8, 2026

View reviewed changes

Jiang-Jia-Jun merged commit 46ad25d into PaddlePaddle:release/2.5 Apr 8, 2026
30 of 37 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Cherry-Pick][Optimization] Enable text-only deployment for multimodal models#7234

[Cherry-Pick][Optimization] Enable text-only deployment for multimodal models#7234
Jiang-Jia-Jun merged 1 commit intoPaddlePaddle:release/2.5from
K11OntheBoat:R25_pick_mmFix

K11OntheBoat commented Apr 8, 2026

Uh oh!

paddle-bot bot commented Apr 8, 2026

Uh oh!

CLAassistant commented Apr 8, 2026

Uh oh!

fastdeploy-bot left a comment

Uh oh!

fastdeploy-bot Apr 8, 2026

Uh oh!

fastdeploy-bot Apr 8, 2026

Uh oh!

codecov-commenter commented Apr 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

K11OntheBoat commented Apr 8, 2026

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

paddle-bot bot commented Apr 8, 2026

Uh oh!

CLAassistant commented Apr 8, 2026

Uh oh!

fastdeploy-bot left a comment

Choose a reason for hiding this comment

📋 Review 摘要

问题

总体评价

Uh oh!

fastdeploy-bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

fastdeploy-bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented Apr 8, 2026

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants