[Optimization] Enable text-only deployment for multimodal models by K11OntheBoat · Pull Request #7183 · PaddlePaddle/FastDeploy

K11OntheBoat · 2026-04-03T07:49:24Z

Motivation

在部署多模态模型的时候，当开启--deploy-modality 'text' 开关，获得一个干净的纯文runtime. 不会有多余的多模部分来干扰服务的资源和推理性能. 收益: xx 多模态模型在使用后, 纯文 benchamrk，QPS 提升2.5倍.

Modifications

enable_mm 代表模型具有多模态能力. enable_mm_runtime 代表多模态runtime，enable_mm_runtime=false 代表纯文runtime.

Usage or Command

多模态模型起服务带上--deploy-modality 'text'开关.

Accuracy Tests

Base 模型，打开和关闭--deploy-modality 'text' ，纯文请求的输入token和输出token一致.

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

paddle-bot · 2026-04-03T07:49:32Z

Thanks for your contribution!

CLAassistant · 2026-04-03T07:50:14Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.

liuruian seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

codecov-commenter · 2026-04-03T09:17:29Z

Codecov Report

❌ Patch coverage is 62.00000% with 19 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@ae2f9f4). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
fastdeploy/worker/input_batch.py	41.17%	8 Missing and 2 partials ⚠️
fastdeploy/config.py	57.14%	4 Missing and 2 partials ⚠️
fastdeploy/engine/async_llm.py	0.00%	0 Missing and 1 partial ⚠️
fastdeploy/engine/common_engine.py	66.66%	0 Missing and 1 partial ⚠️
...executor/layers/attention/dsa_attention_backend.py	0.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             develop    #7183   +/-   ##
==========================================
  Coverage           ?   74.18%           
==========================================
  Files              ?      376           
  Lines              ?    52966           
  Branches           ?     8266           
==========================================
  Hits               ?    39294           
  Misses             ?    10915           
  Partials           ?     2757

Flag	Coverage Δ
GPU	`74.18% <62.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

fastdeploy-bot

🤖 AI Code Review | 2026-04-03 17:37 CST

📋 Review 摘要

PR 概述：为多模态模型启用纯文本部署模式，引入 enable_mm_runtime 属性统一控制运行时多模态特性
变更范围：FDConfig、Engine、Worker、Scheduler、Attention Backends、Speculative Decoding
影响面 Tag：[FDConfig] [Engine] [Scheduler] [Speculative Decoding] [XPU] [HPU] [GCU] [Iluvatar] [Metax]

📝 PR 规范检查

PR 标题使用 [Draft] 不是有效 Tag，且描述中 Motivation、Modifications、Usage 均未填写。

标题建议（可直接复制）：

[Feature] Enable text-only deployment for multimodal models

描述模板（可直接复制）：

## Motivation
支持多模态模型以纯文本模式部署，通过 `deploy_modality=TEXT` 配置禁用多模态运行时特性（如 3D RoPE、encoder cache 等），降低资源占用并提升纯文本场景性能。

## Modifications
1. 在 `FDConfig` 中新增 `enable_mm_runtime` 和 `enable_rope_3d_runtime` 属性
2. 将各模块中对 `model_config.enable_mm` 的判断替换为 `fd_config.enable_mm_runtime`
3. 当 `deploy_modality=TEXT` 时，强制禁用 `rope_3d` 和 `use_3d_rope`

## Usage
```bash
python -m fastdeploy.entrypoints.openai.api_server \
    --model /path/to/multimodal_model \
    --deploy-modality text


### 问题

| 级别 | 文件 | 概述 |
|------|------|------|
| 🔴 Bug | `engine/common_engine.py:1282` | 包含调试打印语句，不应合并到 develop |
| 🔴 Bug | `entrypoints/engine_client.py:364` | 包含调试打印语句 |
| 🔴 Bug | `output/token_processor.py:952` | 包含调试打印语句 |
| 🔴 Bug | `inter_communicator/engine_worker_queue.py:554` | 包含调试打印语句，频繁调用影响性能 |
| ❓ 疑问 | `worker/input_batch.py:235` | `has_mm_model` 与 `enable_mm` 区分逻辑需确认 |

### 总体评价

本 PR 的设计思路清晰，通过引入 `enable_mm_runtime` 属性统一管理运行时多模态特性，变更覆盖面广且一致性好。但存在多处调试打印语句需要在合并前移除，这些 `print` 语句会在生产环境中产生大量输出，严重影响性能和日志可读性。

fastdeploy/engine/common_engine.py

fastdeploy/entrypoints/engine_client.py

fastdeploy/output/token_processor.py

fastdeploy/inter_communicator/engine_worker_queue.py

fastdeploy/worker/input_batch.py

fastdeploy-bot

🤖 AI Code Review | 2026-04-07 10:29 CST

📋 Review 摘要

PR 概述：为多模态模型新增纯文本部署模式支持，通过引入 enable_mm_runtime 属性区分"模型能力"与"运行时启用状态"

变更范围：config.py、多个 attention backend、worker、input_batch、spec_decode 等 33 个文件

影响面 Tag：[FDConfig] [Engine] [Scheduler] [OP] [XPU] [HPU] [GCU] [Iluvatar] [Metax]

📝 PR 规范检查

PR 标题缺少有效 Tag，描述中 Motivation 和 Modifications 部分未填写。

标题建议（可直接复制）：

[Feature] Enable text-only deployment for multimodal models

描述模板（可直接复制）：

## Motivation

允许多模态模型以纯文本模式部署，在不需要图像/视频处理时减少资源消耗并简化部署流程。

## Modifications

1. 在 `FDConfig` 中新增 `enable_mm_runtime` 和 `enable_rope_3d_runtime` 属性
2. 将 `model_config.enable_mm` 的运行时检查替换为 `enable_mm_runtime`
3. 在 TEXT 部署模式下自动禁用 3D RoPE
4. 更新各硬件 attention backend 的 rope_3d 判断逻辑
5. 更新相关测试文件的 mock 配置

问题

级别	文件	概述
🟡 建议	`tests/distributed/chunked_moe.py:95`	测试 mock 配置建议补充 text-only 场景

总体评价

这是一个设计合理的功能增强 PR，通过引入 enable_mm_runtime 属性优雅地解决了多模态模型纯文本部署的需求。代码变更一致性好，各硬件 backend 的适配完整。建议补充 PR 描述并考虑增加 text-only 模式的测试覆盖。

tests/distributed/chunked_moe.py

EmmonsCurse · 2026-04-08T03:25:54Z

❌ Cherry-pick failed: Conflicts detected when cherry-picking to release/2.5. Please resolve manually.

Co-authored-by: liuruian <liuruian@MacBook-Pro.local>

EmmonsCurse · 2026-04-08T03:26:00Z

✅ Cherry-pick successful! Created PR: #7233

fastdeploy-bot

🤖 AI Code Review | 2026-04-08

📋 Review 摘要

PR 概述：通过引入 --deploy-modality 'text' 开关，使多模态模型能够以纯文本模式部署，从而减少资源占用并提升纯文本推理性能（QPS 提升 2.5 倍）。

变更范围：核心配置、引擎层、工作进程层、注意力后端层、测试文件

影响面 Tag：[BugFix] [Engine] [KVCache] [DataProcessor]

📝 PR 规范检查

PR 标题和描述符合规范，已包含：

✅ 有效标签：[BugFix]
✅ Motivation 说明
✅ Modifications 说明
✅ Usage/Command 示例
✅ Accuracy Tests 描述

建议补充：在 Accuracy Tests 中补充具体的测试数据对比（如 QPS 数值），增强说服力。

问题

级别	文件	概述
🟡 建议	`tests/layers/test_kv_cache_int8_dynamic_quant_backend.py:129`	`enable_rope_3d_runtime` 赋值逻辑不正确
🟡 建议	`fastdeploy/config.py:2084`	使用 `setattr` 修改配置不够透明
🟡 建议	`fastdeploy/input/preprocess.py:60`	参数默认值逻辑可能引起误解

总体评价

整体设计合理，成功实现了多模态模型的纯文本部署能力。核心改动（引入 enable_mm_runtime 和 enable_rope_3d_runtime 属性）逻辑清晰，各组件正确适配。但存在几个小问题建议修复：测试文件中 Mock 配置的赋值逻辑有误，以及 postprocess 中使用 setattr 修改配置的方式可以改进。

fastdeploy-bot · 2026-04-08T03:58:03Z

tests/layers/test_kv_cache_int8_dynamic_quant_backend.py

            },
        )()
+        self.enable_mm_runtime = self.model_config.enable_mm
+        self.enable_rope_3d_runtime = self.model_config.enable_mm


🟡 建议 enable_rope_3d_runtime 的赋值逻辑不正确。

根据 FDConfig.enable_rope_3d_runtime 的定义，它应该依赖于 enable_mm_runtime 和 rope_3d/use_3d_rope 的组合，而不是仅依赖 enable_mm。

建议修改为：

self.enable_rope_3d_runtime = self.enable_mm_runtime and ( self.model_config.rope_3d or self.model_config.use_3d_rope )

或者由于测试中 enable_mm=False，可以直接设为：

self.enable_rope_3d_runtime = False

fastdeploy-bot · 2026-04-08T03:58:03Z

fastdeploy/config.py

+                logger.info(
+                    "Deploy modality is text; forcing the multimodal-capable model onto the 2D RoPE runtime path."
+                )
+            setattr(self.model_config, "rope_3d", False)


🟡 建议 使用 setattr 修改 model_config 的属性可能会使配置状态不够透明。

虽然这里需要强制禁用 3D RoPE，但直接使用 setattr 会绕过正常的配置流程。建议使用直接属性访问：

self.model_config.rope_3d = False self.model_config.use_3d_rope = False

或者添加注释说明这是在 postprocess 阶段的动态调整。

fastdeploy-bot · 2026-04-08T03:58:03Z

fastdeploy/input/preprocess.py

        self.mm_processor_kwargs = mm_processor_kwargs
        self.tool_parser = tool_parser
        self.enable_processor_cache = enable_processor_cache
+        self.enable_mm_runtime = self.model_config.enable_mm if enable_mm_runtime is None else enable_mm_runtime


🟡 建议 enable_mm_runtime 参数的默认值逻辑可能引起误解。

当前逻辑：当 enable_mm_runtime 参数为 None 时，默认使用 model_config.enable_mm。但这混淆了"模型是否支持多模态"和"运行时是否启用多模态"两个概念。

虽然实际调用方都传入了正确的值，但建议明确默认行为：

# 如果未显式指定，默认使用模型配置的多模态能力作为运行时行为 self.enable_mm_runtime = model_config.enable_mm if enable_mm_runtime is None else enable_mm_runtime

Co-authored-by: K11OntheBoat <ruianmaidanglao@163.com> Co-authored-by: liuruian <liuruian@MacBook-Pro.local>

K11OntheBoat had a problem deploying to Metax_ci April 3, 2026 07:49 — with GitHub Actions Failure

paddle-bot bot added the contributor External developers label Apr 3, 2026

fastdeploy-bot suggested changes Apr 3, 2026

View reviewed changes

K11OntheBoat force-pushed the dev_split_mm branch from f2a696b to 22e8d0d Compare April 6, 2026 15:26

K11OntheBoat had a problem deploying to Metax_ci April 6, 2026 15:26 — with GitHub Actions Failure

K11OntheBoat marked this pull request as ready for review April 6, 2026 15:28

K11OntheBoat added the cherry-pick: release/2.5 label Apr 6, 2026

K11OntheBoat changed the title ~~[Draft] Enable text-only deployment for multimodal models~~ [BugFix] Enable text-only deployment for multimodal models Apr 6, 2026

fastdeploy-bot reviewed Apr 7, 2026

View reviewed changes

tests/distributed/chunked_moe.py Outdated Show resolved Hide resolved

K11OntheBoat added the cherry-pick: release/2.6 label Apr 7, 2026

K11OntheBoat force-pushed the dev_split_mm branch from 22e8d0d to a732aab Compare April 7, 2026 07:40

K11OntheBoat had a problem deploying to Metax_ci April 7, 2026 07:40 — with GitHub Actions Error

K11OntheBoat force-pushed the dev_split_mm branch from a732aab to 9898677 Compare April 7, 2026 08:03

K11OntheBoat had a problem deploying to Metax_ci April 7, 2026 08:03 — with GitHub Actions Error

K11OntheBoat force-pushed the dev_split_mm branch from 9898677 to 7c8db4e Compare April 7, 2026 08:07

K11OntheBoat had a problem deploying to Metax_ci April 7, 2026 08:07 — with GitHub Actions Error

Split enable_mm

e32f9fe

K11OntheBoat force-pushed the dev_split_mm branch from 7c8db4e to e32f9fe Compare April 7, 2026 08:08

K11OntheBoat had a problem deploying to Metax_ci April 7, 2026 08:08 — with GitHub Actions Failure

K11OntheBoat changed the title ~~[BugFix] Enable text-only deployment for multimodal models~~ [Optimization] Enable text-only deployment for multimodal models Apr 7, 2026

freeliuzc approved these changes Apr 7, 2026

View reviewed changes

zhoutianzi666 approved these changes Apr 7, 2026

View reviewed changes

gongshaotian approved these changes Apr 8, 2026

View reviewed changes

Jiang-Jia-Jun approved these changes Apr 8, 2026

View reviewed changes

Jiang-Jia-Jun merged commit bb48bcb into PaddlePaddle:develop Apr 8, 2026
32 of 39 checks passed

EmmonsCurse pushed a commit to EmmonsCurse/FastDeploy that referenced this pull request Apr 8, 2026

Split enable_mm (PaddlePaddle#7183)

4e0d632

Co-authored-by: liuruian <liuruian@MacBook-Pro.local>

EmmonsCurse mentioned this pull request Apr 8, 2026

[Cherry-Pick][Optimization] Enable text-only deployment for multimodal models(#7183) #7233

Merged

5 tasks

fastdeploy-bot reviewed Apr 8, 2026

View reviewed changes

Jiang-Jia-Jun pushed a commit that referenced this pull request Apr 8, 2026

Split enable_mm (#7183) (#7233)

6b78981

Co-authored-by: K11OntheBoat <ruianmaidanglao@163.com> Co-authored-by: liuruian <liuruian@MacBook-Pro.local>

Conversation

K11OntheBoat commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

paddle-bot bot commented Apr 3, 2026

Uh oh!

CLAassistant commented Apr 3, 2026

Uh oh!

codecov-commenter commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

fastdeploy-bot left a comment

Choose a reason for hiding this comment

📋 Review 摘要

📝 PR 规范检查

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fastdeploy-bot left a comment

Choose a reason for hiding this comment

📋 Review 摘要

📝 PR 规范检查

问题

总体评价

Uh oh!

Uh oh!

Uh oh!

EmmonsCurse commented Apr 8, 2026

Uh oh!

EmmonsCurse commented Apr 8, 2026

Uh oh!

fastdeploy-bot left a comment

Choose a reason for hiding this comment

📋 Review 摘要

📝 PR 规范检查

问题

总体评价

Uh oh!

fastdeploy-bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

fastdeploy-bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

fastdeploy-bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

K11OntheBoat commented Apr 3, 2026 •

edited

Loading

codecov-commenter commented Apr 3, 2026 •

edited

Loading