[Feature] Add server-level token limits and prompt truncation control by luukunn · Pull Request #7842 · PaddlePaddle/FastDeploy

luukunn · 2026-05-18T06:33:40Z

Motivation

本 PR 为服务端新增了统一的长度参数默认值配置能力，使用户在未显式传入请求级参数时，也可以通过服务级配置控制生成长度相关行为；同时新增了输入 token 长度限制，用于提前拦截超长请求。

Modifications

新增服务级长度控制配置 ServingLimitsConfig，并挂载到 FDConfig 中统一管理。
在 CLI / 配置项中新增以下服务级参数：
- max_completion_tokens
- reasoning_max_tokens
- response_max_tokens
- min_completion_tokens
- input_max_tokens
在 async_llm、common_engine、engine_client 初始化阶段，将服务级默认长度配置注入 data_processor。
更新文本与多模态请求处理逻辑：
- 当请求未显式指定 max_tokens 时，默认使用服务级 max_completion_tokens，并受剩余上下文长度约束；
- 当请求显式指定 max_tokens 时，会同时受服务级上限和上下文剩余长度限制；
- reasoning_max_tokens / response_max_tokens 会被约束为不超过最终生效的 max_tokens；
- min_tokens 采用 max(server_value, request_value) 规则，并在超过 max_tokens 时直接报错。
新增 input_max_tokens 校验：
- 在 prompt 被截断前先检查输入长度；
- 当输入 token 数超过 input_max_tokens 时，直接拒绝请求。
调整 engine / engine_client 中默认 max_tokens 的处理逻辑：
- 若配置了 max_completion_tokens，优先使用该值作为默认生成长度；
- 否则保持原有基于 max_model_len 的默认行为。
补充中英文参数文档：
- docs/parameters.md
- docs/zh/parameters.md

Usage or Command

示例启动参数：

--max-completion-tokens 1024 \
--reasoning-max-tokens 512 \
--response-max-tokens 512 \
--min-completion-tokens 1 \
--input-max-tokens 4096

行为说明：

当请求未指定 max_tokens 时，默认使用服务级配置 max_completion_tokens，并受上下文剩余长度约束；
当请求指定了 max_tokens 时，最终值会被限制为 min(请求值, 服务级上限, 上下文剩余长度)；
当请求未指定 reasoning_max_tokens / response_max_tokens 时，可使用服务级默认值；
reasoning_max_tokens / response_max_tokens 的最终值不会超过 max_tokens；
min_tokens 的最终值取服务端配置与请求值中的较大者，若超过 max_tokens 会直接报错；
当输入 prompt token 数超过 input_max_tokens 时，请求会被直接拒绝；
当输入超过 max_model_len 时，直接报错。

Accuracy Tests

该 PR 不涉及模型前向计算逻辑或 kernel 行为修改，因此无精度测试影响。

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

paddle-bot · 2026-05-18T06:33:53Z

Thanks for your contribution!

Copilot

Pull request overview

本 PR 在服务端引入了若干"默认 token 长度限制"配置 (max_completion_tokens / reasoning_max_tokens / response_max_tokens / min_completion_tokens / input_max_tokens)，允许通过 CLI 设置 server-level 默认值；当请求未携带相应字段时使用这些默认值，超过 input_max_tokens 的请求将被拒绝。

Changes:

在 EngineArgs/ModelConfig 上新增 5 个长度相关参数，并在 CLI 和文档中暴露
在 BaseDataProcessor 上新增 set_server_defaults，并在 engine_client / async_llm / engine / common_engine 各入口处调用以同步 server defaults
在 base_processor.py 与 multimodal_processor.py 中加入"超长拒绝"以及"用户值/服务端默认值/上下文上限取最小"的合并逻辑

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
fastdeploy/engine/args_utils.py	新增 5 个 server-level token 长度相关参数及对应 CLI 选项
fastdeploy/config.py	`ModelConfig` 初始化新字段（默认 None / 1）以接受新参数
fastdeploy/input/base_processor.py	新增 `set_server_defaults` 和 `process_request_dict` 中的长度合并/拒绝逻辑
fastdeploy/input/multimodal_processor.py	多模态处理流程中加入同样的长度合并/拒绝逻辑
fastdeploy/entrypoints/engine_client.py	调用 `set_server_defaults`，并在缺失 `max_tokens` 时用 `max_completion_tokens` 兜底
fastdeploy/engine/engine.py	同上：注入 server defaults 并优先使用 `max_completion_tokens` 作为缺省
fastdeploy/engine/common_engine.py	创建 data_processor 后注入 server defaults
fastdeploy/engine/async_llm.py	创建 data_processor 后注入 server defaults
docs/parameters.md / docs/zh/parameters.md	文档同步新增 5 个参数说明

PaddlePaddle-bot · 2026-05-18T07:00:45Z

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-06-01 12:56:31

CI报告基于以下代码生成（30分钟更新一次）:

PR commit: 705de3b
Merge base: 92fdcf7 (branch: develop)
查看完整 Diff
CI 详情

1 任务总览

所有 Required 任务全部通过 ✅，PR 可合并。

总执行（rerun次数）	总任务	✅ 通过	❌ 失败	⏳ 运行中	⏸️ 等待中	跳过
42(0)	42	39	3	0	0	0

2 任务状态汇总

2.1 Required任务 : 10/10 通过

必选任务阻塞合并，失败需优先处理。

状态	任务	耗时	根因	修复建议	日志	重跑
✅	其余 10 个必选任务全部通过	-	-	-	-	-

2.2 可选任务 — 29/32 通过

可选任务不阻塞合并，失败仅供参考。

状态	任务	耗时	日志	重跑
❌	`Run iluvatar Tests / run_iluvatar_cases`	1m45s	Job	-
❌	`Trigger Jenkins for PR`	16s	Job	-
❌	`CI_HPU`	1h4m	Job	-
✅	其余 29 个可选任务通过	-	-	-

3 失败详情（仅 required）

无 required 失败任务。

💡 当前 3 个可选任务失败（ILUVATAR-CI、CI_METAX、CI_HPU），均为可选任务，不阻塞合并。失败原因疑似环境问题（自定义容器异常、Jenkins触发失败、HPU环境退出码非0），建议 rerun 确认。

Copilot

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 5 comments.

+        model_group.add_argument(
+            "--truncate-prompt-tokens",
+            type=lambda x: x.lower() in ("true", "1", "yes"),
+            default=EngineArgs.truncate_prompt_tokens,
+            help="Whether to truncate prompts that exceed max_model_len. "
+            "If True (default), prompts are silently truncated. "
+            "If False, a ValueError is raised.",
+        )


…ength

Copilot

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 3 comments.

        )
        # Create data processor
        self.data_processor = self.input_processor.create_processor()
+        self.data_processor.set_server_defaults(cfg.serving_limits_config)


            enable_mm_runtime=self.cfg.enable_mm_runtime,
        )
        self.data_processor = self.input_processor.create_processor()
+        self.data_processor.set_server_defaults(self.cfg.serving_limits_config)


        )
        self.enable_logprob = self.fd_config.model_config.enable_logprob
        self.data_processor = input_processor.create_processor()
+        self.data_processor.set_server_defaults(self.fd_config.serving_limits_config)


Copilot

Pull request overview

Copilot reviewed 19 out of 19 changed files in this pull request and generated 5 comments.

        )
        self.enable_logprob = self.fd_config.model_config.enable_logprob
        self.data_processor = input_processor.create_processor()
+        self.data_processor.set_server_defaults(self.fd_config.serving_limits_config)


            enable_mm_runtime=self.cfg.enable_mm_runtime,
        )
        self.data_processor = self.input_processor.create_processor()
+        self.data_processor.set_server_defaults(self.cfg.serving_limits_config)


        )
        # Create data processor
        self.data_processor = self.input_processor.create_processor()
+        self.data_processor.set_server_defaults(cfg.serving_limits_config)


+            if effective_min > max_tokens:
+                raise ValueError(f"min_tokens ({effective_min}) must not exceed max_tokens ({max_tokens})")
+            request["min_tokens"] = effective_min


+            if effective_min > max_tokens:
+                raise ValueError(f"min_tokens ({effective_min}) must not exceed max_tokens ({max_tokens})")
+            request["min_tokens"] = effective_min


PaddlePaddle-bot

🤖 Paddle-CI-Agent | pr_review | 2026-06-01 10:21:23

📋 Review 摘要

PR 概述：新增服务级 token 长度限制与 prompt 截断控制配置能力
变更范围：config、engine、entrypoints、input processor、docs、tests
影响面 Tag：[FDConfig] [Engine] [APIServer] [DataProcessor] [Docs]

问题

未发现新的阻塞性问题。历史 Findings 状态见下方。

历史 Findings 修复情况

Finding	问题	状态
F1	超长 prompt 从静默截断改为硬拒绝（破坏性行为变更）	⚠️ 仍存在
F2	base_processor 与 multimodal_processor 长度限制逻辑完全重复	⚠️ 仍存在
F3	engine_client 中 max_tokens 默认值逻辑与 processor 重复	⚠️ 仍存在
F4	test_engine_client 中 reasoning_max_tokens/response_max_tokens 幽灵属性	⚠️ 仍存在

📝 PR 规范检查

PR 标题 [Feature] Add server-level token limits and prompt truncation control 格式合规，Tag 匹配变更内容。描述结构完整，包含 Motivation / Modifications / Usage or Command / Accuracy Tests / Checklist 所有必填段落。✓ 符合规范。

总体评价

功能实现完整，逻辑正确，测试覆盖充分。历史 review 指出的代码重复和行为变更问题仍未解决，建议后续迭代中抽取公共方法消除 base_processor 与 multimodal_processor 的重复逻辑。

LiqinruiG

LGTM

luukunn added 4 commits May 15, 2026 17:11

增加长度控制参数

57e7f26

修改参数名

97b6cb4

修改参数校验

219b640

add docs

49549ca

Copilot AI review requested due to automatic review settings May 18, 2026 06:33

luukunn had a problem deploying to Metax_ci May 18, 2026 06:33 — with GitHub Actions Error

Copilot started reviewing on behalf of luukunn May 18, 2026 06:34 View session

Copilot AI reviewed May 18, 2026

View reviewed changes

Comment thread fastdeploy/input/base_processor.py Outdated

Comment thread fastdeploy/input/multimodal_processor.py Outdated

luukunn changed the title ~~Length~~ [Feature] Add server-level token length defaults and input token limit May 18, 2026

This comment was marked as outdated.

Sign in to view

fix default value

a9076a3

luukunn had a problem deploying to Metax_ci May 18, 2026 06:51 — with GitHub Actions Error

This comment was marked as outdated.

Sign in to view

fix review

260b109

Copilot AI review requested due to automatic review settings May 18, 2026 07:17

luukunn had a problem deploying to Metax_ci May 18, 2026 07:18 — with GitHub Actions Error

Copilot started reviewing on behalf of luukunn May 18, 2026 07:18 View session

This comment was marked as outdated.

Sign in to view

fix error messages

bd93e46

luukunn had a problem deploying to Metax_ci May 18, 2026 07:32 — with GitHub Actions Error

This comment was marked as outdated.

Sign in to view

add truncate_prompt_tokens

3b6d5a3

Copilot AI review requested due to automatic review settings May 18, 2026 07:54

luukunn had a problem deploying to Metax_ci May 18, 2026 07:54 — with GitHub Actions Failure

Copilot started reviewing on behalf of luukunn May 18, 2026 07:54 View session

Copilot AI reviewed May 18, 2026

View reviewed changes

This comment was marked as outdated.

Sign in to view

Merge branch 'develop' into length

9cfbe93

luukunn had a problem deploying to Metax_ci May 28, 2026 13:03 — with GitHub Actions Failure

luukunn added 2 commits May 28, 2026 21:06

remove truncate_prompt_tokens

4beac30

Merge branch 'length' of https://github.com/luukunn/FastDeploy into l…

2673be4

…ength

Copilot AI review requested due to automatic review settings May 28, 2026 13:07

luukunn had a problem deploying to Metax_ci May 28, 2026 13:07 — with GitHub Actions Failure

Copilot started reviewing on behalf of luukunn May 28, 2026 13:07 View session

This comment was marked as resolved.

Sign in to view

remove truncate_prompt_tokens

8284046

luukunn had a problem deploying to Metax_ci May 28, 2026 13:14 — with GitHub Actions Failure

fix review

18f9cd9

Copilot AI review requested due to automatic review settings May 28, 2026 13:54

luukunn had a problem deploying to Metax_ci May 28, 2026 13:54 — with GitHub Actions Failure

Copilot started reviewing on behalf of luukunn May 28, 2026 13:54 View session

Copilot AI reviewed May 28, 2026

View reviewed changes

This comment was marked as outdated.

Sign in to view

fix unit test

9509e37

luukunn had a problem deploying to Metax_ci May 29, 2026 03:21 — with GitHub Actions Failure

This comment was marked as outdated.

Sign in to view

fix review

5ed1117

Copilot AI review requested due to automatic review settings May 29, 2026 06:50

luukunn had a problem deploying to Metax_ci May 29, 2026 06:50 — with GitHub Actions Failure

Copilot started reviewing on behalf of luukunn May 29, 2026 06:50 View session

Copilot AI reviewed May 29, 2026

View reviewed changes

This comment was marked as outdated.

Sign in to view

Merge branch 'develop' into length

705de3b

Jiang-Jia-Jun had a problem deploying to Metax_ci June 1, 2026 02:06 — with GitHub Actions Failure

PaddlePaddle-bot reviewed Jun 1, 2026

View reviewed changes

LiqinruiG reviewed Jun 2, 2026

View reviewed changes

Jiang-Jia-Jun merged commit 42c66a7 into PaddlePaddle:develop Jun 2, 2026
40 of 43 checks passed

Conversation

luukunn commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

paddle-bot Bot commented May 18, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

PaddlePaddle-bot commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1 任务总览

2 任务状态汇总

2.1 Required任务 : 10/10 通过

2.2 可选任务 — 29/32 通过

3 失败详情（仅 required）

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as resolved.

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

This comment was marked as outdated.

Uh oh!

PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

📋 Review 摘要

问题

历史 Findings 修复情况

📝 PR 规范检查

总体评价

Uh oh!

LiqinruiG left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

luukunn commented May 18, 2026 •

edited

Loading

PaddlePaddle-bot commented May 18, 2026 •

edited

Loading