[Optimization] Accelerate the speed of tokenizer. by K11OntheBoat · Pull Request #7544 · PaddlePaddle/FastDeploy

K11OntheBoat · 2026-04-21T12:26:59Z

Motivation

长文+单并发场景，用.encode 替代 .convert_tokens_to_ids 方法，预处理加速1.24倍.
GLM4.5-Air 在长文单并发场景下，TPS提升1.04倍.

Modifications

用tokenizer.encode 替代tokenizer.convert_tokens_to_ids

Accuracy Tests

前后输出的token ids 一致.

paddle-bot · 2026-04-21T12:27:11Z

Thanks for your contribution!

CLAassistant · 2026-04-21T12:27:39Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
2 out of 3 committers have signed the CLA.

✅ K11OntheBoat
✅ ZhangX-21
❌ “liuruian”

“liuruian” seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

codecov-commenter · 2026-04-22T05:46:22Z

Codecov Report

❌ Patch coverage is 63.63636% with 4 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@8883757). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
fastdeploy/input/base_processor.py	63.63%	2 Missing and 2 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             develop    #7544   +/-   ##
==========================================
  Coverage           ?   71.71%           
==========================================
  Files              ?      419           
  Lines              ?    57788           
  Branches           ?     9063           
==========================================
  Hits               ?    41445           
  Misses             ?    13522           
  Partials           ?     2821

Flag	Coverage Δ
GPU	`71.71% <63.63%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

PaddlePaddle-bot

🤖 AI Code Review | 2026-04-23 21:14:15

📋 Review 摘要

PR 概述：将 tokenizer 预处理从 .tokenize() + .convert_tokens_to_ids() 两步改为单步 .encode()，并为 ernie4_5 保留旧路径作为 workaround，长文单并发场景预处理加速 1.24x。
变更范围：fastdeploy/input/base_processor.py（核心逻辑）、tests/input/test_text_processor.py（测试补充）
影响面 Tag：DataProcessor

问题

级别	文件	概述
🔴 Bug	`base_processor.py:171-172`	`hasattr(token_ids, "input_ids")` 触发时使用 `token_ids["input_ids"]`，若对象有属性但无 `__getitem__` 会抛出 `TypeError`
🟡 建议	`base_processor.py:173`	`ndim` 检查仅覆盖 tensor，Python 原生 `list[list[int]]` 嵌套情况未处理，可能返回 `[[1,2,3]]` 而非 `[1,2,3]`
🟡 建议	`base_processor.py:167`	ernie4_5 hang 的 workaround 无关联 issue，建议补充链接便于未来跟踪
🟡 建议	`tests/input/test_text_processor.py`	新增测试仅覆盖 else 分支的三种返回类型，缺少 `tokenizer_type == "ernie4_5"` 路径的单测

总体评价

优化思路清晰，性能收益有数据支撑，兼容性处理也考虑了多种 encode 返回格式。但 hasattr + [] 访问的不一致逻辑存在潜在异常风险，ernie4_5 分支缺少对应测试，请修复后合入。

PaddlePaddle-bot · 2026-04-23T13:17:24Z

+        else:
+            token_ids = self.tokenizer.encode(spliced_message, add_special_tokens=False)
+            if hasattr(token_ids, "input_ids") or (isinstance(token_ids, dict) and "input_ids" in token_ids):
+                token_ids = token_ids["input_ids"]


🔴 Bug hasattr 检查与 [] 下标访问不一致

当 hasattr(token_ids, "input_ids") 为 True 时，若该对象有 input_ids 属性但未实现 __getitem__（即非 dict-like），执行 token_ids["input_ids"] 会抛出 TypeError。

建议将属性访问与字典访问分开处理：

if isinstance(token_ids, dict) and "input_ids" in token_ids: token_ids = token_ids["input_ids"] elif hasattr(token_ids, "input_ids"): token_ids = token_ids.input_ids # 用属性访问替代下标访问

PaddlePaddle-bot · 2026-04-23T13:17:24Z

+            token_ids = self.tokenizer.encode(spliced_message, add_special_tokens=False)
+            if hasattr(token_ids, "input_ids") or (isinstance(token_ids, dict) and "input_ids" in token_ids):
+                token_ids = token_ids["input_ids"]
+                if hasattr(token_ids, "ndim") and token_ids.ndim > 1:


🟡 建议 Python 原生嵌套 list 未处理

hasattr(token_ids, "ndim") and token_ids.ndim > 1 只能处理 numpy/paddle/torch tensor 的 2D 情况。若 input_ids 是普通 Python list[list[int]]（如 [[1, 2, 3]]），则没有 ndim 属性，检查跳过，后续 isinstance(token_ids, list) 直接返回 [[1, 2, 3]]，下游处理会出错。

建议在 ndim 检查后补充：

if isinstance(token_ids, list) and len(token_ids) == 1 and isinstance(token_ids[0], list): token_ids = token_ids[0]

PaddlePaddle-bot · 2026-04-23T13:17:25Z

-        tokens = self.tokenizer.tokenize(spliced_message)
-        token_ids = self.tokenizer.convert_tokens_to_ids(tokens)
+        if self.tokenizer_type == "ernie4_5":
+            # NOTE: ernie4_5 tokenizer will hang when meet long input when use .encode()


🟡 建议 ernie4_5 hang 问题建议关联 issue

注释仅说明「会 hang」但未说明原因、影响版本及预计修复时间，未来维护者难以判断何时可以安全移除该 workaround。

建议补充 issue 链接，例如：

# NOTE: ernie4_5 tokenizer hangs on long input with .encode(); see issue #XXXX # TODO: remove this branch once the upstream issue is fixed

EmmonsCurse · 2026-04-24T05:59:12Z

❌ Cherry-pick failed: Conflicts detected when cherry-picking to release/2.6. Please resolve manually.

…7630) This reverts commit 978b813.

)" (#…" (#7634) This reverts commit 4ca1064.

* Change default workers and max-concurrency when launch api-server * Change convert_tokens_to_ids to encode to get token ids --------- Co-authored-by: zhangxiao35 <zhangxiao35@baidu.com> Co-authored-by: “liuruian” <liuruian@baidu.com>

…le#7544)" (PaddlePaddle#7630) This reverts commit df4ed5a.

…ddlePaddle#7544)" (#…" (PaddlePaddle#7634) This reverts commit 701d268.

* Change default workers and max-concurrency when launch api-server * Change convert_tokens_to_ids to encode to get token ids --------- Co-authored-by: zhangxiao35 <zhangxiao35@baidu.com> Co-authored-by: “liuruian” <liuruian@baidu.com>

…le#7544)" (PaddlePaddle#7630) This reverts commit 978b813.

…ddlePaddle#7544)" (#…" (PaddlePaddle#7634) This reverts commit 4ca1064.

ZhangX-21 and others added 2 commits April 17, 2026 12:04

Change default workers and max-concurrency when launch api-server

d94fa35

Merge branch 'PaddlePaddle:develop' into develop

bf8fb7d

K11OntheBoat had a problem deploying to Metax_ci April 21, 2026 12:27 — with GitHub Actions Failure

paddle-bot Bot added the contributor External developers label Apr 21, 2026