[Engine][DataProcessor] fix decode token by zhuangzhuang12 · Pull Request #7102 · PaddlePaddle/FastDeploy

zhuangzhuang12 · 2026-03-31T04:15:55Z

Title: [Engine][DataProcessor] Simplify force decode logic in _decode_token and add unit tests

Body:

Motivation

When streaming ends with undecoded tokens (e.g., partial UTF-8 byte-level tokens),
_decode_token needs to return these remaining token IDs. The original force decode
logic used a two-level lookup with prefix_offset, prev_cum_len, and start_idx,
which was unnecessarily complex — cum_tokens[read_offset:] is sufficient to capture
all unreturned tokens in every case.

Modifications

engine/common_engine.py: Simplified the force decode path in _decode_token.
Replaced the two-level remaining token lookup (cum_tokens[start_idx:read_offset]
with fallback to cum_tokens[read_offset:]) with a single cum_tokens[read_offset:].
test_decode_token.py: Added unit tests for _decode_token covering:
- Empty end (no tokens, is_end=True)
- Incremental decoding with normal Chinese characters
- Force decode of undecoded byte-level tokens at stream end

Usage or Command

python test_decode_token.py

paddle-bot · 2026-03-31T04:16:01Z

Thanks for your contribution!

codecov-commenter · 2026-03-31T06:37:33Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (develop@4425142). Learn more about missing BASE report.

Additional details and impacted files

@@            Coverage Diff             @@
##             develop    #7102   +/-   ##
==========================================
  Coverage           ?   72.19%           
==========================================
  Files              ?      376           
  Lines              ?    52769           
  Branches           ?     8237           
==========================================
  Hits               ?    38096           
  Misses             ?    12043           
  Partials           ?     2630

Flag	Coverage Δ
GPU	`72.19% <ø> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

CLAassistant · 2026-03-31T09:22:22Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.

root seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

PaddlePaddle-bot

📋 Review 摘要

PR 概述：修复流式解码结束时未解码 tokens 丢失的问题

变更范围：fastdeploy/engine/common_engine.py、tests/engine/test_decode_token.py

影响面 Tag：[Engine] [DataProcessor]

📝 PR 规范检查

问题：PR 标题 "fix decode token" 缺少有效的 [Tag] 前缀，不符合项目规范。

标题建议（可直接复制）：

[Engine] Fix decode token - Force return undecoded tokens at stream end
[DataProcessor] Fix decode token - Force return undecoded tokens at stream end
[BugFix] Fix decode token - Force return undecoded tokens at stream end

描述：Motivation 和 Modifications 部分填写规范，描述清晰。

问题

级别	文件	概述
🟡 建议	N/A	PR 标题格式不符合规范，缺少 `[Tag]` 前缀

总体评价

代码变更正确解决了增量解码中未返回 tokens 丢失的实际问题。新增的 force decode 分支逻辑清晰，能够在流结束时正确返回所有累积的未返回 tokens。测试覆盖了关键场景，验证了边界条件。建议修改 PR 标题以符合项目规范。

fix decode token

b75f679

zhuangzhuang12 temporarily deployed to Metax_ci March 31, 2026 04:15 — with GitHub Actions Inactive

case format

a492d05

zhuangzhuang12 had a problem deploying to Metax_ci March 31, 2026 09:22 — with GitHub Actions Error

case format

b87fd2c

zhuangzhuang12 temporarily deployed to Metax_ci March 31, 2026 10:08 — with GitHub Actions Inactive

case format

9674030

zhuangzhuang12 temporarily deployed to Metax_ci March 31, 2026 11:21 — with GitHub Actions Inactive

case format

dacec20

zhuangzhuang12 had a problem deploying to Metax_ci April 1, 2026 03:02 — with GitHub Actions Error

case format

a7a883b

zhuangzhuang12 temporarily deployed to Metax_ci April 1, 2026 03:04 — with GitHub Actions Inactive

freeliuzc approved these changes Apr 8, 2026

View reviewed changes

EmmonsCurse changed the title ~~fix decode token~~ [Engine][DataProcessor] fix decode token Apr 8, 2026

EmmonsCurse merged commit 757bafe into PaddlePaddle:develop Apr 8, 2026
35 of 38 checks passed

PaddlePaddle-bot reviewed Apr 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Engine][DataProcessor] fix decode token#7102

[Engine][DataProcessor] fix decode token#7102
EmmonsCurse merged 6 commits intoPaddlePaddle:developfrom
zhuangzhuang12:accumulate-garbled-tokens

zhuangzhuang12 commented Mar 31, 2026

Uh oh!

paddle-bot bot commented Mar 31, 2026

Uh oh!

codecov-commenter commented Mar 31, 2026 •

edited

Loading

Uh oh!

CLAassistant commented Mar 31, 2026

Uh oh!

Uh oh!

PaddlePaddle-bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

zhuangzhuang12 commented Mar 31, 2026

Motivation

Modifications

Usage or Command

Uh oh!

paddle-bot bot commented Mar 31, 2026

Uh oh!

codecov-commenter commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

CLAassistant commented Mar 31, 2026

Uh oh!

Uh oh!

PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

📋 Review 摘要

📝 PR 规范检查

问题

总体评价

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

codecov-commenter commented Mar 31, 2026 •

edited

Loading