[RL] Fix the incorrect routing of EOS tokens, which leads to changes in accuracy by gongshaotian · Pull Request #7960 · PaddlePaddle/FastDeploy

gongshaotian · 2026-05-29T07:56:13Z

Motivation

An incorrect calculation of the context length in the tokenizer led to one extra route for the output tokens compared to what was actually captured.
In Overlap mode, the estimated token count for the current inference step might be a bit higher than the actual count. This could cause some contamination in the routing cache when updating the CPU cache.
These two issues together cause the k3_kl value to become unstable after R3 starts supporting Overlap.

Modifications

Integrating multiple operators into get_positions_and_slot_mapping()
Route that no longer returns EOS tokens
In the Overlap Schedule mode，, only the flush routing offor realthe actual token numcount is applicable.
ADD debug mode

Usage or Command

Add debug model

    --routing-replay-config '{"enable_routing_replay":true, "debug_mode":true}'

Accuracy Tests

Add tests/operators/test_get_position_ids_and_slot_mapping.py

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

paddle-bot · 2026-05-29T07:56:19Z

Thanks for your contribution!

PaddlePaddle-bot · 2026-05-29T15:45:50Z

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-05-30 18:27:07

CI报告基于以下代码生成（30分钟更新一次）:

PR commit: a064ee2
Merge base: ac24fcc (branch: release/2.6)
查看完整 Diff
CI 详情

1 任务总览

存在 2 个 Required 任务失败，1 个 Required 任务运行中，需优先处理失败项。

总执行（rerun次数）	总任务	✅ 通过	❌ 失败	⏳ 运行中	⏸️ 等待中	跳过
35(0)	35	30	3	2	0	0

2 任务状态汇总

2.1 Required任务 : 7/10 通过

必选任务阻塞合并，失败需优先处理。

状态	任务	耗时	根因	修复建议	日志	重跑
❌	`Run Four Cards Tests / run_4_cards_tests`	12m17s	PR问题：EOS路由修改导致routing replay未生成`r3_chat_completion_stream`目录	检查`token_processor.py`routing replay输出逻辑确认stream目录生成	Job	-
❌	`Approval`	18s	需要 Approval	请通过人工审批	Job	-
⏳	`Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage`	-	运行中	-	Job	-
✅	其余 7 个必选任务通过	-	-	-	-	-

2.2 可选任务 — 23/25 通过

可选任务不阻塞合并，失败仅供参考。

状态	任务	耗时	日志	重跑
❌	`Run iluvatar Tests / run_iluvatar_cases`	1m46s	Job	-
⏳	`CI_HPU`	-	Job	-
✅	其余 23 个可选任务通过	-	-	-

3 失败详情（仅 required）

Run Four Cards Tests / run_4_cards_tests — 测试失败（置信度: 中）

Run Four Cards Tests / run_4_cards_tests

状态: ❌ 失败
错误类型: 测试失败
置信度: 中
根因摘要: PR修改EOS路由后，routing replay未生成r3_chat_completion_stream目录
分析器: ci_analyze_unittest_fastdeploy

失败用例:

测试	错误	根因
`test_GLM_45_AIR_mtp_tp4.py::test_r3_accuracy`	FileNotFoundError	routing replay未生成r3_chat_completion_stream目录
`test_GLM_45_AIR_tp4.py::test_r3_accuracy`	FileNotFoundError	routing replay未生成r3_chat_completion_stream目录

根因详情:
PR 将 baseline 路径从 R3_BaseLine_uint8_0424 更新为 R3_BaseLine_uint8_0530，并修改了 EOS token 的路由逻辑（fastdeploy/output/token_processor.py 等文件）。test_r3_accuracy 调用 generated_base_line_routing_index 时尝试将 ./R3_tmp/routing_replay_output_glm45air_mtp_tp4/r3_chat_completion_stream 移动到新 baseline 目录，但该源目录不存在，说明 routing replay 在 PR 修改后未能生成预期的流式输出目录。这可能是因为 EOS token 不再被路由，导致 stream 输出结构发生变化。

关键日志:

FileNotFoundError: [Errno 2] No such file or directory:
  './R3_tmp/routing_replay_output_glm45air_mtp_tp4/r3_chat_completion_stream'
  -> '/ModelData/R3_BaseLine_uint8_0530/routing_replay_output_baseline_glm45air_mtp_tp4/r3_chat_completion_stream'
tests/e2e/4cards_cases/test_GLM_45_AIR_mtp_tp4.py:208: in test_r3_accuracy
tests/e2e/utils/rollout_routing_replay_test_utils.py:168: in check_routing_replay_chat_completion
tests/e2e/utils/rollout_routing_replay_test_utils.py:129: in generated_base_line_routing_index

修复建议:

检查 fastdeploy/output/token_processor.py 中 routing replay 的流式输出保存逻辑，确认 r3_chat_completion_stream 目录是否仍被正确创建
检查 tests/e2e/utils/rollout_routing_replay_test_utils.py L129 的 generated_base_line_routing_index 函数，确认其期望的输出目录名称是否与 PR 修改后的实际输出路径一致

修复建议摘要: 检查token_processor.py中stream目录生成逻辑与测试期望路径一致性

关联变更: tests/e2e/utils/rollout_routing_replay_test_utils.py L159（baseline路径从0424→0530）、fastdeploy/output/token_processor.py（EOS路由逻辑变更）
链接: 查看日志

Approval — 需要人工审批

该 Job 需要人工 Approval，完成审批后 CI 才会继续执行。

codecov-commenter · 2026-05-29T16:09:18Z

Codecov Report

❌ Patch coverage is 25.64103% with 87 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (release/2.6@ac24fcc). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
...model_executor/layers/moe/routing_indices_cache.py	18.75%	47 Missing and 5 partials ⚠️
fastdeploy/cache_manager/routing_cache_manager.py	4.54%	21 Missing ⚠️
fastdeploy/worker/gpu_model_runner.py	61.90%	7 Missing and 1 partial ⚠️
fastdeploy/config.py	33.33%	1 Missing and 1 partial ⚠️
fastdeploy/model_executor/pre_and_post_process.py	50.00%	2 Missing ⚠️
fastdeploy/output/token_processor.py	0.00%	2 Missing ⚠️

Additional details and impacted files

@@              Coverage Diff               @@
##             release/2.6    #7960   +/-   ##
==============================================
  Coverage               ?   71.63%           
==============================================
  Files                  ?      386           
  Lines                  ?    55447           
  Branches               ?     8688           
==============================================
  Hits                   ?    39718           
  Misses                 ?    12927           
  Partials               ?     2802

Flag	Coverage Δ
GPU	`71.63% <25.64%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

PaddlePaddle-bot

🤖 Paddle-CI-Agent | pr_review | 2026-05-30 18:02:16

📋 Review 摘要

PR 概述：修复 R3 Routing Replay 中 EOS token 路由错误和 Overlap 模式下 token 计数不准导致的精度不稳定问题
变更范围：custom_ops（新增 fused kernel）、cache_manager、worker、model_executor/layers/moe、output/token_processor
影响面 Tag：[RL] [OP] [KVCache] [FDConfig]

问题

级别	文件	概述
🟡 建议	`fastdeploy/output/token_processor.py:620`	EOS 检查缺少防御性守卫，当 `eos_token_ids` 为 None 或 `output_token_ids` 为空时会抛异常

历史 Findings 修复情况

Finding	问题	状态
F1	`block_tables` dtype 不匹配	✅ 已修复
F2	`logger.info` 在热路径刷屏	✅ 已修复（已加 `debug_mode` 守卫）
F3	遗留三重注释代码	⚠️ 仍存在
F4	`token_num_overlap` 初始化为 0	⚠️ 仍存在（经分析，执行流保证首次调用前已赋值，非实际 bug）
F5	DSA Attention Backend 功能回归	⚠️ 仍存在（position_ids 从 int32 改为 int64，DSA 消费侧需确认兼容）
F6	`seq_len - 1` 在 recovery_stop 场景下 off-by-one	✅ 已修复（EOS 检查已精确守卫）
F7	`routing_replay_manager` 为 None 时 AttributeError	✅ 已修复（已加 `is not None` 判断）
F8	Kernel `<<<1, bsz>>>` 当 bsz>1024 静默失败	⚠️ 仍存在
F9	Kernel 串行循环性能问题	⚠️ 仍存在

📝 PR 规范检查

PR 目标分支为 release/2.6（非 develop），按仓库规范应使用 Cherry-Pick 格式标题。同时 Checklist 条目均未勾选。

标题建议（可直接复制）：

[Cherry-Pick][RL] Fix the incorrect routing of EOS tokens, which leads to changes in accuracy

PR 描述建议（点击展开，可直接复制）

## Motivation

An incorrect calculation of the context length in the tokenizer led to one extra route for the output tokens compared to what was actually captured.
In Overlap mode, the estimated token count for the current inference step might be a bit higher than the actual count. This could cause some contamination in the routing cache when updating the CPU cache.
These two issues together cause the k3_kl value to become unstable after R3 starts supporting Overlap.

## Modifications

1. Integrating multiple operators into `get_positions_and_slot_mapping()` — a new fused CUDA kernel replacing separate `get_position_ids` + Python-side slot_mapping computation
2. Route that no longer returns EOS tokens (`seq_len - 1` in `_finalize_routing`)
3. In the Overlap Schedule mode, only the flush routing for the actual token count is applicable (`token_num_overlap`)
4. Added Debug mode for R3 routing validation (`--routing-replay-config '{"enable_routing_replay":true, "debug_mode":true}'`)

## Usage or Command

Add debug mode:

--routing-replay-config '{"enable_routing_replay":true, "debug_mode":true}'


## Accuracy Tests

Add tests/operators/test_get_position_ids_and_slot_mapping.py

## Checklist

- [x] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
  - You can add new tags based on the PR content, but the semantics must be clear.
- [ ] Format your code, run `pre-commit` before commit.
- [x] Add unit tests. Please write the reason in this PR if no unit tests.
- [ ] Provide accuracy results.
- [x] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.

总体评价

核心修复逻辑正确：EOS token 路由排除、Overlap 实际 token 数修正、fused kernel 合并计算均验证合理。F1/F2/F6/F7 已修复；建议后续迭代处理 F8（kernel bsz 上限）和 F5（DSA int64 兼容性确认）。

PaddlePaddle-bot · 2026-05-30T10:04:14Z

                    if hasattr(task, "output_token_ids")
                    else task.prompt_token_ids_len
                )
+                if task.output_token_ids[-1] in task.eos_token_ids:


🟡 建议 EOS 检查缺少防御性守卫

当前写法 task.output_token_ids[-1] in task.eos_token_ids 在以下边界场景会抛异常：

task.eos_token_ids 为 None → TypeError

task.output_token_ids 为空列表 → IndexError

虽然外层有 try/except 兜底不会 crash，但会导致该请求的 routing 数据静默丢失。

建议修复：

if ( hasattr(task, "output_token_ids") and task.output_token_ids and task.eos_token_ids and task.output_token_ids[-1] in task.eos_token_ids ): seq_len = seq_len - 1 # Ignore eos token

gongshaotian added 3 commits May 20, 2026 17:26

Reset buffer size of R3

507e464

refine code

956b543

R3 fix Eos bug

ef9c4b3