[RL] Fix the incorrect routing of EOS tokens, which leads to changes in accuracy#7960
[RL] Fix the incorrect routing of EOS tokens, which leads to changes in accuracy#7960gongshaotian wants to merge 11 commits into
Conversation
|
Thanks for your contribution! |
CI报告基于以下代码生成(30分钟更新一次): 1 任务总览存在 2 个 Required 任务失败,1 个 Required 任务运行中,需优先处理失败项。
2 任务状态汇总2.1 Required任务 : 7/10 通过
2.2 可选任务 — 23/25 通过
3 失败详情(仅 required)Run Four Cards Tests / run_4_cards_tests — 测试失败(置信度: 中)Run Four Cards Tests / run_4_cards_tests
失败用例:
根因详情: 关键日志: 修复建议:
修复建议摘要: 检查 关联变更: Approval — 需要人工审批该 Job 需要人工 Approval,完成审批后 CI 才会继续执行。 |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## release/2.6 #7960 +/- ##
==============================================
Coverage ? 71.63%
==============================================
Files ? 386
Lines ? 55447
Branches ? 8688
==============================================
Hits ? 39718
Misses ? 12927
Partials ? 2802
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
PaddlePaddle-bot
left a comment
There was a problem hiding this comment.
🤖 Paddle-CI-Agent | pr_review |
2026-05-30 18:02:16
📋 Review 摘要
PR 概述:修复 R3 Routing Replay 中 EOS token 路由错误和 Overlap 模式下 token 计数不准导致的精度不稳定问题
变更范围:custom_ops(新增 fused kernel)、cache_manager、worker、model_executor/layers/moe、output/token_processor
影响面 Tag:[RL] [OP] [KVCache] [FDConfig]
问题
| 级别 | 文件 | 概述 |
|---|---|---|
| 🟡 建议 | fastdeploy/output/token_processor.py:620 |
EOS 检查缺少防御性守卫,当 eos_token_ids 为 None 或 output_token_ids 为空时会抛异常 |
历史 Findings 修复情况
| Finding | 问题 | 状态 |
|---|---|---|
| F1 | block_tables dtype 不匹配 |
✅ 已修复 |
| F2 | logger.info 在热路径刷屏 |
✅ 已修复(已加 debug_mode 守卫) |
| F3 | 遗留三重注释代码 | |
| F4 | token_num_overlap 初始化为 0 |
|
| F5 | DSA Attention Backend 功能回归 | |
| F6 | seq_len - 1 在 recovery_stop 场景下 off-by-one |
✅ 已修复(EOS 检查已精确守卫) |
| F7 | routing_replay_manager 为 None 时 AttributeError |
✅ 已修复(已加 is not None 判断) |
| F8 | Kernel <<<1, bsz>>> 当 bsz>1024 静默失败 |
|
| F9 | Kernel 串行循环性能问题 |
📝 PR 规范检查
PR 目标分支为 release/2.6(非 develop),按仓库规范应使用 Cherry-Pick 格式标题。同时 Checklist 条目均未勾选。
标题建议(可直接复制):
[Cherry-Pick][RL] Fix the incorrect routing of EOS tokens, which leads to changes in accuracy
PR 描述建议(点击展开,可直接复制)
## Motivation
An incorrect calculation of the context length in the tokenizer led to one extra route for the output tokens compared to what was actually captured.
In Overlap mode, the estimated token count for the current inference step might be a bit higher than the actual count. This could cause some contamination in the routing cache when updating the CPU cache.
These two issues together cause the k3_kl value to become unstable after R3 starts supporting Overlap.
## Modifications
1. Integrating multiple operators into `get_positions_and_slot_mapping()` — a new fused CUDA kernel replacing separate `get_position_ids` + Python-side slot_mapping computation
2. Route that no longer returns EOS tokens (`seq_len - 1` in `_finalize_routing`)
3. In the Overlap Schedule mode, only the flush routing for the actual token count is applicable (`token_num_overlap`)
4. Added Debug mode for R3 routing validation (`--routing-replay-config '{"enable_routing_replay":true, "debug_mode":true}'`)
## Usage or Command
Add debug mode:--routing-replay-config '{"enable_routing_replay":true, "debug_mode":true}'
## Accuracy Tests
Add tests/operators/test_get_position_ids_and_slot_mapping.py
## Checklist
- [x] Add at least a tag in the PR title.
- Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
- You can add new tags based on the PR content, but the semantics must be clear.
- [ ] Format your code, run `pre-commit` before commit.
- [x] Add unit tests. Please write the reason in this PR if no unit tests.
- [ ] Provide accuracy results.
- [x] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.
总体评价
核心修复逻辑正确:EOS token 路由排除、Overlap 实际 token 数修正、fused kernel 合并计算均验证合理。F1/F2/F6/F7 已修复;建议后续迭代处理 F8(kernel bsz 上限)和 F5(DSA int64 兼容性确认)。
| if hasattr(task, "output_token_ids") | ||
| else task.prompt_token_ids_len | ||
| ) | ||
| if task.output_token_ids[-1] in task.eos_token_ids: |
There was a problem hiding this comment.
🟡 建议 EOS 检查缺少防御性守卫
当前写法 task.output_token_ids[-1] in task.eos_token_ids 在以下边界场景会抛异常:
task.eos_token_ids为None→TypeErrortask.output_token_ids为空列表 →IndexError
虽然外层有 try/except 兜底不会 crash,但会导致该请求的 routing 数据静默丢失。
建议修复:
if (
hasattr(task, "output_token_ids")
and task.output_token_ids
and task.eos_token_ids
and task.output_token_ids[-1] in task.eos_token_ids
):
seq_len = seq_len - 1 # Ignore eos token
Motivation
An incorrect calculation of the context length in the tokenizer led to one extra route for the output tokens compared to what was actually captured.
In Overlap mode, the estimated token count for the current inference step might be a bit higher than the actual count. This could cause some contamination in the routing cache when updating the CPU cache.
These two issues together cause the k3_kl value to become unstable after R3 starts supporting Overlap.
Modifications
get_positions_and_slot_mapping()Usage or Command
Add debug model
Accuracy Tests
Add tests/operators/test_get_position_ids_and_slot_mapping.py
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.