Skip to content

[RL] Fix the incorrect routing of EOS tokens, which leads to changes in accuracy#7960

Open
gongshaotian wants to merge 11 commits into
PaddlePaddle:release/2.6from
gongshaotian:r3_eos_2.6
Open

[RL] Fix the incorrect routing of EOS tokens, which leads to changes in accuracy#7960
gongshaotian wants to merge 11 commits into
PaddlePaddle:release/2.6from
gongshaotian:r3_eos_2.6

Conversation

@gongshaotian
Copy link
Copy Markdown
Collaborator

@gongshaotian gongshaotian commented May 29, 2026

Motivation

An incorrect calculation of the context length in the tokenizer led to one extra route for the output tokens compared to what was actually captured.
In Overlap mode, the estimated token count for the current inference step might be a bit higher than the actual count. This could cause some contamination in the routing cache when updating the CPU cache.
These two issues together cause the k3_kl value to become unstable after R3 starts supporting Overlap.

Modifications

  1. Integrating multiple operators into get_positions_and_slot_mapping()
  2. Route that no longer returns EOS tokens
  3. In the Overlap Schedule mode,, only the flush routing offor realthe actual token numcount is applicable.
  4. ADD debug mode

Usage or Command

Add debug model

    --routing-replay-config '{"enable_routing_replay":true, "debug_mode":true}' 

Accuracy Tests

Add tests/operators/test_get_position_ids_and_slot_mapping.py

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

@paddle-bot
Copy link
Copy Markdown

paddle-bot Bot commented May 29, 2026

Thanks for your contribution!

PaddlePaddle-bot

This comment was marked as outdated.

PaddlePaddle-bot

This comment was marked as outdated.

@PaddlePaddle-bot
Copy link
Copy Markdown

PaddlePaddle-bot commented May 29, 2026

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-05-30 18:27:07

CI报告基于以下代码生成(30分钟更新一次):


1 任务总览

存在 2 个 Required 任务失败1 个 Required 任务运行中,需优先处理失败项。

总执行(rerun次数) 总任务 ✅ 通过 ❌ 失败 ⏳ 运行中 ⏸️ 等待中 跳过
35(0) 35 30 3 2 0 0

2 任务状态汇总

2.1 Required任务 : 7/10 通过

必选任务阻塞合并,失败需优先处理。

状态 任务 耗时 根因 修复建议 日志 重跑
Run Four Cards Tests / run_4_cards_tests 12m17s PR问题:EOS路由修改导致routing replay未生成r3_chat_completion_stream目录 检查token_processor.pyrouting replay输出逻辑确认stream目录生成 Job -
Approval 18s 需要 Approval 请通过人工审批 Job -
Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage - 运行中 - Job -
其余 7 个必选任务通过 - - - - -

2.2 可选任务 — 23/25 通过

可选任务不阻塞合并,失败仅供参考。

状态 任务 耗时 日志 重跑
Run iluvatar Tests / run_iluvatar_cases 1m46s Job -
CI_HPU - Job -
其余 23 个可选任务通过 - - -

3 失败详情(仅 required)

Run Four Cards Tests / run_4_cards_tests — 测试失败(置信度: 中)

Run Four Cards Tests / run_4_cards_tests

  • 状态: ❌ 失败
  • 错误类型: 测试失败
  • 置信度: 中
  • 根因摘要: PR修改EOS路由后,routing replay未生成r3_chat_completion_stream目录
  • 分析器: ci_analyze_unittest_fastdeploy

失败用例:

测试 错误 根因
test_GLM_45_AIR_mtp_tp4.py::test_r3_accuracy FileNotFoundError routing replay未生成r3_chat_completion_stream目录
test_GLM_45_AIR_tp4.py::test_r3_accuracy FileNotFoundError routing replay未生成r3_chat_completion_stream目录

根因详情:
PR 将 baseline 路径从 R3_BaseLine_uint8_0424 更新为 R3_BaseLine_uint8_0530,并修改了 EOS token 的路由逻辑(fastdeploy/output/token_processor.py 等文件)。test_r3_accuracy 调用 generated_base_line_routing_index 时尝试将 ./R3_tmp/routing_replay_output_glm45air_mtp_tp4/r3_chat_completion_stream 移动到新 baseline 目录,但该源目录不存在,说明 routing replay 在 PR 修改后未能生成预期的流式输出目录。这可能是因为 EOS token 不再被路由,导致 stream 输出结构发生变化。

关键日志:

FileNotFoundError: [Errno 2] No such file or directory:
  './R3_tmp/routing_replay_output_glm45air_mtp_tp4/r3_chat_completion_stream'
  -> '/ModelData/R3_BaseLine_uint8_0530/routing_replay_output_baseline_glm45air_mtp_tp4/r3_chat_completion_stream'
tests/e2e/4cards_cases/test_GLM_45_AIR_mtp_tp4.py:208: in test_r3_accuracy
tests/e2e/utils/rollout_routing_replay_test_utils.py:168: in check_routing_replay_chat_completion
tests/e2e/utils/rollout_routing_replay_test_utils.py:129: in generated_base_line_routing_index

修复建议:

  1. 检查 fastdeploy/output/token_processor.py 中 routing replay 的流式输出保存逻辑,确认 r3_chat_completion_stream 目录是否仍被正确创建
  2. 检查 tests/e2e/utils/rollout_routing_replay_test_utils.py L129 的 generated_base_line_routing_index 函数,确认其期望的输出目录名称是否与 PR 修改后的实际输出路径一致

修复建议摘要: 检查token_processor.py中stream目录生成逻辑与测试期望路径一致性

关联变更: tests/e2e/utils/rollout_routing_replay_test_utils.py L159(baseline路径从0424→0530)、fastdeploy/output/token_processor.py(EOS路由逻辑变更)
链接: 查看日志

Approval — 需要人工审批

该 Job 需要人工 Approval,完成审批后 CI 才会继续执行。

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 29, 2026

Codecov Report

❌ Patch coverage is 25.64103% with 87 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (release/2.6@ac24fcc). Learn more about missing BASE report.

Files with missing lines Patch % Lines
...model_executor/layers/moe/routing_indices_cache.py 18.75% 47 Missing and 5 partials ⚠️
fastdeploy/cache_manager/routing_cache_manager.py 4.54% 21 Missing ⚠️
fastdeploy/worker/gpu_model_runner.py 61.90% 7 Missing and 1 partial ⚠️
fastdeploy/config.py 33.33% 1 Missing and 1 partial ⚠️
fastdeploy/model_executor/pre_and_post_process.py 50.00% 2 Missing ⚠️
fastdeploy/output/token_processor.py 0.00% 2 Missing ⚠️
Additional details and impacted files
@@              Coverage Diff               @@
##             release/2.6    #7960   +/-   ##
==============================================
  Coverage               ?   71.63%           
==============================================
  Files                  ?      386           
  Lines                  ?    55447           
  Branches               ?     8688           
==============================================
  Hits                   ?    39718           
  Misses                 ?    12927           
  Partials               ?     2802           
Flag Coverage Δ
GPU 71.63% <25.64%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment thread fastdeploy/worker/gpu_model_runner.py Outdated
Comment thread fastdeploy/worker/gpu_model_runner.py Outdated
PaddlePaddle-bot

This comment was marked as outdated.

PaddlePaddle-bot

This comment was marked as outdated.

PaddlePaddle-bot

This comment was marked as outdated.

PaddlePaddle-bot

This comment was marked as outdated.

PaddlePaddle-bot

This comment was marked as outdated.

Copy link
Copy Markdown

@PaddlePaddle-bot PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Paddle-CI-Agent | pr_review | 2026-05-30 18:02:16

📋 Review 摘要

PR 概述:修复 R3 Routing Replay 中 EOS token 路由错误和 Overlap 模式下 token 计数不准导致的精度不稳定问题
变更范围:custom_ops(新增 fused kernel)、cache_manager、worker、model_executor/layers/moe、output/token_processor
影响面 Tag[RL] [OP] [KVCache] [FDConfig]

问题

级别 文件 概述
🟡 建议 fastdeploy/output/token_processor.py:620 EOS 检查缺少防御性守卫,当 eos_token_ids 为 None 或 output_token_ids 为空时会抛异常

历史 Findings 修复情况

Finding 问题 状态
F1 block_tables dtype 不匹配 ✅ 已修复
F2 logger.info 在热路径刷屏 ✅ 已修复(已加 debug_mode 守卫)
F3 遗留三重注释代码 ⚠️ 仍存在
F4 token_num_overlap 初始化为 0 ⚠️ 仍存在(经分析,执行流保证首次调用前已赋值,非实际 bug)
F5 DSA Attention Backend 功能回归 ⚠️ 仍存在(position_ids 从 int32 改为 int64,DSA 消费侧需确认兼容)
F6 seq_len - 1 在 recovery_stop 场景下 off-by-one ✅ 已修复(EOS 检查已精确守卫)
F7 routing_replay_manager 为 None 时 AttributeError ✅ 已修复(已加 is not None 判断)
F8 Kernel <<<1, bsz>>> 当 bsz>1024 静默失败 ⚠️ 仍存在
F9 Kernel 串行循环性能问题 ⚠️ 仍存在

📝 PR 规范检查

PR 目标分支为 release/2.6(非 develop),按仓库规范应使用 Cherry-Pick 格式标题。同时 Checklist 条目均未勾选。

标题建议(可直接复制):

  • [Cherry-Pick][RL] Fix the incorrect routing of EOS tokens, which leads to changes in accuracy
PR 描述建议(点击展开,可直接复制)
## Motivation

An incorrect calculation of the context length in the tokenizer led to one extra route for the output tokens compared to what was actually captured.
In Overlap mode, the estimated token count for the current inference step might be a bit higher than the actual count. This could cause some contamination in the routing cache when updating the CPU cache.
These two issues together cause the k3_kl value to become unstable after R3 starts supporting Overlap.

## Modifications

1. Integrating multiple operators into `get_positions_and_slot_mapping()` — a new fused CUDA kernel replacing separate `get_position_ids` + Python-side slot_mapping computation
2. Route that no longer returns EOS tokens (`seq_len - 1` in `_finalize_routing`)
3. In the Overlap Schedule mode, only the flush routing for the actual token count is applicable (`token_num_overlap`)
4. Added Debug mode for R3 routing validation (`--routing-replay-config '{"enable_routing_replay":true, "debug_mode":true}'`)

## Usage or Command

Add debug mode:

--routing-replay-config '{"enable_routing_replay":true, "debug_mode":true}'


## Accuracy Tests

Add tests/operators/test_get_position_ids_and_slot_mapping.py

## Checklist

- [x] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
  - You can add new tags based on the PR content, but the semantics must be clear.
- [ ] Format your code, run `pre-commit` before commit.
- [x] Add unit tests. Please write the reason in this PR if no unit tests.
- [ ] Provide accuracy results.
- [x] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.

总体评价

核心修复逻辑正确:EOS token 路由排除、Overlap 实际 token 数修正、fused kernel 合并计算均验证合理。F1/F2/F6/F7 已修复;建议后续迭代处理 F8(kernel bsz 上限)和 F5(DSA int64 兼容性确认)。

if hasattr(task, "output_token_ids")
else task.prompt_token_ids_len
)
if task.output_token_ids[-1] in task.eos_token_ids:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 建议 EOS 检查缺少防御性守卫

当前写法 task.output_token_ids[-1] in task.eos_token_ids 在以下边界场景会抛异常:

  • task.eos_token_idsNoneTypeError
  • task.output_token_ids 为空列表 → IndexError

虽然外层有 try/except 兜底不会 crash,但会导致该请求的 routing 数据静默丢失。

建议修复:

if (
    hasattr(task, "output_token_ids")
    and task.output_token_ids
    and task.eos_token_ids
    and task.output_token_ids[-1] in task.eos_token_ids
):
    seq_len = seq_len - 1  # Ignore eos token

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants