[BugFix] fix mtp reset bugs in rl#7957
Conversation
|
Thanks for your contribution! |
PaddlePaddle-bot
left a comment
There was a problem hiding this comment.
🤖 Paddle-CI-Agent | pr_review |
2026-05-29 12:28:08
📋 Review 摘要
PR 概述:在 RL reset 场景下,reset_model_inputs 重建 ids_remove_padding / batch_id_per_token 前补充 max_chunk_tokens 的重新计算,修复 MTP CUDAGraph replay OOB 问题。
变更范围:fastdeploy/worker/input_batch.py
影响面 Tag:[RL] [Speculative Decoding]
问题
未发现阻塞性问题。PR 规范问题在下面章节报,不要在这里重复。
📝 PR 规范检查
标题缺少官方 Tag,且描述各 section 均为空。
标题建议(可直接复制):
[BugFix] fix mtp reset bugs in rl
PR 描述建议(点击展开,可直接复制)
## Motivation
在 RL 训练的 reset 流程中,`reset_model_inputs` 重建 `ids_remove_padding` 和 `batch_id_per_token` 时直接使用 `self.max_chunk_tokens`,但未在重建前重新计算该值。对于多模态模型(`enable_mm=True` 且 `mm_max_tokens_per_item is None`)场景,可能导致 tensor 尺寸与 CUDAGraph capture 时不一致,引发 replay OOB(CUDA error 700)。
## Modifications
- `fastdeploy/worker/input_batch.py`:在 `reset_model_inputs` 重建 `ids_remove_padding` / `batch_id_per_token` 前,补充与 `__init__` 一致的 `max_chunk_tokens` 计算逻辑(多模态无 `mm_max_tokens_per_item` 时取 `max_model_len`,否则调用 `fd_config.get_max_chunk_tokens`)。
## Usage or Command
N/A
## Accuracy Tests
N/A
## Checklist
- [ ] Add at least a tag in the PR title.
- Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
- You can add new tags based on the PR content, but the semantics must be clear.
- [ ] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. Please write the reason in this PR if no unit tests.
- [ ] Provide accuracy results.
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.总体评价
代码逻辑与 __init__ 保持一致,修复方向正确。PR 描述需补充 Motivation / Modifications 等必填 section,标题需加官方 Tag。
| # NOTE(fix): These tensors are dynamically resized during runtime inference. | ||
| # Must recreate at full initial size to avoid CUDAGraph replay OOB access. | ||
| max_num_seqs = self.scheduler_config.max_num_seqs | ||
| if self.enable_mm and self.model_config.mm_max_tokens_per_item is None: |
There was a problem hiding this comment.
❓ 疑问 max_chunk_tokens 是否会在运行时被修改?
从代码搜索来看,self.max_chunk_tokens 仅在 __init__(L106-109)中赋值,运行时未见其他修改路径。若该值在推理过程中确实不会变化,此处重新计算是冗余的(但无害)。
如果 MTP 推理路径中存在修改 max_chunk_tokens 的逻辑,建议在注释中说明触发场景,以便后续维护者理解修复意图。另外,init_share_inputs 中同样依赖 max_chunk_tokens 的 position_ids_buffer(L195)和 slot_mapping_buffer(L196)在 reset_model_inputs 中未做重置,若 max_chunk_tokens 确实会被修改,这两个 tensor 是否也需要同步重置?
CI报告基于以下代码生成(30分钟更新一次): 1 任务总览Required 任务当前 0 个失败、1 个运行中、0 个等待中。请等待 本轮已按要求查看 PR 变更上下文:本 PR 仅修改
2 任务状态汇总日志列说明:失败任务使用 CI 提供的日志链接;运行中任务链接到对应 Workflow。 2.1 Required任务 : 9/10 通过
2.2 可选任务 — 27/31 通过
3 失败详情(仅 required)无 required 失败任务,本轮未调用 |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #7957 +/- ##
==========================================
Coverage ? 67.59%
==========================================
Files ? 467
Lines ? 65182
Branches ? 10008
==========================================
Hits ? 44060
Misses ? 18303
Partials ? 2819
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Motivation
在 RL 训练的 reset 流程中,
reset_model_inputs重建ids_remove_padding和batch_id_per_token时直接使用self.max_chunk_tokens,但未在重建前重新计算该值。对于多模态模型(enable_mm=True且mm_max_tokens_per_item is None)场景,可能导致 tensor 尺寸与 CUDAGraph capture 时不一致,引发 replay OOB(CUDA error 700)。Modifications
fastdeploy/worker/input_batch.py:在reset_model_inputs重建ids_remove_padding/batch_id_per_token前,补充与__init__一致的max_chunk_tokens计算逻辑(多模态无mm_max_tokens_per_item时取max_model_len,否则调用fd_config.get_max_chunk_tokens)。Usage or Command
N/A
Accuracy Tests
N/A
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.