[BugFix]: Slice image features correctly when prefix cache hits on multimodal requests (qwen2.5-vl) by Smallhucaptain · Pull Request #6558 · PaddlePaddle/FastDeploy

Smallhucaptain · 2026-02-28T03:52:19Z

问题描述

enable_prefix_caching=True 时，发送两次相同的多模态请求，第二次崩溃：
ValueError: cannot copy sequence with size 3577 to array axis with dimension 14

根本原因

input_ids 被正确切片到 [prefill_start, prefill_end)，
但 image_features 没有对应切片，仍然返回完整的 [3577, 3584]，
导致 get_input_embeddings() 中形状不匹配。

修复方案

新增 _calc_image_feature_range()，根据 mm_positions 计算
prefill 范围对应的图像特征切片范围，在 extract_vision_features()
之后进行切片。

测试验证

两次相同请求均正常返回，无崩溃
sliced_feature_count == image_tokens_in_prefill (14 == 14)
两次请求的图像特征 Last row 完全一致

…addle#5408) * [RL] Support Rollout Routing Replay * add routing indices cache * fix config bug and moe forward bug * R3 Support GLM * support eb4.5 * fix merge bug * Apply suggestion from @Copilot * Apply suggestion from @Copilot * Apply suggestion from @Copilot * Apply suggestion from @Copilot * add routing replay ci * support glm topk * support orther top_k * fix ci bug * pre-commit * only support chatcmpl * Revert "Revert "[RL] Support Rollout Routing Replay (PaddlePaddle#5321)" (PaddlePaddle#5402)" This reverts commit c45e064. * Fix XPU and NPU bug --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Yuanle Liu <yuanlehome@163.com>

…dleOCR-VL (PaddlePaddle#5413) (PaddlePaddle#5414) * [BugFix] Fix some parameter place on CPU in PaddleOCR-VL * clean log * fix codestyle

…#5423) * fix bug * fix bug

…cess_group for RL (PaddlePaddle#5433) (PaddlePaddle#5434) * [fix] remove shutdown_process_group/restart_process_group for RL * [chore] remove log * [chore] remove log * [chore] set log to debug level

…ddlePaddle#5432)

PaddlePaddle#5448)

* [BugFix] fix instability after clearing weight * [chore] add todo

…Paddle#5492)(PaddlePaddle#5499) (PaddlePaddle#5498) * [BugFix] fix hung when n>1 and --enable-logprob (PaddlePaddle#5492) * check * check * check

…ing is done (PaddlePaddle#5527) (PaddlePaddle#5523) * [fix] fix ep loop * [fix] another try * [fix] again

…ddlePaddle#5486) (PaddlePaddle#5536)

…addlePaddle#5519) * fix dyname load bug * update * update

-

…ePaddle#5578) (PaddlePaddle#5583) * [CI] Remove test_metrics.py due to incompatible forced merge (PaddlePaddle#5578) * [CI] Adapt vl_model baseline changes due to Paddle update (PaddlePaddle#5576)

…dle#5468) * [RL] R3 support rdma store * refine code * refine notes * disable prefix cache * fix ci bug * support preempted task and put cpu tensor

…ddlePaddle#5568) (PaddlePaddle#5597) * fix mtp entropy drop in RL * optimize usage and fix unit test * optimize padding_sampling_params speed(vectorized)

…addlePaddle#5491) (PaddlePaddle#5617) * [liuzichang spend 10 dyas]fix write qknorm cache bug * fix 'fix cachekv bug''

…monitoring.(PaddlePaddle#5518) (PaddlePaddle#5614) * support spec metrics monitor per request

…ddlePaddle#5621)

* [Model] tp+ep support v1_loader * fix * fix mtp_linear * fix mtp_linear * fix * fix * fix v0 loader * fix * Add get_tensor for EP * fix linear weight_loader * fix typo * fix

…Paddle#6201) * [fix] fix cache transfer tasks failure after cache cleared * [fix] fix submit_task

…lePaddle#6047, PaddlePaddle#6093) (PaddlePaddle#6219)

* Update download_dependencies.sh

…lash_mask_attn PaddlePaddle#6238 (PaddlePaddle#6232) * fash_mask_attn support mixed * enhance deep_ep and fix bug * update * fix

…addlePaddle#6193) * cherry pick * bug fix tool_calls (PaddlePaddle#6166) * fix image gen (PaddlePaddle#6175) * fix unit test

…#6096 (#…" (PaddlePaddle#6253) This reverts commit c424287.

…PaddlePaddle#6120) * fused put routing * fix bug * [draft commit]dynamic dtype * Updated to accommodate uint8 baseline changes * fix async put & numpy bug --------- Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>

…ePaddle#6256) * support glm mtp rl model * update baseline

…itly to avoid pip cache issues (PaddlePaddle#6265)

…#6082)(PaddlePaddle#6270) (PaddlePaddle#6279) * [Cherry-Pick][CI] Add 4-GPU test job and fix stable_test(PaddlePaddle#6082)(PaddlePaddle#6270) * fix error

…ddlePaddle#6193 (PaddlePaddle#6258) * cherry pick * bug fix tool_calls (PaddlePaddle#6166) * fix image gen (PaddlePaddle#6175) * fix unit test

…#6096 PaddlePaddle#6…" (PaddlePaddle#6293) This reverts commit 38ed58e.

… plugins…" (PaddlePaddle#6294) This reverts commit 5f077c5.

…PaddlePaddle#6033) (PaddlePaddle#6299)

PaddlePaddle#6256)" (PaddlePaddle#6302) This reverts commit 4097455.

…ayers (PaddlePaddle#6312)

* add envs USE_FD_FP8_QUANT * USE_FD_FP8_QUANT -> FD_USE_PHI_FP8_QUANT 默认是true * modify comments * use bool type

…ePaddle#6268)

…ache hits

…ltimodal requests (qwen2.5-vl)

paddle-bot · 2026-02-28T03:52:29Z

Thanks for your contribution!

CLAassistant · 2026-02-28T03:52:34Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
17 out of 18 committers have signed the CLA.

✅ liyonghua0910
✅ EmmonsCurse
✅ kevincheng2
✅ iosmers
✅ Deleter-D
✅ gongshaotian
✅ yuanlehome
✅ fxyfxy777
✅ a31413510
✅ yangjianfengo1
✅ plusNew001
✅ Jiang-Jia-Jun
✅ qwes5s5
✅ luukunn
✅ ApplEOFDiscord
✅ xiaoxiaohehe001
✅ ZhangYulongg
❌ huzesen

huzesen seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

bukejiyu and others added 30 commits December 6, 2025 00:47

cp pr5373 pr5379 pr5410 (PaddlePaddle#5411)

7eea23f

[Cherry-Pick][Loader][BugFix] Fix some parameters place on CPU in Pad…

7926add

…dleOCR-VL (PaddlePaddle#5413) (PaddlePaddle#5414) * [BugFix] Fix some parameter place on CPU in PaddleOCR-VL * clean log * fix codestyle

Update setup.py

1dceb1c

[BugFix][Cherry-Pick] fix can not enter into cuda graph (PaddlePaddle…

d4c16aa

…#5423) * fix bug * fix bug

[Cherry-Pick] [BugFix] [RL] remove shutdown_process_group/restart_pro…

31436a3

…cess_group for RL (PaddlePaddle#5433) (PaddlePaddle#5434) * [fix] remove shutdown_process_group/restart_process_group for RL * [chore] remove log * [chore] remove log * [chore] set log to debug level

[BugFix] 0 not into cuda graph to save memory (PaddlePaddle#5426) (Pa…

4b9e2c5

…ddlePaddle#5432)

support dynamic load for normal (PaddlePaddle#5437)

2c55bbc

[Optimization] compulte real max_logprobs in batch (PaddlePaddle#5430) (

b491dcd

PaddlePaddle#5448)

commit (PaddlePaddle#5452)

e9174f2

fix limit_thinking bug (PaddlePaddle#5469)

1776d41

fix attention bug in spec decoding (PaddlePaddle#5481)

c5c43e3

[CI][XPU] ep+prefix cache+chunk prefill (PaddlePaddle#5490)

bcde798

[BugFix] fix instability after clearing weight (PaddlePaddle#5487)

7019afb

* [BugFix] fix instability after clearing weight * [chore] add todo

[CI] disable test_cuda_graph_dynamic_subgraph.py in unit_test

b435639

RL fix (PaddlePaddle#5505)

71781b5

[[Cherry-Pick][BugFix] fix hung when n>1 and --enable-logprob (Paddle…

4e5e36e

…Paddle#5492)(PaddlePaddle#5499) (PaddlePaddle#5498) * [BugFix] fix hung when n>1 and --enable-logprob (PaddlePaddle#5492) * check * check * check

[Cherry-Pick] [BugFix] [RL] skip model executing after clearing/updat…

12e0206

…ing is done (PaddlePaddle#5527) (PaddlePaddle#5523) * [fix] fix ep loop * [fix] another try * [fix] again

[Feature][Optimization] Qwen Support Dynamic block_wise_fp8 cache (Pa…

5bdef76

…ddlePaddle#5486) (PaddlePaddle#5536)

Fix bug for caching output when preempted (PaddlePaddle#5510)

0fa40f5

[Cherry-Pick][BugFix] fix dynamic c8 in v1 loader(PaddlePaddle#5562) (P…

99b4024

…addlePaddle#5519) * fix dyname load bug * update * update

【NewFeature】support load fp8 weight (PaddlePaddle#5566)

9f74233

-

[Cherry-Pick][CI] Adape unit_test due to incompatibility change(Paddl…

53158b7

…ePaddle#5578) (PaddlePaddle#5583) * [CI] Remove test_metrics.py due to incompatible forced merge (PaddlePaddle#5578) * [CI] Adapt vl_model baseline changes due to Paddle update (PaddlePaddle#5576)

[Cherry-Pick][RL] R3 Support RDMA Store(PaddlePaddle#5467) (PaddlePad…

c19af49

…dle#5468) * [RL] R3 support rdma store * refine code * refine notes * disable prefix cache * fix ci bug * support preempted task and put cpu tensor

[Cherry-Pick][CI]Support different inferseed in speculate decoding(Pa…

a7359d1

…ddlePaddle#5568) (PaddlePaddle#5597) * fix mtp entropy drop in RL * optimize usage and fix unit test * optimize padding_sampling_params speed(vectorized)

add detoken switch (PaddlePaddle#5463) (PaddlePaddle#5572)

d67b64d

[Cherry-Pick][CI]Fix write qknorm cache bug in speculative decoding(P…

d7d633a

…addlePaddle#5491) (PaddlePaddle#5617) * [liuzichang spend 10 dyas]fix write qknorm cache bug * fix 'fix cachekv bug''

[Cherry-Pick] Support for request-level speculative decoding metrics …

e56c4dd

…monitoring.(PaddlePaddle#5518) (PaddlePaddle#5614) * support spec metrics monitor per request

[Others] Maintain the mtp branch temporarily. (PaddlePaddle#5446) (Pa…

5300e73

…ddlePaddle#5621)

[Model] tp+ep support v1_loader (PaddlePaddle#5600)

a30a5b4

* [Model] tp+ep support v1_loader * fix * fix mtp_linear * fix mtp_linear * fix * fix * fix v0 loader * fix * Add get_tensor for EP * fix linear weight_loader * fix typo * fix

liyonghua0910 and others added 24 commits January 26, 2026 09:52

[BugFix] fix cache transfer tasks failure after cache cleared (Paddle…

1c01c9b

…Paddle#6201) * [fix] fix cache transfer tasks failure after cache cleared * [fix] fix submit_task

Update _build_linux_rl.yml (PaddlePaddle#6215)

3c99a5d

[fix] fix pd_comm_port index out of bound (PaddlePaddle#6106)

5cf2da4

[Cherry-Pick][Speculative Decoding] Support MTP for GLM-4.5-Air (Padd…

c8cf686

…lePaddle#6047, PaddlePaddle#6093) (PaddlePaddle#6219)

[XPU][CI] Release ci update (PaddlePaddle#6212)

1d519b9

* Update download_dependencies.sh

[Cherry-Pick][Others] enhance deep_ep import and support mixed mode f…

fb7ec62

…lash_mask_attn PaddlePaddle#6238 (PaddlePaddle#6232) * fash_mask_attn support mixed * enhance deep_ep and fix bug * update * fix

[Cherry-Pick] update data_processor & add tool parser plugins#6096 (P…

c424287

…addlePaddle#6193) * cherry pick * bug fix tool_calls (PaddlePaddle#6166) * fix image gen (PaddlePaddle#6175) * fix unit test

Revert "[Cherry-Pick] update data_processor & add tool parser plugins…

fccfe57

…#6096 (#…" (PaddlePaddle#6253) This reverts commit c424287.

[Cherry-Pick][RL] Support GLM MTP RL Model (PaddlePaddle#6223) (Paddl…

4097455

…ePaddle#6256) * support glm mtp rl model * update baseline

[CI] Remove test_splitwise_scheduler and download latest_wheel explic…

f04ba48

…itly to avoid pip cache issues (PaddlePaddle#6265)

add token ratio metrics (PaddlePaddle#6245)

1079f92

[Cherry-Pick][CI] Add 4-GPU test job and fix stable_test(PaddlePaddle…

0f35091

…#6082)(PaddlePaddle#6270) (PaddlePaddle#6279) * [Cherry-Pick][CI] Add 4-GPU test job and fix stable_test(PaddlePaddle#6082)(PaddlePaddle#6270) * fix error

[Cherry-Pick] update data_processor & add tool parser plugins#6096 Pa…

38ed58e

…ddlePaddle#6193 (PaddlePaddle#6258) * cherry pick * bug fix tool_calls (PaddlePaddle#6166) * fix image gen (PaddlePaddle#6175) * fix unit test

Revert "[Cherry-Pick] update data_processor & add tool parser plugins…

5f077c5

…#6096 PaddlePaddle#6…" (PaddlePaddle#6293) This reverts commit 38ed58e.

Revert "Revert "[Cherry-Pick] update data_processor & add tool parser…

e677cd5

… plugins…" (PaddlePaddle#6294) This reverts commit 5f077c5.

[Cherry-Pick][CI] Adapt vl_model baseline changes due to linear update(…

3a270c2

…PaddlePaddle#6033) (PaddlePaddle#6299)

Revert "[Cherry-Pick][RL] Support GLM MTP RL Model (PaddlePaddle#6223) (

457b69f

PaddlePaddle#6256)" (PaddlePaddle#6302) This reverts commit 4097455.

[Cherry-Pick] [Optimize] Qwen2.5-VL vision model with merged linear l…

3dbc04b

…ayers (PaddlePaddle#6312)

fix tokenizer oom (PaddlePaddle#6325)

499a272

[Feature] add envs FD_USE_PHI_FP8_QUANT (PaddlePaddle#6319)

e56ffc9

* add envs USE_FD_FP8_QUANT * USE_FD_FP8_QUANT -> FD_USE_PHI_FP8_QUANT 默认是true * modify comments * use bool type

[Cherry-Pick][RL] Support GLM MTP RL Model (PaddlePaddle#6267) (Paddl…

ec15a0f

…ePaddle#6268)

fix(worker):(qwen2.5-vl) slice image features correctly when prefix c…

1588a45

…ache hits

[BugFix]: Slice image features correctly when prefix cache hits on mu…

c0b8525

…ltimodal requests (qwen2.5-vl)

Smallhucaptain had a problem deploying to Metax_ci February 28, 2026 03:52 — with GitHub Actions Failure

paddle-bot bot added the contributor External developers label Feb 28, 2026

Smallhucaptain closed this Feb 28, 2026

Smallhucaptain had a problem deploying to Metax_ci February 28, 2026 04:05 — with GitHub Actions Failure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BugFix]: Slice image features correctly when prefix cache hits on multimodal requests (qwen2.5-vl)#6558

[BugFix]: Slice image features correctly when prefix cache hits on multimodal requests (qwen2.5-vl)#6558
Smallhucaptain wants to merge 143 commits intoPaddlePaddle:developfrom
Smallhucaptain:fix/prefix-cache-qwen2.5vl-image-feature-slicing

Smallhucaptain commented Feb 28, 2026

Uh oh!

paddle-bot bot commented Feb 28, 2026

Uh oh!

CLAassistant commented Feb 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

Smallhucaptain commented Feb 28, 2026

问题描述

根本原因

修复方案

测试验证

Uh oh!

paddle-bot bot commented Feb 28, 2026

Uh oh!

CLAassistant commented Feb 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants