Skip to content

[BugFix]: Slice image features correctly when prefix cache hits on multimodal requests (qwen2.5-vl)#6558

Closed
Smallhucaptain wants to merge 143 commits intoPaddlePaddle:developfrom
Smallhucaptain:fix/prefix-cache-qwen2.5vl-image-feature-slicing
Closed

[BugFix]: Slice image features correctly when prefix cache hits on multimodal requests (qwen2.5-vl)#6558
Smallhucaptain wants to merge 143 commits intoPaddlePaddle:developfrom
Smallhucaptain:fix/prefix-cache-qwen2.5vl-image-feature-slicing

Conversation

@Smallhucaptain
Copy link
Copy Markdown

问题描述

enable_prefix_caching=True 时,发送两次相同的多模态请求,第二次崩溃:
ValueError: cannot copy sequence with size 3577 to array axis with dimension 14

根本原因

input_ids 被正确切片到 [prefill_start, prefill_end),
但 image_features 没有对应切片,仍然返回完整的 [3577, 3584],
导致 get_input_embeddings() 中形状不匹配。

修复方案

新增 _calc_image_feature_range(),根据 mm_positions 计算
prefill 范围对应的图像特征切片范围,在 extract_vision_features()
之后进行切片。

测试验证

  • 两次相同请求均正常返回,无崩溃
  • sliced_feature_count == image_tokens_in_prefill (14 == 14)
  • 两次请求的图像特征 Last row 完全一致

bukejiyu and others added 30 commits December 6, 2025 00:47
…addle#5408)

* [RL] Support Rollout Routing Replay

* add routing indices cache

* fix config bug and moe forward bug

* R3 Support GLM

* support eb4.5

* fix merge bug

* Apply suggestion from @Copilot



* Apply suggestion from @Copilot



* Apply suggestion from @Copilot



* Apply suggestion from @Copilot



* add routing replay ci

* support glm topk

* support orther top_k

* fix ci bug

* pre-commit

* only support chatcmpl

* Revert "Revert "[RL] Support Rollout Routing Replay (PaddlePaddle#5321)" (PaddlePaddle#5402)"

This reverts commit c45e064.

* Fix XPU and NPU bug

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Yuanle Liu <yuanlehome@163.com>
…dleOCR-VL (PaddlePaddle#5413) (PaddlePaddle#5414)

* [BugFix] Fix some parameter place on CPU in PaddleOCR-VL

* clean log

* fix codestyle
…cess_group for RL (PaddlePaddle#5433) (PaddlePaddle#5434)

* [fix] remove shutdown_process_group/restart_process_group for RL

* [chore] remove log

* [chore] remove log

* [chore] set log to debug level
* [BugFix] fix instability after clearing weight

* [chore] add todo
…ing is done (PaddlePaddle#5527) (PaddlePaddle#5523)

* [fix] fix ep loop

* [fix] another try

* [fix] again
…ePaddle#5578) (PaddlePaddle#5583)

* [CI] Remove test_metrics.py due to incompatible forced merge (PaddlePaddle#5578)
* [CI] Adapt vl_model baseline changes due to Paddle update (PaddlePaddle#5576)
…dle#5468)

* [RL] R3 support rdma store

* refine code

* refine notes

* disable prefix cache

* fix ci bug

* support preempted task and put cpu tensor
…ddlePaddle#5568) (PaddlePaddle#5597)

* fix mtp entropy drop in RL

* optimize usage and fix unit test

* optimize padding_sampling_params speed(vectorized)
…addlePaddle#5491) (PaddlePaddle#5617)

* [liuzichang spend 10 dyas]fix write qknorm cache bug

* fix 'fix cachekv bug''
* [Model] tp+ep support v1_loader

* fix

* fix mtp_linear

* fix mtp_linear

* fix

* fix

* fix v0 loader

* fix

* Add get_tensor for EP

* fix linear weight_loader

* fix typo

* fix
liyonghua0910 and others added 24 commits January 26, 2026 09:52
…Paddle#6201)

* [fix] fix cache transfer tasks failure after cache cleared

* [fix] fix submit_task
* Update download_dependencies.sh
…lash_mask_attn PaddlePaddle#6238 (PaddlePaddle#6232)

* fash_mask_attn support mixed

* enhance deep_ep and fix bug

* update

* fix
…PaddlePaddle#6120)

* fused put routing

* fix bug

* [draft commit]dynamic dtype

* Updated to accommodate uint8 baseline changes

* fix async put & numpy bug

---------

Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
* add envs USE_FD_FP8_QUANT

* USE_FD_FP8_QUANT -> FD_USE_PHI_FP8_QUANT 默认是true

* modify comments

* use bool type
@paddle-bot
Copy link
Copy Markdown

paddle-bot bot commented Feb 28, 2026

Thanks for your contribution!

@paddle-bot paddle-bot bot added the contributor External developers label Feb 28, 2026
@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
17 out of 18 committers have signed the CLA.

✅ liyonghua0910
✅ EmmonsCurse
✅ kevincheng2
✅ iosmers
✅ Deleter-D
✅ gongshaotian
✅ yuanlehome
✅ fxyfxy777
✅ a31413510
✅ yangjianfengo1
✅ plusNew001
✅ Jiang-Jia-Jun
✅ qwes5s5
✅ luukunn
✅ ApplEOFDiscord
✅ xiaoxiaohehe001
✅ ZhangYulongg
❌ huzesen


huzesen seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor External developers

Projects

None yet

Development

Successfully merging this pull request may close these issues.