[BugFix]: Slice image features correctly when prefix cache hits on multimodal requests (qwen2.5-vl)#6558
Closed
Smallhucaptain wants to merge 143 commits intoPaddlePaddle:developfrom
Conversation
…addle#5408) * [RL] Support Rollout Routing Replay * add routing indices cache * fix config bug and moe forward bug * R3 Support GLM * support eb4.5 * fix merge bug * Apply suggestion from @Copilot * Apply suggestion from @Copilot * Apply suggestion from @Copilot * Apply suggestion from @Copilot * add routing replay ci * support glm topk * support orther top_k * fix ci bug * pre-commit * only support chatcmpl * Revert "Revert "[RL] Support Rollout Routing Replay (PaddlePaddle#5321)" (PaddlePaddle#5402)" This reverts commit c45e064. * Fix XPU and NPU bug --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Yuanle Liu <yuanlehome@163.com>
…dleOCR-VL (PaddlePaddle#5413) (PaddlePaddle#5414) * [BugFix] Fix some parameter place on CPU in PaddleOCR-VL * clean log * fix codestyle
…#5423) * fix bug * fix bug
…cess_group for RL (PaddlePaddle#5433) (PaddlePaddle#5434) * [fix] remove shutdown_process_group/restart_process_group for RL * [chore] remove log * [chore] remove log * [chore] set log to debug level
* [BugFix] fix instability after clearing weight * [chore] add todo
…Paddle#5492)(PaddlePaddle#5499) (PaddlePaddle#5498) * [BugFix] fix hung when n>1 and --enable-logprob (PaddlePaddle#5492) * check * check * check
…ing is done (PaddlePaddle#5527) (PaddlePaddle#5523) * [fix] fix ep loop * [fix] another try * [fix] again
…addlePaddle#5519) * fix dyname load bug * update * update
…ePaddle#5578) (PaddlePaddle#5583) * [CI] Remove test_metrics.py due to incompatible forced merge (PaddlePaddle#5578) * [CI] Adapt vl_model baseline changes due to Paddle update (PaddlePaddle#5576)
…dle#5468) * [RL] R3 support rdma store * refine code * refine notes * disable prefix cache * fix ci bug * support preempted task and put cpu tensor
…ddlePaddle#5568) (PaddlePaddle#5597) * fix mtp entropy drop in RL * optimize usage and fix unit test * optimize padding_sampling_params speed(vectorized)
…addlePaddle#5491) (PaddlePaddle#5617) * [liuzichang spend 10 dyas]fix write qknorm cache bug * fix 'fix cachekv bug''
…monitoring.(PaddlePaddle#5518) (PaddlePaddle#5614) * support spec metrics monitor per request
* [Model] tp+ep support v1_loader * fix * fix mtp_linear * fix mtp_linear * fix * fix * fix v0 loader * fix * Add get_tensor for EP * fix linear weight_loader * fix typo * fix
…Paddle#6201) * [fix] fix cache transfer tasks failure after cache cleared * [fix] fix submit_task
* Update download_dependencies.sh
…lash_mask_attn PaddlePaddle#6238 (PaddlePaddle#6232) * fash_mask_attn support mixed * enhance deep_ep and fix bug * update * fix
…addlePaddle#6193) * cherry pick * bug fix tool_calls (PaddlePaddle#6166) * fix image gen (PaddlePaddle#6175) * fix unit test
…#6096 (#…" (PaddlePaddle#6253) This reverts commit c424287.
…PaddlePaddle#6120) * fused put routing * fix bug * [draft commit]dynamic dtype * Updated to accommodate uint8 baseline changes * fix async put & numpy bug --------- Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
…ePaddle#6256) * support glm mtp rl model * update baseline
…itly to avoid pip cache issues (PaddlePaddle#6265)
…#6082)(PaddlePaddle#6270) (PaddlePaddle#6279) * [Cherry-Pick][CI] Add 4-GPU test job and fix stable_test(PaddlePaddle#6082)(PaddlePaddle#6270) * fix error
…ddlePaddle#6193 (PaddlePaddle#6258) * cherry pick * bug fix tool_calls (PaddlePaddle#6166) * fix image gen (PaddlePaddle#6175) * fix unit test
…#6096 PaddlePaddle#6…" (PaddlePaddle#6293) This reverts commit 38ed58e.
… plugins…" (PaddlePaddle#6294) This reverts commit 5f077c5.
* add envs USE_FD_FP8_QUANT * USE_FD_FP8_QUANT -> FD_USE_PHI_FP8_QUANT 默认是true * modify comments * use bool type
…ltimodal requests (qwen2.5-vl)
|
Thanks for your contribution! |
|
huzesen seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
问题描述
enable_prefix_caching=True 时,发送两次相同的多模态请求,第二次崩溃:
ValueError: cannot copy sequence with size 3577 to array axis with dimension 14
根本原因
input_ids 被正确切片到 [prefill_start, prefill_end),
但 image_features 没有对应切片,仍然返回完整的 [3577, 3584],
导致 get_input_embeddings() 中形状不匹配。
修复方案
新增 _calc_image_feature_range(),根据 mm_positions 计算
prefill 范围对应的图像特征切片范围,在 extract_vision_features()
之后进行切片。
测试验证