issue/1148: PagedAttentionPrefill 添加 KV cache 连续性 guard#1149
Open
JoeZhang-0x000 wants to merge 1 commit intoInfiniTensor:mainfrom
Open
issue/1148: PagedAttentionPrefill 添加 KV cache 连续性 guard#1149JoeZhang-0x000 wants to merge 1 commit intoInfiniTensor:mainfrom
JoeZhang-0x000 wants to merge 1 commit intoInfiniTensor:mainfrom
Conversation
Author
|
当前实现仅适合作为验证 InfiniCore PA prefill/decode 可用性的过渡方案。 |
Collaborator
|
感谢老师,这个算子确实加得匆忙了些,性能也不太理想。我们后续再研究一下优化安排 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
关联 Issue
Closes #1148
改动内容
在
python/infinicore/ops/paged_attention_prefill.py中添加_ensure_head_dim_contiguous辅助函数,对k_cache和v_cache的最后一维(head_dim)做连续性检查,不连续时自动调用.contiguous()。背景
paged_attention_prefill底层算子按 head_dim 做点积运算,要求 KV cache 最后一维 stride 为 1,这是普遍要求而非特定后端问题。传入非连续张量会触发Bad Tensor Strides错误,导致测试 failed 56/60。标准 vLLM KV cache view 的 head-dim stride 已经是 1,正常路径不会触发额外 copy。