Skip to content

Conversation

@Clarence-1103
Copy link
Contributor

Purpose

What this PR does / why we need it?

Optimize the kvcomp performance on CUDA by modifying CUDA SIMD and adding the Triton hash encoder.

Modifications

Does this PR introduce any user-facing change?

Test

How was this patch tested?

1.Compare the performance of the old and new hash retrieval backend.
image
new hash retrieval backend spent 0.17117667198181152 s
old hash retrieval backend spent 0.4427645206451416 s

2.Benchmark testing of triton_hash_code and torch_hash_code
image

3.E2E test
old:
image
new:
image

@hek14 hek14 merged commit fdc31df into ModelEngine-Group:develop Nov 28, 2025
3 checks passed
ygwpz pushed a commit that referenced this pull request Nov 28, 2025
* [fix] fix sparse attention (#397)

fix ascend attention

Co-authored-by: lijiachen19 <lijiachen19@huawei.com>

* [opt] Share Infra implementation and unify status codes (#399)

share infra module

Co-authored-by: Fang Run <Fang_Run@126.com>

* [bugfix] Fix ESA to be compatible with the latest NFSStore. (#401)

fix esa to adapt latest NFSStore

* release v0.1.0rc4 (#402)

Co-authored-by: lijiachen19 <lijiachen19@huawei.com>

* [opt] Remove unused cc impl of dramstore (#406)

remove unused cc impl of dramstore

* [Fix]remove dram docs and modify quick-start doc (#411)

* [Fix]remove dram docs and modify quick-start doc

* modify index.md

---------

Co-authored-by: t00939662 <tianxuehan@huawei.com>

* [Feature] Added performance testing tool based on the PyTest testing framework (#295)

Performance testing tool based on the PyTest testing framework.

* [Misc] Add cpp-linter.yml (#422)

* [docs]add metrics doc (#416)

* [docs]add metrics doc

* modify metrics.md

* modify metrics.md

---------

Co-authored-by: t00939662 <tianxuehan@huawei.com>

* [perf] Modify CUDA SIMD and add Triton hash encoder (#408)

* fix cpp code style

---------

Co-authored-by: Lijiachen1018 <30387633+Lijiachen1018@users.noreply.github.com>
Co-authored-by: lijiachen19 <lijiachen19@huawei.com>
Co-authored-by: Mag1c.H <hemajun815@163.com>
Co-authored-by: Fang Run <Fang_Run@126.com>
Co-authored-by: MaxWang <wangwenxin21@huawei.com>
Co-authored-by: hero0307 <tianxuehan0307@163.com>
Co-authored-by: t00939662 <tianxuehan@huawei.com>
Co-authored-by: ML <85485147+Menglths@users.noreply.github.com>
Co-authored-by: ShiXiaolei <indirashi@163.com>
sumingZero pushed a commit to sumingZero/unified-cache-management that referenced this pull request Nov 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants