[fix] Adapt all sparse-attention methods to the new connector. #441

wangwenxin0312 · 2025-12-01T01:43:08Z

Purpose

What this PR does / why we need it?
Adapt to the new connector by updating the hash generation and layer-offset computation.

Modifications

Does this PR introduce any user-facing change?

Test

How was this patch tested?
python examples/offline_inference_esa.py

…Engine-Group#441) * sparse to adapt new connector * Adapt the YAML configuration

* [opt] refactor uc connector (#364) refactor ucm_connector * [Feat] Implement kv cache broadcast in MLA (#367) * [Feat] Implement kv cache broadcast in MLA in ucm_connector * [Style] Change wait for broadcast into single task method * [feature] add ucm mock connector (#375) * add ucm mock connector * fix chunk prefill bug * [Feat] Support get launch config from yaml (#377) * [Feat] Support launch from config file * [Docs] Update documents for launch with yaml * [Fix] Change load only on first rank into configuration * [Feat] Add support for hit ratio in yaml * [Fix] Fix load only first rank in non mla scene * [fix] refuse monkey patch (#383) refuse monkey patch * [bugfix] fix gqa bug (#384) fix gqa bug * [bugfix] end == 0 bug (#385) fix end == 0 bug * [feature] optimize generate_tensor (#396) optimize generate_tensor * [Fix] fix mla bug when no broadcast in wait for save (#398) * [feat]adapt GQA & modify config.yaml (#407) * adapt GQA & modify config.yaml * move process to UCMDirectConnector * fix comment * modify hash function * fix style * code style and modify hash * init parent_block_hash_value * [feat]Adapt vllm_ascend_0110 and Add configurable options (#415) * Adapt vllm_ascend_0110 and Add configurable options * avoid type conversion in init kvcache * [patch]seprate sparse patch (#417) seprate spase patch Co-authored-by: lijiachen19 <lijiachen19@huawei.com> * [bugfix]Support tensor parallelism across servers (#420) Support tensor parallelism across servers * [Feat] UCM supports metrics display online via Grafana and Promethues (#414) * [Feat] Build metrics frame * [Feat]add metrics(ucm_obser.py + metrics_configs.yaml) * [Feat] Implementation of metrics logger on the C++ side for storing and retrieving stats * [Fix] Provide simple grafana and fix bugs * [feat] change the log position of UCM metrics * [fix]modify grafana.json * [Feat] UCM supports metrics display online via Grafana and Promethues * [Fix] Remove configs to examples and add liscense --------- Co-authored-by: flesher0813 <1208954694@qq.com> Co-authored-by: hero<tianxuehan@huawei.com> * [feat]Merge develop to dev-ucm-v1 and fix code style (#428) * [fix] fix sparse attention (#397) fix ascend attention Co-authored-by: lijiachen19 <lijiachen19@huawei.com> * [opt] Share Infra implementation and unify status codes (#399) share infra module Co-authored-by: Fang Run <Fang_Run@126.com> * [bugfix] Fix ESA to be compatible with the latest NFSStore. (#401) fix esa to adapt latest NFSStore * release v0.1.0rc4 (#402) Co-authored-by: lijiachen19 <lijiachen19@huawei.com> * [opt] Remove unused cc impl of dramstore (#406) remove unused cc impl of dramstore * [Fix]remove dram docs and modify quick-start doc (#411) * [Fix]remove dram docs and modify quick-start doc * modify index.md --------- Co-authored-by: t00939662 <tianxuehan@huawei.com> * [Feature] Added performance testing tool based on the PyTest testing framework (#295) Performance testing tool based on the PyTest testing framework. * [Misc] Add cpp-linter.yml (#422) * [docs]add metrics doc (#416) * [docs]add metrics doc * modify metrics.md * modify metrics.md --------- Co-authored-by: t00939662 <tianxuehan@huawei.com> * [perf] Modify CUDA SIMD and add Triton hash encoder (#408) * fix cpp code style --------- Co-authored-by: Lijiachen1018 <30387633+Lijiachen1018@users.noreply.github.com> Co-authored-by: lijiachen19 <lijiachen19@huawei.com> Co-authored-by: Mag1c.H <hemajun815@163.com> Co-authored-by: Fang Run <Fang_Run@126.com> Co-authored-by: MaxWang <wangwenxin21@huawei.com> Co-authored-by: hero0307 <tianxuehan0307@163.com> Co-authored-by: t00939662 <tianxuehan@huawei.com> Co-authored-by: ML <85485147+Menglths@users.noreply.github.com> Co-authored-by: ShiXiaolei <indirashi@163.com> * add env variable ENABLE_SPARSE (#430) Co-authored-by: lijiachen19 <lijiachen19@huawei.com> * Fix(patch): fix patch for vllm-ascend (#433) Fix(patch): fix patch for vllm-ascend volcengine/verl#2564 Co-authored-by: lijiachen19 <lijiachen19@huawei.com> * [bugfix] fix accuracy problem when chunked prefill (#438) * fix accuracy problem when chunked prefill * [bugfix]fix num_schedule-tokens=1 (#442) * fix num_schedule-tokens=1 * Simplify the code * [fix]: Fix sparse patch (#444) Fix sparse patch Co-authored-by: lijiachen <lijiachen19@huawei.com> * [bugfix] The Metrics module uses a non-existent variable self.rank (#445) * [Feature]Add an access bandwidth test script for ucm_connector (#418) * Add an access bandwidth test script for 'ucm_connector' * [bugfix]adapt vllm0.9.1 (#446) adapt vllm0.9.1 * [Fix]Set the multiprocessing start method of the test tool to 'spawn'. (#447) Set the multiprocessing start method of the test tool to 'spawn' and add NPU cleanup * [fix] Adapt all sparse-attention methods to the new connector. (#441) * sparse to adapt new connector * Adapt the YAML configuration * [docs] renew docs for v1 (#448) renew docs for v1 Co-authored-by: lijiachen19 <lijiachen19@huawei.com> * set version to 0.1.0 (#450) * [Feature] GSA adapt nfsStore (#451) * adapt nfsstore * fix codestyle --------- Co-authored-by: ygwpz <543529648@qq.com> Co-authored-by: harrisonyhq <harrisonyhq@gmail.com> Co-authored-by: qyh111 <qiuyuhao1@huawei.com> Co-authored-by: lijiachen19 <lijiachen19@huawei.com> Co-authored-by: sumingZero <58885253+sumingZero@users.noreply.github.com> Co-authored-by: flesher0813 <1208954694@qq.com> Co-authored-by: Mag1c.H <hemajun815@163.com> Co-authored-by: Fang Run <Fang_Run@126.com> Co-authored-by: MaxWang <wangwenxin21@huawei.com> Co-authored-by: hero0307 <tianxuehan0307@163.com> Co-authored-by: t00939662 <tianxuehan@huawei.com> Co-authored-by: ML <85485147+Menglths@users.noreply.github.com> Co-authored-by: ShiXiaolei <indirashi@163.com> Co-authored-by: zhou-haitao <74044944+zhou-haitao@users.noreply.github.com> Co-authored-by: zbb200819 <1130072360@qq.com>

wangwenxin0312 requested review from harrisonyhq, hek14, leideng, mag1c-h, pengwwang, saki-daisuki, summer-ai007, wuhuxiao, xwLearnsLLM and ygwpz as code owners December 1, 2025 01:43

wangwenxin0312 changed the base branch from develop to dev-ucm-v1 December 1, 2025 03:45

wangwenxin0312 requested review from FangRun2, Tarrei, Wwwzff, flesher0813 and qyh111 as code owners December 1, 2025 03:45

wangwenxin0312 force-pushed the dev_sparse_fix branch 2 times, most recently from eea1826 to f1e807a Compare December 1, 2025 04:51

wangwenxin0312 added 2 commits December 1, 2025 19:40

sparse to adapt new connector

343e809

Adapt the YAML configuration

6276613

wangwenxin0312 force-pushed the dev_sparse_fix branch from 5e89688 to 6276613 Compare December 1, 2025 11:41

hek14 approved these changes Dec 1, 2025

View reviewed changes

hek14 merged commit aff412a into ModelEngine-Group:dev-ucm-v1 Dec 1, 2025
4 checks passed

Lijiachen1018 pushed a commit to Lijiachen1018/unified-cache-management that referenced this pull request Dec 1, 2025

[fix] Adapt all sparse-attention methods to the new connector. (Model…

7e95a84

…Engine-Group#441) * sparse to adapt new connector * Adapt the YAML configuration

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[fix] Adapt all sparse-attention methods to the new connector. #441

[fix] Adapt all sparse-attention methods to the new connector. #441

Uh oh!

wangwenxin0312 commented Dec 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[fix] Adapt all sparse-attention methods to the new connector. #441

[fix] Adapt all sparse-attention methods to the new connector. #441

Uh oh!

Conversation

wangwenxin0312 commented Dec 1, 2025

Purpose

Modifications

Test

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants