Skip to content

Conversation

@wangwenxin0312
Copy link
Contributor

@wangwenxin0312 wangwenxin0312 commented Sep 25, 2025

Purpose

What this PR does / why we need it?
Adds asynchronous retrieval and loading support to ESA with NfsStore adaptation, enabling sparse computation during the decode stage and improving tpot performance.

Modifications

Does this PR introduce any user-facing change?
unified-cache-management/ucm/ucm_sparse/retrieval: C++ implementation of retrieval
unified-cache-management/ucm/ucm_sparse/esa.py: asynchronous implementation of Example Sparse Attention (ESA), works out of the box

Test

How was this patch tested?

Tested with the NFS connector, enabling the ESA feature.
image
image

@wangwenxin0312 wangwenxin0312 force-pushed the dev_esa_v2 branch 4 times, most recently from 1f81086 to 8052026 Compare September 25, 2025 13:16
@hek14 hek14 merged commit a902a01 into ModelEngine-Group:develop Sep 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants