Skip to content

Conversation

@FangRun2
Copy link
Contributor

Purpose

Add pcstore for enhanced PrefixCache performance

Modifications

  • Data is paged out to Host and asynchronously written to SSD, freeing HBM space earlier.
  • Reads and writes are aggregated at Block-level granularity to increase SSD I/O size and improve performance.
  • For MLA models, data is loaded once from SSD and shared across Devices in DRAM, reducing SSD bandwidth pressure.

Test

  • ucm/store/test/e2e/pcstore_embed.py
  • ucm/store/test/e2e/pcstore_fetch.py

@mag1c-h
Copy link
Contributor

mag1c-h commented Nov 22, 2025

LGTM

@mag1c-h mag1c-h merged commit b6e5f62 into ModelEngine-Group:develop Nov 22, 2025
3 checks passed
@mag1c-h mag1c-h deleted the dev_pcstore branch November 22, 2025 07:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants