Skip to content

[Roadmap] UCM Roadmap Q4 2025 #78

@ygwpz

Description

@ygwpz

UCM aims to accelerate reasoning for long sequences, encompassing table lookup instead of KV computation in the Prefill phase, sparsification in the Decode phase, and a PD (Prefill-Decode) disaggregated architecture centered on KVCache for large-scale scenarios.

The first version of UCM has achieved the basic goal of sparsification acceleration for long sequences and successfully implemented a heterogeneous PD Disaggregation example. In Q4, we will successively release long-sequence inference acceleration features to further enhance inference performance, reduce inference costs, and address issues such as long sequences being "unable to be inferred" or "slow to be inferred".

Core

  • CacheBlend
  • Prefill KVCache Offload
  • Model Window Extrapolation
  • Sparse
    • DSA
    • GSA Optimization
    • KVComp Optimization
    • KVStar Optimization
  • PD Disaggregation
    • Heterogeneous Optimization
    • PD Scheduler
  • Store
    • Scatter Gather IO
    • GPU Direct Storage
    • NPU Direct Storage
    • localCacheStore

Others

  • Docs Optimization
  • Benchmark
    • Mooncake Trace and more dataset for PD test
    • benchmark for sparse performance and accuracy

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions