[Roadmap] UCM Roadmap Q4 2025

UCM aims to accelerate reasoning for long sequences, encompassing table lookup instead of KV computation in the Prefill phase, sparsification in the Decode phase, and a PD (Prefill-Decode) disaggregated architecture centered on KVCache for large-scale scenarios.

The first version of UCM has achieved the basic goal of sparsification acceleration for long sequences and successfully implemented a heterogeneous PD Disaggregation example. In Q4, we will successively release long-sequence inference acceleration features to further enhance inference performance, reduce inference costs, and address issues such as long sequences being "unable to be inferred" or "slow to be inferred".

### Core
- [ ] CacheBlend
- [ ] Prefill KVCache Offload
- [ ] Model Window Extrapolation
- [ ] Sparse
  - [ ] DSA
  - [ ] GSA Optimization
  - [ ] KVComp Optimization
  - [ ] KVStar Optimization
- [ ] PD Disaggregation
  - [ ] Heterogeneous Optimization
  - [ ] PD Scheduler
- [ ] Store
  - [ ] Scatter Gather IO
  - [ ] GPU Direct Storage
  - [ ] NPU Direct Storage
  - [ ] localCacheStore

### Others
- [ ] Docs Optimization
- [ ] Benchmark
  - [ ] Mooncake Trace and more dataset for PD test
  - [ ] benchmark for sparse performance and accuracy


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Roadmap] UCM Roadmap Q4 2025 #78

Core

Others

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Roadmap] UCM Roadmap Q4 2025 #78

Description

Core

Others

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions