Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
<p align="center">
<picture>
<source media="(prefers-color-scheme: dark)" srcset="docs/source/logos/UCM.png">
<img alt="UCM" src="docs/source/logos/UCM.png" width=70%>
<img alt="UCM" src="docs/source/logos/UCM-light.png" width=50%>
</picture>
</p>

Expand Down
4 changes: 4 additions & 0 deletions docs/source/_static/css/logo.css
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
.navbar-brand img {
max-width: 180px;
height: auto;
}
11 changes: 7 additions & 4 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,14 +39,19 @@

html_title = project
html_theme = "sphinx_book_theme"
html_logo = "logos/UCM.png"
html_static_path = ["_static"]
html_css_files = ["css/logo.css"]
html_theme_options = {
"path_to_docs": "docs/source",
"repository_url": "https://github.com/ModelEngine-Group/unified-cache-management",
"use_repository_button": True,
"use_edit_page_button": True,
"logo": {
"image_light": "logos/UCM-light.png",
"image_dark": "logos/UCM-dark.png",
"alt_text": "UCM",
},
}

import os
import shutil

Expand All @@ -64,6 +69,4 @@ def setup(app):
app.connect("build-finished", copy_images)


# html_static_path = ['_static']

# language = 'zh_CN'
12 changes: 0 additions & 12 deletions docs/source/developer/index.md

This file was deleted.

Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Contributing
# How to contribute
## Building and testing
It’s recommended to set up a local development environment to build and test before you submit a PR.
### Run lint locally
Expand Down
12 changes: 12 additions & 0 deletions docs/source/developer_guide/design/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# Design
This section provides a detailed design guide of UCM features. Developers can refer to this section to see how UCM works.

:::{toctree}
:caption: Design Index
:maxdepth: 1
architecture.md
add_connector.md
nfs_connector.md
vllm_institution.md
:::

Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ This doc shows how the UcmKVStoreBase work with KV connector api in v1 to suppor
## How it works
As you can see in the README part, the KVStoreBase helps decoupling sparse algorithms and external storage, a class that inherits from KVConnectorBase_V1 named UnifiedCacheConnectorV1 facilitates the connection between vLLM v1 and this class, the The figure below shows how it worked:

![uc_connector](../images/ucconn_ucmconn.png)(../images/ucconn_ucmconn.png)
![uc_connector](../../images/ucconn_ucmconn.png)(../../images/ucconn_ucmconn.png)

The interfaces designed in KVStoreBase are similar to the KV connector API in v1, which are divided into scheduler-side methods and worker-side methods, as follows:
- scheduler methods
Expand Down
8 changes: 8 additions & 0 deletions docs/source/developer_guide/performance/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Performance
This section provides methods to test UCM performance.

:::{toctree}
:caption: Performance
:maxdepth: 1
performance_benchmark
:::
11 changes: 0 additions & 11 deletions docs/source/feature/index.md

This file was deleted.

9 changes: 0 additions & 9 deletions docs/source/getting-started/index.md

This file was deleted.

10 changes: 10 additions & 0 deletions docs/source/getting-started/installation/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# Installation
UCM supports the following hardware platforms:

:::{toctree}
:maxdepth: 1
:caption: Index
installation.md
installation_npu.md
:::

Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Installation
# GPU
This document describes how to install unified-cache-management.

## Requirements
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Installation NPU
# NPU
This document describes how to install unified-cache-management when using Ascend NPU manually.

## Requirements
Expand Down
1 change: 1 addition & 0 deletions docs/source/getting-started/quick_start.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# Quickstart
37 changes: 30 additions & 7 deletions docs/source/index.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
# Welcome to Unified Cache Manager

:::{figure} ./logos/UCM.png
:::{figure} ./logos/UCM-light.png
:align: center
:alt: UCM
:class: no-scaled-link
:width: 70%
:width: 50%
:::

:::{raw} html
Expand All @@ -23,11 +23,34 @@

Make KVCache Great Again!

## Documentation

:::{toctree}
:caption: Getting Started
:maxdepth: 1
getting-started/quick_start
getting-started/installation/index
:::

:::{toctree}
:maxdepth: 3
getting-started/index.md
feature/index.md
developer/index.md
about.md
:caption: User Guide
:maxdepth: 1
user_guide/support_matrix/index
user_guide/features/index
user_guide/connector_guide/index
user_guide/engine_guide/index
:::

:::{toctree}
:caption: Developer Guide
:maxdepth: 1
developer_guide/design/index
developer_guide/contributing
developer_guide/performance/index
:::

:::{toctree}
:caption: About Us
:maxdepth: 1
about
:::
Binary file added docs/source/logos/UCM-dark.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/logos/UCM-light.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed docs/source/logos/UCM.png
Binary file not shown.
2 changes: 2 additions & 0 deletions docs/source/user_guide/connector_guide/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# Connector Guide
This section provides a guide to connectors currently supported by UCM.
2 changes: 2 additions & 0 deletions docs/source/user_guide/engine_guide/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# Engine Guide
This section provides a guide to the serving engines currently supported by UCM. Users can refer to this guide to use serving engines with UCM.
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ CUDA_VISIBLE_DEVICES=1 vllm serve /home/models/Qwen2.5-7B-Instruct \
Make sure prefill nodes and decode nodes can connect to each other.
```bash
cd vllm-workspace/unified-cache-management/test/
python3 toy_proxy_server.py --host localhost --port 7802 --prefiller-host <prefill-node-ip> --prefiller-port 7800 --decoder-host <prefill-node-ip> --decoder-port 7801
python3 toy_proxy_server.py --host localhost --port 7802 --prefiller-host <prefill-node-ip> --prefiller-port 7800 --decoder-host <decode-node-ip> --decoder-port 7801
```

## Testing and Benchmarking
Expand All @@ -80,7 +80,7 @@ curl http://localhost:7802/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "/home/models/Qwen2.5-7B-Instruct",
"prompt": "content": "What date is today?",
"prompt": "What date is today?",
"max_tokens": 20,
"temperature": 0
}'
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -133,7 +133,7 @@ curl http://localhost:7805/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "/home/models/Qwen2.5-7B-Instruct",
"prompt": "content": "What date is today?",
"prompt": "What date is today?",
"max_tokens": 20,
"temperature": 0
}'
Expand Down
13 changes: 13 additions & 0 deletions docs/source/user_guide/features/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Feature Guide
This section provides a detailed usage guide of UCM features.

:::{toctree}
:maxdepth: 1
:caption: Feature Index
prefix_cache.md
pd.md
rag_cache.md
sparse/index.md
store/index.md
:::

File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,12 @@
Attention mechanisms, especially in LLMs, are often the bottleneck in terms of latency during inference due to their computational complexity. Despite their importance in capturing contextual relationships, traditional attention requires processing all token interactions, leading to significant delays.

<p align="center">
<img alt="UCM" src="../images/attention_overhead.png" width="80%">
<img alt="UCM" src="../../../images/attention_overhead.png" width="80%">
</p>

Researchers have found that attention in LLM is highly dispersed:
<p align="center">
<img alt="UCM" src="../images/attention_sparsity.png" width="80%">
<img alt="UCM" src="../../../images/attention_sparsity.png" width="80%">
</p>

This movitates them actively developing sparse attention algorithms to address the latency issue. These algorithms aim to reduce the number of token interactions by focusing only on the most relevant parts of the input, thereby lowering the computation and memory requirements.
Expand All @@ -24,7 +24,7 @@ By utilizing UCM, researchers can efficiently implement rapid prototyping and te
### Overview
The core concept of our UCMSparse attention framework is to offload the complete Key-Value (KV) cache to a dedicated KV cache storage. We then identify the crucial KV pairs relevant to the current context, as determined by our sparse attention algorithms, and selectively load only the necessary portions of the KV cache from storage into High Bandwidth Memory (HBM). This design significantly reduces the HBM footprint while accelerating generation speed.
<p align="center">
<img alt="UCM" src="../images/sparse_attn_arch.png" width="80%">
<img alt="UCM" src="../../../images/sparse_attn_arch.png" width="80%">
</p>


Expand Down
1 change: 1 addition & 0 deletions docs/source/user_guide/features/sparse/cacheblend.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# Cache Blend
1 change: 1 addition & 0 deletions docs/source/user_guide/features/sparse/esa.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# Esa
1 change: 1 addition & 0 deletions docs/source/user_guide/features/sparse/gsa.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# Gsa
13 changes: 13 additions & 0 deletions docs/source/user_guide/features/sparse/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Sparse

:::{toctree}
:maxdepth: 1
:caption: Index
Base <base>
esa
gsa
kvcomp
kvstar
prefill_offload
cacheblend
:::
1 change: 1 addition & 0 deletions docs/source/user_guide/features/sparse/kvcomp.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# KV Comp
1 change: 1 addition & 0 deletions docs/source/user_guide/features/sparse/kvstar.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# KVStar
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# Prefill Offload
1 change: 1 addition & 0 deletions docs/source/user_guide/features/store/3fs_store.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# 3fs Store
1 change: 1 addition & 0 deletions docs/source/user_guide/features/store/base.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# Base
1 change: 1 addition & 0 deletions docs/source/user_guide/features/store/dram_store.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# Dram Store
13 changes: 13 additions & 0 deletions docs/source/user_guide/features/store/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Store

:::{toctree}
:maxdepth: 1
:caption: Index
base
3fs_store
dram_store
vfs_store
nfs_store
nds_store
mooncake_store
:::
1 change: 1 addition & 0 deletions docs/source/user_guide/features/store/mooncake_store.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# Mooncake Store
1 change: 1 addition & 0 deletions docs/source/user_guide/features/store/nds_store.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# Nds Store
1 change: 1 addition & 0 deletions docs/source/user_guide/features/store/nfs_store.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# Nfs Store
1 change: 1 addition & 0 deletions docs/source/user_guide/features/store/vfs_store.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# Vfs Store
8 changes: 8 additions & 0 deletions docs/source/user_guide/support_matrix/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Support Matrix
This section provides support matrices of UCM.

:::{toctree}
:maxdepth: 1
:caption: Support Index
support
:::