Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
94356b5
init repo and add intergration
flesher0813 Jul 24, 2025
a6b9f08
remove impl and fix dump failed issue
flesher0813 Jul 25, 2025
b971f91
Merge pull request #11 from flesher0813/develop
flesher0813 Jul 25, 2025
aa6cb83
adapt vllm v0.9.2
flesher0813 Jul 26, 2025
c70fe56
Merge pull request #13 from flesher0813/develop
flesher0813 Jul 26, 2025
ec32418
init doc
ygwpz Jul 27, 2025
ef46630
add build doc readme and requirements
ygwpz Jul 28, 2025
516d3b8
Merge pull request #15 from ygwpz/develop
ygwpz Jul 28, 2025
c213e94
remove impl test and add uc connector test (#14)
flesher0813 Jul 28, 2025
95413be
[Feat] Add ucm dram connector and test
harrisonyhq Jul 28, 2025
2722d73
[Doc] Installation of ucm (#17)
flesher0813 Jul 29, 2025
8bed8dc
[style] Add constant MB_TO_BYTE, add default max_cache_size as 5Gb.
harrisonyhq Jul 29, 2025
f004105
[Doc] Add doc for dram connector
harrisonyhq Jul 29, 2025
1166362
[Feat] Add some arguments in config for ucm_dram
harrisonyhq Jul 29, 2025
d5f0166
[Doc] Fix some typo in document
harrisonyhq Jul 29, 2025
c9ecc5c
[Feat] Add device support for CUDA devices
harrisonyhq Jul 29, 2025
aea666c
[Fix] fix some bugs
harrisonyhq Jul 29, 2025
562ea5a
[Style] Remove MB_TO_BYTE, change config max_cache_size from MB to byte
harrisonyhq Jul 29, 2025
dfe3e14
Merge pull request #18 from harrisonyhq/develop
ygwpz Jul 30, 2025
90945f7
[Feat] Add dockerfiles
flesher0813 Jul 30, 2025
da7b0fd
[Feature] nfsstore
propanone1006 Jul 30, 2025
fa1cd50
add readme and license
ygwpz Jul 30, 2025
510ee07
Merge pull request #24 from ygwpz/develop
ygwpz Jul 30, 2025
77c3ace
Merge pull request #20 from flesher0813/develop
flesher0813 Jul 30, 2025
1622ba4
Merge pull request #23 from ModelEngine-Group/develop_nfsstore
ygwpz Jul 31, 2025
c71290c
change docs outline
ygwpz Jul 31, 2025
01e8f66
Merge pull request #32 from ygwpz/0.0.1-release
ygwpz Jul 31, 2025
11630f9
[Feature] Add Cmake build command in setup.py
harrisonyhq Jul 31, 2025
50e57c5
Merge pull request #34 from harrisonyhq/0.0.1-release
ygwpz Jul 31, 2025
6aa8309
[Fix bug] fix issue#25 issue#31 and issue#33
flesher0813 Aug 1, 2025
73cf03b
Merge pull request #30 from flesher0813/0.0.1-release
flesher0813 Aug 1, 2025
0f81cb5
[Fix][Docs] Make example runnable and add performance data (closes #3…
harrisonyhq Aug 1, 2025
381c832
[Feat] Move kv_block_size to config (#43)
harrisonyhq Aug 1, 2025
8e18e09
[feature][docs]finish nfs store and add docs (#44)
qyh111 Aug 1, 2025
6e03b9e
[doc] Add export of device type in installation;[Fix] fix version inv…
harrisonyhq Aug 1, 2025
b17e77f
add perf data in readme (#49)
ygwpz Aug 1, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Development Enviroment
.vscode/**
.idea/**
.git/**
**/build/**
**/output/**
.venv/**
21 changes: 21 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
The MIT License

Copyright (c) 2025 Huawei Technologies Co., Ltd. All rights reserved.

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
70 changes: 70 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
<p align="center">
<picture>
<source media="(prefers-color-scheme: dark)" srcset="docs/source/logos/UCM.png">
<img alt="UCM" src="docs/source/logos/UCM.png" width=70%>
</picture>
</p>

<p align="center">
| <a href="docs/source/index.md"><b>Documentation</b></a> | <a href="https://github.com/ModelEngine-Group/unified-cache-management/issues/16"><b>Roadmap</b></a> |
</p>

---

*Latest News* 🔥
- [2025/08/01] We are excited to announce the alpha release of Unified Cache Manager.

---

## Performance
nfs connector has reached about 4x TTFT accelerate.

![perf](docs/source/images/nfs_performance.png)

## Overview

### Motivation
With the increase of model size, the KV cache became larger and sparser, especially for long sequence requests. To reduce the GPU memory used, offload full KV to external storage and only keep partial or compressed KV in GPU memory became the popular direction. This can also reduce the GPU calculation, increase the sequence length and batch size of decoding.

Sparse KV cache have many different choices. Recently paper point out that there is no common way can fit all scenarios and all models. So better to build a common framework then different sparse algorithms can be plugin to it like KV connector for PC.

### Proposed Change
![idea](docs/source/images/idea.png)

All gray boxes are current classes in 0.9.2. Green boxes are proposed to add. Light green ones show out the future sub classes base on this framework.

SpareKVBase is the base class of different algorithms. Just like KV connector design, it will hook few places of scheduler and layer.py to allow sparse algorithms do additional load, dump and calculate sparse KV blocks.

SparseKVManager provide different KV block allocation methods for different algorithms. To keep all implementation under SpareKVBase, it will call SparseKVBase and real implementation will happen in sub class of sparse algorithms.

KVStoreBase helps decoupling sparse algorithms and external storage. It defined the methods how to talk to external storage, so any sparse algorithms can work with any external storage. Concepts here is blocks identify by ID with offset. This is not only for sparse but also naturally for prefix cache also. KVStoreConnector connect it with current KVConnectorBase_V1 to provide PC function.

NFSStore is sample implementation here provide ability to store blocks in local file system or NFS mount point in multi-server case.

LocalCachedStore can refence any store to provide local DRAM read cache layer.

---

## Quick Start
please refer to [installation](docs/source/getting-started/installation.md) and [example](docs/source/getting-started/example/dram_conn.md)。

---

## Branch Policy
Unified Cache has main branch, develop branch and release branch.
- **main**: main is the most stable branch. Only the release branch can be integrated. The tag is attached to the main branch.
- **develop**: develop is a daily development branch, new features will be merged in this branch.
- **x.x.x-release**: each time we decide to release a new version, we checkout a release branch and test on this branch, this branch only accepted [bugfix]. When the branch passed test, we merge the branch into develop and main, tag the corresponding x.x.x tag based on the main branch, and finish the release.

Usually, a commit should be ONLY first merged in the develop branch.

---

## Contributing
When you want to contribute some features to the Unified Cache Community, first fork a branch (usually develop) to your own repository, then commit in your own repository, and finally submit a pull request to the community.

---

## License

Apache License 2.0, as found in the [LICENSE](./LICENSE) file.
26 changes: 26 additions & 0 deletions docker/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# Set to other image if needed
FROM vllm/vllm-openai:v0.9.2

WORKDIR /workspace

# ReInstall vLLM for editting
RUN pip uninstall -y vllm && rm -rf /vllm-workspace/*
ARG VLLM_REPO=https://github.com/vllm-project/vllm.git
ARG VLLM_TAG=v0.9.2
RUN git clone --depth 1 $VLLM_REPO --branch $VLLM_TAG /vllm-workspace/vllm

# Set other VLLM_TARGET_DEVICE or other extra-index if needed
ENV VLLM_USE_PRECOMPILED=1
RUN VLLM_TARGET_DEVICE=cuda pip install -v -e /vllm-workspace/vllm --extra-index=https://download.pytorch.org/whl/nightly/cu128

# Install unified-cache-management
COPY . /vllm-workspace/unified-cache-management

RUN export PLATFORM="cuda" && \
pip install -v -e /vllm-workspace/unified-cache-management

# Apply patch for vLLM
RUN cd /vllm-workspace/vllm \
&& git apply /vllm-workspace/unified-cache-management/unifiedcache/patch/vllm-adapt.patch

ENTRYPOINT ["/bin/bash"]
22 changes: 22 additions & 0 deletions docker/Dockerfile-NPU
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Set to other image if needed
FROM quay.io/ascend/vllm-ascend:v0.9.2rc1

WORKDIR /workspace

# Install unified-cache-management
COPY . /vllm-workspace/unified-cache-management

RUN export PLATFORM="ascend" && \
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/Ascend/ascend-toolkit/latest/`uname -i`-linux/devlib && \
pip install -v -e /vllm-workspace/unified-cache-management

# Apply patch for vLLM
RUN cd /vllm-workspace/vllm \
&& git apply /vllm-workspace/unified-cache-management/unifiedcache/patch/vllm-adapt.patch

# Apply patch for vLLM-Ascend
RUN cd /vllm-workspace/vllm-ascend \
&& git apply /vllm-workspace/unified-cache-management/unifiedcache/patch/vllm-ascend-adapt.patch


CMD ["/bin/bash"]
20 changes: 20 additions & 0 deletions docs/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Minimal makefile for Sphinx documentation
#

# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build
SOURCEDIR = source
BUILDDIR = build

# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

.PHONY: help Makefile

# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
21 changes: 21 additions & 0 deletions docs/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# Unified Cache Manager documents

Live doc: Coming soon

## Build the docs

```bash
# Install dependencies.
pip install -r requirements-docs.txt

# Build the docs.
make clean
make html


# Open the docs with your browser
python -m http.server -d build/html/
```

Launch your browser and open:
- English version: http://localhost:8000
35 changes: 35 additions & 0 deletions docs/make.bat
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
@ECHO OFF

pushd %~dp0

REM Command file for Sphinx documentation

if "%SPHINXBUILD%" == "" (
set SPHINXBUILD=sphinx-build
)
set SOURCEDIR=source
set BUILDDIR=build

%SPHINXBUILD% >NUL 2>NUL
if errorlevel 9009 (
echo.
echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
echo.installed, then set the SPHINXBUILD environment variable to point
echo.to the full path of the 'sphinx-build' executable. Alternatively you
echo.may add the Sphinx directory to PATH.
echo.
echo.If you don't have Sphinx installed, grab it from
echo.https://www.sphinx-doc.org/
exit /b 1
)

if "%1" == "" goto help

%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
goto end

:help
%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%

:end
popd
10 changes: 10 additions & 0 deletions docs/requirements-docs.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
sphinx
sphinx-argparse
sphinx-book-theme
sphinx-copybutton
sphinx-design
sphinx-togglebutton
myst-parser
msgspec
sphinx-substitution-extensions
sphinx-intl
1 change: 1 addition & 0 deletions docs/source/about.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# About Us
52 changes: 52 additions & 0 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# Configuration file for the Sphinx documentation builder.
#
# For the full list of built-in configuration values, see the documentation:
# https://www.sphinx-doc.org/en/master/usage/configuration.html

# -- Project information -----------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information

project = 'Unified Cache Manager'
copyright = '2025, Unified Cache Manager Team'
author = 'Unified Cache Manager Team'
release = ''

# -- General configuration ---------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration

# Copy from https://github.com/vllm-project/vllm/blob/main/docs/source/conf.py
extensions = [
"sphinx.ext.napoleon",
"sphinx.ext.intersphinx",
"sphinx_copybutton",
"sphinx.ext.autodoc",
"sphinx.ext.autosummary",
"myst_parser",
"sphinxarg.ext",
"sphinx_design",
"sphinx_togglebutton",
"sphinx_substitution_extensions",
]

myst_enable_extensions = ["colon_fence", "substitution"]

# templates_path = ['_templates']
exclude_patterns = []



# -- Options for HTML output -------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output

html_title = project
html_theme = 'sphinx_book_theme'
html_logo = 'logos/UCM.png'
html_theme_options = {
'path_to_docs': 'docs/source',
'repository_url': 'https://github.com/ModelEngine-Group/unified-cache-management',
'use_repository_button': True,
'use_edit_page_button': True,
}
# html_static_path = ['_static']

# language = 'zh_CN'
1 change: 1 addition & 0 deletions docs/source/developer/add_connector.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# How To Add New Connector
1 change: 1 addition & 0 deletions docs/source/developer/architecture.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# Architecture
10 changes: 10 additions & 0 deletions docs/source/developer/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# Developer Guide

:::{toctree}
:maxdepth: 2
architecture.md
add_connector.md
nfs_connector.md
performance_benchmark.md
:::

1 change: 1 addition & 0 deletions docs/source/developer/nfs_connector.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# NFS Connector
1 change: 1 addition & 0 deletions docs/source/developer/performance_benchmark.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# Performance Benchmark
Loading