Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
55 changes: 55 additions & 0 deletions docs/source/getting-started/installation.md
Original file line number Diff line number Diff line change
@@ -1 +1,56 @@
# Installation
This document describes how to install unified-cache-management.

## Requirements
- OS: Linux
- Python: >= 3.9, < 3.12
- GPU: compute capability 8.0 or higher (e.g., V100, T4, RTX20xx, A100, L4, H100, etc.)
- CUDA 12.8+

You have 2 ways to install for now:
- Setup from code: First, prepare vLLM environment, then install unified-cache-management from source code.
- Setup from docker: use the unified-cache-management docker image directly.

## Setup from code

### Prepare vLLM Environment
For the sake of environment isolation and simplicity, we recommend preparing the vLLM environment by pulling the official, pre-built vLLM Docker image.
```bash
docker pull vllm/vllm-openai:v0.9.2
```
Use the following command to run your own container:
```bash
# Use `--ipc=host` to make sure the shared memory is large enough.
docker run \
--gpus all \
--network=host \
--ipc=host \
-v <path_to_your_models>:/app/model \
-v <path_to_your_storage>:/app/storage \
--name <name_of_your_container> \
-it vllm/vllm-openai:v0.9.2 bash
```
Refer to [Set up using docker](https://docs.vllm.ai/en/latest/getting_started/installation/gpu.html#set-up-using-docker) for more information to run your own vLLM container. After installation, please refer to this [issue](https://github.com/vllm-project/vllm/issues/21702) in order to use the uc_connector provided by unified-cache-management.

### Build from source code
Follow commands below to install unified-cache-management:
```bash
git clone --depth 1 --branch develop https://github.com/ModelEngine-Group/unified-cache-management.git
cd unified-cache-management
pip install -v -e .
cd ..
```

## Setup from docker
Download the pre-built docker image provided. Then run your container using following command. You can add or remove Docker parameters as needed.
```bash
# Use `--ipc=host` to make sure the shared memory is large enough.
docker run --rm \
--gpus all \
--network=host \
--ipc=host \
-v <path_to_your_models>:/app/model \
-v <path_to_your_storage>:/app/storage \
--name <name_of_your_container> \
-it <image_id>
```
71 changes: 71 additions & 0 deletions docs/source/getting-started/installation_npu.md
Original file line number Diff line number Diff line change
@@ -1 +1,72 @@
# Installation NPU
This document describes how to install unified-cache-management when using Ascend NPU manually.

## Requirements
- OS: Linux
- Python: >= 3.9, < 3.12
- A hardware with Ascend NPU. It’s usually the Atlas 800 A2 series.

The current version of unified-cache-management based on vLLM-Ascend v0.9.2rc1, refer to [vLLM-Ascend Installation Requirements](https://vllm-ascend.readthedocs.io/en/latest/installation.html#requirements) to meet the requirements.

You have 2 ways to install for now:
- Setup from code: First, prepare vLLM-Ascend environment, then install unified-cache-management from source code.
- Setup from docker: use the unified-cache-management docker image directly.

## Setup from code

### Prepare vLLM-Ascend Environment
For the sake of environment isolation and simplicity, we recommend preparing the vLLM-Ascend environment by pulling the official, pre-built vLLM-Ascend Docker image.
```bash
docker pull quay.io/ascend/vllm-ascend:v0.9.2rc1
```
Use the following command to run your own container:
```bash
# Update DEVICE according to your device (/dev/davinci[0-7])
export DEVICE=/dev/davinci7
# Update the vllm-ascend image
export IMAGE=quay.io/ascend/vllm-ascend:v0.9.2rc1
docker run --rm \
--name vllm-ascend-env \
--device $DEVICE \
--device /dev/davinci_manager \
--device /dev/devmm_svm \
--device /dev/hisi_hdc \
-v /usr/local/dcmi:/usr/local/dcmi \
-v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
-v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \
-v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \
-v /etc/ascend_install.info:/etc/ascend_install.info \
-v /root/.cache:/root/.cache \
-it $IMAGE bash
```
Codes of vLLM and vLLM Ascend are placed in /vllm-workspace, you can refer to [vLLM-Ascend Installation](https://vllm-ascend.readthedocs.io/en/latest/installation.html) for more information. After installation, please refer to these [vllm-issue](https://github.com/vllm-project/vllm/issues/21702) and [vllm-ascend-issue](https://github.com/vllm-project/vllm-ascend/issues/2057) in order to use the uc_connector provided by unified-cache-management.

### Build from source code
Follow commands below to install unified-cache-management:
```bash
git clone --depth 1 --branch develop https://github.com/ModelEngine-Group/unified-cache-management.git
cd unified-cache-management
pip install -v -e .
cd ..
```

## Setup from docker
Download the pre-built docker image provided. Then run your container using following command. You can add or remove Docker parameters as needed.
```bash
# Use `--ipc=host` to make sure the shared memory is large enough.
docker run --rm \
--device $DEVICE \
--device /dev/davinci_manager \
--device /dev/devmm_svm \
--device /dev/hisi_hdc \
-v /usr/local/dcmi:/usr/local/dcmi \
-v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
-v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \
-v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \
-v /etc/ascend_install.info:/etc/ascend_install.info \
-v /root/.cache:/root/.cache \
-v <path_to_your_models>:/app/model \
-v <path_to_your_storage>:/app/storage \
--name <name_of_your_container> \
-it <image_id> bash
```