Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 11 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,11 +50,9 @@ And the request throughput of TurboMind is 30% higher than vLLM.

### Installation

Below are quick steps for installation:
Install lmdeploy with pip ( python 3.8+) or [from source](./docs/en/build.md)

```shell
conda create -n lmdeploy python=3.10 -y
conda activate lmdeploy
pip install lmdeploy
```

Expand Down Expand Up @@ -92,7 +90,15 @@ python -m lmdeploy.turbomind.chat ./workspace
> **Note**<br />
> Tensor parallel is available to perform inference on multiple GPUs. Add `--tp=<num_gpu>` on `chat` to enable runtime TP.

#### Serving
#### Serving with gradio

```shell
python3 -m lmdeploy.serve.gradio.app ./workspace
```

![](https://github.com/InternLM/lmdeploy/assets/67539920/08d1e6f2-3767-44d5-8654-c85767cec2ab)

#### Serving with Triton Inference Server

Launch inference server by:

Expand All @@ -109,11 +115,9 @@ python3 -m lmdeploy.serve.client {server_ip_addresss}:33337
or webui,

```shell
python3 -m lmdeploy.app {server_ip_addresss}:33337
python3 -m lmdeploy.serve.gradio.app {server_ip_addresss}:33337
```

![](https://github.com/InternLM/lmdeploy/assets/67539920/08d1e6f2-3767-44d5-8654-c85767cec2ab)

For the deployment of other supported models, such as LLaMA, LLaMA-2, vicuna and so on, you can find the guide from [here](docs/en/serving.md)

### Inference with PyTorch
Expand Down
18 changes: 12 additions & 6 deletions README_zh-CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,9 +51,9 @@ TurboMind 的 output token throughput 超过 2000 token/s, 整体比 DeepSpeed

### 安装

使用 pip ( python 3.8+) 安装 LMDeploy,或者[源码安装](./docs/zh_cn/build.md)

```shell
conda create -n lmdeploy python=3.10 -y
conda activate lmdeploy
pip install lmdeploy
```

Expand Down Expand Up @@ -90,7 +90,15 @@ python3 -m lmdeploy.turbomind.chat ./workspace
> **Note**<br />
> 使用 Tensor 并发可以利用多张 GPU 进行推理。在 `chat` 时添加参数 `--tp=<num_gpu>` 可以启动运行时 TP。

#### 部署推理服务
#### 启动 gradio server

```shell
python3 -m lmdeploy.serve.gradio.app ./workspace
```

![](https://github.com/InternLM/lmdeploy/assets/67539920/08d1e6f2-3767-44d5-8654-c85767cec2ab)

#### 通过容器部署推理服务

使用下面的命令启动推理服务:

Expand All @@ -107,11 +115,9 @@ python3 -m lmdeploy.serve.client {server_ip_addresss}:33337
也可以通过 WebUI 方式来对话:

```shell
python3 -m lmdeploy.app {server_ip_addresss}:33337
python3 -m lmdeploy.serve.gradio.app {server_ip_addresss}:33337
```

![](https://github.com/InternLM/lmdeploy/assets/67539920/08d1e6f2-3767-44d5-8654-c85767cec2ab)

其他模型的部署方式,比如 LLaMA,LLaMA-2,vicuna等等,请参考[这里](docs/zh_cn/serving.md)

### 基于 PyTorch 的推理
Expand Down
26 changes: 26 additions & 0 deletions docs/en/build.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
## Build from source

- make sure local gcc version no less than 9, which can be conformed by `gcc --version`.
- install packages for compiling and running:
```shell
pip install -r requirements.txt
```
- install [nccl](https://docs.nvidia.com/deeplearning/nccl/install-guide/index.html), set environment variables:
```shell
export NCCL_ROOT_DIR=/path/to/nccl/build
export NCCL_LIBRARIES=/path/to/nccl/build/lib
```
- install rapidjson
- install openmpi, installing from source is recommended.
```shell
wget https://download.open-mpi.org/release/open-mpi/v4.1/openmpi-4.1.5.tar.gz
tar -xzf openmpi-*.tar.gz && cd openmpi-*
./configure --with-cuda
make -j$(nproc)
make install
```
- build and install lmdeploy:
```shell
mkdir build && cd build
sh ../generate.sh
```
26 changes: 26 additions & 0 deletions docs/zh_cn/build.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
### 源码安装

- 确保物理机环境的 gcc 版本不低于 9,可以通过`gcc --version`确认。
- 安装编译和运行依赖包:
```shell
pip install -r requirements.txt
```
- 安装 [nccl](https://docs.nvidia.com/deeplearning/nccl/install-guide/index.html),设置环境变量
```shell
export NCCL_ROOT_DIR=/path/to/nccl/build
export NCCL_LIBRARIES=/path/to/nccl/build/lib
```
- rapidjson 安装
- openmpi 安装, 推荐从源码安装:
```shell
wget https://download.open-mpi.org/release/open-mpi/v4.1/openmpi-4.1.5.tar.gz
tar -xzf openmpi-*.tar.gz && cd openmpi-*
./configure --with-cuda
make -j$(nproc)
make install
```
- lmdeploy 编译安装:
```shell
mkdir build && cd build
sh ../generate.sh
```
169 changes: 0 additions & 169 deletions lmdeploy/app.py

This file was deleted.

1 change: 1 addition & 0 deletions lmdeploy/serve/gradio/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# Copyright (c) OpenMMLab. All rights reserved.
Loading