Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cherry pick to 0.2.0] PR 418 419 #423

Merged
merged 9 commits into from
Apr 7, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 22 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,12 +35,29 @@ We consider deploying deep learning inference service online to be a user-facing
<h2 align="center">Installation</h2>

We highly recommend you to run Paddle Serving in Docker, please visit [Run in Docker](https://github.com/PaddlePaddle/Serving/blob/develop/doc/RUN_IN_DOCKER.md)
```
# Run CPU Docker
docker pull hub.baidubce.com/paddlepaddle/serving:0.2.0
docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:0.2.0
docker exec -it test bash
```
```
# Run GPU Docker
nvidia-docker pull hub.baidubce.com/paddlepaddle/serving:0.2.0-gpu
nvidia-docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:0.2.0-gpu
nvidia-docker exec -it test bash
```

```shell
pip install paddle-serving-client
pip install paddle-serving-server
pip install paddle-serving-client
pip install paddle-serving-server # CPU
pip install paddle-serving-server-gpu # GPU
```

You may need to use a domestic mirror source (in China, you can use the Tsinghua mirror source, add `-i https://pypi.tuna.tsinghua.edu.cn/simple` to pip command) to speed up the download.

Client package support Centos 7 and Ubuntu 18, or you can use HTTP service without install client.

<h2 align="center">Quick Start Example</h2>

### Boston House Price Prediction model
Expand Down Expand Up @@ -128,6 +145,7 @@ curl -H "Content-Type:application/json" -X POST -d '{"words": "我爱北京天
- **Description**:
``` shell
Image classification trained with Imagenet dataset. A label and corresponding probability will be returned.
Note: This demo needs paddle-serving-server-gpu.
```

- **Download Servable Package**:
Expand Down Expand Up @@ -243,6 +261,8 @@ curl -H "Content-Type:application/json" -X POST -d '{"url": "https://paddle-serv

### About Efficiency
- [How to profile Paddle Serving latency?](python/examples/util)
- [How to optimize performance?(Chinese)](doc/MULTI_SERVICE_ON_ONE_GPU_CN.md)
- [Deploy multi-services on one GPU(Chinese)](doc/PERFORMANCE_OPTIM_CN.md)
- [CPU Benchmarks(Chinese)](doc/BENCHMARKING.md)
- [GPU Benchmarks(Chinese)](doc/GPU_BENCHMARKING.md)

Expand Down
22 changes: 21 additions & 1 deletion README_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,11 +37,28 @@ Paddle Serving 旨在帮助深度学习开发者轻易部署在线预测服务

强烈建议您在Docker内构建Paddle Serving,请查看[如何在Docker中运行PaddleServing](doc/RUN_IN_DOCKER_CN.md)

```
# 启动 CPU Docker
docker pull hub.baidubce.com/paddlepaddle/serving:0.2.0
docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:0.2.0
docker exec -it test bash
```
```
# 启动 GPU Docker
nvidia-docker pull hub.baidubce.com/paddlepaddle/serving:0.2.0-gpu
nvidia-docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:0.2.0-gpu
nvidia-docker exec -it test bash
```
```shell
pip install paddle-serving-client
pip install paddle-serving-server
pip install paddle-serving-server # CPU
pip install paddle-serving-server-gpu # GPU
```

您可能需要使用国内镜像源(例如清华源, 在pip命令中添加`-i https://pypi.tuna.tsinghua.edu.cn/simple`)来加速下载。

客户端安装包支持Centos 7和Ubuntu 18,或者您可以使用HTTP服务,这种情况下不需要安装客户端。

<h2 align="center">快速启动示例</h2>

<h3 align="center">波士顿房价预测</h3>
Expand Down Expand Up @@ -167,6 +184,7 @@ curl -H "Content-Type:application/json" -X POST -d '{"words": "我爱北京天
- **介绍**:
``` shell
图像分类模型由Imagenet数据集训练而成,该服务会返回一个标签及其概率
注意:本示例需要安装paddle-serving-server-gpu
```

- **下载服务包**:
Expand Down Expand Up @@ -283,6 +301,8 @@ curl -H "Content-Type:application/json" -X POST -d '{"url": "https://paddle-serv

### 关于Paddle Serving性能
- [如何测试Paddle Serving性能?](python/examples/util/)
- [如何优化性能?](doc/MULTI_SERVICE_ON_ONE_GPU_CN.md)
- [在一张GPU上启动多个预测服务](doc/PERFORMANCE_OPTIM_CN.md)
- [CPU版Benchmarks](doc/BENCHMARKING.md)
- [GPU版Benchmarks](doc/GPU_BENCHMARKING.md)

Expand Down
14 changes: 14 additions & 0 deletions doc/MULTI_SERVICE_ON_ONE_GPU_CN.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# 单卡多模型预测服务

当客户端发送的请求数并不频繁的情况下,会造成服务端机器计算资源尤其是GPU资源的浪费,这种情况下,可以在服务端启动多个预测服务来提高资源利用率。Paddle Serving支持在单张显卡上部署多个预测服务,使用时只需要在启动单个服务时通过--gpu_ids参数将服务与显卡进行绑定,这样就可以将多个服务都绑定到同一张卡上。

例如:

```shell
python -m paddle_serving_server_gpu.serve --model bert_seq20_model --port 9292 --gpu_ids 0
python -m paddle_serving_server_gpu.serve --model ResNet50_vd_model --port 9393 --gpu_ids 0
```

在卡0上,同时部署了bert示例和iamgenet示例。

**注意:** 单张显卡内部进行推理计算时仍然为串行计算,这种方式是为了减少server端显卡的空闲时间。
13 changes: 13 additions & 0 deletions doc/PERFORMANCE_OPTIM_CN.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# 性能优化

由于模型结构的不同,在执行预测时不同的预测对计算资源的消耗也不相同,对于在线的预测服务来说,对计算资源要求较少的模型,通信的时间成本占比就会较高,称为通信密集型服务,对计算资源要求较多的模型,推理计算的时间成本较高,称为计算密集型服务。对于这两种服务类型,可以根据实际需求采取不同的方式进行优化

对于一个预测服务来说,想要判断属于哪种类型,最简单的方法就是看时间占比,Paddle Serving提供了[Timeline工具](../python/examples/util/README_CN.md),可以直观的展现预测服务中各阶段的耗时。

对于通信密集型的预测服务,可以将请求进行聚合,在对延时可以容忍的限度内,将多个预测请求合并成一个batch进行预测。

对于计算密集型的预测服务,可以使用GPU预测服务代替CPU预测服务,或者增加GPU预测服务的显卡数量。

在相同条件下,Paddle Serving提供的HTTP预测服务的通信时间是大于RPC预测服务的,因此对于通信密集型的服务请优先考虑使用RPC的通信方式。

对于模型较大,预测服务内存或显存占用较多的情况,可以通过将--mem_optim选项设置为True来开启内存/显存优化。
4 changes: 2 additions & 2 deletions python/examples/util/show_profile.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,15 +10,15 @@
def prase(line):
profile_list = line.split(" ")
num = len(profile_list)
for idx in range(num / 2):
for idx in range(int(num / 2)):
profile_0_list = profile_list[idx * 2].split(":")
profile_1_list = profile_list[idx * 2 + 1].split(":")
if len(profile_0_list[0].split("_")) == 2:
name = profile_0_list[0].split("_")[0]
else:
name = profile_0_list[0].split("_")[0] + "_" + profile_0_list[
0].split("_")[1]
cost = long(profile_1_list[1]) - long(profile_0_list[1])
cost = int(profile_1_list[1]) - int(profile_0_list[1])
if name not in time_dict:
time_dict[name] = cost
else:
Expand Down