Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ernie3.0 python deploy dev #2077

Merged
merged 30 commits into from
May 12, 2022
Merged

Ernie3.0 python deploy dev #2077

merged 30 commits into from
May 12, 2022

Conversation

yeliang2258
Copy link
Contributor

PR types

Performance optimization

PR changes

Others

Description

Ernie3.0 python deploy script

@ZeyuChen ZeyuChen self-assigned this May 8, 2022
@ZeyuChen ZeyuChen self-requested a review May 8, 2022 15:31
@@ -0,0 +1,38 @@
# Ernie-3.0 Python部署说明
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

新增deploy层级

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

- onnxruntime-gpu >= 1.10.0
- paddleinference-trt
- onnx >= 1.10.0
- paddle2onnx develop版本
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

到时paddlenlp内置paddle2onnx,务必确保正式版本匹配,所以不用增加这一项。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

### 运行指令
1. CPU非量化模型
```
python infer.py --model_path tnews/pruned_fp32/float32 --device ‘cpu’
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cpu这里的符号全半角有问题
--device ‘cpu’ -> --device 'cpu'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -0,0 +1,272 @@
# C#opyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copyright有Typo

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -0,0 +1,148 @@
# C#opyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

tokenizer=tokenizer,
is_test=False)
dev_ds = dev_ds.map(trans_func, lazy=False)
batchify_fn = lambda samples, fn=Tuple(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

与liqi讨论,需要换成非batchify function的版本

@yeliang2258 yeliang2258 requested a review from ZeyuChen May 10, 2022 11:18
@@ -0,0 +1,69 @@
# Ernie-3.0 Python部署指南
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ERNIE 3.0
无横杆

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

# Ernie-3.0 Python部署指南

## 安装
Ernie-3.0的部署分为cpu和gpu两种情况,请根据你的部署环境安装对应的依赖。
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ERNIE 3.0
cpu -> CPU
gpu -> GPU

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

大小写要保持一致

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

### GPU端
在进行GPU部署之前请先确保机器已经安装好CUDA11.04和CUDNN8.2+,然后请使用如下指令安装所需依赖
```
pip install -r requirement_gpu.txt
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

requirements_cpu.txt
requirements_gpu.txt

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

带s

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

```
pip install -r requirement_gpu.txt
```
在计算能力大于7.0的GPU硬件上,比如T4,如需FP16或者Int8量化推理加速,还需安装TensorRT和PaddleInference-TRT,具体硬件和精度支持情况请参考:[GPU算力和支持精度对照表](https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-840-ea/support-matrix/index.html#hardware-precision-matrix)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

计算能力(Compute Capability)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

|--model_path | 配置包含Paddle模型的目录路径|
|--device | 配置部署设备,可选‘cpu’或者‘gpu’|
|--batch_size |测试的batch size大小|
|--enable_fp16 | 是否使用FP16进行加速 |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use_fp16

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

|--task_name | 配置任务名称,默认tnews|
|--model_path | 配置包含Paddle模型的目录路径|
|--device | 配置部署设备,可选‘cpu’或者‘gpu’|
|--batch_size |测试的batch size大小|
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

注意上下一致,没有默认值?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

|--model_path | 用于推理的Paddle模型的路径|
|--batch_size |测试的batch size大小,默认为32|
|--perf | 是否测试性能 |
|--enable_quantize | 是否启动ONNX FP32的动态量化进行加速 |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不用露出ONNX,
是否启动ONNX FP32的动态量化进行加速->是否使用动态量化进行加速

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

参数说明:
| 参数 |参数说明 |
|----------|--------------|
|--task_name | 配置任务名称,默认tnews|
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

还有什么其他选项呢?换其他选项也能跑通吗?

enable_quantize=args.enable_quantize,
collect_shape=args.collect_shape,
num_threads=args.num_threads)
if args.collect_shape:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

补充一些注释

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

args.enable_fp16 = False
args.collect_shape = False
if args.device == 'gpu':
args.num_threads = 10
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

设置为自动根据当前硬件的核数,进行加速

@@ -0,0 +1,72 @@
# Ernie3.0 Python部署指南
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ernie3.0 -> ERNIE 3.0 Python部署

全局的Ernie 3.0 -> ERNIE 3.0

pip install -r requirements_cpu.txt
```
### GPU端
在进行GPU部署之前请先确保机器已经安装好CUDA11.04和CUDNN8.2+,然后请使用如下指令安装所需依赖
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CUDA >= 11.04, cuDNN >= 8.2

注意专业术语的标准大小写形式

```
python infer_cpu.py --task_name tnews --model_path ./model/infer
```
如果在支持avx512_vnni的CPU机器上,比如Intel(R) Xeon(R) Gold 6271C或11代CPU以上机器,可开启enable_quantize开关,无需数据便可对FP32模型进行量化,获得1到2倍的加速效果,具体部署指令如下
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

**Note**:在支持avx512_vnni指令集或Intel® DL Boost的CPU设备上,可开启enable_quantize开关对FP32模型进行动态量化以获得更高的推理性能,加速性能相比如下表所示:

|--batch_size |测试的batch size大小,默认为32|
|--perf | 是否测试性能,默认关闭 |
|--enable_quantize | 是否使用动态量化进行加速,默认关闭 |
|--num_threads | 配置cpu的线程数,默认为cpu的最大线程数 |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

将上述的Note放在这个位置,并提供数据表格支撑

|--batch_size |测试的batch size大小,默认为32|
|--use_fp16 | 是否使用FP16进行加速,默认关闭 |
|--perf | 是否测试性能,默认关闭 |
|--collect_shape | 配置是否自动配置TensorRT的dynamic shape,开启enable_fp16或者进行int8量化推理时需要先开启此选项进行dynamic shape配置,生成shapeinfo.txt后再关闭,默认关闭 |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

全局讲enable_fp16统一为use_fp16 保持统一,不要一下子use一下子enable

fetch_vars] = fluid.io.load_inference_model(model_dir, exe)
else:
[program, feed_var_names,
fetch_vars] = fluid.io.load_inference_model(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

使用paddle.static下的APi


def infer(self, data):
if isinstance(self.predictor,
paddle.fluid.core_avx.PaddleInferPredictor):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为什么需要这个判断?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

因为这里有两种predictor,onnxruntime和paddleinference

metric = METRIC_CLASSES[args.task_name]()
metric.reset()
for i, batch in enumerate(batches):
input_ids, segment_ids, label = batchify_fn(batch)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

逐渐替换为非batchify版本

@@ -0,0 +1,99 @@
#Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的井号后面要有一个空格

tokenizer=tokenizer,
is_test=False)
dev_ds = dev_ds.map(trans_func, lazy=False)
batchify_fn = lambda samples, fn=Tuple(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

后面需要替换为非batchify_fn版本,使用tokenizer版本

@yeliang2258 yeliang2258 requested a review from ZeyuChen May 12, 2022 08:28
```
pip install -r requirements_gpu.txt
```
在计算能力(Compute Capability)大于7.0的GPU硬件上,比如T4,如需FP16或者Int8量化推理加速,还需安装TensorRT和PaddleInference-TRT,计算能力(Compute Capability)和精度支持情况请参考:[GPU算力和支持精度对照表](https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-840-ea/support-matrix/index.html#hardware-precision-matrix)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PaddleInference-TRT改成Paddle Inference

```
在计算能力(Compute Capability)大于7.0的GPU硬件上,比如T4,如需FP16或者Int8量化推理加速,还需安装TensorRT和PaddleInference-TRT,计算能力(Compute Capability)和精度支持情况请参考:[GPU算力和支持精度对照表](https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-840-ea/support-matrix/index.html#hardware-precision-matrix)
1. TensorRT安装请参考:[TensorRT安装说明](https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-840-ea/install-guide/index.html#overview),简要步骤如下:
(1)下载TensorRT8.4版本,文件名TensorRT-XXX.tar.gz,[下载链接](https://developer.nvidia.com/tensorrt)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Paddle Inference目前只适配了8.2

(4)使用pip install安装TensorRT-XXX/python中对应的tensorrt安装包
2. PaddleInference-TRT安装步骤如下:
(1)下载对应版本的PaddleInference-TRT,[PaddleInference-TRT下载路径](https://www.paddlepaddle.org.cn/inference/v2.3/user_guides/download_lib.html#python)
(2)使用pip install安装下载好的PaddleInference-TRT安装包
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里只考虑了Linux,Windows上的下载地址,以及环境变量tensorrt的安装都不同

(3)通过export LD_LIBRARY_PATH=TensorRT-XXX/lib:$LD_LIBRARY_PATH将lib路径加入到LD_LIBRARY_PATH中
(4)使用pip install安装TensorRT-XXX/python中对应的tensorrt安装包
2. PaddleInference-TRT安装步骤如下:
(1)下载对应版本的PaddleInference-TRT,[PaddleInference-TRT下载路径](https://www.paddlepaddle.org.cn/inference/v2.3/user_guides/download_lib.html#python)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

下载对应cuda环境,python版本的Paddle Inference预测库包。 其中注意须下载支持tensort的预测包,如linux-cuda11.2-cudnn8.2-trt8-gcc8.2

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

import onnxruntime as ort
import copy
self.predictor_type = "onnxruntime"
float_onnx_file = "model.onnx"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里是否一定要写到本地,不能直接传中间转换结果吗

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

dynamic_quantize_onnx_file)
providers = ['CPUExecutionProvider']
sess_options = ort.SessionOptions()
sess_options.optimized_model_filepath = "./optimize_model.onnx"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里同理

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

pip install -r requirements_cpu.txt
```
### GPU端
在进行GPU部署之前请先确保机器已经安装好CUDA >= 11.04,CuDNN >= 8.2,然后请使用如下指令安装所需依赖
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

确认下是11.04还是11.4

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

11.2就可以了

The model detects all entities:
entity: 玛雅 label: LOC pos: [2, 3]
entity: 华夏 label: LOC pos: [14, 15]
-----------------------------
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的输出打印值,跟代码里面的demo是不同的?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image
相同的

enable_quantize=False,
set_dynamic_shape=False,
num_threads=10):
file_name = model_path.split('/')[-1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

代码考虑Windows/Linux兼容性,不要用字符串split,而是os.path.split


def paddle_quantize_model(self, model_dir, model_file, params_file):
file_name = model_file.split('.')[0]
model = paddle.jit.load(model_dir + "/" + file_name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

使用os.path.join

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

路径相关的都删除了

@jiangjiajun
Copy link
Contributor

看起来代码和文档并未在Windows进行测试过, @ZeyuChen 我们看是否有必要支持Windows,如若有必要,需要进行实测

help="The directory or name of model.", )
parser.add_argument(
"--model_path",
default='tnews_quant_models/mse4/int8',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这种参数不适合设置默认值

@ZeyuChen ZeyuChen merged commit d83be59 into PaddlePaddle:develop May 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants