Ernie3.0 python deploy dev #2077

yeliang2258 · 2022-05-07T16:13:41Z

PR types

Performance optimization

PR changes

Others

Description

Ernie3.0 python deploy script

…deploy_dev

ZeyuChen · 2022-05-08T09:52:13Z

examples/language_model/ernie-3.0/python/README.md

@@ -0,0 +1,38 @@
+# Ernie-3.0 Python部署说明


新增deploy层级

ZeyuChen · 2022-05-08T09:52:55Z

examples/language_model/ernie-3.0/python/README.md

+- onnxruntime-gpu >= 1.10.0
+- paddleinference-trt
+- onnx >= 1.10.0
+- paddle2onnx develop版本


到时paddlenlp内置paddle2onnx，务必确保正式版本匹配，所以不用增加这一项。

ZeyuChen · 2022-05-08T09:53:35Z

examples/language_model/ernie-3.0/python/README.md

+### 运行指令
+1. CPU非量化模型
+```
+python infer.py --model_path tnews/pruned_fp32/float32 --device ‘cpu’


cpu这里的符号全半角有问题
--device ‘cpu’ -> --device 'cpu'

ZeyuChen · 2022-05-08T09:54:08Z

examples/language_model/ernie-3.0/python/infer.py

@@ -0,0 +1,272 @@
+# C#opyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.


Copyright有Typo

ZeyuChen · 2022-05-08T09:54:15Z

examples/language_model/ernie-3.0/python/infer_backend.py

@@ -0,0 +1,148 @@
+# C#opyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.


ZeyuChen · 2022-05-08T09:54:46Z

examples/language_model/ernie-3.0/python/infer.py

+        tokenizer=tokenizer,
+        is_test=False)
+    dev_ds = dev_ds.map(trans_func, lazy=False)
+    batchify_fn = lambda samples, fn=Tuple(


与liqi讨论，需要换成非batchify function的版本

…deploy_dev

ZeyuChen · 2022-05-10T08:50:22Z

examples/language_model/ernie-3.0/deploy/python/README.md

@@ -0,0 +1,69 @@
+# Ernie-3.0 Python部署指南


ERNIE 3.0
无横杆

ZeyuChen · 2022-05-10T08:50:40Z

examples/language_model/ernie-3.0/deploy/python/README.md

+# Ernie-3.0 Python部署指南
+
+## 安装
+Ernie-3.0的部署分为cpu和gpu两种情况，请根据你的部署环境安装对应的依赖。


ERNIE 3.0
cpu -> CPU
gpu -> GPU

大小写要保持一致

ZeyuChen · 2022-05-10T08:51:36Z

examples/language_model/ernie-3.0/deploy/python/README.md

+### GPU端
+在进行GPU部署之前请先确保机器已经安装好CUDA11.04和CUDNN8.2+，然后请使用如下指令安装所需依赖
+```
+pip install -r requirement_gpu.txt


requirements_cpu.txt
requirements_gpu.txt

ZeyuChen · 2022-05-10T11:17:33Z

examples/language_model/ernie-3.0/deploy/python/README.md

+```
+pip install -r requirement_gpu.txt
+```
+在计算能力大于7.0的GPU硬件上，比如T4，如需FP16或者Int8量化推理加速，还需安装TensorRT和PaddleInference-TRT，具体硬件和精度支持情况请参考：[GPU算力和支持精度对照表](https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-840-ea/support-matrix/index.html#hardware-precision-matrix)  


计算能力（Compute Capability）

ZeyuChen · 2022-05-10T11:20:38Z

examples/language_model/ernie-3.0/deploy/python/README.md

+|--model_path | 配置包含Paddle模型的目录路径|
+|--device | 配置部署设备，可选‘cpu’或者‘gpu’|
+|--batch_size |测试的batch size大小|
+|--enable_fp16 | 是否使用FP16进行加速 |


ZeyuChen · 2022-05-10T11:22:33Z

examples/language_model/ernie-3.0/deploy/python/README.md

+|--task_name | 配置任务名称，默认tnews|
+|--model_path | 配置包含Paddle模型的目录路径|
+|--device | 配置部署设备，可选‘cpu’或者‘gpu’|
+|--batch_size |测试的batch size大小|


注意上下一致，没有默认值？

ZeyuChen · 2022-05-10T11:23:26Z

examples/language_model/ernie-3.0/deploy/python/README.md

+|--model_path | 用于推理的Paddle模型的路径|
+|--batch_size |测试的batch size大小，默认为32|
+|--perf | 是否测试性能 |
+|--enable_quantize | 是否启动ONNX FP32的动态量化进行加速 |


不用露出ONNX，
是否启动ONNX FP32的动态量化进行加速->是否使用动态量化进行加速

ZeyuChen · 2022-05-10T11:23:56Z

examples/language_model/ernie-3.0/deploy/python/README.md

+参数说明：
+| 参数 |参数说明 |
+|----------|--------------|
+|--task_name | 配置任务名称，默认tnews|


还有什么其他选项呢？换其他选项也能跑通吗？

ZeyuChen · 2022-05-10T11:24:19Z

examples/language_model/ernie-3.0/deploy/python/ernie_predictor.py

+            enable_quantize=args.enable_quantize,
+            collect_shape=args.collect_shape,
+            num_threads=args.num_threads)
+        if args.collect_shape:


补充一些注释

ZeyuChen · 2022-05-10T11:26:34Z

examples/language_model/ernie-3.0/deploy/python/ernie_predictor.py

+            args.enable_fp16 = False
+            args.collect_shape = False
+        if args.device == 'gpu':
+            args.num_threads = 10


设置为自动根据当前硬件的核数，进行加速

…deploy_dev

ZeyuChen · 2022-05-11T15:03:17Z

examples/language_model/ernie-3.0/deploy/python/README.md

@@ -0,0 +1,72 @@
+# Ernie3.0 Python部署指南


Ernie3.0 -> ERNIE 3.0 Python部署

全局的Ernie 3.0 -> ERNIE 3.0

ZeyuChen · 2022-05-11T15:03:53Z

examples/language_model/ernie-3.0/deploy/python/README.md

+pip install -r requirements_cpu.txt
+```
+### GPU端
+在进行GPU部署之前请先确保机器已经安装好CUDA11.04和CUDNN8.2+，然后请使用如下指令安装所需依赖


CUDA >= 11.04, cuDNN >= 8.2

注意专业术语的标准大小写形式

ZeyuChen · 2022-05-11T15:07:47Z

examples/language_model/ernie-3.0/deploy/python/README.md

+```
+python infer_cpu.py --task_name tnews --model_path ./model/infer
+```
+如果在支持avx512_vnni的CPU机器上，比如Intel(R) Xeon(R) Gold 6271C或11代CPU以上机器，可开启enable_quantize开关，无需数据便可对FP32模型进行量化，获得1到2倍的加速效果，具体部署指令如下


**Note**：在支持avx512_vnni指令集或Intel® DL Boost的CPU设备上，可开启enable_quantize开关对FP32模型进行动态量化以获得更高的推理性能，加速性能相比如下表所示：

ZeyuChen · 2022-05-11T15:08:04Z

examples/language_model/ernie-3.0/deploy/python/README.md

+|--batch_size |测试的batch size大小，默认为32|
+|--perf | 是否测试性能，默认关闭 |
+|--enable_quantize | 是否使用动态量化进行加速，默认关闭 |
+|--num_threads | 配置cpu的线程数，默认为cpu的最大线程数 |


将上述的Note放在这个位置，并提供数据表格支撑

ZeyuChen · 2022-05-11T15:08:35Z

examples/language_model/ernie-3.0/deploy/python/README.md

+|--batch_size |测试的batch size大小，默认为32|
+|--use_fp16 | 是否使用FP16进行加速，默认关闭 |
+|--perf | 是否测试性能，默认关闭 |
+|--collect_shape | 配置是否自动配置TensorRT的dynamic shape，开启enable_fp16或者进行int8量化推理时需要先开启此选项进行dynamic shape配置，生成shapeinfo.txt后再关闭，默认关闭 |


全局讲enable_fp16统一为use_fp16 保持统一，不要一下子use一下子enable

ZeyuChen · 2022-05-11T15:13:07Z

examples/language_model/ernie-3.0/deploy/python/ernie_predictor.py

+             fetch_vars] = fluid.io.load_inference_model(model_dir, exe)
+        else:
+            [program, feed_var_names,
+             fetch_vars] = fluid.io.load_inference_model(


使用paddle.static下的APi

ZeyuChen · 2022-05-11T15:13:51Z

examples/language_model/ernie-3.0/deploy/python/ernie_predictor.py

+
+    def infer(self, data):
+        if isinstance(self.predictor,
+                      paddle.fluid.core_avx.PaddleInferPredictor):


为什么需要这个判断？

因为这里有两种predictor，onnxruntime和paddleinference

ZeyuChen · 2022-05-11T15:14:26Z

examples/language_model/ernie-3.0/deploy/python/ernie_predictor.py

+            metric = METRIC_CLASSES[args.task_name]()
+            metric.reset()
+            for i, batch in enumerate(batches):
+                input_ids, segment_ids, label = batchify_fn(batch)


逐渐替换为非batchify版本

ZeyuChen · 2022-05-11T15:14:37Z

examples/language_model/ernie-3.0/deploy/python/infer_cpu.py

@@ -0,0 +1,99 @@
+#Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.


这里的井号后面要有一个空格

ZeyuChen · 2022-05-11T15:15:02Z

examples/language_model/ernie-3.0/deploy/python/infer_cpu.py

+        tokenizer=tokenizer,
+        is_test=False)
+    dev_ds = dev_ds.map(trans_func, lazy=False)
+    batchify_fn = lambda samples, fn=Tuple(


后面需要替换为非batchify_fn版本，使用tokenizer版本

…deploy_dev

jiangjiajun · 2022-05-12T08:51:37Z

examples/language_model/ernie-3.0/deploy/python/README.md

+```
+pip install -r requirements_gpu.txt
+```
+在计算能力（Compute Capability）大于7.0的GPU硬件上，比如T4，如需FP16或者Int8量化推理加速，还需安装TensorRT和PaddleInference-TRT，计算能力（Compute Capability）和精度支持情况请参考：[GPU算力和支持精度对照表](https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-840-ea/support-matrix/index.html#hardware-precision-matrix)  


PaddleInference-TRT改成Paddle Inference

jiangjiajun · 2022-05-12T08:52:23Z

examples/language_model/ernie-3.0/deploy/python/README.md

+```
+在计算能力（Compute Capability）大于7.0的GPU硬件上，比如T4，如需FP16或者Int8量化推理加速，还需安装TensorRT和PaddleInference-TRT，计算能力（Compute Capability）和精度支持情况请参考：[GPU算力和支持精度对照表](https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-840-ea/support-matrix/index.html#hardware-precision-matrix)  
+1. TensorRT安装请参考：[TensorRT安装说明](https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-840-ea/install-guide/index.html#overview)，简要步骤如下：  
+    (1)下载TensorRT8.4版本,文件名TensorRT-XXX.tar.gz，[下载链接](https://developer.nvidia.com/tensorrt)  


Paddle Inference目前只适配了8.2

jiangjiajun · 2022-05-12T08:52:48Z

examples/language_model/ernie-3.0/deploy/python/README.md

+    (4)使用pip install安装TensorRT-XXX/python中对应的tensorrt安装包
+2. PaddleInference-TRT安装步骤如下：  
+    (1)下载对应版本的PaddleInference-TRT，[PaddleInference-TRT下载路径](https://www.paddlepaddle.org.cn/inference/v2.3/user_guides/download_lib.html#python)  
+    (2)使用pip install安装下载好的PaddleInference-TRT安装包


这里只考虑了Linux，Windows上的下载地址，以及环境变量tensorrt的安装都不同

jiangjiajun · 2022-05-12T08:54:46Z

examples/language_model/ernie-3.0/deploy/python/README.md

+    (3)通过export LD_LIBRARY_PATH=TensorRT-XXX/lib:$LD_LIBRARY_PATH将lib路径加入到LD_LIBRARY_PATH中  
+    (4)使用pip install安装TensorRT-XXX/python中对应的tensorrt安装包
+2. PaddleInference-TRT安装步骤如下：  
+    (1)下载对应版本的PaddleInference-TRT，[PaddleInference-TRT下载路径](https://www.paddlepaddle.org.cn/inference/v2.3/user_guides/download_lib.html#python)  


下载对应cuda环境，python版本的Paddle Inference预测库包。其中注意须下载支持tensort的预测包，如linux-cuda11.2-cudnn8.2-trt8-gcc8.2

jiangjiajun · 2022-05-12T08:58:23Z

examples/language_model/ernie-3.0/deploy/python/ernie_predictor.py

+            import onnxruntime as ort
+            import copy
+            self.predictor_type = "onnxruntime"
+            float_onnx_file = "model.onnx"


这里是否一定要写到本地，不能直接传中间转换结果吗

jiangjiajun · 2022-05-12T08:58:49Z

examples/language_model/ernie-3.0/deploy/python/ernie_predictor.py

+                                      dynamic_quantize_onnx_file)
+                providers = ['CPUExecutionProvider']
+            sess_options = ort.SessionOptions()
+            sess_options.optimized_model_filepath = "./optimize_model.onnx"


这里同理

jiangjiajun · 2022-05-12T09:02:42Z

examples/language_model/ernie-3.0/deploy/python/README.md

+pip install -r requirements_cpu.txt
+```
+### GPU端
+在进行GPU部署之前请先确保机器已经安装好CUDA >= 11.04，CuDNN >= 8.2，然后请使用如下指令安装所需依赖


确认下是11.04还是11.4

11.2就可以了

jiangjiajun · 2022-05-12T09:03:43Z

examples/language_model/ernie-3.0/deploy/python/README.md

+The model detects all entities:
+entity: 玛雅   label: LOC   pos: [2, 3]
+entity: 华夏   label: LOC   pos: [14, 15]
+-----------------------------


这里的输出打印值，跟代码里面的demo是不同的？

jiangjiajun · 2022-05-12T09:35:18Z

examples/language_model/ernie-3.0/deploy/python/ernie_predictor.py

+                 enable_quantize=False,
+                 set_dynamic_shape=False,
+                 num_threads=10):
+        file_name = model_path.split('/')[-1]


代码考虑Windows/Linux兼容性，不要用字符串split，而是os.path.split

jiangjiajun · 2022-05-12T09:35:47Z

examples/language_model/ernie-3.0/deploy/python/ernie_predictor.py

+
+    def paddle_quantize_model(self, model_dir, model_file, params_file):
+        file_name = model_file.split('.')[0]
+        model = paddle.jit.load(model_dir + "/" + file_name)


使用os.path.join

路径相关的都删除了

jiangjiajun · 2022-05-12T09:36:56Z

看起来代码和文档并未在Windows进行测试过， @ZeyuChen 我们看是否有必要支持Windows，如若有必要，需要进行实测

ZeyuChen · 2022-05-12T09:25:30Z

examples/language_model/ernie-3.0/deploy/python/infer_gpu.py

+        help="The directory or name of model.", )
+    parser.add_argument(
+        "--model_path",
+        default='tnews_quant_models/mse4/int8',


这种参数不适合设置默认值

yeliang2258 added 6 commits May 7, 2022 13:10

add python deploy scripy

a1bf878

update readme

1301f6b

remove useless file

c558bf0

udpate code

704c0bf

remove useless print

68d1e68

update dynamic quantize

283c67c

ZeyuChen self-assigned this May 8, 2022

yeliang2258 added 3 commits May 8, 2022 12:09

update daynamic quantize func

16edc76

Merge remote-tracking branch 'upstream/develop' into ernie3.0_python_…

e5073aa

…deploy_dev

mv python to deploy folder

d0a3724

ZeyuChen self-requested a review May 8, 2022 15:31

ZeyuChen reviewed May 8, 2022

View reviewed changes

yeliang2258 added 9 commits May 9, 2022 09:06

add use_inference in infer_backend

94d02ec

Merge remote-tracking branch 'upstream/develop' into ernie3.0_python_…

7270fa2

…deploy_dev

add optimize choice

641d0a1

fix doc and add infer_cpu infer_gpu

fbc7780

remove useless file

d3e5798

update doc

ef8330b

update doc for quantize model

f1b5cc1

Merge remote-tracking branch 'upstream/develop' into ernie3.0_python_…

7e3a690

…deploy_dev

fix doc

23a797a

yeliang2258 requested a review from ZeyuChen May 10, 2022 11:18

Merge remote-tracking branch 'upstream/develop' into ernie3.0_python_…

4fd26f6

…deploy_dev

ZeyuChen reviewed May 10, 2022

View reviewed changes

yeliang2258 added 3 commits May 10, 2022 12:48

update docs

b5ed9f7

fix doc and add set_dynamic_shape fuc

2116fc2

Merge remote-tracking branch 'upstream/develop' into ernie3.0_python_…

c5afd90

…deploy_dev

ZeyuChen reviewed May 11, 2022

View reviewed changes

yeliang2258 added 2 commits May 12, 2022 01:48

Merge remote-tracking branch 'upstream/develop' into ernie3.0_python_…

506e75b

…deploy_dev

update ernie_predictor and doc

6d0d575

merge remote-tracking branch 'upstream/develop' into ernie3.0_python_…

f0add51

…deploy_dev

yeliang2258 requested a review from ZeyuChen May 12, 2022 08:28

yeliang2258 added 3 commits May 12, 2022 08:42

update doc and paddle2onnx api

8f2b524

update ernie_predictor, use all cpu

95c99ee

add quantize tool doc link

761c910

jiangjiajun suggested changes May 12, 2022

View reviewed changes

update doc and use c api of paddle2onnx

7dc267b

yeliang2258 requested a review from jiangjiajun May 12, 2022 09:29

jiangjiajun suggested changes May 12, 2022

View reviewed changes

fix path bug

eb06841

ZeyuChen approved these changes May 12, 2022

View reviewed changes

ZeyuChen merged commit d83be59 into PaddlePaddle:develop May 12, 2022

ZeyuChen mentioned this pull request May 15, 2022

PaddleNLP v2.3rc Release Note Candidate #2031

Closed

		@@ -0,0 +1,272 @@
		# C#opyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.

		@@ -0,0 +1,148 @@
		# C#opyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.

		@@ -0,0 +1,99 @@
		#Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.

Ernie3.0 python deploy dev #2077

Ernie3.0 python deploy dev #2077

Conversation

yeliang2258 commented May 7, 2022

PR types

PR changes

Description

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jiangjiajun commented May 12, 2022

Choose a reason for hiding this comment