[Backend] support ipu in paddle inference backend. #437

czr-gc · 2022-10-26T06:44:36Z

PR types(PR类型)

Backend

Describe

增加paddle inference backend对IPU的支持。

czr-gc · 2022-10-27T01:51:03Z

测试结果：

example测试：

测试覆盖fastdeploy readme 中提供的除inceptionV3以外所有模型，用一张图片测试推理。

测试脚本:

import os
import re
import subprocess

model_list = {
"PPLCNet_x1_0":"https://bj.bcebos.com/paddlehub/fastdeploy/PPLCNet_x1_0_infer.tgz",
"PPLCNetV2_base":"https://bj.bcebos.com/paddlehub/fastdeploy/PPLCNetV2_base_infer.tgz",
"EfficientNetB7":"https://bj.bcebos.com/paddlehub/fastdeploy/EfficientNetB7_infer.tgz",
"EfficientNetB0_small":"https://bj.bcebos.com/paddlehub/fastdeploy/EfficientNetB0_small_infer.tgz",
"GhostNet_x1_3_ssld":"https://bj.bcebos.com/paddlehub/fastdeploy/GhostNet_x1_3_ssld_infer.tgz",
"GhostNet_x0_5_ssld":"https://bj.bcebos.com/paddlehub/fastdeploy/GhostNet_x0_5_infer.tgz",
"MobileNetV1_x0_25":"https://bj.bcebos.com/paddlehub/fastdeploy/MobileNetV1_x0_25_infer.tgz",
"MobileNetV1_ssld":"https://bj.bcebos.com/paddlehub/fastdeploy/MobileNetV1_ssld_infer.tgz",
"MobileNetV2_x0_25":"https://bj.bcebos.com/paddlehub/fastdeploy/MobileNetV2_x0_25_infer.tgz",
"MobileNetV2_ssld":"https://bj.bcebos.com/paddlehub/fastdeploy/MobileNetV2_ssld_infer.tgz",
"MobileNetV3_small_x0_35_ssld":"https://bj.bcebos.com/paddlehub/fastdeploy/MobileNetV3_small_x0_35_ssld_infer.tgz",
"MobileNetV3_large_x1_0_ssld":"https://bj.bcebos.com/paddlehub/fastdeploy/MobileNetV3_large_x1_0_ssld_infer.tgz",
"ShuffleNetV2_x0_25":"https://bj.bcebos.com/paddlehub/fastdeploy/ShuffleNetV2_x0_25_infer.tgz",
"ShuffleNetV2_x2_0":"https://bj.bcebos.com/paddlehub/fastdeploy/ShuffleNetV2_x2_0_infer.tgz",
"SqueezeNet1_1":"https://bj.bcebos.com/paddlehub/fastdeploy/SqueezeNet1_1_infer.tgz",
"PPHGNet_tiny_ssld":"https://bj.bcebos.com/paddlehub/fastdeploy/PPHGNet_tiny_ssld_infer.tgz",
"PPHGNet_base_ssld": "https://bj.bcebos.com/paddlehub/fastdeploy/PPHGNet_base_ssld_infer.tgz",
"ResNet50_vd": "https://bj.bcebos.com/paddlehub/fastdeploy/ResNet50_vd_infer.tgz",
}

for k, v in model_list.items():
    print("TESTING: {}".format(k))
    pattern = r'.*\/([\d\w_]+).tgz$'
    model_file = re.match(pattern, v).group(1)
    download_cmd = f'''
    wget {v}
    tar -xvf {model_file}.tgz
    '''
    cpu_cmd = f'''
    python infer.py --model {model_file} --image ILSVRC2012_val_00000010.jpeg --device cpu --topk 1
    '''
    ipu_cmd = f'''
    python infer.py --model {model_file} --image ILSVRC2012_val_00000010.jpeg --device ipu --topk 1
    '''
    print(subprocess.Popen(download_cmd, shell=True, stdout=subprocess.PIPE).stdout.read())
    cpu_result = subprocess.Popen(cpu_cmd, shell=True, stdout=subprocess.PIPE).stdout.read()
    ipu_result = subprocess.Popen(ipu_cmd, shell=True, stdout=subprocess.PIPE).stdout.read()
    result_pattern = r'.*label_ids: (\d+).*scores: (\d*\.?\d*)'
    cpu_match = re.match(result_pattern, cpu_result.decode('utf-8').replace('\n', ''))
    ipu_match = re.match(result_pattern, ipu_result.decode('utf-8').replace('\n', ''))

    print("=============================={}==============================".format(k))
    if cpu_match and ipu_match:
        print("cpu_label: {}, cpu_score: {}".format(cpu_match.group(1), cpu_match.group(2)))
        print("ipu_label: {}, ipu_score: {}".format(ipu_match.group(1), ipu_match.group(2)))
    else:
        print("FAILED RUN")
    print("=============================={}==============================".format(k))

测试结果:

==============================PPLCNet_x1_0==============================
cpu_label: 153, cpu_score: 0.612086
ipu_label: 153, ipu_score: 0.612087
==============================PPLCNet_x1_0==============================

==============================PPLCNetV2_base==============================
cpu_label: 332, cpu_score: 0.278354
ipu_label: 332, ipu_score: 0.278357
==============================PPLCNetV2_base==============================

==============================EfficientNetB7==============================
cpu_label: 332, cpu_score: 0.564357
ipu_label: 332, ipu_score: 0.564378
==============================EfficientNetB7==============================

==============================EfficientNetB0_small==============================
cpu_label: 153, cpu_score: 0.525857
ipu_label: 153, ipu_score: 0.525857
==============================EfficientNetB0_small==============================

==============================GhostNet_x1_3_ssld==============================
cpu_label: 153, cpu_score: 0.849879
ipu_label: 153, ipu_score: 0.849879
==============================GhostNet_x1_3_ssld==============================

==============================GhostNet_x0_5_ssld==============================
cpu_label: 283, cpu_score: 0.341981
ipu_label: 283, ipu_score: 0.341981
==============================GhostNet_x0_5_ssld==============================

==============================MobileNetV1_x0_25==============================
cpu_label: 153, cpu_score: 0.221087
ipu_label: 153, ipu_score: 0.221088
==============================MobileNetV1_x0_25==============================

==============================MobileNetV1_ssld==============================
cpu_label: 332, cpu_score: 0.742867
ipu_label: 332, ipu_score: 0.742867
==============================MobileNetV1_ssld==============================

==============================MobileNetV2_x0_25==============================
cpu_label: 207, cpu_score: 0.247315
ipu_label: 207, ipu_score: 0.247313
==============================MobileNetV2_x0_25==============================

==============================MobileNetV3_small_x0_35_ssld==============================
cpu_label: 153, cpu_score: 0.494442
ipu_label: 153, ipu_score: 0.494442
==============================MobileNetV3_small_x0_35_ssld==============================

==============================MobileNetV3_large_x1_0_ssld==============================
cpu_label: 153, cpu_score: 0.521042
ipu_label: 153, ipu_score: 0.521041
==============================MobileNetV3_large_x1_0_ssld==============================

==============================ShuffleNetV2_x0_25==============================
cpu_label: 259, cpu_score: 0.240480
ipu_label: 259, ipu_score: 0.240481
==============================ShuffleNetV2_x0_25==============================

==============================ShuffleNetV2_x2_0==============================
cpu_label: 153, cpu_score: 0.842726
ipu_label: 153, ipu_score: 0.842727
==============================ShuffleNetV2_x2_0==============================

==============================SqueezeNet1_1==============================
cpu_label: 338, cpu_score: 0.189432
ipu_label: 338, ipu_score: 0.189433
==============================SqueezeNet1_1==============================

==============================PPHGNet_tiny_ssld==============================
cpu_label: 153, cpu_score: 0.536040
ipu_label: 153, ipu_score: 0.536039
==============================PPHGNet_tiny_ssld==============================

==============================PPHGNet_base_ssld==============================
cpu_label: 332, cpu_score: 0.996301
ipu_label: 332, ipu_score: 0.996301
==============================PPHGNet_base_ssld==============================

==============================ResNet50_vd==============================
cpu_label: 153, cpu_score: 0.686229
ipu_label: 153, ipu_score: 0.686230
==============================ResNet50_vd==============================

benchmark

测试使用脚本，将其中的运行命令改为：

python benchmark_ppcls.py --model $model --image ILSVRC2012_val_00000010.jpeg --iter_num 2000 --backend paddle --device ipu

测试覆盖readme 中提供的除inceptionV3以外所有模型。截取部分输出log如下：

[FastDeploy]    Running PPcls benchmark...
[Benchmark-PPcls] 1/20 ppcls_model/EfficientNetB0_small_infer ...
Total iterations: 2000
Total time of runtime: 3.46793s.
Warmup iterations: 400
Total time of runtime in warmup step: 0.703937s.
Average time of runtime exclude warmup step: 1.72749ms.

[Benchmark-PPcls] 3/20 ppcls_model/EfficientNetB7_infer ...
Total iterations: 2000
Total time of runtime: 20.3836s.
Warmup iterations: 400
Total time of runtime in warmup step: 4.06914s.
Average time of runtime exclude warmup step: 10.1965ms.

[Benchmark-PPcls] 4/20 ppcls_model/GhostNet_x0_5_infer ...
Total iterations: 2000
Total time of runtime: 3.26153s.
Warmup iterations: 400
Total time of runtime in warmup step: 0.6352s.
Average time of runtime exclude warmup step: 1.64145ms.

[Benchmark-PPcls] 5/20 ppcls_model/GhostNet_x1_3_ssld_infer ...
Total iterations: 2000
Total time of runtime: 3.57343s.
Warmup iterations: 400
Total time of runtime in warmup step: 0.692799s.
Average time of runtime exclude warmup step: 1.8004ms.

[Benchmark-PPcls] 7/20 ppcls_model/MobileNetV1_ssld_infer ...
Total iterations: 2000
Total time of runtime: 2.8455s.
Warmup iterations: 400
Total time of runtime in warmup step: 0.574721s.
Average time of runtime exclude warmup step: 1.41924ms.

[Benchmark-PPcls] 8/20 ppcls_model/MobileNetV1_x0_25_infer ...
Total iterations: 2000
Total time of runtime: 2.63379s.
Warmup iterations: 400
Total time of runtime in warmup step: 0.518629s.
Average time of runtime exclude warmup step: 1.32198ms.

[Benchmark-PPcls] 9/20 ppcls_model/MobileNetV2_ssld_infer ...
Total iterations: 2000
Total time of runtime: 3.20334s.
Warmup iterations: 400
Total time of runtime in warmup step: 0.61259s.
Average time of runtime exclude warmup step: 1.61922ms.

[Benchmark-PPcls] 10/20 ppcls_model/MobileNetV2_x0_25_infer ...
Total iterations: 2000
Total time of runtime: 2.93448s.
Warmup iterations: 400
Total time of runtime in warmup step: 0.561751s.
Average time of runtime exclude warmup step: 1.48296ms.

[Benchmark-PPcls] 11/20 ppcls_model/MobileNetV3_large_x1_0_ssld_infer ...
Total iterations: 2000
Total time of runtime: 3.09113s.
Warmup iterations: 400
Total time of runtime in warmup step: 0.614774s.
Average time of runtime exclude warmup step: 1.54772ms.

[Benchmark-PPcls] 12/20 ppcls_model/MobileNetV3_small_x0_35_ssld_infer ...
Total iterations: 2000
Total time of runtime: 2.87719s.
Warmup iterations: 400
Total time of runtime in warmup step: 0.543467s.
Average time of runtime exclude warmup step: 1.45858ms.

[Benchmark-PPcls] 13/20 ppcls_model/PPHGNet_base_ssld_infer ...
Total iterations: 2000
Total time of runtime: 6.51754s.
Warmup iterations: 400
Total time of runtime in warmup step: 1.30042s.
Average time of runtime exclude warmup step: 3.26069ms.

[Benchmark-PPcls] 14/20 ppcls_model/PPHGNet_tiny_ssld_infer ...
Total iterations: 2000
Total time of runtime: 3.71101s.
Warmup iterations: 400
Total time of runtime in warmup step: 0.698029s.
Average time of runtime exclude warmup step: 1.88311ms.

[Benchmark-PPcls] 15/20 ppcls_model/PPLCNetV2_base_infer ...
Total iterations: 2000
Total time of runtime: 2.87388s.
Warmup iterations: 400
Total time of runtime in warmup step: 0.572371s.
Average time of runtime exclude warmup step: 1.43844ms.

[Benchmark-PPcls] 16/20 ppcls_model/PPLCNet_x1_0_infer ...
Total iterations: 2000
Total time of runtime: 2.88727s.
Warmup iterations: 400
Total time of runtime in warmup step: 0.569004s.
Average time of runtime exclude warmup step: 1.44892ms.

[Benchmark-PPcls] 17/20 ppcls_model/ResNet50_vd_infer ...
Total iterations: 2000
Total time of runtime: 3.86693s.
Warmup iterations: 400
Total time of runtime in warmup step: 0.749314s.
Average time of runtime exclude warmup step: 1.94851ms.

[Benchmark-PPcls] 18/20 ppcls_model/ShuffleNetV2_x0_25_infer ...
Total iterations: 2000
Total time of runtime: 2.76203s.
Warmup iterations: 400
Total time of runtime in warmup step: 0.577006s.
Average time of runtime exclude warmup step: 1.36564ms.

[Benchmark-PPcls] 19/20 ppcls_model/ShuffleNetV2_x2_0_infer ...
Total iterations: 2000
Total time of runtime: 3.16924s.
Warmup iterations: 400
Total time of runtime in warmup step: 0.640512s.
Average time of runtime exclude warmup step: 1.58046ms.

[Benchmark-PPcls] 20/20 ppcls_model/SqueezeNet1_1_infer ...
Total iterations: 2000
Total time of runtime: 2.50874s.
Warmup iterations: 400
Total time of runtime in warmup step: 0.495713s.
Average time of runtime exclude warmup step: 1.25814ms.

注意：这里的benchmark结果仅仅用于PR测试，由于后续的硬件变动，当前的数据不具有性能参考意义。

jiangjiajun · 2022-10-28T03:45:53Z

fastdeploy/runtime.h

+   * \param[in] batches_per_step the number of batches per run in pipelining.
+   */
+  void EnableIpu(int device_num = 1, int micro_batch_size = 1,
+                 bool enable_pipelining = false, int batches_per_step = 1);


UseIpu与EnableIpu两个接口合为一个看起来会更方便，类似UseGpu时，支持同时配置GPU的device_id

这里的接口设计是为了和paddle inference中的IPU接口保持了一致。所以没有合并到一起。

保持接口简洁, IPU先采用如下两个用户接口设计：

void UseIpu(int device_num = 1, int micro_batch_size = 1,bool enable_pipelining = false, int batches_per_step = 1); void SetIpuConfig(bool enable_fp16 = false, int replica_num = 1, float available_memory_proportion = 1.0, bool enable_half_partial = false);

jiangjiajun · 2022-10-28T03:47:01Z

@leiqing1 麻烦帮忙Review下文档的修改

czr-gc · 2022-10-28T07:13:06Z

解决develop 分支冲突出了点差错，git amend 冲突正确的改动上来

czr-gc added 2 commits October 26, 2022 14:55

feat(ipu): add ipu support for paddle_infer backend.

1ea919c

fix(): remove unused env.

aa21ea5

czr-gc force-pushed the ipu_commit/ipu_support branch from 748a405 to aa21ea5 Compare October 26, 2022 06:55

jiangjiajun requested changes Oct 28, 2022

View reviewed changes

jiangjiajun and others added 3 commits October 28, 2022 11:47

Merge branch 'develop' into ipu_commit/ipu_support

ecd9a6a

fix(ipu): simplify user API for IPU.

ba55a88

merge develop to ipu_support.

e2fa75e

czr-gc force-pushed the ipu_commit/ipu_support branch from 2b61fdb to e2fa75e Compare October 28, 2022 07:12

czr-gc and others added 3 commits October 28, 2022 16:22

fix(cmake): fix merge conflict error in CMakeList.

542a528

Merge branch 'develop' into ipu_commit/ipu_support

f88059f

Merge branch 'develop' into ipu_commit/ipu_support

9ffaded

jiangjiajun approved these changes Oct 30, 2022

View reviewed changes

jiangjiajun merged commit ede59af into PaddlePaddle:develop Oct 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Backend] support ipu in paddle inference backend. #437

[Backend] support ipu in paddle inference backend. #437

czr-gc commented Oct 26, 2022

czr-gc commented Oct 27, 2022

jiangjiajun Oct 28, 2022

czr-gc Oct 28, 2022

czr-gc Oct 28, 2022

czr-gc Oct 28, 2022

jiangjiajun commented Oct 28, 2022

czr-gc commented Oct 28, 2022

[Backend] support ipu in paddle inference backend. #437

[Backend] support ipu in paddle inference backend. #437

Conversation

czr-gc commented Oct 26, 2022

PR types(PR类型)

Describe

czr-gc commented Oct 27, 2022

测试结果：

example测试：

benchmark

jiangjiajun Oct 28, 2022

Choose a reason for hiding this comment

czr-gc Oct 28, 2022

Choose a reason for hiding this comment

czr-gc Oct 28, 2022

Choose a reason for hiding this comment

czr-gc Oct 28, 2022

Choose a reason for hiding this comment

jiangjiajun commented Oct 28, 2022

czr-gc commented Oct 28, 2022