Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Backend] support ipu in paddle inference backend. #437

Merged
merged 8 commits into from
Oct 30, 2022

Conversation

czr-gc
Copy link
Contributor

@czr-gc czr-gc commented Oct 26, 2022

PR types(PR类型)

Backend

Describe

增加paddle inference backend对IPU的支持。

@czr-gc
Copy link
Contributor Author

czr-gc commented Oct 27, 2022

测试结果:

example测试:

测试覆盖fastdeploy readme 中提供的除inceptionV3以外所有模型,用一张图片测试推理。

  • 测试脚本:
import os
import re
import subprocess

model_list = {
"PPLCNet_x1_0":"https://bj.bcebos.com/paddlehub/fastdeploy/PPLCNet_x1_0_infer.tgz",
"PPLCNetV2_base":"https://bj.bcebos.com/paddlehub/fastdeploy/PPLCNetV2_base_infer.tgz",
"EfficientNetB7":"https://bj.bcebos.com/paddlehub/fastdeploy/EfficientNetB7_infer.tgz",
"EfficientNetB0_small":"https://bj.bcebos.com/paddlehub/fastdeploy/EfficientNetB0_small_infer.tgz",
"GhostNet_x1_3_ssld":"https://bj.bcebos.com/paddlehub/fastdeploy/GhostNet_x1_3_ssld_infer.tgz",
"GhostNet_x0_5_ssld":"https://bj.bcebos.com/paddlehub/fastdeploy/GhostNet_x0_5_infer.tgz",
"MobileNetV1_x0_25":"https://bj.bcebos.com/paddlehub/fastdeploy/MobileNetV1_x0_25_infer.tgz",
"MobileNetV1_ssld":"https://bj.bcebos.com/paddlehub/fastdeploy/MobileNetV1_ssld_infer.tgz",
"MobileNetV2_x0_25":"https://bj.bcebos.com/paddlehub/fastdeploy/MobileNetV2_x0_25_infer.tgz",
"MobileNetV2_ssld":"https://bj.bcebos.com/paddlehub/fastdeploy/MobileNetV2_ssld_infer.tgz",
"MobileNetV3_small_x0_35_ssld":"https://bj.bcebos.com/paddlehub/fastdeploy/MobileNetV3_small_x0_35_ssld_infer.tgz",
"MobileNetV3_large_x1_0_ssld":"https://bj.bcebos.com/paddlehub/fastdeploy/MobileNetV3_large_x1_0_ssld_infer.tgz",
"ShuffleNetV2_x0_25":"https://bj.bcebos.com/paddlehub/fastdeploy/ShuffleNetV2_x0_25_infer.tgz",
"ShuffleNetV2_x2_0":"https://bj.bcebos.com/paddlehub/fastdeploy/ShuffleNetV2_x2_0_infer.tgz",
"SqueezeNet1_1":"https://bj.bcebos.com/paddlehub/fastdeploy/SqueezeNet1_1_infer.tgz",
"PPHGNet_tiny_ssld":"https://bj.bcebos.com/paddlehub/fastdeploy/PPHGNet_tiny_ssld_infer.tgz",
"PPHGNet_base_ssld": "https://bj.bcebos.com/paddlehub/fastdeploy/PPHGNet_base_ssld_infer.tgz",
"ResNet50_vd": "https://bj.bcebos.com/paddlehub/fastdeploy/ResNet50_vd_infer.tgz",
}

for k, v in model_list.items():
    print("TESTING: {}".format(k))
    pattern = r'.*\/([\d\w_]+).tgz$'
    model_file = re.match(pattern, v).group(1)
    download_cmd = f'''
    wget {v}
    tar -xvf {model_file}.tgz
    '''
    cpu_cmd = f'''
    python infer.py --model {model_file} --image ILSVRC2012_val_00000010.jpeg --device cpu --topk 1
    '''
    ipu_cmd = f'''
    python infer.py --model {model_file} --image ILSVRC2012_val_00000010.jpeg --device ipu --topk 1
    '''
    print(subprocess.Popen(download_cmd, shell=True, stdout=subprocess.PIPE).stdout.read())
    cpu_result = subprocess.Popen(cpu_cmd, shell=True, stdout=subprocess.PIPE).stdout.read()
    ipu_result = subprocess.Popen(ipu_cmd, shell=True, stdout=subprocess.PIPE).stdout.read()
    result_pattern = r'.*label_ids: (\d+).*scores: (\d*\.?\d*)'
    cpu_match = re.match(result_pattern, cpu_result.decode('utf-8').replace('\n', ''))
    ipu_match = re.match(result_pattern, ipu_result.decode('utf-8').replace('\n', ''))

    print("=============================={}==============================".format(k))
    if cpu_match and ipu_match:
        print("cpu_label: {}, cpu_score: {}".format(cpu_match.group(1), cpu_match.group(2)))
        print("ipu_label: {}, ipu_score: {}".format(ipu_match.group(1), ipu_match.group(2)))
    else:
        print("FAILED RUN")
    print("=============================={}==============================".format(k))
  • 测试结果:
==============================PPLCNet_x1_0==============================
cpu_label: 153, cpu_score: 0.612086
ipu_label: 153, ipu_score: 0.612087
==============================PPLCNet_x1_0==============================

==============================PPLCNetV2_base==============================
cpu_label: 332, cpu_score: 0.278354
ipu_label: 332, ipu_score: 0.278357
==============================PPLCNetV2_base==============================

==============================EfficientNetB7==============================
cpu_label: 332, cpu_score: 0.564357
ipu_label: 332, ipu_score: 0.564378
==============================EfficientNetB7==============================

==============================EfficientNetB0_small==============================
cpu_label: 153, cpu_score: 0.525857
ipu_label: 153, ipu_score: 0.525857
==============================EfficientNetB0_small==============================

==============================GhostNet_x1_3_ssld==============================
cpu_label: 153, cpu_score: 0.849879
ipu_label: 153, ipu_score: 0.849879
==============================GhostNet_x1_3_ssld==============================

==============================GhostNet_x0_5_ssld==============================
cpu_label: 283, cpu_score: 0.341981
ipu_label: 283, ipu_score: 0.341981
==============================GhostNet_x0_5_ssld==============================

==============================MobileNetV1_x0_25==============================
cpu_label: 153, cpu_score: 0.221087
ipu_label: 153, ipu_score: 0.221088
==============================MobileNetV1_x0_25==============================

==============================MobileNetV1_ssld==============================
cpu_label: 332, cpu_score: 0.742867
ipu_label: 332, ipu_score: 0.742867
==============================MobileNetV1_ssld==============================

==============================MobileNetV2_x0_25==============================
cpu_label: 207, cpu_score: 0.247315
ipu_label: 207, ipu_score: 0.247313
==============================MobileNetV2_x0_25==============================

==============================MobileNetV3_small_x0_35_ssld==============================
cpu_label: 153, cpu_score: 0.494442
ipu_label: 153, ipu_score: 0.494442
==============================MobileNetV3_small_x0_35_ssld==============================

==============================MobileNetV3_large_x1_0_ssld==============================
cpu_label: 153, cpu_score: 0.521042
ipu_label: 153, ipu_score: 0.521041
==============================MobileNetV3_large_x1_0_ssld==============================

==============================ShuffleNetV2_x0_25==============================
cpu_label: 259, cpu_score: 0.240480
ipu_label: 259, ipu_score: 0.240481
==============================ShuffleNetV2_x0_25==============================

==============================ShuffleNetV2_x2_0==============================
cpu_label: 153, cpu_score: 0.842726
ipu_label: 153, ipu_score: 0.842727
==============================ShuffleNetV2_x2_0==============================

==============================SqueezeNet1_1==============================
cpu_label: 338, cpu_score: 0.189432
ipu_label: 338, ipu_score: 0.189433
==============================SqueezeNet1_1==============================

==============================PPHGNet_tiny_ssld==============================
cpu_label: 153, cpu_score: 0.536040
ipu_label: 153, ipu_score: 0.536039
==============================PPHGNet_tiny_ssld==============================

==============================PPHGNet_base_ssld==============================
cpu_label: 332, cpu_score: 0.996301
ipu_label: 332, ipu_score: 0.996301
==============================PPHGNet_base_ssld==============================

==============================ResNet50_vd==============================
cpu_label: 153, cpu_score: 0.686229
ipu_label: 153, ipu_score: 0.686230
==============================ResNet50_vd==============================

benchmark

测试使用脚本,将其中的运行命令改为:

python benchmark_ppcls.py --model $model --image ILSVRC2012_val_00000010.jpeg --iter_num 2000 --backend paddle --device ipu

测试覆盖readme 中提供的除inceptionV3以外所有模型。截取部分输出log如下:

[FastDeploy]    Running PPcls benchmark...
[Benchmark-PPcls] 1/20 ppcls_model/EfficientNetB0_small_infer ...
Total iterations: 2000
Total time of runtime: 3.46793s.
Warmup iterations: 400
Total time of runtime in warmup step: 0.703937s.
Average time of runtime exclude warmup step: 1.72749ms.

[Benchmark-PPcls] 3/20 ppcls_model/EfficientNetB7_infer ...
Total iterations: 2000
Total time of runtime: 20.3836s.
Warmup iterations: 400
Total time of runtime in warmup step: 4.06914s.
Average time of runtime exclude warmup step: 10.1965ms.

[Benchmark-PPcls] 4/20 ppcls_model/GhostNet_x0_5_infer ...
Total iterations: 2000
Total time of runtime: 3.26153s.
Warmup iterations: 400
Total time of runtime in warmup step: 0.6352s.
Average time of runtime exclude warmup step: 1.64145ms.

[Benchmark-PPcls] 5/20 ppcls_model/GhostNet_x1_3_ssld_infer ...
Total iterations: 2000
Total time of runtime: 3.57343s.
Warmup iterations: 400
Total time of runtime in warmup step: 0.692799s.
Average time of runtime exclude warmup step: 1.8004ms.

[Benchmark-PPcls] 7/20 ppcls_model/MobileNetV1_ssld_infer ...
Total iterations: 2000
Total time of runtime: 2.8455s.
Warmup iterations: 400
Total time of runtime in warmup step: 0.574721s.
Average time of runtime exclude warmup step: 1.41924ms.

[Benchmark-PPcls] 8/20 ppcls_model/MobileNetV1_x0_25_infer ...
Total iterations: 2000
Total time of runtime: 2.63379s.
Warmup iterations: 400
Total time of runtime in warmup step: 0.518629s.
Average time of runtime exclude warmup step: 1.32198ms.

[Benchmark-PPcls] 9/20 ppcls_model/MobileNetV2_ssld_infer ...
Total iterations: 2000
Total time of runtime: 3.20334s.
Warmup iterations: 400
Total time of runtime in warmup step: 0.61259s.
Average time of runtime exclude warmup step: 1.61922ms.

[Benchmark-PPcls] 10/20 ppcls_model/MobileNetV2_x0_25_infer ...
Total iterations: 2000
Total time of runtime: 2.93448s.
Warmup iterations: 400
Total time of runtime in warmup step: 0.561751s.
Average time of runtime exclude warmup step: 1.48296ms.

[Benchmark-PPcls] 11/20 ppcls_model/MobileNetV3_large_x1_0_ssld_infer ...
Total iterations: 2000
Total time of runtime: 3.09113s.
Warmup iterations: 400
Total time of runtime in warmup step: 0.614774s.
Average time of runtime exclude warmup step: 1.54772ms.

[Benchmark-PPcls] 12/20 ppcls_model/MobileNetV3_small_x0_35_ssld_infer ...
Total iterations: 2000
Total time of runtime: 2.87719s.
Warmup iterations: 400
Total time of runtime in warmup step: 0.543467s.
Average time of runtime exclude warmup step: 1.45858ms.

[Benchmark-PPcls] 13/20 ppcls_model/PPHGNet_base_ssld_infer ...
Total iterations: 2000
Total time of runtime: 6.51754s.
Warmup iterations: 400
Total time of runtime in warmup step: 1.30042s.
Average time of runtime exclude warmup step: 3.26069ms.

[Benchmark-PPcls] 14/20 ppcls_model/PPHGNet_tiny_ssld_infer ...
Total iterations: 2000
Total time of runtime: 3.71101s.
Warmup iterations: 400
Total time of runtime in warmup step: 0.698029s.
Average time of runtime exclude warmup step: 1.88311ms.

[Benchmark-PPcls] 15/20 ppcls_model/PPLCNetV2_base_infer ...
Total iterations: 2000
Total time of runtime: 2.87388s.
Warmup iterations: 400
Total time of runtime in warmup step: 0.572371s.
Average time of runtime exclude warmup step: 1.43844ms.

[Benchmark-PPcls] 16/20 ppcls_model/PPLCNet_x1_0_infer ...
Total iterations: 2000
Total time of runtime: 2.88727s.
Warmup iterations: 400
Total time of runtime in warmup step: 0.569004s.
Average time of runtime exclude warmup step: 1.44892ms.

[Benchmark-PPcls] 17/20 ppcls_model/ResNet50_vd_infer ...
Total iterations: 2000
Total time of runtime: 3.86693s.
Warmup iterations: 400
Total time of runtime in warmup step: 0.749314s.
Average time of runtime exclude warmup step: 1.94851ms.

[Benchmark-PPcls] 18/20 ppcls_model/ShuffleNetV2_x0_25_infer ...
Total iterations: 2000
Total time of runtime: 2.76203s.
Warmup iterations: 400
Total time of runtime in warmup step: 0.577006s.
Average time of runtime exclude warmup step: 1.36564ms.

[Benchmark-PPcls] 19/20 ppcls_model/ShuffleNetV2_x2_0_infer ...
Total iterations: 2000
Total time of runtime: 3.16924s.
Warmup iterations: 400
Total time of runtime in warmup step: 0.640512s.
Average time of runtime exclude warmup step: 1.58046ms.

[Benchmark-PPcls] 20/20 ppcls_model/SqueezeNet1_1_infer ...
Total iterations: 2000
Total time of runtime: 2.50874s.
Warmup iterations: 400
Total time of runtime in warmup step: 0.495713s.
Average time of runtime exclude warmup step: 1.25814ms.

注意: 这里的benchmark结果仅仅用于PR测试,由于后续的硬件变动,当前的数据不具有性能参考意义。

* \param[in] batches_per_step the number of batches per run in pipelining.
*/
void EnableIpu(int device_num = 1, int micro_batch_size = 1,
bool enable_pipelining = false, int batches_per_step = 1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UseIpuEnableIpu两个接口合为一个看起来会更方便,类似UseGpu时,支持同时配置GPU的device_id

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的接口设计是为了和paddle inference中的IPU接口保持了一致。所以没有合并到一起。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

保持接口简洁, IPU先采用如下两个用户接口设计:

void UseIpu(int device_num = 1, int micro_batch_size = 1,bool enable_pipelining = false, int batches_per_step = 1);
void SetIpuConfig(bool enable_fp16 = false, int replica_num = 1,
                  float available_memory_proportion = 1.0,
                  bool enable_half_partial = false);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

@jiangjiajun
Copy link
Collaborator

@leiqing1 麻烦帮忙Review下文档的修改

@czr-gc
Copy link
Contributor Author

czr-gc commented Oct 28, 2022

解决develop 分支冲突出了点差错,git amend 冲突正确的改动上来

@jiangjiajun jiangjiajun merged commit ede59af into PaddlePaddle:develop Oct 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants