Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cherry-pick][opencl] Refine opencl tune. test=develop #4765

Merged

Conversation

ysh329
Copy link
Contributor

@ysh329 ysh329 commented Nov 18, 2020

状态:cherry-pick,等待review

主要内容

cherry-pick #4700

  1. set_opencl_tune接口传入参数由原本bool类型,改为size_t类型,考虑后续增加多种tune_type。其中修改涉及到paddle_api.cc,paddle_api.h,cl_runtime.h,mobilenetv1_light_api.cc;
  2. 方法名规范化。
    1. 原DefaultWorkSize改为DefaultGlobalWorkSize,调用该方法的kernel较多:bilinear_interp_image_compute.cc,box_coder_image_compute.cc,concat_image_compute.cc,conv_image_compute.cc,dropout_image_compute.cc,grid_sample_image_compute.cc,lrn_image_compute.cc,nearest_interp_image_compute.cc,pad2d_image_compute.cc,reshape_image_compute.cc,slice_image_compute.cc,split_image_compute.cc;;
    2. 原LocalWorkSize改为DefaultLocalWorkSize,因为原先有一个DefaultGlobalWorksize,方法名统一:
  3. 默认计算LWS的方法、正向调试LWS的方法、反向调试LWS的方法,这3个方法存在冗余,合为1个DefaultLocalWorkSize,原有正反向通过bool reverse来控制;
  4. 解耦conv和auto-tune,将auto-tune策略的生成LWS的过程封装到GenerateLocalWorkSizes中,见cl_context.cc;
  5. 默认开启event,默认enqueue加入event,创建CommandQueue在开启tune选项时带上Profile。见cl_context.cc,cl_runtime.cc。

性能

骁龙835

  • 测试模型:caffe_mobilenetv1
  • 原性能:14.5ms,性能区间:22~12ms
  • tune后性能:11.9ms,性能区间:11~12ms
  • 收益:性能提升17%,性能稳定性大幅增强

麒麟990

  • 测试模型:caffe_mobilenetv1
  • 原性能:13.7ms,性能区间:12.6~15.3ms
  • 旧tune后性能:13.0ms,性能区间:10.5~21.1ms,首次跑+tune过程为3.3秒;
  • 新tune后性能:12.5ms,性能区间:11.3~14.2ms,首次跑+tune过程为4.5秒;
  • 收益:
    1. 新tune相比旧tune:性能提升3.8%,性能稳定性大幅增加,但tune过程时间延长;
    2. 新tune相比未tune:性能提升8.7,性能稳定性无变化。

Copy link
Collaborator

@zhaoyang-star zhaoyang-star left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@daming5432
Copy link
Collaborator

LGTM

@ysh329 ysh329 merged commit cc9f5a5 into PaddlePaddle:release/v2.7 Nov 19, 2020
@ysh329 ysh329 deleted the cherry-pick-refine-opencl-tune branch November 19, 2020 09:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants