Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[OpenCL][Kernel] Add OpenCL image kernel: split #4645

Merged
merged 8 commits into from
Nov 4, 2020

Conversation

zhaoyang-star
Copy link
Collaborator

@zhaoyang-star zhaoyang-star commented Nov 3, 2020

【问题】split作为一个常用 op,之前没有 OpenCL 实现,这样会引起不必要的io_copylayout_cast执行,进而增加运行耗时。

【解决方法】本 PR 新增split的 OpenCL image 实现。

【效果】测试 shufflenetv2,已验证精度与ARM 无 diff。该模型中含有 16 个concat -> shuffle_channel -> split这种 block 结构。其中shuffle_channel的 GPU 实现 PR 尚未合入。因此:如果仅有split的 GPU 实现,并不会减少io_copylayout_cast的执行次数,且split操作是纯数据拷贝无计算量,因此理论上该 op 使用 GPU 并无优势(下表第三行数据也验证了这一点)。但是,当splitshuffle_channel均有 GPU 实现时,一个 block 就可以降低 3 次io_copy和 3 次layout_cast的执行耗时,因此最终在 shufflenetv2 上有速度提升。

855 Run time(ms)
ARM 10.68
OpenCL 58.36
+PR(split) 43.52
+PR(split) + PR(shuffle_channel) 15.11

备注:仅使用 GPU 版shuffle_channel时,在 835 上运行 shufflenetv2 由 43ms 降为 41ms。因此必须shuffle_channelsplit均有 GPU 实现时才有意义(1+1>2哈哈)。

【TODO】

  • 当前 repo 中没有针对 ARM / OpenCL 等 target 的split单测。由于 OpenCL 的单测会迁移至 tests 文件夹下,对应修改较多。因此split的单测会在新的一个 PR 中添加。
  • shufflenetv2 OpenCL 比 ARM 慢,需要详细耗时分析。

Copy link
Collaborator

@xiebaiyuan xiebaiyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
之前设想过一种方式,通过一个pass来检测一个block是否是同一种backend. 如果为同一种,尽可能选择同一种来提升性能.

@zhaoyang-star zhaoyang-star merged commit e009848 into PaddlePaddle:develop Nov 4, 2020
@zhaoyang-star zhaoyang-star deleted the add_split branch November 4, 2020 11:38
zhaoyang-star added a commit to zhaoyang-star/Paddle-Lite that referenced this pull request Nov 19, 2020
zhaoyang-star added a commit to zhaoyang-star/Paddle-Lite that referenced this pull request Nov 20, 2020
zhaoyang-star added a commit that referenced this pull request Nov 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants