Add oneDNN binary op #7319

luqiang-guo · 2022-01-20T09:41:33Z

此PR 通过使用oneDNN的Binary CPU Kernel替换了现有一部分Binary CPU kernel。
替换的binary CPU Kernel 有： add, sub, mul, div, max, min, Equal, NotEqual, LessThan, LessEqual, GreaterThan, GreaterEqual,

实际测试结果：

	torch	oneflow	此pr
add (13244*244)	53 us	422 us	75 us
add (103244*244)	520 us	3387 us	529 us
sub (13244*244)	51 us	431 us	80 us
sub (103244*244)	490 us	3318 us	473 us
mul (13244*244)	55 us	398 us	79 us
mul (103244*244)	486 us	3332 us	543 us
div (13244*244)	54 us	405 us	107 us
div (103244*244)	550 us	3244 us	451 us

…eflow into dev_parallel_loop

github-actions · 2022-02-13T17:39:47Z

Speed stats:

…c/oneflow into dev_add_onednn_binary

github-actions · 2022-02-14T04:47:37Z

Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.

github-actions · 2022-02-14T06:58:52Z

Speed stats:

GPU Name: GeForce GTX 1080 

✔️ OneFlow resnet50 time: 129.0ms (= 12905.0ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 140.0ms (= 14004.6ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.09 (= 140.0ms / 129.0ms)

❌ OneFlow resnet50 time: 79.3ms (= 7927.1ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 85.5ms (= 8546.9ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.08 (= 85.5ms / 79.3ms)

OneFlow resnet50 time: 52.9ms (= 10584.9ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 57.4ms (= 11470.3ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.08 (= 57.4ms / 52.9ms)

OneFlow resnet50 time: 44.4ms (= 8888.0ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 44.3ms (= 8858.4ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.00 (= 44.3ms / 44.4ms)

OneFlow resnet50 time: 40.7ms (= 8136.7ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 41.7ms (= 8346.3ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.03 (= 41.7ms / 40.7ms)

✔️ OneFlow resnet50 time: 142.3ms (= 14226.0ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 161.1ms (= 16106.7ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.13 (= 161.1ms / 142.3ms)

OneFlow resnet50 time: 87.8ms (= 8784.4ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 100.1ms (= 10011.0ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.14 (= 100.1ms / 87.8ms)

OneFlow resnet50 time: 60.8ms (= 12155.5ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 75.2ms (= 15035.5ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.24 (= 75.2ms / 60.8ms)

OneFlow resnet50 time: 51.1ms (= 10226.8ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 64.6ms (= 12920.4ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.26 (= 64.6ms / 51.1ms)

OneFlow resnet50 time: 48.6ms (= 9729.2ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 61.0ms (= 12198.0ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.25 (= 61.0ms / 48.6ms)

github-actions · 2022-02-14T16:26:43Z

Speed stats:

GPU Name: GeForce GTX 1080 

✔️ OneFlow resnet50 time: 129.0ms (= 12897.3ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 140.2ms (= 14020.7ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.09 (= 140.2ms / 129.0ms)

✔️ OneFlow resnet50 time: 77.5ms (= 7752.5ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 85.0ms (= 8498.1ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.10 (= 85.0ms / 77.5ms)

OneFlow resnet50 time: 52.5ms (= 10495.9ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 55.5ms (= 11094.6ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.06 (= 55.5ms / 52.5ms)

OneFlow resnet50 time: 43.1ms (= 8625.6ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 47.5ms (= 9502.5ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.10 (= 47.5ms / 43.1ms)

OneFlow resnet50 time: 39.6ms (= 7915.6ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 40.9ms (= 8170.1ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.03 (= 40.9ms / 39.6ms)

✔️ OneFlow resnet50 time: 142.0ms (= 14197.9ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 159.0ms (= 15898.1ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.12 (= 159.0ms / 142.0ms)

OneFlow resnet50 time: 89.8ms (= 8980.2ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 102.3ms (= 10234.5ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.14 (= 102.3ms / 89.8ms)

OneFlow resnet50 time: 60.2ms (= 12042.5ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 73.0ms (= 14604.8ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.21 (= 73.0ms / 60.2ms)

OneFlow resnet50 time: 51.3ms (= 10263.5ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 64.5ms (= 12899.6ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.26 (= 64.5ms / 51.3ms)

OneFlow resnet50 time: 51.7ms (= 10347.0ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 57.7ms (= 11541.3ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.12 (= 57.7ms / 51.7ms)

* add * merge master * Solve the thread pool problem * add device local logical cores * fix error * Delete threadpool * fix include file * fix clang -lopm * fix clang error omp.h * fix omp cmake * omp.h * fix #ifdef * test clang13 -lomp * test -fopenmp * add fopenmp * rename OMP_FLAGS * static analysis libopm-12-dev * add tbb * refien * refine * refine * refine * revert * add tbb * success add tbb * tbb onednn ok * fix ninja onednn * component * install tbb include file * updata tbb master zip * fix md5 * refine * refjine * fix * cmake option * modified clang 10 OMP * add line * fix add OMP flags * fix tbb * fix * fix * fix' * fix * fix * fix OF_RUNTIME_TBB * fix * modified binary op * fix * fix * fux error * fix * fix * fix * refine * refine * fix * add seq * refine * fix * fix * fix * add set_num_threads * fix * fi * fix error * fix * refine * refine * fix * refine * fix * refine * refine * refine * refine * refine * fix * refine * fix * fix * fix * fix * fix * refine * refine * refine * refine * refine * refine * refine * fix * fix * fix * refine * refine * auto format by CI * fix * rename mm_, dynamic_cast * auto format by CI * fix MAKE_NEW_ONEDNN_BROADCAST_ELEMENTWISE_BINARY_COMPARASION_AND_LOGICAL_ENTRY * fix 0-dim tensor * fix onednn format tag * auto format by CI Co-authored-by: jackalcooper <jackalcooper@gmail.com> Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>

luqiang-guo and others added 30 commits December 13, 2021 12:01

add

c79f6e8

merge master

63eec78

merge master

95311b3

Solve the thread pool problem

d068bbf

merge master

85c0163

add device local logical cores

4783a91

fix error

26cd0e7

Delete threadpool

624d7e9

Merge branch 'master' into dev_parallel_loop

7b6e4d2

fix include file

543c726

Merge branch 'dev_parallel_loop' of https://github.com/Oneflow-Inc/on…

36a755b

…eflow into dev_parallel_loop

fix clang -lopm

2288a11

fix clang error omp.h

5fec766

fix omp cmake

3de09b6

omp.h

9c05b6c

fix #ifdef

17bd1bb

test clang13 -lomp

c226521

test -fopenmp

c4a5179

add fopenmp

4b028a6

Merge branch 'master' into dev_parallel_loop

0eb1059

rename OMP_FLAGS

784badd

Merge branch 'dev_parallel_loop' of https://github.com/Oneflow-Inc/on…

e500586

…eflow into dev_parallel_loop

Merge branch 'master' into dev_parallel_loop

b689676

Merge branch 'master' into dev_parallel_loop

bafea64

Merge branch 'master' into dev_parallel_loop

3d6c191

Merge branch 'dev_parallel_loop' of https://github.com/Oneflow-Inc/on…

72c39e9

…eflow into dev_parallel_loop

static analysis libopm-12-dev

d00a1da

add tbb

18f363f

refien

71dac72

refine

6eee93e

luqiang-guo requested a review from oneflow-ci-bot February 13, 2022 14:42

liujuncheng approved these changes Feb 13, 2022

View reviewed changes

oneflow-ci-bot removed their request for review February 13, 2022 17:41

luqiang-guo added 2 commits February 14, 2022 12:43

fix onednn format tag

d58c8bd

Merge branch 'dev_add_onednn_binary' of https://github.com/Oneflow-In…

d470a1d

…c/oneflow into dev_add_onednn_binary

luqiang-guo requested a review from oneflow-ci-bot February 14, 2022 04:45

Merge branch 'master' into dev_add_onednn_binary

5e09fd3

luqiang-guo requested review from oneflow-ci-bot and removed request for oneflow-ci-bot February 14, 2022 04:45

auto format by CI

9381fb0

luqiang-guo requested review from oneflow-ci-bot and removed request for oneflow-ci-bot February 14, 2022 05:14

oneflow-ci-bot removed their request for review February 14, 2022 07:01

guo-ran approved these changes Feb 14, 2022

View reviewed changes

luqiang-guo requested a review from oneflow-ci-bot February 14, 2022 13:19

Merge branch 'master' into dev_add_onednn_binary

533b220

luqiang-guo added the automerge label Feb 14, 2022

luqiang-guo requested review from oneflow-ci-bot and removed request for oneflow-ci-bot February 14, 2022 13:22

Merge branch 'master' into dev_add_onednn_binary

637df74

oneflow-ci-bot requested review from oneflow-ci-bot and removed request for oneflow-ci-bot February 14, 2022 15:13

oneflow-ci-bot removed their request for review February 14, 2022 17:10

oneflow-ci-bot merged commit 0dc88d5 into master Feb 14, 2022

oneflow-ci-bot deleted the dev_add_onednn_binary branch February 14, 2022 17:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add oneDNN binary op #7319

Add oneDNN binary op #7319

luqiang-guo commented Jan 20, 2022 •

edited

Loading

github-actions bot commented Feb 13, 2022

github-actions bot commented Feb 14, 2022

github-actions bot commented Feb 14, 2022

github-actions bot commented Feb 14, 2022

Add oneDNN binary op #7319

Add oneDNN binary op #7319

Conversation

luqiang-guo commented Jan 20, 2022 • edited Loading

实际测试结果：

github-actions bot commented Feb 13, 2022

github-actions bot commented Feb 14, 2022

github-actions bot commented Feb 14, 2022

github-actions bot commented Feb 14, 2022

luqiang-guo commented Jan 20, 2022 •

edited

Loading