Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add oneDNN binary op #7319

Merged
merged 170 commits into from
Feb 14, 2022
Merged

Add oneDNN binary op #7319

merged 170 commits into from
Feb 14, 2022

Conversation

luqiang-guo
Copy link
Contributor

@luqiang-guo luqiang-guo commented Jan 20, 2022

此PR 通过使用oneDNN的Binary CPU Kernel替换了现有一部分Binary CPU kernel。
替换的binary CPU Kernel 有: add, sub, mul, div, max, min, Equal, NotEqual, LessThan, LessEqual, GreaterThan, GreaterEqual,

实际测试结果:

torch oneflow 此pr
add (1*3*244*244) 53 us 422 us 75 us
add (10*3*244*244) 520 us 3387 us 529 us
sub (1*3*244*244) 51 us 431 us 80 us
sub (10*3*244*244) 490 us 3318 us 473 us
mul (1*3*244*244) 55 us 398 us 79 us
mul (10*3*244*244) 486 us 3332 us 543 us
div (1*3*244*244) 54 us 405 us 107 us
div (10*3*244*244) 550 us 3244 us 451 us

@github-actions
Copy link
Contributor

Speed stats:

@oneflow-ci-bot oneflow-ci-bot removed their request for review February 13, 2022 17:41
@luqiang-guo luqiang-guo requested review from oneflow-ci-bot and removed request for oneflow-ci-bot February 14, 2022 04:45
@github-actions
Copy link
Contributor

Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.

@luqiang-guo luqiang-guo requested review from oneflow-ci-bot and removed request for oneflow-ci-bot February 14, 2022 05:14
@github-actions
Copy link
Contributor

Speed stats:
GPU Name: GeForce GTX 1080 

✔️ OneFlow resnet50 time: 129.0ms (= 12905.0ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 140.0ms (= 14004.6ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.09 (= 140.0ms / 129.0ms)

❌ OneFlow resnet50 time: 79.3ms (= 7927.1ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 85.5ms (= 8546.9ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.08 (= 85.5ms / 79.3ms)

OneFlow resnet50 time: 52.9ms (= 10584.9ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 57.4ms (= 11470.3ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.08 (= 57.4ms / 52.9ms)

OneFlow resnet50 time: 44.4ms (= 8888.0ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 44.3ms (= 8858.4ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.00 (= 44.3ms / 44.4ms)

OneFlow resnet50 time: 40.7ms (= 8136.7ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 41.7ms (= 8346.3ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.03 (= 41.7ms / 40.7ms)

✔️ OneFlow resnet50 time: 142.3ms (= 14226.0ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 161.1ms (= 16106.7ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.13 (= 161.1ms / 142.3ms)

OneFlow resnet50 time: 87.8ms (= 8784.4ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 100.1ms (= 10011.0ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.14 (= 100.1ms / 87.8ms)

OneFlow resnet50 time: 60.8ms (= 12155.5ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 75.2ms (= 15035.5ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.24 (= 75.2ms / 60.8ms)

OneFlow resnet50 time: 51.1ms (= 10226.8ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 64.6ms (= 12920.4ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.26 (= 64.6ms / 51.1ms)

OneFlow resnet50 time: 48.6ms (= 9729.2ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 61.0ms (= 12198.0ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.25 (= 61.0ms / 48.6ms)

@oneflow-ci-bot oneflow-ci-bot removed their request for review February 14, 2022 07:01
@luqiang-guo luqiang-guo requested review from oneflow-ci-bot and removed request for oneflow-ci-bot February 14, 2022 13:22
@oneflow-ci-bot oneflow-ci-bot requested review from oneflow-ci-bot and removed request for oneflow-ci-bot February 14, 2022 15:13
@github-actions
Copy link
Contributor

Speed stats:
GPU Name: GeForce GTX 1080 

✔️ OneFlow resnet50 time: 129.0ms (= 12897.3ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 140.2ms (= 14020.7ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.09 (= 140.2ms / 129.0ms)

✔️ OneFlow resnet50 time: 77.5ms (= 7752.5ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 85.0ms (= 8498.1ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.10 (= 85.0ms / 77.5ms)

OneFlow resnet50 time: 52.5ms (= 10495.9ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 55.5ms (= 11094.6ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.06 (= 55.5ms / 52.5ms)

OneFlow resnet50 time: 43.1ms (= 8625.6ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 47.5ms (= 9502.5ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.10 (= 47.5ms / 43.1ms)

OneFlow resnet50 time: 39.6ms (= 7915.6ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 40.9ms (= 8170.1ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.03 (= 40.9ms / 39.6ms)

✔️ OneFlow resnet50 time: 142.0ms (= 14197.9ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 159.0ms (= 15898.1ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.12 (= 159.0ms / 142.0ms)

OneFlow resnet50 time: 89.8ms (= 8980.2ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 102.3ms (= 10234.5ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.14 (= 102.3ms / 89.8ms)

OneFlow resnet50 time: 60.2ms (= 12042.5ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 73.0ms (= 14604.8ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.21 (= 73.0ms / 60.2ms)

OneFlow resnet50 time: 51.3ms (= 10263.5ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 64.5ms (= 12899.6ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.26 (= 64.5ms / 51.3ms)

OneFlow resnet50 time: 51.7ms (= 10347.0ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 57.7ms (= 11541.3ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.12 (= 57.7ms / 51.7ms)

@oneflow-ci-bot oneflow-ci-bot removed their request for review February 14, 2022 17:10
@oneflow-ci-bot oneflow-ci-bot merged commit 0dc88d5 into master Feb 14, 2022
@oneflow-ci-bot oneflow-ci-bot deleted the dev_add_onednn_binary branch February 14, 2022 17:12
marigoold pushed a commit that referenced this pull request Mar 15, 2022
* add

* merge master

* Solve the thread pool problem

* add device  local logical cores

* fix error

* Delete threadpool

* fix include file

* fix clang -lopm

* fix  clang error omp.h

* fix omp cmake

* omp.h

* fix #ifdef

* test clang13 -lomp

* test -fopenmp

* add fopenmp

* rename OMP_FLAGS

* static analysis libopm-12-dev

* add tbb

* refien

* refine

* refine

* refine

* revert

* add tbb

* success  add tbb

* tbb onednn ok

* fix ninja onednn

* component

* install tbb include file

* updata tbb master zip

* fix md5

* refine

* refjine

* fix

* cmake option

* modified  clang 10 OMP

* add line

* fix add OMP flags

* fix tbb

* fix

* fix

* fix'

* fix

* fix

* fix OF_RUNTIME_TBB

* fix

* modified binary op

* fix

* fix

* fux error

* fix

* fix

* fix

* refine

* refine

* fix

* add seq

* refine

* fix

* fix

* fix

* add set_num_threads

* fix

* fi

* fix  error

* fix

* refine

* refine

* fix

* refine

* fix

* refine

* refine

* refine

* refine

* refine

* fix

* refine

* fix

* fix

* fix

* fix

* fix

* refine

* refine

* refine

* refine

* refine

* refine

* refine

* fix

* fix

* fix

* refine

* refine

* auto format by CI

* fix

* rename  mm_, dynamic_cast

* auto format by CI

* fix MAKE_NEW_ONEDNN_BROADCAST_ELEMENTWISE_BINARY_COMPARASION_AND_LOGICAL_ENTRY

* fix 0-dim tensor

* fix onednn format tag

* auto format by CI

Co-authored-by: jackalcooper <jackalcooper@gmail.com>
Co-authored-by: oneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants