Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hot fix trunc_normal_ bug #9711

Merged
merged 7 commits into from Jan 7, 2023
Merged

hot fix trunc_normal_ bug #9711

merged 7 commits into from Jan 7, 2023

Conversation

BBuf
Copy link
Contributor

@BBuf BBuf commented Jan 6, 2023

修复 trunc_normal_ 实现 bug。

#close https://github.com/Oneflow-Inc/OneTeam/issues/1867

分布测试结果:

torch:

图片

本pr:

图片

oneflow master:

图片

@BBuf BBuf changed the title fix trunc_normal_ bug hot fix trunc_normal_ bug Jan 6, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Jan 6, 2023

Speed stats:

@github-actions
Copy link
Contributor

github-actions bot commented Jan 6, 2023

Speed stats:
GPU Name: GeForce GTX 1080 









❌ OneFlow resnet50 time: 140.3ms (= 14030.2ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 162.0ms (= 16202.0ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.15 (= 162.0ms / 140.3ms)

OneFlow resnet50 time: 85.7ms (= 8571.7ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 111.7ms (= 11167.0ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.30 (= 111.7ms / 85.7ms)

OneFlow resnet50 time: 58.3ms (= 11658.1ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 78.9ms (= 15771.4ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.35 (= 78.9ms / 58.3ms)

OneFlow resnet50 time: 46.6ms (= 9320.4ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 78.3ms (= 15650.0ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.68 (= 78.3ms / 46.6ms)

OneFlow resnet50 time: 41.1ms (= 8229.7ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 68.8ms (= 13755.9ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.67 (= 68.8ms / 41.1ms)

@github-actions
Copy link
Contributor

github-actions bot commented Jan 6, 2023

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/9711/

@github-actions
Copy link
Contributor

github-actions bot commented Jan 6, 2023

CI failed when running job: cuda-module. PR label automerge has been removed

@github-actions
Copy link
Contributor

github-actions bot commented Jan 7, 2023

Speed stats:
GPU Name: GeForce GTX 1080 









❌ OneFlow resnet50 time: 139.8ms (= 13978.2ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 161.2ms (= 16119.2ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.15 (= 161.2ms / 139.8ms)

OneFlow resnet50 time: 85.2ms (= 8521.7ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 103.3ms (= 10326.5ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.21 (= 103.3ms / 85.2ms)

OneFlow resnet50 time: 57.7ms (= 11545.4ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 78.6ms (= 15715.9ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.36 (= 78.6ms / 57.7ms)

OneFlow resnet50 time: 44.4ms (= 8883.8ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 70.1ms (= 14014.7ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.58 (= 70.1ms / 44.4ms)

OneFlow resnet50 time: 40.1ms (= 8016.4ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 67.3ms (= 13454.0ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.68 (= 67.3ms / 40.1ms)

@github-actions
Copy link
Contributor

github-actions bot commented Jan 7, 2023

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/9711/

@github-actions github-actions bot removed the automerge label Jan 7, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Jan 7, 2023

CI failed when running job: cuda-misc. PR label automerge has been removed

1 similar comment
@github-actions
Copy link
Contributor

github-actions bot commented Jan 7, 2023

CI failed when running job: cuda-misc. PR label automerge has been removed

@github-actions
Copy link
Contributor

github-actions bot commented Jan 7, 2023

Speed stats:
GPU Name: GeForce GTX 1080 









❌ OneFlow resnet50 time: 140.1ms (= 14013.4ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 162.2ms (= 16218.2ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.16 (= 162.2ms / 140.1ms)

OneFlow resnet50 time: 85.7ms (= 8568.0ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 103.7ms (= 10373.4ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.21 (= 103.7ms / 85.7ms)

OneFlow resnet50 time: 58.3ms (= 11669.5ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 79.8ms (= 15965.3ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.37 (= 79.8ms / 58.3ms)

OneFlow resnet50 time: 44.3ms (= 8860.7ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 77.8ms (= 15557.6ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.76 (= 77.8ms / 44.3ms)

OneFlow resnet50 time: 42.6ms (= 8523.7ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 67.9ms (= 13585.6ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.59 (= 67.9ms / 42.6ms)

@github-actions
Copy link
Contributor

github-actions bot commented Jan 7, 2023

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/9711/

@github-actions
Copy link
Contributor

github-actions bot commented Jan 7, 2023

CI failed when running job: cuda-misc. PR label automerge has been removed

@github-actions github-actions bot removed the automerge label Jan 7, 2023
@BBuf BBuf requested review from oneflow-ci-bot and removed request for oneflow-ci-bot January 7, 2023 07:47
@github-actions
Copy link
Contributor

github-actions bot commented Jan 7, 2023

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/9711/

@github-actions
Copy link
Contributor

github-actions bot commented Jan 7, 2023

Speed stats:
GPU Name: GeForce GTX 1080 









❌ OneFlow resnet50 time: 140.3ms (= 14025.6ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 163.6ms (= 16358.7ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.17 (= 163.6ms / 140.3ms)

OneFlow resnet50 time: 86.3ms (= 8631.2ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 102.6ms (= 10263.3ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.19 (= 102.6ms / 86.3ms)

OneFlow resnet50 time: 58.3ms (= 11664.1ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 79.7ms (= 15930.0ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.37 (= 79.7ms / 58.3ms)

OneFlow resnet50 time: 44.4ms (= 8886.4ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 70.5ms (= 14095.5ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.59 (= 70.5ms / 44.4ms)

OneFlow resnet50 time: 40.2ms (= 8042.0ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 68.8ms (= 13766.0ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.71 (= 68.8ms / 40.2ms)

@mergify mergify bot merged commit 82ce240 into master Jan 7, 2023
@mergify mergify bot deleted the hot_fix_trunc_normal_bug branch January 7, 2023 15:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants