Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up the training #9278

Merged
merged 24 commits into from Nov 17, 2022
Merged

Speed up the training #9278

merged 24 commits into from Nov 17, 2022

Conversation

Yipeng1994
Copy link
Contributor

@Yipeng1994 Yipeng1994 commented Oct 20, 2022

Mentioned in https://github.com/Oneflow-Inc/OneTeam/issues/1735, some operators might need to be run as late as possible since they have a large activation time in cpu.
In this feature, we move those operators backward and reduce the idle time in cuda by 40% (11.5ms -> 7ms per iteration)

Currently no obvious speeding up

@mergify mergify bot mentioned this pull request Nov 14, 2022
@github-actions
Copy link
Contributor

Speed stats:

@github-actions
Copy link
Contributor

Speed stats:
GPU Name: GeForce GTX 1080 









❌ OneFlow resnet50 time: 139.5ms (= 13950.6ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 160.3ms (= 16032.6ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.15 (= 160.3ms / 139.5ms)

OneFlow resnet50 time: 84.9ms (= 8491.9ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 108.3ms (= 10834.4ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.28 (= 108.3ms / 84.9ms)

OneFlow resnet50 time: 57.6ms (= 11529.5ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 77.3ms (= 15463.3ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.34 (= 77.3ms / 57.6ms)

OneFlow resnet50 time: 44.8ms (= 8961.5ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 69.6ms (= 13926.1ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.55 (= 69.6ms / 44.8ms)

OneFlow resnet50 time: 40.0ms (= 8000.5ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 64.6ms (= 12917.4ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.61 (= 64.6ms / 40.0ms)

@github-actions
Copy link
Contributor

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/9278/

@mergify mergify bot merged commit ab9d76c into master Nov 17, 2022
@mergify mergify bot deleted the feat-speed_up-throughput branch November 17, 2022 05:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants