Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only gradient acc be scheduled in parallel. #5926

Merged
merged 3 commits into from Aug 18, 2021
Merged

Conversation

hjchen2
Copy link
Contributor

@hjchen2 hjchen2 commented Aug 17, 2021

解决ddp将整个后向过程并发调度导致的内存增加问题,解决方法是:将并发调度只局限在最后的梯度累加阶段。

最终测试结果:

  • ddp
:-: mem (M) test 1 (ms) test 2 (ms) test 3 (ms)
Pytorch 1701 47.5 47.1 46.6
Oneflow 1043 52.7 53.4 53.2
Relative 0.90 0.88 0.88
Pytorch 1701 45.2 46.5 44.5
Oneflow-PR 1043 41.8 51.0 50.1
Relative 1.08 0.91 0.89
  • 单卡训练
:-: mem (M, bs=32)
Pytorch 4409
Oneflow 5493
Oneflow-PR 4023

该PR应该对DDP速度几乎没有影响。此外比较奇怪的是建浩给的DDP速度对比脚本测出的显存比pytorch低,该PR对ddp显存占用没影响,猜测的原因是这个DDP脚本每次执行都同步过,导致其并发受到限制。

@yuanms2
Copy link
Contributor

yuanms2 commented Aug 17, 2021

这个PR是解决 eager 比 graph 显存占用多的问题吗

https://github.com/Oneflow-Inc/OneTeam/issues/538

@lixinqi
Copy link
Contributor

lixinqi commented Aug 17, 2021

是解决ddp占用内存多的问题。

@hjchen2
Copy link
Contributor Author

hjchen2 commented Aug 17, 2021

这个PR是解决 eager 比 graph 显存占用多的问题吗

Oneflow-Inc/OneTeam#538

是的,这个PR应该可以降低DDP在真实场景下的内存占用。

@github-actions
Copy link
Contributor

Speed stats:
GPU Name: GeForce GTX 1080 

PyTorch resnet50 time: 143.9ms (= 7192.9ms / 50, input_shape=[16, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 128.9ms (= 6444.2ms / 50, input_shape=[16, 3, 224, 224], backward is enabled)
Relative speed: 1.12 (= 143.9ms / 128.9ms)

PyTorch resnet50 time: 82.9ms (= 4146.5ms / 50, input_shape=[8, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 76.4ms (= 3822.3ms / 50, input_shape=[8, 3, 224, 224], backward is enabled)
Relative speed: 1.08 (= 82.9ms / 76.4ms)

PyTorch resnet50 time: 57.5ms (= 2876.1ms / 50, input_shape=[4, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 48.6ms (= 2428.1ms / 50, input_shape=[4, 3, 224, 224], backward is enabled)
Relative speed: 1.18 (= 57.5ms / 48.6ms)

PyTorch resnet50 time: 48.1ms (= 2404.3ms / 50, input_shape=[2, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 39.5ms (= 1976.7ms / 50, input_shape=[2, 3, 224, 224], backward is enabled)
Relative speed: 1.22 (= 48.1ms / 39.5ms)

PyTorch resnet50 time: 42.4ms (= 2122.5ms / 50, input_shape=[1, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 40.7ms (= 2032.7ms / 50, input_shape=[1, 3, 224, 224], backward is enabled)
Relative speed: 1.04 (= 42.4ms / 40.7ms)

@oneflow-ci-bot oneflow-ci-bot removed their request for review August 17, 2021 10:11
@hjchen2 hjchen2 requested a review from lixinqi August 17, 2021 12:17
@github-actions
Copy link
Contributor

Speed stats:
GPU Name: GeForce GTX 1080 

PyTorch resnet50 time: 138.6ms (= 6930.3ms / 50, input_shape=[16, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 128.3ms (= 6413.5ms / 50, input_shape=[16, 3, 224, 224], backward is enabled)
Relative speed: 1.08 (= 138.6ms / 128.3ms)

PyTorch resnet50 time: 83.6ms (= 4180.1ms / 50, input_shape=[8, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 74.5ms (= 3723.2ms / 50, input_shape=[8, 3, 224, 224], backward is enabled)
Relative speed: 1.12 (= 83.6ms / 74.5ms)

PyTorch resnet50 time: 59.0ms (= 2949.3ms / 50, input_shape=[4, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 47.1ms (= 2356.3ms / 50, input_shape=[4, 3, 224, 224], backward is enabled)
Relative speed: 1.25 (= 59.0ms / 47.1ms)

PyTorch resnet50 time: 48.3ms (= 2412.8ms / 50, input_shape=[2, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 38.1ms (= 1903.6ms / 50, input_shape=[2, 3, 224, 224], backward is enabled)
Relative speed: 1.27 (= 48.3ms / 38.1ms)

PyTorch resnet50 time: 44.5ms (= 2224.0ms / 50, input_shape=[1, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 38.6ms (= 1931.7ms / 50, input_shape=[1, 3, 224, 224], backward is enabled)
Relative speed: 1.15 (= 44.5ms / 38.6ms)

@oneflow-ci-bot oneflow-ci-bot merged commit b5a1a80 into master Aug 18, 2021
@oneflow-ci-bot oneflow-ci-bot deleted the dev_ddp_opt_mem branch August 18, 2021 02:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants