New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Only gradient acc be scheduled in parallel. #5926
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
lixinqi
reviewed
Aug 17, 2021
这个PR是解决 eager 比 graph 显存占用多的问题吗 |
是解决ddp占用内存多的问题。 |
是的,这个PR应该可以降低DDP在真实场景下的内存占用。 |
Speed stats:
|
daquexian
approved these changes
Aug 18, 2021
Speed stats:
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
解决ddp将整个后向过程并发调度导致的内存增加问题,解决方法是:将并发调度只局限在最后的梯度累加阶段。
最终测试结果:
该PR应该对DDP速度几乎没有影响。此外比较奇怪的是建浩给的DDP速度对比脚本测出的显存比pytorch低,该PR对ddp显存占用没影响,猜测的原因是这个DDP脚本每次执行都同步过,导致其并发受到限制。