Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add use_hierarchical_allreduce for DistributedFusedLAMB #44821

Merged
merged 2 commits into from
Aug 3, 2022

Conversation

sneaxiy
Copy link
Collaborator

@sneaxiy sneaxiy commented Aug 2, 2022

PR types

Performance optimization

PR changes

OPs

Describe

Add use_hierarchical_allreduce for DistributedFusedLAMB.

假设有N个节点,每个节点有M张GPU卡。当打开use_hierarchical_allreduce=True和设置nproc_per_node=M后,会建立链两个通信组:

  • 第i + k * M (k = 0, 1, ..., N-1)号卡建立一个通信组A。现在该通信组A上做allreduce。
  • 每个节点内的GPU卡建立一个通信组B。然后在通信组B上做allreduce。

@paddle-bot
Copy link

paddle-bot bot commented Aug 2, 2022

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

Copy link
Contributor

@FeixLiu FeixLiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sneaxiy sneaxiy assigned sneaxiy and unassigned sneaxiy Aug 2, 2022
@sneaxiy sneaxiy requested a review from XieYunshen August 3, 2022 01:58
@sneaxiy sneaxiy merged commit c770053 into PaddlePaddle:develop Aug 3, 2022
@sneaxiy sneaxiy deleted the add_allreduce_opt branch August 3, 2022 02:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants