Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix reduce func bug in process_group_bkcl #49749

Merged
merged 2 commits into from Jan 12, 2023

Conversation

XiaociZhang
Copy link
Contributor

PR types

Bug fixes

PR changes

Others

Describe

Fix reduce func bug in process_group_bkcl. Also catch up with a recent process_group PR that failed to add XPU branch. Note that reduce is still accomplished by allreduce for xpu. Fix this should xccl lib be updated.

Also catch up with a recent process_group PR that failed to add XPU branch.
Note that reduce is still accomplished by allreduce for xpu. Fix this should
xccl lib be updated.
@paddle-bot
Copy link

paddle-bot bot commented Jan 11, 2023

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@paddle-bot-old paddle-bot-old bot added the contributor External developers label Jan 11, 2023
Copy link
Contributor

@taixiurong taixiurong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -277,7 +277,8 @@ std::shared_ptr<ProcessGroup::Task> ProcessGroupBKCL::Reduce(
const phi::DenseTensor& input,
BKCLContext_t comm,
const XPUStream& stream) {
phi::DenseTensor output_t(*output);
phi::DenseTensor output_t;
paddle::framework::TensorCopy(*output, platform::XPUPlace(), &output_t);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TensorCopy会自己分配空间,对吧?
之后可以考虑直接实现一个Reduce,在模型并行sharding中Reduce用的比较多;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

会自己分配空间,reduce等xccl kernel实现后我更新一下

@QingshuChen QingshuChen merged commit 8e291bf into PaddlePaddle:develop Jan 12, 2023
@paddle-bot
Copy link

paddle-bot bot commented Jan 12, 2023

你的PR已合入Paddle库,请关注后续测试结果。
Your PR has been merged into the repository. An official integration test will be conducted later. Stay tuned.

yjjiang11 pushed a commit to yjjiang11/Paddle that referenced this pull request Jan 13, 2023
* Fix reduce func bug in process_group_bkcl

Also catch up with a recent process_group PR that failed to add XPU branch.
Note that reduce is still accomplished by allreduce for xpu. Fix this should
xccl lib be updated.

* fix compile issue for non-XPU
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contributor External developers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants