Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Device Event Creation #57574

Conversation

eee4017
Copy link
Contributor

@eee4017 eee4017 commented Sep 21, 2023

PR types

Bug fixes

PR changes

Others

Description

In the Paddle framework, DeviceEvent are currently created using the cudaEventDefault flag. However, these events are exclusively used for synchronization purposes and not for timing measurements. Therefore, it would be more appropriate to create these events with the cudaEventDisableTiming flag. This PR is intended to correct the flag used during CUDA event creation. Additionally, the existing default flag for DeviceEvent is ambiguously set to 0. To improve clarity and prevent misunderstandings, this default flag has been removed. Instead, the flag is now explicitly generated using the GenerateDeviceEventFlag() function.

@paddle-bot
Copy link

paddle-bot bot commented Sep 21, 2023

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@paddle-bot paddle-bot bot added the contributor External developers label Sep 21, 2023
@jeng1220
Copy link
Collaborator

@Xreki,

This PR should resolve the problem of NCCL not being able to overlap with compute kernels if tensor parallelism is enabled.

@eee4017
Copy link
Contributor Author

eee4017 commented Sep 22, 2023

It was mentioned that once the CUDA_DEVICE_MAX_CONNECTIONS flag is set, overlapping operations become entirely unfeasible. This limitation arises from the blocking introduced by the use of cudaEventDefault, since there's an inherent conflict between CUDA_DEVICE_MAX_CONNECTIONS and cudaEventDefault. We can configure the event to utilize cudaEventDisableTiming, so we could use CUDA_DEVICE_MAX_CONNECTIONS.

@GhostScreaming
Copy link
Contributor

GhostScreaming commented Sep 22, 2023

Origin Timeline(use paddle.ones to make all_reduce(dx) occupying SM before matmul(dw))
8a2605805222f2dc40382ddb4aaec8b5
Timeline with fix PR(remove paddle.ones, and export CUDA_DEVICE_MAX_CONNECTIONS=1
e89d1c6657cdc36a0d75cb1146704b9b

It fixes the problem of mp_async_allreduce overlap failure.

Copy link
Contributor

@GhostScreaming GhostScreaming left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR fix the problem of mp_async_allreduce overlap failure. PR-CI-ROCM-Compile compilation bug needs to be fixed.
image

@CLAassistant
Copy link

CLAassistant commented Sep 22, 2023

CLA assistant check
All committers have signed the CLA.

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ eee4017
❌ gongweibao


gongweibao seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@jeng1220
Copy link
Collaborator

@eee4017 ,

PR-CI-ROCM-Compile:

2023-09-23 00:18:56 FAILED: paddle/fluid/platform/CMakeFiles/device_event_test.dir/device_event_test.cc.o
...
2023-09-23 00:18:56 ../paddle/fluid/platform/device_event_test.cc: In member function 'virtual void DeviceEvent_CUDA_Test::TestBody()':
2023-09-23 00:18:56 ../paddle/fluid/platform/device_event_test.cc:89:26: error: no matching function for call to 'paddle::platform::DeviceEvent::DeviceEvent(phi::GPUPlace&)'
2023-09-23 00:18:56    DeviceEvent event(place);

@eee4017
Copy link
Contributor Author

eee4017 commented Sep 25, 2023

This branch was cherry-picked by #57656. Close this PR.

@eee4017 eee4017 closed this Sep 25, 2023
@Xreki
Copy link
Contributor

Xreki commented Sep 25, 2023

Hi @eee4017 , #57656 是将该PR的修改cherry-pick到incubate/new_frl分支,develop分支仍需求合入该PR。

@eee4017 eee4017 reopened this Sep 25, 2023
Copy link
Contributor

@GhostScreaming GhostScreaming left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line 89, DeviceEvent event(place) -> DeviceEvent event(place, paddle::platform::GenerateDeviceEventFlag().

@eee4017 eee4017 force-pushed the fix_cuda_device_max_connection_with_event_create_flag branch from a397713 to 2c28c92 Compare September 25, 2023 08:01
Copy link
Contributor

@Xreki Xreki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Great work!

@Xreki Xreki merged commit d55bb44 into PaddlePaddle:develop Sep 26, 2023
27 checks passed
Frida-a pushed a commit to Frida-a/Paddle that referenced this pull request Oct 14, 2023
* Fix Device Event Creation

* Fix Device Event Test

* fix test
jiahy0825 pushed a commit to jiahy0825/Paddle that referenced this pull request Oct 16, 2023
* Fix Device Event Creation

* Fix Device Event Test

* fix test
danleifeng pushed a commit to danleifeng/Paddle that referenced this pull request Nov 14, 2023
* Fix Device Event Creation

* Fix Device Event Test

* fix test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contributor External developers NVIDIA
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants