New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Prepare for MultiProcess xpu] unified gen nccl id, refine imperative reducer #30455
[Prepare for MultiProcess xpu] unified gen nccl id, refine imperative reducer #30455
Conversation
Thanks for your contribution! |
2c2dcd6
to
82dc202
Compare
@@ -12,7 +12,8 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | |||
See the License for the specific language governing permissions and | |||
limitations under the License. */ | |||
|
|||
#include "paddle/fluid/operators/collective/gen_nccl_id_op_helper.h" | |||
#ifdef PADDLE_WITH_NCCL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
如何兼容多种通信库,如NCCL、BKCL,尤其是当多个通信库需要同时使用时怎么保持兼容。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BKCL和NCCL接口基本一致,这个文件只给类似nccl的使用,后面用了模板
#ifdef PADDLE_WITH_NCCL
INSTANT_TEMPLATE(ncclUniqueId)
#endif
#ifdef PADDLE_WITH_XPU_BKCL
INSTANT_TEMPLATE(bkclUniqueId)
#endif
82dc202
to
e5f4e61
Compare
->stream(); | ||
auto comm_stream = | ||
platform::NCCLCommContext::Instance().Get(ring_id, place_)->stream(); | ||
auto event = compute_events_[ring_id].get(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add assert for ring_id < compute_events_.size()? Same as WaitComm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. add assert ring_id >=0 and ring_id < compute_events_.size
int local_rank_{0}; | ||
std::vector<std::string> trainer_endpoints_{}; | ||
std::string current_endpoint_{""}; | ||
// TODO(shenliang03): support multi stream communication |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please help shenliang to remove this line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
PADDLE_ENFORCE_CUDA_SUCCESS( | ||
platform::dynload::ncclGetUniqueId(&(*nccl_ids)[i])); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里是不是也可以和nccl_context.h那里生成nccl id的函数维持成一个。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
理论是的,准备后面再抽象一下,放到nccl helper或者nccl comm里面。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR types
Others
PR changes
Others
Describe
支持昆仑动态图多卡的准备PR(下个PR提昆仑动态图多卡支持),主要改动:
1、统一动态图和静态图gen_nccl_id,以及bkcl id的广播。方便下个PR支持昆仑动态图多卡以及静态图多进程多卡模式。
2、动态图
ParallelContext
抽取到parallel_context.h文件中,方便作为NCCLParallelContext、BKCLParallelContext、GLOOParallelContext等通信库的基类。添加
WaitCompute()
和WaitComm()
接口,用于抽取通信等待计算、计算等待通信的逻辑,保持Reducer中代码尽量与设备通信库无关。3、调整动态图reducer部分代码,尽量移除与设备相关代码