Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add device_send, device_recv, device_sendrecv, device_multicast_sendrecv #144

Merged
merged 14 commits into from
Feb 22, 2021

Conversation

seunghwak
Copy link
Contributor

  • Undo temporarily exposing a RAFT communication object's private NCCL communicator.
  • Add device_send/device_recv (if sending or receiving), device_sendrecv (if sending and receiving), device_multicast_sendrecv (if sending and receiving multiple messages).
  • Add test suites for newly added raft::comms_t routines.

}

/**
* Performs a multicast send/receive
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, is there an MPI equivalent for that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MPI has all-to-all, but AFAIK no multicast receiving data from a subset of the nodes and sending data to another subset.

And this multicast is actually ncclSend/ncclRecv operations placed inside ncclGroupStart() & ncclGroupEnd().

std::vector<size_t> const &recvoffsets,
std::vector<int> const &sources,
cudaStream_t stream) const {
// ncclSend/ncclRecv pair needs to be inside ncclGroupStart/ncclGroupEnd to avoid deadlock
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In practice do these transfers get serialized on the same stream? The API doesn't seem to allow the backend to run on multiple streams.
Thinking about the case where sendsizes.size() is large but sendizes[i] are relatively small. It would be interesting to see if this benefits from concurrency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the send/receive operations are placed inside ncclGroupStart() and ncclGroupEnd(), so AFAIK, all the send/receive operations are executed concurrently (at least logically, NCCL may or may not restrict parallelism internally to avoid congestion based on the interconnect) after ncclGroupEnd().

If you are worried about the time to queue ncclSend/ncclRecv operations (the cost of the for loops, this may become problematic if sendsizes.size() or recvsizes.size() gets very large such as millions), I am assuming that sendsizes.size() <= # of GPUs and # of GPUs may not go that large (and I guess it is a great thing if our code scales to million GPUs everywhere else and this becomes a bottleneck.... but I don't expect that will happen in foreseeable future).

@afender afender added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Feb 18, 2021
@BradReesWork
Copy link
Member

@gpucibot merge

@rapids-bot rapids-bot bot merged commit a3461b2 into rapidsai:branch-0.19 Feb 22, 2021
rapids-bot bot pushed a commit to rapidsai/cugraph that referenced this pull request Feb 25, 2021
…icast_sendrecv, gather, gatherv) (#1391)

- [x] Update cuGraph to use RAFT::comms_t's newly added device_sendrecv & device_multicast_sendrecv)
- [x] Update cuGraph to use RAFT::comms_t's newly added gather & gatherv
- [x] Update RAFT git tag once rapidsai/raft#114 (currently merged in 0.18 but is not merged to 0.19) and rapidsai/raft#144 are merged to 0.19

Ready for review but cannot be merged till RAFT PR 114 and 144 are merged to RAFT branch-0.19.

Authors:
  - Seunghwa Kang (@seunghwak)

Approvers:
  - Alex Fender (@afender)

URL: #1391
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants