-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding allreduce for ndarray #234
Comments
And this is my case (similar to that of batch_normalization). I had to handle |
@Hakuyume as #237 has been merged, this issue will be addressed at 1.3.0 release. So far only communicator will support allreduce, but do you think you need functions like AllGather ? To see real use case, the URL of your code ( https://github.com/Hakuyume/chainercv/blob/ssd-multi-gpu-train/chainercv/links/model/ssd/multibox_loss.py#L91-L95 ) has been lost 404 so could you provide latest allreduce code if you still have it somewhere open? |
@kuenishi Thank you for supporting allreduce!
For my case, function classes are not required.
I'm sorry for removing the branch. Here is the latest code. |
Glad to hear that. So far I'll be closing this issue for cleaning up the milestone list, but feel free to reopen this when you think you need more. |
Yes, it is not just documented, but |
Thank you. |
Actually it wasn't implemented for cupy.ndarray and now #293 should fix it. |
The allreduce function doesn't seem to use NCCL or am I missing something? |
Hi @ankahira , Besides, ChainerMN's PureNcclCommunicator uses NCCL. https://github.com/chainer/chainer/blob/master/chainermn/communicators/pure_nccl_communicator.py#L180 |
I want
CommunicatorBase.allreduce
method that can work with numpy/cupy.ndarray.Although it can be archived by using
comm.mpi_comm.A(a)llreduce
, it has two problems.MPI
module.comm.mpi_comm.A(a)llreduce
requiresMPI
module imported, i.e.MPI.IN_PLACE
orMPI.SUM
.However, we need import
MPI
lazily (I don't know the reason, but ChainerMN's examples do so).With a wrapper function, handling
MPI
module can be hidden.For cupy.ndarray, there are two backends; mpi and nccl.
However, using
comm.mpi_comm
cannot make use of nccl.It is better to select the backend automatically.
For example,
https://github.com/chainer/chainermn/blob/master/chainermn/functions/batch_normalization.py#L121-L125 can be replaced with
comm.allreduce(tmp)
and we can removeself.memory_utility_module, self.mpi_module
.The text was updated successfully, but these errors were encountered: