-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize communication within the same rank #134
Conversation
auto const receive_buffer_ptr = | ||
src_buffer.data() + _src_offsets[i] * num_packets; | ||
requests.emplace_back(); | ||
MPI_Irecv(receive_buffer_ptr, message_size, MPI_BYTE, _sources[i], 123, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this just a good habit to fix the tag here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am always wary of fixing an MPI tag because you never know what a user will pick and if things will hang if they happen to be the same - typically I try to expose this to the user in an interface for code that does MPI communication with a default value provided
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typically I try to expose this to the user in an interface for code that does MPI communication with a default value provided
How do you do that? Do all your communications use the same tag?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case I would expose in either the distributed tree constructor or send across network - basically any function that does MPI. If you have a single function that does multiple MPI communications which need different tags then you could ask for two or more tags or a range of tags that are acceptable for the library to use. See this for example: https://github.com/ECP-copa/Cabana/blob/89240fe1a605589d7c0d925b41b3b57bc5586459/core/src/Cabana_Distributor.hpp#L100
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't a much better and cleaner solution to do MPI_Comm_dup
as that creates a new namespace so that any tag you use won't intersect with user communication? I think in fact this should be done in ArborX.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As long as the object is long-lived that sounds reasonable. Either way we shouldn't use a fixed tag with a user-provided communicator. Either the approach I suggested or the approach @aprokop suggested will resolve this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comm duplication was merged in #135.
auto const position = it - _sources.begin(); | ||
auto const receive_buffer_ptr = | ||
src_buffer.data() + _src_offsets[position] * num_packets; | ||
std::memcpy(receive_buffer_ptr, send_buffer_ptr, message_size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you guys using GPU aware MPI? If you are, this won't work. If you aren't, we should have a discussion about that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But src_buffer
is a std::vector
in the current implementation so the change proposed are valid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough - lets make a note to discuss gpu-aware strategy in the future then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ArborX/src/details/ArborX_DetailsDistributor.hpp
Lines 147 to 149 in b6b7d7e
// TODO | |
// * apply permutation on the device in a parallel for | |
// * switch to MPI with CUDA support (do not copy to host) |
We typically have large message sizes for communication within the same rank. Hence, this also fixes problems with message sizes
MPI
can't handler within one MPI communication call. Also, see #131.