Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize communication within the same rank #134

Merged
merged 1 commit into from
Oct 9, 2019

Conversation

masterleinad
Copy link
Collaborator

We typically have large message sizes for communication within the same rank. Hence, this also fixes problems with message sizes MPI can't handler within one MPI communication call. Also, see #131.

auto const receive_buffer_ptr =
src_buffer.data() + _src_offsets[i] * num_packets;
requests.emplace_back();
MPI_Irecv(receive_buffer_ptr, message_size, MPI_BYTE, _sources[i], 123,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this just a good habit to fix the tag here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am always wary of fixing an MPI tag because you never know what a user will pick and if things will hang if they happen to be the same - typically I try to expose this to the user in an interface for code that does MPI communication with a default value provided

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typically I try to expose this to the user in an interface for code that does MPI communication with a default value provided

How do you do that? Do all your communications use the same tag?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case I would expose in either the distributed tree constructor or send across network - basically any function that does MPI. If you have a single function that does multiple MPI communications which need different tags then you could ask for two or more tags or a range of tags that are acceptable for the library to use. See this for example: https://github.com/ECP-copa/Cabana/blob/89240fe1a605589d7c0d925b41b3b57bc5586459/core/src/Cabana_Distributor.hpp#L100

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't a much better and cleaner solution to do MPI_Comm_dup as that creates a new namespace so that any tag you use won't intersect with user communication? I think in fact this should be done in ArborX.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as the object is long-lived that sounds reasonable. Either way we shouldn't use a fixed tag with a user-provided communicator. Either the approach I suggested or the approach @aprokop suggested will resolve this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comm duplication was merged in #135.

auto const position = it - _sources.begin();
auto const receive_buffer_ptr =
src_buffer.data() + _src_offsets[position] * num_packets;
std::memcpy(receive_buffer_ptr, send_buffer_ptr, message_size);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you guys using GPU aware MPI? If you are, this won't work. If you aren't, we should have a discussion about that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But src_buffer is a std::vector in the current implementation so the change proposed are valid.

Copy link
Contributor

@sslattery sslattery Oct 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough - lets make a note to discuss gpu-aware strategy in the future then.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// TODO
// * apply permutation on the device in a parallel for
// * switch to MPI with CUDA support (do not copy to host)

@aprokop aprokop added the bug Something isn't working label Oct 9, 2019
@dalg24 dalg24 merged commit 36f7f35 into arborx:master Oct 9, 2019
@masterleinad masterleinad deleted the avoid_selfcommunicatuin branch October 9, 2019 17:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants