-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix communication for large message sizes #131
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree with the intent of this PR. I think we should just throw in this situation, and not allow things to continue. If user's code is trying to send a message that large, we should advocate for rethinking their approach.
If you guys did want to do this, however, I would advocate for a more efficient asynchronous strategy for pack-unpack of multiple messages to avoid some latency costs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a comment that you are splitting up the message in chunks when it exceeds the maximum size
No I do not agree. This is the correct fix for now. |
Clang build is hanging and others are broken. @masterleinad please investigate. |
d946712
to
6b03a84
Compare
@dalg24 I fixed the implementation. @aprokop @sslattery So you are saying that the failing setup isn't reasonable at all? Surely, it is an edge case, but I don't see a reason to not support it. The current implementation also makes it possible to change the message size quite easily in case we want to play with it.
@sslattery Are you thinking about compressing via |
No I was referring to an overlapping pack/unpack strategy when multiple messages are in play to hide the latencies of the pack/unpack kernels on GPU systems. This is fine for now - we can optimize later as needed. |
I don't see why we should throw? Why do you think that is not a valid case @aprokop ? |
I agree with @dalg24 and @masterleinad after more thought - it should work for huge messages even if you're in a regime of bad performance |
@masterleinad Please confirm that the message that exceed the maximum size is sent by the rank to itself. |
I am not quite sure what you are asking for, but I checked (manually) that the tests pass if |
An additional concern is that the message sends/receives in this patch are not word-aligned. This could potentially be influencing some of MPI machinery, including the choice of a protocol to communicate data. I'm not approving this until this is researched. |
@aprokop is right - we might need an LDRD before we can merge |
|
@dalg24 Yes, in case |
@aprokop would you be happier if we filter out self-communication instead? |
Why do we even use MPI for self-communication? |
Code simplicity if you don't treat self-commination explicitly. |
So, yes, in short, just do the change where you filter out self-communication, and put in place an assert for non-self communication. |
@masterleinad or @aprokop |
But only in release mode. 😉 So I was lying: there is communication to another rank for the scenario described above. |
2aac7d1
to
c8d2048
Compare
The last commit avoids communication if an MPI rank tries to send to itself. For the scenario above that gives like 5% performance improvement for knn. |
Please elaborate. I am a bit surprised we hit the limit on communication with neighboring processes. |
No, another error on my side. I was checking the wrong array for the second check (when sending). It's actually just the communication with the same rank that is that large. |
Then drop the splitting into chunks. |
Didn't we agree to cover that edge case? |
There is no consensus. Come back with problem setting that triggers it and does not hit other problems like overflowing integers used to index views and we'll revisit :) |
I agree with @dalg24 - if we have a test case where non-self-communication breaks the integer size limit then we will need to implement the chunking capability. Should we then consider throwing so we easily detect this scenario if it ever arises, as unexpected as it may be? |
yes. |
Closing in favor of #134. |
For
I was getting an MPI error telling me that the message size was too large. This pull requests splits the message into multiple smaller ones in case we need to send more than
std::numeric_limits<int>::max()
bytes.