New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix potential zero-copy for primarynamespace::bulk_service_async et.al. #1543
Conversation
// ) | ||
// { | ||
// this->base_type::bulk_service_non_blocking(this->get_gid(), reqs, priority); | ||
// } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'll remove the code.
This is related to #1521 |
I'm not sure what this is supposed to fix. How is the cb different to what is already in place in |
This change turns one of the internal |
Yes, that's what I can read from the code as well, but how is that |
As a far as I remember, we have no such mechanism in place, do we? |
The parcel is being kept alive until after it has been completely sent. |
If I understand the situation correctly, this is not true. The parcel stays alive indeed, but zero-copy-serialized arguments are not embedded in the parcel, but just referenced by it. Those arguments go out of scope as soon as |
I think this analysis of the problem is wrong. First of all, we don't have
The proposed patch doesn't take care of the potential problems in 2 or the |
Are you saying that there is no potential problem when |
That is what I'm trying to get at, yes. Do we have a unit/regression test |
You know as well as I do that reproducing this issue is very difficult. However, I will try to create a test. All we know is that this patch improved the problems @biddisco was seeing. Besides, even if this patch turns out not to do any good, it definitely does not do any harm either. |
Yes, I know that reproducing this is hard, yet, there hasn't been a single test failure within our existing teststuite that shows that problem (the TCP parcelport is sending fully asynchronous, the MPI one is currently sending synchronous if there are no futures which aren't ready yet involved). That's the primary reason why I am highly suspicous that zero copy serialization is to blame here. |
stubs::primary_namespace::bulk_service_async( | ||
(*it).first, (*it).second, action_priority_)); | ||
(*it).first, (*it).second, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even though the requests are stored in a shared pointer which is kept alive by the callback function, the requests are copied here. The actual requests that are part of the asynchronous communication are not being held alive, by the bound callback function.
That's correct, that patch, if I am not completely mistaken, is essentially a noop associated with the cost of allocating memory and atomic increments/decrements of the shared_ptr reference counts. |
This fixes one instance where zero-copy serialization was interfering with a plain async invocation.