-
-
Notifications
You must be signed in to change notification settings - Fork 716
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unnecessary deep copy causes memory flare on network comms #5107
Comments
The flare is caused by this line: distributed/distributed/protocol/serialize.py Line 472 in 6ecb4a0
|
This implies that all numpy buffers larger than a single frame are deep-copied upon arrival, which is wasteful. It should be possible to reassemble them directly from the network card's buffer into their final location although that may require some low level work. |
… On Thu, Jul 22, 2021 at 7:35 AM crusaderky ***@***.***> wrote:
This implies that *all* numpy buffers larger than a single frame are
deep-copied upon arrival, which is wasteful. It should be possible to
reassemble them directly from the network card's buffer into their final
location although that may require some low level work.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#5107 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACKZTBNEQBDQQCMYHKLW4TTZAGARANCNFSM5AZXSS4Q>
.
|
Worth noting that the default frame size is (I believe) 64MiB, which is a pretty small chunksize, so it's probably reasonable to think this is happening often: distributed/distributed/distributed.yaml Line 169 in 057262b
To add more detail here, since initially I thought there might be a simpler approach (there isn't): at first, I thought "instead, could we just make a non-contiguous byearray-like object, so at least we get rid of the copy here"? But that would just move the problem around, because in the end, the NumPy array needs to be contiguous. So long as we've loaded all the 64MiB pieces of the array into non-contiguous memory before we've figured out where they need to go in the end, and then it turns out we need them to become contiguous, using 2x the memory of the final array is unavoidable. I'd argue that this sort of problem is part of why Arrow exists. |
Yeah actually we already read them into a single buffer after PR ( #4506 ). There still may be some copying due to how Tornado is buffering communication, but it should be limited. That said, it sounds like the serialization logic results in extra copies as Gabe discovered in PR ( #5112 ). Generally we have been aware there are cases where extra copies are happening, but the history is complicated (for example sometimes people have expected writable buffers when |
distributed git tip, Linux x64
Expected behaviour
Thanks to pickle 5 buffers, the peak RAM usage on each worker is 1 GiB
Actual behaviour
I can see on the dashboard the RAM of all workers that receive the computed future over the network briefly flare up to 2 GiB and then settle down at 1 GiB.
On stderr I read:
If I reduce the memory_limit to 2 GiB, the workers get killed off.
The sender worker is unaffected by the flaring.
I tested on Python 3.8 and 3.9 and on protocols tcp://, ws:// and ucx:// and all are equally affected.
The text was updated successfully, but these errors were encountered: