Alternative destination-specific serialization #400

mrocklin · 2016-08-02T20:17:34Z

In some cases objects may want to serialize themselves dependent on the sender and recipient machine. One such example is multi-gpu settings where certain workers may be able to exchange GPU data efficiently if they are on the same physical host or nearby in the network.

To meet this goal it may make sense to enable host-specific serialization, that is we would optionally use a new serialization function that was invoked for certain types:

serialize(obj, host, destination) -> bytes

We would probably have to register serialization functions for certain types or else look for a special protocol like __getstate_host__ and __setstate_host__.

cc @seibert @sklam @gforsyth

The text was updated successfully, but these errors were encountered:

sklam · 2016-08-04T18:45:53Z

IIUC, a usecase for GPU array would be: serialize(gpuarray, host, dest) send a handle for the destination to request for direct GPU transfer without going thought pickling and the network.

Is that right?

If so, who is reponsible to keep obj alive so that it is available when the destination request for the peer transfer?

seibert · 2016-08-04T18:48:27Z

that is a good point. Any of these alternate-channel communication methods require some resource on the source to exist until the destination has completed the transfer. This will require the destination to ACK back to the source before that resource can be released.

mrocklin · 2016-08-08T13:29:10Z

That will commonly be true in practice but is hard to guarantee. One could hack this by implementing a bit of reference counting in the serialize function.

mrocklin · 2016-11-01T21:33:36Z

As brought up in #614 (comment) this would also be helpful for clusters with heterogeneous language workers (Python worker sending data to Julia worker)

pitrou · 2017-02-23T19:30:00Z

So, this is a feature baked in the new I/O infrastructure, since each transport (or backend) decides on the serialization it wants to use. For example, the inproc backend defined in #887 uses no serialization: Python object references are simply queued from one thread to another.

mrocklin · 2018-06-11T17:41:05Z

So here are a couple options:

We include source and destination host to serialization functions. They send enough data to the other side to initiate an efficient communication back through some side channel.
We create different comm types for inter-host and intra-host communications (see Intra-host communication #2046) and then allow serialization to differ among comm types. Serialization schemes do not get source and destination host information, but now we can different kinds of serialization schemes for intra-node and inter-node, and use them appropriately for inter-node comms and intra-node comms.

For GPU transfers in either case the data serialized is just some connection and address information. When the receiving side accepts this data and deserializes it it initiates a side-channel communication. We allow serialization functions to request that the sending side only release resources after the receiving side has finished the deserialization process (which in this case is a side-channel transfer)

jakirkham · 2020-07-21T03:27:17Z

Is this still relevant?

This was referenced Oct 20, 2016

Support shared memory with Queues #539

Closed

Custom Serialization #604

Closed

mrocklin mentioned this issue Nov 1, 2016

Efficient Pandas serialization #614

Closed

mrocklin mentioned this issue Jul 5, 2017

Better GPGPU integration #1226

Open

jakirkham mentioned this issue Jul 21, 2020

Intra-host communication #2046

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alternative destination-specific serialization #400

Alternative destination-specific serialization #400

mrocklin commented Aug 2, 2016

sklam commented Aug 4, 2016 •

edited

Loading

seibert commented Aug 4, 2016

mrocklin commented Aug 8, 2016

mrocklin commented Nov 1, 2016

pitrou commented Feb 23, 2017

mrocklin commented Jun 11, 2018

jakirkham commented Jul 21, 2020

Alternative destination-specific serialization #400

Alternative destination-specific serialization #400

Comments

mrocklin commented Aug 2, 2016

sklam commented Aug 4, 2016 • edited Loading

seibert commented Aug 4, 2016

mrocklin commented Aug 8, 2016

mrocklin commented Nov 1, 2016

pitrou commented Feb 23, 2017

mrocklin commented Jun 11, 2018

jakirkham commented Jul 21, 2020

sklam commented Aug 4, 2016 •

edited

Loading