Skip to content

Revisit serialization in IPython.parallel #1504

@minrk

Description

@minrk

erialization code used in IPython.parallel could use an audit and overhaul. It will be a while before I actually get to this, but I should make a note here to keep track of current thoughts.

The serialization code (principally in utils.pickleutil/newserialized) is largely inherited from the days of IPython.kernel, and could use some improvements. Specifically:

  • the newserialized machinery for zero-copy sends should be extensible, allowing third-parties to define their own non-copying behavior. Since this is currently implemented with a simple if/elif/elif-style switch statement, it can just as easily be done with a dict-based switch, which would trivially provide extensibilty.
  • the whole can/uncan/serialize logic is a bit convoluted. It might make better sense to conflate these two libraries. I'm not sure.
  • the serialize_object method used for serializing containers of objects (applied to args/kwargs and results when using apply) serializes objects separately, and then pickles the container itself. This is extraordinarily expensive for large containers, which technically only applies to results, but is still important.

There is also a data-overhead from using this inefficient double-serialization:

from IPython.parallel.util import serialize_object
import cPickle as pickle
obj = range(1000)
print '    pickle: %6i bytes' % len(pickle.dumps(obj, -1))
print 'serialized: %6i bytes' % len(serialize_object(obj)[0])
#--
    pickle:   2752 bytes
serialized:  34086 bytes

There is also finite overhead of doing zero-copy sends (construction of Python objects and async invocation of the GIL from the ØMQ io_thread). The Session object currently does zero-copy sends exclusively, but there is probably a finite threshold below which copying is actually cheaper. We need to do some rough profiling to figure out what this should be, but I expect it's somewhere above 1kB.

A simple test case on my laptop:

In [11]: import zmq
    ...: import numpy as np
    ...: 
    ...: ctx = zmq.Context()
    ...: s = ctx.socket(zmq.PUB)
    ...: s.bind_to_random_port('tcp://127.0.0.1')
    ...: 
    ...: 
    ...: for n in np.logspace(3,5,9).astype(int):
    ...:     msg = 'x'*n
    ...:     print n
    ...:     print 'copy', 
    ...:     %timeit s.send(msg)
    ...:     print 'zero-copy',
    ...:     %timeit s.send(msg, copy=False)
1000
copy 1000000 loops, best of 3: 1.5 us per loop
zero-copy 100000 loops, best of 3: 5.85 us per loop
1778
copy 1000000 loops, best of 3: 1.53 us per loop
zero-copy 100000 loops, best of 3: 5.74 us per loop
3162
copy 1000000 loops, best of 3: 1.63 us per loop
zero-copy 100000 loops, best of 3: 5.86 us per loop
5623
copy 1000000 loops, best of 3: 1.75 us per loop
zero-copy 100000 loops, best of 3: 5.93 us per loop
10000
copy 100000 loops, best of 3: 2.22 us per loop
zero-copy 100000 loops, best of 3: 6.13 us per loop
17782
copy 100000 loops, best of 3: 3.44 us per loop
zero-copy 100000 loops, best of 3: 5.78 us per loop
31622
copy 100000 loops, best of 3: 4.92 us per loop
zero-copy 100000 loops, best of 3: 5.77 us per loop
56234
copy 100000 loops, best of 3: 7.34 us per loop
zero-copy 100000 loops, best of 3: 5.78 us per loop
100000
copy 100000 loops, best of 3: 11.7 us per loop
zero-copy 100000 loops, best of 3: 5.77 us per loop

Indicates that the threshold should be somewhere around 50kB. Now, with the memory costs of performing this copy, as well as the hidden and unpredictable cost of asynchronous GIL calls from the IO thread when doing zero-copy make this much less clear than a simple crossover plot would suggest.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions