Implement acid.structlib #41

Open
dw opened this Issue Sep 7, 2013 · 0 comments

Projects

None yet

1 participant

@dw
Owner
dw commented Sep 7, 2013

This ticket is to implement a new Struct class that avoids the immediate decoding presently involved in doing reads, to allow dynamic generation of compact encodings in order to support meta.py, and to avoid memory copies (ticket #23). The goal is to allow large (thousand rows+) range scans to be practical on a per-request basis, by translating all key and value decoding essentials into pointer/freelist manipulations.

Although the class could exist as a separate package, it will be included in Acid for convenience' sake. The class is not intended to be consumed directly by the user (even though it could), but instead to be used as a backing encoding for meta.py.

The goal is to gut py-datakit's pb_encoding.py and rewrite it as a lazy-decoding dict-like object backed by a protocol buffer. The set of supported field types will be the same, except the time type will be modified to use the same offset-aware time mechanism as keylib. Depending on mode, the object will be backed either by a memory buffer, a dict, or combination of both.

Existing Thrift/Protocol Buffer/etc. solutions all fail in various ways: they do immediate decoding, they require a separate definition/protocol compiler step, they require memory copies, they do not support defining types at runtime, etc., etc.

The module will resemble:

class StructType:
    __init__(name)
    add_field(field_name, kind, id)

class Struct:
    __init__(struct_type)
    classmethod from_raw(struct_type, buf, source=None) # Ticket #23
    classmethod from_hex(key=None) # Ticket #49
    reset()
    to_raw()
    to_hex(key=None)
    __getitem__
    __setitem__
    __delitem__
    # [all the usual dict methods]

StructType will be a type descriptor object, and Struct will be the container object itself.

acid.encoders.RecordEncoder will grow a new prepare() method, which will receive a description of the type that will be encoded (the interface to this is not yet decided). When the meta.py metaclass runs, it passes a list of the type's fields to prepare().

prepare() looks up (KIND_STRUCT, "modname.typename", "field_name", field_type) tuples in the store's metadata, expecting the metadata record's value to be an integer ID to assign to the field. If no such ID exists, a new one is assigned using a Store.count() counter.

In this way prepare() produces a persistent mapping of (type, field, field_type) to integer ID, transparently providing a space-efficient encoding for any user Model class without extra effort.


Similar to Key, Struct internally looks something like:

enum StructFlags {
    // Struct is stored in a shared buffer.
    STRUCT_SHARED = 1,
    // Struct was stored in a shared buffer, but the buffer expired, so we copied
    // it to a new heap allocation.
    STRUCT_COPIED = 2,
    // Struct was created uniquely for this instance, buffer was included in
    // instance allocation during construction time.
    STRUCT_PRIVATE = 4
};

struct StructType {
    PyObject_VAR_HEAD
    // Strong reference to source object.
    PyObject *source;
    // Size is tracked in Py_SIZE(Key).
    enum StructFlags flags;
    // In all cases, points to data.
    uint8_t *p;
    // PyDict_Type instance, or NULL
    PyObject *dict;
};

When Struct() is invoked, a single allocation is made, with Py_SIZE(struct) set to 0, and p and dict set to NULL.

When Struct.from_raw() is invoked with no source=, a single allocation is made, with the source data copied to the end of the struct's PyObject. dict is set to NULL.

When Struct.from_raw() is invoked with source=, a single allocation is made, with source data pointing to the source object and using the protocol from ticket #23.

When Struct.__getitem__ is invoked:

  • If dict is not NULL, look for the item in the dict. If it is there, return it.
  • Lazily decode the requested element. If the element is an immutable type (string, integer, time, UUID, bool, ...), return it.
  • If the element is a mutable type (list, etc.), then if dict is NULL, initialize a new dict. Now store the element in the dict and finally return it.

When Struct.__setitem__ is invoked:

  • If dict is NULL, initialize it to an empty dict.
  • Insert the modified element into the dict.

When Struct.to_raw() is invoked:

  • For each element defined in the field type, if the element is present in dict, encode it, otherwise if the element is present in struct->p, then copy it.

When Struct.reset() is invoked:

  • Py_DECREF(dict) and set it to NULL, resetting the instance back to pre-modification state. This may be used later to automatically rollback model changes.

In this way Struct can represent multi-field mutations without needing to modify its source buffer, and can track changes made to mutable data structures. The dict need never be allocated unless a mutation is attempted, or a list or embedded struct is decoded.

A final part of the design should be a to_descriptor() function, that takes a list of StructTypes and returns either a textual description of the types in Protocol Buffers syntax, or as a descriptor.pb-format protocol buffer.

@dw dw pushed a commit that referenced this issue Oct 20, 2014
David Wilson Ticket #41: beginnings of structlib d2f2162
@dw dw added a commit that referenced this issue Oct 20, 2014
@dw Ticket #41: beginnings of structlib bbf15bf
@dw dw added a commit that referenced this issue Oct 20, 2014
@dw Ticket #41: implement field coders. 7528cdf
@dw dw added a commit that referenced this issue Oct 22, 2014
@dw Ticket #41: fix array handling 0144704
@dw dw added a commit that referenced this issue Oct 22, 2014
@dw Issue #41: replace StringIO with direct access
5x faster on CPython, PyPy for some toy examples.
8a1a56c
@dw dw added a commit that referenced this issue Oct 22, 2014
@dw Issue #41: remove unused skip(). 21af2a7
@dw dw added a commit that referenced this issue Oct 22, 2014
@dw Ticket #41: implement lazy decoding.
Per original plan, Struct now lazily decodes fields. On PyPy 2.4.0 x4,
instantiating a struct is almost free:

    1000000000 loops, best of 3: 0.00101 usec per loop

With a slightly more complex example:

    stype = s.StructType()
    stype.add_field('dave', 1, 'bool', False)
    stype.add_field('dave2', 2, 'varint', True)
    stype.add_field('host', 3, 'inet4', False)
    stype.add_field('host6', 4, 'inet6', False)
    stype.add_field('host6port', 5, 'inet6port', False)
    stype.add_field('text', 6, 'bytes', False)
    stype.add_field('flue', 7, 'bool', False)
    stype.add_field('name', 8, 'bytes', False)

    st = s.Struct(stype)
    st['dave'] = True
    st['dave2'] = [1, 2, 3, 4, 5]
    st['host'] = '255.0.255.0'
    st['text'] = open('/etc/passwd').read()
    st['flue'] = True
    st['name'] = 'david wilson'
    print [len(st.to_raw()), st.to_raw()]

Decoding the 12 byte name field, which is the 6th field in the 2kb
buffer and appearing last, requires:

    1000000 loops, best of 3: 0.469 usec per loop

Or about 2.1 million rows/sec, assuming we only wanted that field.
27ac236
@dw dw added a commit that referenced this issue Oct 22, 2014
@dw Ticket #41: implement lazy decoding.
Per original plan, Struct now lazily decodes fields. On PyPy 2.4.0 x64,
instantiating a struct is almost free:

    1000000000 loops, best of 3: 0.00101 usec per loop

With a slightly more complex example:

    stype = s.StructType()
    stype.add_field('dave', 1, 'bool', False)
    stype.add_field('dave2', 2, 'varint', True)
    stype.add_field('host', 3, 'inet4', False)
    stype.add_field('host6', 4, 'inet6', False)
    stype.add_field('host6port', 5, 'inet6port', False)
    stype.add_field('text', 6, 'bytes', False)
    stype.add_field('flue', 7, 'bool', False)
    stype.add_field('name', 8, 'bytes', False)

    st = s.Struct(stype)
    st['dave'] = True
    st['dave2'] = [1, 2, 3, 4, 5]
    st['host'] = '255.0.255.0'
    st['text'] = open('/etc/passwd').read()
    st['flue'] = True
    st['name'] = 'david wilson'
    print [len(st.to_raw()), st.to_raw()]

Decoding the 12 byte name field, which is the 6th field in the 2kb
buffer and appearing last, requires:

    1000000 loops, best of 3: 0.469 usec per loop

Or about 2.1 million rows/sec, assuming we only wanted that field.
68c5cbb
@dw dw added a commit that referenced this issue Oct 22, 2014
@dw Ticket #41: inline read_key() call for read_value().
1000000 loops, best of 3: 0.44 usec per loop
51b9c53
@dw dw added a commit that referenced this issue Oct 22, 2014
@dw Ticket #41: precalculate varint to avoid some shift/masking.
1000000 loops, best of 3: 0.435 usec per loop
18541fe
@dw dw added a commit that referenced this issue Oct 22, 2014
@dw Issue #41: repr/dict interface fixes bfc1081
@dw dw added a commit that referenced this issue Oct 23, 2014
@dw Ticket #41: specialize encode for PyPy
Before:
    100000 loops, best of 3: 3.88 usec per loop

After:
    100000 loops, best of 3: 2.27 usec per loop
cb2413c
@dw dw added a commit that referenced this issue Oct 23, 2014
@dw Ticket #41: more dict interface fixes. b473b88
@dw dw added a commit that referenced this issue Oct 23, 2014
@dw Ticket #41: unroll write_varint()
PyPy 2.4.0 before:
    1000000 loops, best of 3: 2.18 usec per loop

PyPy 2.4.0 after:
    1000000 loops, best of 3: 0.893 usec per loop
c6f13e5
@dw dw added a commit that referenced this issue Oct 23, 2014
@dw Ticket #41: use bytearray() instead of StringIO.
CPython 2.7, before:
    10 loops, best of 6: 38.8 msec per loop

After:
    10 loops, best of 6: 31.5 msec per loop
0b2318c
@dw dw added a commit that referenced this issue Oct 23, 2014
@dw Ticket #41: partially unroll read_varint()
PyPy 2.4.0, before:
    1000 loops, best of 3: 3 msec per loop

After:
    1000 loops, best of 3: 2.02 msec per loop
cfdd989
@dw dw added a commit that referenced this issue Oct 23, 2014
@dw Ticket #41: only cache mutable values in Struct dict.
PyPy 2.4.0, before:
    1000000 loops, best of 3: 0.209 usec per loop

After:
    10000000 loops, best of 3: 0.165 usec per loop
ee1a98c
@dw dw added a commit that referenced this issue Oct 24, 2014
@dw ticket #41: clean up map names slightly 4a6ea81
@dw dw added a commit that referenced this issue Oct 25, 2014
@dw Ticket #41: constant time skip function selection
Avoids up between 1 and 3 branches while skipping elements. Has no
effect on PyPy, at least for this microbenchmark.

CPython, before:
        100000 loops, best of 3: 6.48 usec per loop

After:
        100000 loops, best of 3: 5.83 usec per loop
e1d2c69
@dw dw added a commit that referenced this issue Oct 25, 2014
@dw ticket #41: remove method call & reorder branch..
..to avoid jump in the usual case. Otherwise costs a double hash lookup
for mutable elements.

CPython before:

    100000 loops, best of 10: 5.91 usec per loop

CPython after:

    100000 loops, best of 10: 5.71 usec per loop
8198388
@dw dw added a commit that referenced this issue Oct 25, 2014
@dw Issue #41: delete some unused constants. 0696134
@dw dw added a commit that referenced this issue Oct 25, 2014
@dw Ticket #41: remove unused import, inline read_key()
Since field.wire_key exists, can avoid read_key() use entirely for
DelimitedCoder.read_value().
75e0df7
@dw dw added a commit that referenced this issue Oct 25, 2014
@dw Ticket #41: fold branches together.
Nets a tiny improvement on CPython.
c42f030
@dw dw added a commit that referenced this issue Oct 26, 2014
@dw Use ticket #41: use array for sequence decode where possible
PyPy, before:
        100 loops, best of 6: 2.46 msec per loop

After:
        100000 loops, best of 6: 6.42 usec per loop
f6dc2d3
@dw dw added a commit that referenced this issue Oct 26, 2014
@dw Use ticket #41: use array for sequence decode where possible
PyPy, before:
        10000 loops, best of 6: 54.9 usec per loop

PyPy, after:
        100000 loops, best of 6: 7.17 usec per loop

CPython, before:
        100 loops, best of 6: 2.46 msec per loop

CPython, after:
        100000 loops, best of 6: 6.42 usec per loop
bbe3ee9
@dw dw added a commit that referenced this issue Oct 26, 2014
@dw Ticket #41: support StructField in add_field().
Fix various small bugs too.
f4b6f17
@dw dw added a commit that referenced this issue Oct 26, 2014
@dw Ticket #41: implement DelimitedSequence.
Given  {'x': [{'x': 1000}] * 10000}, PyPy, before:

    100 loops, best of 3: 7.94 msec per loop

PyPy, after:

    1000 loops, best of 3: 282 usec per loop

Cost of a delimited list is 3 allocations, could maybe improve on this.
bdfd805
@dw dw added a commit that referenced this issue Nov 2, 2014
@dw ticket #41: Precompute wire key serialization
Saves a little encode time.
d43ece6
@dw dw added a commit that referenced this issue Nov 2, 2014
@dw ticket #41: abandon dict interface.
This deletes over 100 lines of gunk that aren't needed for meta.py, and
pushes some more branches into the type system (type_check_sequence(),
caching mutable attributes).

Since we cache values directly in the Struct instance's dict, we
no longer need to allocate a cache dict. Additionally, it is now almost
possible to use any type as the Struct class, which would allow avoiding
another allocation per row while scanning a collection of meta.py
objects, if things are arranged carefully.
8697e8b
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment