New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Shared memory interface #23
Labels
Comments
This was referenced Sep 7, 2013
dw
added a commit
that referenced
this issue
Sep 7, 2013
dw
added a commit
that referenced
this issue
Sep 8, 2013
dw
added a commit
that referenced
this issue
Sep 8, 2013
dw
added a commit
that referenced
this issue
Sep 8, 2013
dw
added a commit
that referenced
this issue
Sep 8, 2013
dw
added a commit
that referenced
this issue
Sep 8, 2013
dw
added a commit
that referenced
this issue
Sep 11, 2013
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The goal is to exploit database engines like LMDB (via py-lmdb) by transparently extending their zero-copy behaviour up into user code. Since LMDB never copies during reads, range scans on cached datasets can be extremely fast, however in the current version of Acid most of this efficiency is lost on immediate key decoding, string slicing (partly due to #11 and #37), and immediate value decoding.
py-lmdb already supports exposing zero-copy via
buffer()
objects, but the resulting buffers must be managed carefully by the user, e.g. they cannot outlive the transaction they were created in. Other databases like LevelDB also export(ptr, length)
pairs that are guaranteed to survive at least until the next iterator mutation.It's therefore desirable to imagine an interface where Acid could receive the database engine's buffer, and safely use it as the backing store for its
Key
type exactly until the moment before it becomes unavailable, without needing careful management by the user. The plain buffers interface doesn't allow this, it only permits locking, which would prevent any change to the LMDB transaction or LevelDB iterator while any sink object exists. The plain interface also doesn't support any kind of notification, so at best a transaction would be forced to remain alive for as long as anyKey
instances existed.The plan is to design a protocol for orthogonal Python objects to communicate with each other at the C level, to learn how to share read-only buffers in a way that is transparent to the (Python) programmer. The programmer need not care where an object came from, or that within a transaction its storage does not exist, or that during transaction finalization its storage was copied to a private heap allocation (or for LevelDB, that the copy happened because the user kept the object around during a subsequent iteration).
If implemented correctly, creating a large number of temporary
Key
instances (e.g. for hash join, ticket #29) should involve 0-1 allocations per key (assuming a freelist exists), and zero copies. Combined with the planned shared-memory aware value type (#41), it should also allow large range scans with minimal overhead.Implement
KEY_SHARED
/KEY_COPIED
modes by teachingext/key.c
how to accept a source buffer object and source object, and how to communicate with the source object to get a lease on the buffer. Reuse the new interface forext/structlib.c
(ticket #41) when that module exists.Current idea is to extend
from_raw
likeKeyList.from_raw(prefix, buf, source=None)
. Ifsource
is present,buf
must be abuffer
instance.key.c
will ask source for__memsource__
class attribute. The attribute must be present, and reference aPyCapsule
containing a struct. Struct will look like:listen()
will internally ask KeyList for__memsink__
attribute, which must be present, and reference a PyCapsule containing a struct. Struct will look like:SinkList
is simply:The idea is that with 2 (potentially interned) hash lookups, source object knows how to maintain a linked list of sink objects directly inside the PyObjects themselves, avoiding separate heap allocations for a separate container and one weakref instance per sink.
Exposing the protocol directly to Python code doesn't seem very useful at all, so keeping everything buried as PyCapsules doesn't feel so bad.
When source object (the database iterator or transaction) is about to abort or commit, it walks the list like:
This approach is obviously horribly fragile, so not ready to commit to it just yet. The major benefit is that sink objects need only include a header file in their source tree, and all linked list logic and suchlike is constrained to the source object (py-lmdb Transaction in this case).
Other issues with the protocol above are that it does not allow for multiple sources to exist for a single sink (although that is not needed in my use case), and that no verification occurs that source really provided buf. Not sure how useful such a verification would be.
Another issue is that there is potential "jankiness" when the user builds up a huge list of shared-memory objects within a transaction, and then commits or aborts the transaction. In that case, expiry will immediately cause a large number of heap allocations/memory copies, potentially leading to a
MemoryError
or crash far away from the code that caused it. It seems easy to avoid this, though, and there doesn't seem to be any alternative design that prevents it.An alternative approach would be to combine
__memsource__
and thebuffer
object into a single custom type, but thenPyArg_ParseTuple()
and friends would need to callPyTypeObject.tp_as_buffer
and create a temporary buffer object when not using the interface. It's also repugnant on the basis that it introduces Yet Another buffer type, in addition tobuffer
andmemoryview
.It also doesn't solve the fact that consumers must also still implement the protocol.
The text was updated successfully, but these errors were encountered: