Join GitHub today
Add support for ```mdb_set_compare``` (and maybe ```mdb_set_dupsort```) #79
I've been lurking around the documentation and it's seems like there is no support for
I don't know what is the interest of
def compare_keys(a, b): a = msgpack.loads(a) b = msgpack.loads(b) return cmp(a, b)
Except there is no
Here is the lookup fonction:
from collections import namedtuple cursor = index.cursor() # Entries have the following format: # # (attribute_name, value, identifier) -> '' # # Where identifier is the identifier of a document # with ``attribute_name`` set to ``value``. # Value is always the empty string, I did not find an interest on using it. KeyIndex = namedtuple('KeyIndex', ('attribute', 'value', 'identifier')) def lookup_documents_identifiers(attribute, value): # The identifier placeholder is set to the empty # string so that the cursor will be positioned at the first # key found for the combination of ``attribute`` # and ``value`` if any, because the empty strings is the # smallest string value and the index is sorted. lookup = KeyIndex(attribute, value, '') next = cursor.set_range(dumps(lookup)) while next: key, _ = next # value is a useless empty string key = KeyIndex(*loads(key)) # check that key is within bound of the lookup if (key.attribute == lookup.attribute and key.value == lookup.value): yield key.identifier next = cursor.next() else: # it can mean that: # # key.attribute != lookup.attribute # # which means there is no more document indexed # with this attribute, no need to iterate over more # keys # # or that: # # key.value != lookup.value # # They are some document that have a value for # ``lookup.attribute`` but ``key.value`` is not # what we look for and will never be anymore since # the index is ordered. # # In both case, there is no more matching documents. break
This has come up a bunch of times, and it's not clear either to me or upstream whether implementing it is worthwhile. Implementing it requires tracking the current comparator for the current database on the current thread, which requires either modifying LMDB's built-in comparators to accept a
Even if we could easily track the Python comparator function, performance is going to drop off the floor - creating a Python frame (and its associated heap churn, etc.) anywhere from 2-50 or more times per lookup just won't ever be fast, thus negating one of the principle benefits of LMDB.
One more problem is that changes to the comparator can produce database corruption (if the order changes beneath LMDB, it cannot reliably maintain the database), and there is no way for the database binding to prevent the user from shooting themselves in the foot from it.
Another problem is tracking exceptions that occur within the callback function. There is no sensible way to do this so that exceptions propagate without slowing down all
Finally, in most cases, and including some motivating examples from other projects (like SQLite4), use of a custom key comparator has been replaced instead with a custom key encoding. Such encodings can be created that sort in arbitrarily useful ways, all while relying on simple fast
An example key encoding for Python that behaves like this is the
There are unique problems that would be solved by introducing support for this, though. For example, the ability to maintain exact feature parity with an alternative C implementation. I'm definitely happy to think more about this, but we'd need to find a good approach to make it work, and probably I'd prefer to defer the final decision to upstream rather than introduce something slow, unreliable, or sucky. :)
Looking forward to your thoughts
I understand that it will be a performance sink, this doesn't mean users should be disallowed to do what they want/need.
Regarding the problem of errors raised in the comparator, I don't think it's a big issue, in the end you seldom write the comparator. That's said as a maintainer you might think differently.
I don't understand the following:
I was thinking about something like
I will probably use Acid for my project (if I come back to it).
The question is not about restricting the user, but about not making too many promises to the user, especially when the overhead or maintenance burden may be quite high :)
Please feel free to cutpaste code from Acid, but I would recommend against trying to use it, it is in a very early stage ;)
I think we should leave this ticket open at least for a couple more releases, and in the meantime I'll think about ways it could be implemented. Perhaps someone else will find an example of a critical use case.
Ok, excellent :) I think at this point, it is safe to say acid.keylib.* interface will remain stable. Basically all the remaining high level interfaces are very much still a work in progress, there are two very big rewrites that have not landed yet due to lack of time.