New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for mdb_set_compare
and mdb_set_dupsort
.
#12
Comments
mdb_set_compare
and mdb_set_duport
.mdb_set_compare
and mdb_set_dupsort
.
What does a comparison function do when assumptions about key structure are not true? A
|
Bringing up context values in the LMDB mailing list wasn't very productive. It seems unlikely for a context argument to be added. Other experimentation in #11 has led me to believe that simple integer based keys like "big-endian 64-bit unsigned integers" are unlikely to be fruitful in terms of performance gains. One option that does seem plausible (though a burden on the user) is to expose a fairly raw API that accepts a C function pointer as an argument. Such a function is actually impossible to call using a Go function because the
I have a branch investigating this, bmatsuo/compare-funcs. Using the above API it also explores the penalty involved in delegating a Go func to perform comparison. The experiment |
With the experience I have now I am going to close this issue. I will continue to investigate things in the avenue of my branch, bmatsuo/compare-funcs. Exposing the low-level API may be effective. But I will probably wait at least until when the cgo proposal from #10 takes effect (Go 1.6) and the cgo dynamic check has been implemented. |
@bmatsuo Execute me, |
I haven't revisited this since go1.6 has been released. I still plan on doing it. But I just haven't gotten around to it. But from the description of your problem I'm not sure if these APIs are what you need. Can you elaborate on your use case? The use case for the mdb_set_compare and mdb_set_dupsort functions are when you have special comparison requirements. For example, keys in the database are created by concatenating two strings of variable length. A special comparison function is required to sort keys of that nature "correctly". If you are trying to compare a value to the existing value under the same key before updating (e.g. so new data can be merged into the existing record) then these functions are not what you need. In that case you just issue a Txn.Get with the key, check the value, and then issue the Txn.Put if necessary to store the updated value. In such cases it is probably best to enable Let me know if I misunderstood something. |
Thank you @bmatsuo, I want to impl a simple queue, which is similar with kafka, but lightweight. The meta db used to record procuder and consumers Topic1 (db)
The data db used to record real data. data file1 (db)
The Consumer_%s is string,dataDBFileIndex is uint32, and offset is uint64 So my compare func looks like:
I'm not familiar with lmdb, without cmp code,can I impl that? |
Ok. I have a better idea now. And, Yes. It seems that you would need the functions discussed in this issue if your data needs to be ordered this way (by key size, then by key content). For your example above specifically, it is also problematic because it is clearly using the MDB_INTEGERKEY/MDB_INTEGERDUP flag(s). And those flags are not currently supported in lmdb-go (see #11). Though unless you have another reason to use those flags equivalent ordering is achieved by encoding unsigned integers as big-endian values using the "encoding/binary" package. I am going to reopen this issue to track support for mdb_set_compare and mdb_set_dupsort. If you look at my branch, bmatsuo/compare-funcs, and the example program I have that sets the comparison function. It should be possible for you to get started with custom comparison functions. https://github.com/bmatsuo/lmdb-go/blob/bmatsuo/compare-funcs/exp/cmd/lmdb_cmp/main.go Because I wrote the example program to gauge performance it allows the user to declare that comparison functions it uses be defined in C, in Go, or additionally in Go with some form of dynamic dispatch. |
@bmatsuo Get it! 👍 |
@zwb-ict looking again your tables above (Topic1 & data file 1). It seems like you could potentially get by splitting the Topic1 db into two dbs TopicMeta1, which contains string keys "Consumer_*" and "Producer", and TopicIndex1, which contains uint32 keys (dataDBFileIndex). If you changed your data model in that way (using 3 dbs per topic instead of two) it seems to me like you wouldn't need custom comparison functions at all. Would that work for you? I'm still interested in getting real support for these functions. But you are going to take a performance hit by using them. If it is possible to easily restructure your schema to avoid them I would highly recommend you do. edit: Below is an illustration TopicMeta1 (db)
TopicIndex1 (db)
data file1 (db)
|
@bmatsuo change schema seems to be a better idea, I'll change to that. Thank you! |
Define IntegerKey and IntegerDup flags
I think there is potential value here. But there needs to be some research and, in general, the C API needs to change for applications to take full advantage of these functions.
Background
The LMDB C API lets an application specify custom functions used to sort keys and values in a database. And this feature is not supported in lmdb-go despite potential benefits.
It is common for applications to store data in LMDB with fixed-width integer keys. Currently these keys must be encoded as big-endian byte slices and are compared as c-strings when scanned.
It may also be desirable to store data using exotic keys or exploit data structure when sorting duplicates. These cases are extreme but should be possible (and useful) with LMDB. A combination of the LMDB API and cgo prevent such things from being practical today. But were circumstances to change an lmdb-go API that could eventually support this functionality would be ideal.
Proposal
Define a new Cmp type that is an opaque handle for a comparison function. And, when arbitrary comparison functions can be provided, two functions for initializing and deinitializing a Cmp respectively will be be defined.
With the Cmp type two new transaction methods can be added for supplying alternate comparison functions.
To provide some utility before the LMDB C API changes several C-based sorting functions can be defined.
These Go constants will have corresponding C functions defined in the lmdbgo C library.
Examples
Examples of the various facets covered in this proposal are given below. In each case the comparison function is set immediately when the database handle is open, which itself immediately follows the call to Env.Open. This is the only calling pattern that will be supported.
Arbitrary Cmp Functions
In this example an arbitrary comparison function is used to create a Cmp value that is then used to set the key comparison function for the database.
Many to Many uint mapping
This example demonstrates a common real world case of database foreign keys. In this case 64-bit database IDs are mapped to one-another using a builtin comparison function to optimize seek performance.
Consequences
Existing programs will continue to function the same as they due. Their binary size may grow slightly.
Programs using builtin big-endian comparison functions will function the same semantically, but should have increased performance. Having a known length and encoding allows fast decoding to an integer format that can be quickly compared. Using a big-endian integer encoding is fortuitous because the Env.Copy* methods work as expected.
Risks
Big-endian integer encodings may see no performance increase when using builtin comparison functions.
The C API and cgo may never support hooking up arbitrary Go functions as comparison functions. Or the cgo bridge required may be too slow for practical use.
Alternatives
The
MDB_INTEGERKEY
andMDB_INTEGERDUP
DBI flags perform very similar tasks for integer valued keys/data. The integer flags may even outperform custom functions. However their introduction into the lmdb-go API has not been finalized. The flags also give up specificity about key encoding (big-/little-endian) for performance which will have benefits and drawbacks.Non-opaque values such as capnproto buffers cannot typically be used as keys in a database. Instead the values must either be stored in a separate database with opaque keys or appended to an opaque fixed-width prefix before stored as a key.
The text was updated successfully, but these errors were encountered: