Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom encodings #149

Merged
merged 9 commits into from
Aug 14, 2013
Merged

Custom encodings #149

merged 9 commits into from
Aug 14, 2013

Conversation

dominictarr
Copy link
Contributor

Adds support for custom encodings.

Just pass in an object with {encode: function, decode: function, buffer: boolean}
instead of a string encoding type, for keyEncoding, valueEncoding or encoding options.

So, this could probably be cleaned up, to use more abstraction - like: var key =this.toKey(key_)
but I did it this way, because was simplest to make smallest change to the current code.

closes #51

@max-mapper
Copy link

speaking on behalf of the web browser lobby I would like to ask if this API would let me ensure that levelup/leveldown never try to create a Buffer, e.g. what are the semantics of the buffer: boolean option? can there be a default in which leveldown returns the value from the db in whatever encoding it is stored in?

@dominictarr
Copy link
Contributor Author

Yes! so this is passed straight to leveldown, and leveldown returns whatever it wants
see the levelDown asBuffer option.

https://github.com/rvagg/node-leveldown#leveldowngetkey-options-callback

that should be flexible - we could use bops in the leveldown tests, so that a Bufferish object is acceptable.

@dominictarr
Copy link
Contributor Author

@maxogden the reason for the option is to avoid a memcopy in leveldown, depending on whether the custom encoding wants to retrive a buffer or a string (example: JSON needs string, msgpack needs buffers)

@max-mapper
Copy link

@chrisdickinson have you benchmarked bops at all? there is a question of if it is any slower than plain Buffers when running in node on top of Buffers

@chrisdickinson
Copy link

@maxogden I haven't, but I would expect to see a (very) small slowdown as the node version of bops is just directly delegating to the appropriate Buffer operations (so it's the overhead of an extra call that could potentially be inlined by the JIT).

@@ -389,6 +390,38 @@ buster.testCase('ReadStream', {
}.bind(this))
}

, 'test injectable encoding': function (done) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this testing other than the ability to pass in the encoding object? Can we do a utf8 readStream and collect the data and compare it to what it should look like in msgpack format?

@rvagg
Copy link
Member

rvagg commented Jun 18, 2013

Yeah, I'm cool with this I think. It gives us the flexibility to still pass in alternative encodings to individual operations so you could have crazy stuff like msgpack and bytewise encoded values/keys in the same store.
Needs some cleaning up and some of the methods are getting pretty big now so it'd be good to try and pull out common utility functions.

@dominictarr
Copy link
Contributor Author

Cool, agree on need to refactor.

@mcollina
Copy link
Member

In node-level-ttl there is some need of an API for encoding/decoding the keys, as it does not support keys that are not utf8. How about exposing the encoding/decoding functions in the API?

Level/level-ttl#3

(if you can figure out how to deal with that without some LevelUp support, I'll be very happy!)

@mikeal
Copy link

mikeal commented Jul 7, 2013

@dominictarr just enlightened me to the fact that I could use this for validations by just throwing if the key didn't match certain parameters. +1

@rvagg
Copy link
Member

rvagg commented Jul 17, 2013

Where are we at with this @dominictarr? would be a good one to get out soon. Do you need someone else to help get this finalised?

@dominictarr
Copy link
Contributor Author

I want to put some time into this soon, and also, getting lt, gt, lte, gte into leveldown - probably next week?

@rvagg
Copy link
Member

rvagg commented Jul 17, 2013

excellent, le tme know if you need any help cleaning up the leveldown work, looking forward to that!

@dominictarr
Copy link
Contributor Author

okay!

I've reorganised this to use some utility functions,
which didn't involve any changes to the public interface.
as mentioned before, there is are some additional tests.

Also, https://github.com/eugeneware/byteup will break with this change, so @eugeneware should update it.
(note, byteup adds another encoding to levelup, but not via the public api, so thems the breaks)

Also, added documentation to readme.

@dominictarr
Copy link
Contributor Author

okay, this is now passing the tests on master & good to merge!

@rvagg
Copy link
Member

rvagg commented Aug 14, 2013

LATM ("Looks Awesome To Me")

If there's no objections by the time you're around next, go ahead and merge and I'll push it in a release, this is fantastic.

@eugeneware
Copy link

@dominictarr no probs. I'll update it work again. Can't wait for this sucker to land! :-)

@eugeneware
Copy link

Can't find any easy way to update byteup. @dominictarr any ideas?

But that's fine. The new method is better. Byteup will be deprecated once this land in any case. It was an interim hack.

Though it would be nice if you could register global types. There is a type field that gets output for debugging purposes. Would be nice if you could have a simple way to register a default encoding for a type string.

But it's not a big deal. Let's push this out :-)

dominictarr added a commit that referenced this pull request Aug 14, 2013
@dominictarr dominictarr merged commit 79c974c into Level:master Aug 14, 2013
@dominictarr
Copy link
Contributor Author

merged.

@eugeneware to be honest, I'm a little confused about how byteup is used, I can see it's called by subindex.
I see that it adds a 'bytewise' encoding to levelup's utils, but I can't see it ever pass that as an encoding option to anything?

@eugeneware
Copy link

@dominictarr presently you use byteup like so:

var bytewise = byteup(); 
// adds the type "bytewise" to the encodings list with the bytewise.encode / bytewise.decode fns

var db = levelup(dbPath, { keyEncoding: 'bytewise', valueEncoding: 'json' }); 
// the string 'bytewise' will map to the codec entry added.

Once again, not a big deal, but checking if there was some way with the new API to register an entry into the encoders/decoders data structure for a specific string "type" (ie. in the same way that 'utf8' and 'json' exist).

Hope I'm making sense. If not, no big dramas, but worth an ask.

@eugeneware
Copy link

@dominictarr in regards to subindex, I started off using byteup, but I struggled to get sublevel play well with Buffers as a keyencodings (like bytewise). I eventually just encoded the key as a hex string.

You're right, the byteup reference is not actually used in subindex any more.

@dominictarr
Copy link
Contributor Author

oh right - that makes sense. I would be opposed to registering encodings,
better, would be to just get modules like msgpack and bytewise to export the {encode, decode, buffer, type} pattern,
and then you can just do

level(path, {keyEncoding: require('bytewise'), valueEncoding: require('msgpack')})

etc.

I am planning on refactoring level-sublevel so that it has real support for binary encoded keys, bytewise, etc.

Although, ranges is a higher priority for me. Once that is cleaned up it will remove a lot of crud from level-sublevel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

support arbitary encoding
7 participants