support arbitary encoding #51

Raynos · 2013-01-04T06:56:35Z

Let's say I want encoding like

var db = levelup("uri", {
    toEncoding: function (raw) {
        return new Type(JSON.parse(raw))
    }
    , toBuffer: function (type) {
        delete type._id
        return JSON.stringify(type)
    }
})

function Type(data) { ... }

I basically want to wrap all objects coming out of the db in a certain encoding and clean up all objects I put into the db with a certain cleanup logic.

dominictarr · 2013-01-04T10:27:39Z

I think there are better names than "toEncoding" and "toBuffer"

Raynos · 2013-01-04T16:57:43Z

I would agree on that

ralphtheninja · 2013-01-04T17:29:01Z

How about

var db = levelup("uri", {
    transform: function (raw) {
        return new Type(JSON.parse(raw))
    }
})

juliangruber · 2013-01-04T17:59:54Z

var db = levelup('uri', {
  encode : function (raw) { return JSON.stringify(raw) },
  decode : function (json) { return JSON.parse(json) }
})

dominictarr · 2013-01-04T18:29:39Z

what about

var db = levelup('uri', {
  encoding : {
   encode | stringify : function () {...},
   decode | parse : function () {...}
  }
})

pretty much every node module that does this will either have encode & decode, or stringify & parse.
so you can also just do

var db = levelup('uri', {
  encoding : require('yamlish')
})

or JSON, or msg-pack or whatever.

That is how I think you should do that, if you where going to do this.
I think that we are better off not putting litte convienience functions into the core bindings.
Only put the things that is strictly necessary, put the sugar in a wrapper, like connect vs. express.
... though if we where to apply this principle to the end, I think it would mean removing some current features.

ralphtheninja · 2013-01-04T18:49:49Z

@dominictarr With that approach we could check if it's a callback or not and have support for streams. Just pipe the data through a stream provided by the user and pipe it back as a duplex if the data should continue somewhere else. The user can then create any type of chain of streams.

Raynos · 2013-01-04T20:00:54Z

@dominictarr the reason this needs to go into core is because there's a bunch of json / buffer encoding logic all over the place already.

I just want that logic to allow me to use custom encoding / decoding functions.

If we dont want this in code then we can remove all the encoding / decoding logic.

I do agree that encoding: "json" should go in favor of encoding: JSON

juliangruber · 2013-01-04T20:33:12Z

stating that you need to provide both methods by wrapping them in a config option is a good idea.

What I don't like is big objects for configuration instead of methods and chaining, which are more beautiful and also semantically more correct.

rvagg · 2013-01-04T23:45:18Z

Something to keep in mind: the binding can handle either Strings or Buffers so that's fine, but coming back out it needs to know whether to put the contents into a String or a Buffer; putting everything into a Buffer is going to have a performance hit where Strings are all that's needed (the majority case) due to an extra copy. So there would need to be some way to signal to the binding what to return. See the asBuffer boolean option that's passed to the binding from get() and also keyAsBuffer and valueAsBuffer for readStream().

Also, my vote would be for adding this to the options object, and 'encode' and 'decode' make sense I guess.

dominictarr · 2013-01-05T03:10:57Z

@ralphtheninja streams is the wrong idea here, because the leveldb api needs a single buffer/string.
If the api supported Streams, then it would need to buffer them, which would just give users the wrong idea.

I think we should remove the json encoding. just support the same base64, utf-8, ascii encodings that node core has. Small core, big Userland. Note, it's beginning to become evident that node core is still too large.

Easier to nip it in the bud, than to prune it back when fully grown.

juliangruber · 2013-01-05T09:07:02Z

The correct thing would be to use buffers as only data type, but that's not an option since it slows things down and I think that 95% of users only need strings.

So what about making strings the default type and having buffers as an option like

db.put('foo', new Buffer('bar'))
db.get('foo', { buffer : true })

juliangruber · 2013-01-05T09:07:54Z

Also, should the encode/decode functions need to handle buffers or just be ignored for them?

rvagg · 2013-01-05T09:14:52Z

Maybe if you supply an encoder and/or decoder then you can't supply the encoding option for any operation (you can currently supply it for any operation). Then, at the same time as you give the encoder and decoder you can supply the 'asBuffer' (or just 'buffer') option, this then gets passed down to the binding so it knows when to provide a Buffer for all of its operations.

There's also the issue that you can adjust encoding for keys too, probably a < 1% use-case but it's still there as an option.

dominictarr · 2013-01-05T09:32:16Z

The binding only needs the encoding when creating a string, or accepting a string, correct?

rvagg · 2013-01-05T09:39:17Z

only when passing back into V8, otherwise it can detect whether it's a Buffer or otherwise. When it comes out of LevelDB it's just an array of bytes and it needs to decide whether it should be utf8ified or stuck into a raw Buffer and so it needs that information from outside of the database.

dominictarr · 2013-01-05T11:01:23Z

Right, but if the encoding is base64 or something, you turn it into a different string.

Also, can't you just make a Buffer that point to that particular memory range? or is there memory management problems there?

rvagg · 2013-01-05T22:25:20Z

Making a new Buffer performs a memcpy() on the byte array. Then making a new String does the same thing (I believe). If it's not utf8 then it's even more involved than that.
Hence, either stick to Buffer or String where possible and not convert between them if we can help it. The performance improvements when I switched to this were pretty impressive and it'd be best not to go backwards.

dominictarr · 2013-01-06T06:24:00Z

hmm... so setting any value for encoding implys a String.
Else it's a Buffer.

... this is complicated by arbitrary encodings,
if you want to support an encoding function that may require a string, or may require a buffer...

dominictarr · 2013-04-04T23:37:36Z

So, the core issue here is how to handle whether an encoding goes to a buffer or a string,
We last discussed this 3 months ago, but this stuff has changed now right -- leveldown handles strings too?

eugeneware · 2013-06-18T05:38:18Z

The other issue I've found with custom encodings (see: http://github.com/eugeneware/byteup) is that the 'batch' event that gets emitted ignores the decoding, and you just get the raw buffer, which may/may not be what you want. While there is a performance issue of decoding the buffer, it was a little disconcerting to see the abstraction leak.

eugeneware · 2013-06-18T05:43:43Z

Another option with specifying custom encodings would be simply to override the existing keyEncoding and valueEncoding options. If it's text then use the built in codecs. If it's a function use that:

var db = levelup('uri', {
  keyEncoding: {
    encode : function (raw) { return JSON.stringify(raw) },
    decode : function (json) { return JSON.parse(json) }
  },
  valueEncoding: {
    encode : function (raw) { return JSON.stringify(raw) },
    decode : function (json) { return JSON.parse(json) }
  }
})

Then you could have custom libraries that just return the { encode: fn, decode: fn } function pairs:

var mycodec = require('mycode');
var db = levelup('uri', {
  keyEncoding: mycodec,
  valueEncoding: mycodec
})

rvagg · 2013-06-18T05:46:35Z

the 'batch' event issue probably should be fixed! the test suite is very light on tests for events (in fact... I think they are almost entirely untested).

Re your suggestion, yes I think that's the way to go except for the issue of the codec needing to tell LevelDOWN whether it wants to receive buffers or Strings--it matters for performance that we get the right type from the db. So there needs to be a 3rd param.

dominictarr · 2013-06-18T15:40:25Z

Okay, what about:

var encoding = {
  stringify: function (...) {..}, parse: function (...){...}, buffer: true | false
}

I like stringify/parse because then you can just do

keyEncoding: JSON

Of course, somethings use encode/decode... and even pack/unpack... because people are like that.
I'd go for stringify/parse because it's most familiar.

dominictarr · 2013-06-18T15:55:07Z

hmm, except that it's already called a value encoding...
var encoding = {stringify: ...} is silly, var encoding = {encode, decode} is more consistent.

dominictarr mentioned this issue Jun 18, 2013

Custom encodings #149

Merged

dominictarr closed this as completed in #149 Aug 14, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support arbitary encoding #51

support arbitary encoding #51

Raynos commented Jan 4, 2013

dominictarr commented Jan 4, 2013

Raynos commented Jan 4, 2013

ralphtheninja commented Jan 4, 2013

juliangruber commented Jan 4, 2013

dominictarr commented Jan 4, 2013

ralphtheninja commented Jan 4, 2013

Raynos commented Jan 4, 2013

juliangruber commented Jan 4, 2013

rvagg commented Jan 4, 2013

dominictarr commented Jan 5, 2013

juliangruber commented Jan 5, 2013

juliangruber commented Jan 5, 2013

rvagg commented Jan 5, 2013

dominictarr commented Jan 5, 2013

rvagg commented Jan 5, 2013

dominictarr commented Jan 5, 2013

rvagg commented Jan 5, 2013

dominictarr commented Jan 6, 2013

dominictarr commented Apr 4, 2013

eugeneware commented Jun 18, 2013

eugeneware commented Jun 18, 2013

rvagg commented Jun 18, 2013

dominictarr commented Jun 18, 2013

dominictarr commented Jun 18, 2013

support arbitary encoding #51

support arbitary encoding #51

Comments

Raynos commented Jan 4, 2013

dominictarr commented Jan 4, 2013

Raynos commented Jan 4, 2013

ralphtheninja commented Jan 4, 2013

juliangruber commented Jan 4, 2013

dominictarr commented Jan 4, 2013

ralphtheninja commented Jan 4, 2013

Raynos commented Jan 4, 2013

juliangruber commented Jan 4, 2013

rvagg commented Jan 4, 2013

dominictarr commented Jan 5, 2013

juliangruber commented Jan 5, 2013

juliangruber commented Jan 5, 2013

rvagg commented Jan 5, 2013

dominictarr commented Jan 5, 2013

rvagg commented Jan 5, 2013

dominictarr commented Jan 5, 2013

rvagg commented Jan 5, 2013

dominictarr commented Jan 6, 2013

dominictarr commented Apr 4, 2013

eugeneware commented Jun 18, 2013

eugeneware commented Jun 18, 2013

rvagg commented Jun 18, 2013

dominictarr commented Jun 18, 2013

dominictarr commented Jun 18, 2013