Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow extra values in get/put callbacks #255

Closed
wants to merge 9 commits into from
Closed

Conversation

nlf
Copy link

@nlf nlf commented Jun 30, 2014

This is mostly to facilitate RiakDOWN, since tracking vclocks manually is sometimes desirable.

This will allow the _put method in abstract-leveldown backends to pass a value out to the client, and the _get method to pass extra metadata as well.

This is mostly to facilitate RiakDOWN, since tracking vclocks manually is sometimes desirable. This will allow the _put method in abstract-leveldown backends to pass a value out to the client, and the _get method to pass extra metadata as well.
nlf added a commit to nlf/riakdown that referenced this pull request Jun 30, 2014
@juliangruber
Copy link
Member

am I right that this is about forwarding data you provide to level's calls?

@nlf
Copy link
Author

nlf commented Jun 30, 2014

Right, exactly. It's to let the leveldown interface pass extra data back out to the client.

@juliangruber
Copy link
Member

This belongs in a level-* module that wraps db.{put,get,batch,create{Read,Key,Value}Stream} and adds the extra argument, this functionality is too specific to make sense for core.

@rvagg
Copy link
Member

rvagg commented Jul 1, 2014

hang on, I'm interested in hearing the case here; @nlf can you clarify the use-case here and if this is difficult in an addon module explain why?

@nlf
Copy link
Author

nlf commented Jul 1, 2014

So my specific use case relates to the use of riak in an abstract-leveldown compatible backend riakdown.

When creating an object, riak associates a vclock with that object. Future changes to the object should include that vclock, to make sure that riak updates the existing object rather than creating a new one.

I'm using this extra parameter to allow the put and get commands to return the vclock, that way the user can store the vclock and pass it along with future commands to get better consistency.

I saw it being potentially useful for other backends as well, since it could pass through any metadata the *down implementer wanted.

@rvagg
Copy link
Member

rvagg commented Jul 1, 2014

While I don't really want to encourage adding things just in case other backends might need it, I'm not sure I see an easy path to doing this as an addon to levelup without reimplementing the whole this.db interaction architecture. @juliangruber is it really reasonable to suggest that this should be pushed to an addon?

@juliangruber
Copy link
Member

Aah I misread the code. I thought the extra argument wouldn't even make it to leveldown.

The general solution I guess would be for backends to return arbitrary arrays instead of just the value on a get and put, and arbitrary objects on iterators.

When we merge this, another backend might need to pass two additional arguments, or more. One could argue though that they could just make extra an object with everything stuffed in there.

I'm reopening this.

@juliangruber juliangruber reopened this Jul 1, 2014
@juliangruber
Copy link
Member

the extra argument in get and puts doesn't really bother me, but streams should only add the extra key when there's a value, otherwise this could be confusing.

@nlf
Copy link
Author

nlf commented Jul 1, 2014

I can fix it so extra is only added to streams when it exists

@dominictarr
Copy link
Contributor

My question is what do you get out of using level on top of riak? Riak may well be a fine database,
but it also because of it's distributed architecture it cannot support many of the features that a embedded database has (for examlpe, in order traversals, or real time events)

What is the situation that you want to use level on top of riak in?

@nlf
Copy link
Author

nlf commented Jul 1, 2014

Specifically, it was to support dulcimer

@dominictarr
Copy link
Contributor

Hmm, so how do you use the riak features you mentioned in dulcimer?
arn't you leaking your abstraction?

"when you only have a hammer..." you said it!

@nlf
Copy link
Author

nlf commented Jul 1, 2014

I believe the usage of them is still in a development branch, but I'd have to check with @fritzy to be sure.

The general idea is that riak specific features are leveraged directly though riakdown or via a level-style plugin on top of riakdown (since the underlying riak client is exposed).

Then the same features are implemented in level plugins on top of leveldb, which essentially allows us to swap between level and riak backends, and maintain the same set of features.

@dominictarr
Copy link
Contributor

Right, but none of the other leveldowns have the vector clocks from riak so that feature leakes through, or you just don't use it...

I guess if you use something like dulcimer you can know that you can scale up to riak later, but don't have to run riak yet, but you can never use the best features any particular database offers.

@nlf
Copy link
Author

nlf commented Jul 1, 2014

That's true. This isn't necessarily only for vector clocks though, it just happens to be the use case I needed it for. The idea is just to allow for the leveldown interface to send additional data, besides just the value, back out to the client.

@phated
Copy link

phated commented Jul 2, 2014

I definitely see the need to allow *down interfaces to send extra data back from the store. Why restrict the types of databases you can use because you don't want to allow extra metadata?

@dominictarr
Copy link
Contributor

leveldown is not a generic database abstraction, it's primarily leveldb, but we have polyfills for things that are sufficiently similar to leveldb, i.e. it needs range/cursors and batch inserts, at the very least, and even - better to be an embedded database - which makes the put/del/batch events possible. Riak does not support these things so it doesn't really fit into a leveldb shaped hole.

@rvagg
Copy link
Member

rvagg commented Jul 18, 2014

dude, we are programmers, this is what we do, find a beautiful abstraction and force everything through it, SQL is an example of this.

wrong shape

I'm actually more interested in the possibilities of modularity on top of a common abstraction, there's loads of interesting things to be done in the level* ecosystem with slightly crazy backends and I'm +1 on helping that happen as long as we don't ruin our beautiful abstraction!

@fritzy
Copy link

fritzy commented Jul 24, 2014

I have two modules: level-dulcimer and riak-dulcimer which provides mixins that I need for each type of connection.

@fritzy
Copy link

fritzy commented Jul 24, 2014

I'm all for feature detection and early warning by failing early, but I haven't yet taken the time to help solve that problem.

@rvagg
Copy link
Member

rvagg commented Jul 31, 2014

This has generated a lot of discussion but I don't see much of a consensus here.

Can we please hear from any core collaborators who are still -1 on this and a succinct summary of why you don't want to see this merged:

@juliangruber
@substack
@maxogden
@pgte
@mcollina
@kesla
@No9
@hij1nx
@chesles
@dominictarr
@ralphtheninja
@Raynos
@nolanlawson

@Raynos
Copy link
Member

Raynos commented Jul 31, 2014

@rvagg thanks for following through and taking charge :)

@ralphtheninja
Copy link
Member

+1. I like rock n roll.

@juliangruber
Copy link
Member

👎, at least not in this shape, because:

  • createReadStream would emit { key, value, extra }, but neither would create{Key,Value}Stream() emit the extra arg nor is there a createExtraStream(). We would be losing symmetry. Either add createExtraStream() or drop create{Key,Value}Stream() and add { extra: true/false } as an option to createReadStream()

I would be 👍 if we can fit this in nicely though, passing around meta data sounds like a good feature.

@rvagg
Copy link
Member

rvagg commented Aug 2, 2014

@juliangruber is the asymmetry the main problem for you? Can you help us care about this because I'm struggling.

Does the additional value in { key, value, extra } make any practical difference for existing code? It's happily quarantined so that only data stores that support it actually make it on to the object.

@juliangruber
Copy link
Member

It's just odd having create{Key,Value}Stream as convenience methods for { values: false } and { keys: false }, but not createExtraStream.

This is also possibly a breaking change, as createReadStream({ values: false }) would originally only give you keys, now you get keys and extra values. Imagine

db.createReadStream({ values: false })
  .pipe(JSONStream.stringify())
  .pipe( whatever )

There goes the optimization too I guess...

@fritzy
Copy link

fritzy commented Aug 3, 2014

We're all for having extra be false by default, and adding the other convenience methods.

values: false would still result in a stream of key strings this way.

We'll update the pull request on Monday.

@dominictarr
Copy link
Contributor

My objection is really that this changes the meaning of levelup from a generalization of embedded databases,
to a database abstraction layer for any database. I don't think that is a good thing because levelup already has features that won't work if you just plonk any database in as the leveldown, everything only works if you use an embedded database.

I think dulcimer would work better if they forked levelup and deleted all the features that riak could not support, this would be much more clear and consistent. And it would be an abstraction of pretty much any database. You could still use an embedded database inside this, you couldn't do everything you can do with levelup, but it would work fine.

@juliangruber
Copy link
Member

👍 what dominic said

@rvagg
Copy link
Member

rvagg commented Aug 3, 2014

For me, one of the problems this PR highlights yet again is our inability to come up with a plugin or extension mechanism that exposes enough of the guts of levelup such that it can be extended flexibly. I'd love to be able to say that this should be experimented with in an extension to levelup before we consider it for core but that's just impossible without forking it. My experiments with externr (and whatever issue/PR the code for that is hidden in) were all about trying to expose extension points deep inside levelup so you could intercept calls as they pass through rather than being left captive to hidden closures or scoped functions that you have no way to intercept. I'd really like us to get back to exploring the possibilities here more.

Having said that, I still don't see the problem with exposing an additional argument that can be used to carry metadata. This isn't just a Riak thing, it's a common database pattern and having a floating argument that is optional and may or may not be used downstream allows for some interesting experiments that are currently not possible.

I also find it hard to get my OCD churned up over the lack of symmetry over createXStream here either.

@dominictarr
Copy link
Contributor

So @rvagg what I'm reading from for comment is an appeal to inclusiveness - you don't want to turn the these people away, just because they are doing things differently? Then the other side of this discussion is could be described as a line in the sand argument.

Inclusiveness is certainly a noble aim - and there is still room for improvement in levelup - I would have said the improvement-room is mainly polishing and tidying - but maybe these are the same thing?
recently we merged a pr to refactor out encodings into their own file, and also, readstream creation is now more decoupled from the LevelUP object - in fact, the new sublevel uses levelup/lib/encodings and levelup/lib/read-stream without actually wrapping LevelUp - this was necessary because I needed to handle prefixes in the key encoding. Maybe the way forward is to decouple the various levelup parts so that it's more easy to build your own levelup?

@fritzy
Copy link

fritzy commented Aug 3, 2014

I have a few counter points.

  1. There are other use cases. The fact is that dulcimer is just the first use case to bring it up. A leveldown fork may want to include whether the read was from LRU cache or not. A levelup mixin may want to include indexes and link-walking data for that key upon get/read. Another backend fork might generate a continuation token to preserve the snapshot diff (like rocksdb) on disk for later use when "limit" is used.
  2. Any external database that supports consistency could have atomic batching. To say that the feature-set is only covered by embedded databases is false.
  3. "Turning us away" would just be turning us away on this feature. Unless you add a clause to your license, you don't get to police the way other people use the library. "If we don't approve your use case, you need to fork the project." Since this isn't the only use case for this feature, that seems rather silly. Why do you feel the need to police the use of this library?

The use case is very tangential to the feature request. Using this as a way of preventing us from using the library hurts the project.


You don't like Riak, you don't like the idea of using levelup as an abstraction point for keystores, and so you don't like Dulcimer. I'm trying to make it easy for developers to develop against embedded databases, and potentially change to other databases later. I'm trying to service developers using key stores in general, including leveldb specifically.

@sockdrawermoney
Copy link

Since dulcimer was originally the indirect result of me requesting a couple features from Fritzy to help build a couple small level apps I was making, it feels very odd to see dulcimer as controversial in this community.

I can certainly understand the biases people have against specific technologies. We are fairly opinionated crew and tend strongly toward the same general philosophies—strongly preferring to use and contribute only to open tools.

Loosely coupled components made libuv possible and this sort of thing (while much less dramatic) feels in line with those ideals and possibilities.

In this discussion, I see no real complaints of causing harm to level, except for philosophical ones.

But we're on your "side" here, guys. We want level to be used by more people. And we absolutely want to use this for good, not evil.

It's not like there's some crazy Corporate Riak Overlord Shadow Government Conspiracy here. Hell, we don't currently have a use case that even makes &yet money.

One of our primary use cases is building a completely open and free federated messaging and WebRTC service.

We want to be able to use Level and Riak and possibly other keystores interchangeably based on the right fit for the specific needs at hand.

Why? Because it's fun and interesting. Maybe we'll find out it was a terrible idea. Maybe we'll find out it was a good idea.

Aol keyword hashtag: #science

@dominictarr
Copy link
Contributor

The argument about whether or not this feature has other uses would carry a lot more weight for me if there was another actual implementation that wanted this. currently, this is being pitched as "possible" uses.

I don't mean to exclude you by saying you should fork, you are still part of the level community even if you build your own levelup. In many ways, a module is better than a pull request. In particular, you do not have to ask permission to make a module. I realize now that some of the language I used maybe had exclusionary connotations - It's difficult to disagree with someone while showing respect - especially over the internet, but I'm doing my best. I believe that disagreements not usually because someone is wrong but rather because people are coming from different contexts. It's often difficult to state your context, because it always surrounds you like the water the fish swims in. I guess a significant part of my context is that most of the modules I have written for level-* (which is quite a lot) would not work correctly with riak or another remote backend. Merging this patch would implicitly say it's alright to use a remote back end with level, and I'd have to add disclaimers to all of my modules that they now only work with level in embedded mode.

My favorite thing about embedded databases, is that since you control everything that goes in and comes out then you can do lots of cool stuff with realtime events. levelup has events that trigger after a put, del, and batch, and so you can use this to get a realtime feed of everything that has gone into the database. This does not work with a remote backend, unless you pretend it's embedded, and run everything through one levelup instance, but that defeats the purpose of using a massively scalable database service like riak.

One technique for finding agreement is limiting the scope of the agreement. I certainly agree that there is a subset of the level api that is also in riak. None of us are making much progress persuading each other that our position is correct but is there another approach we can take where the scope of agreement is smaller?

@dominictarr
Copy link
Contributor

by the way, I'd just like to point out that every change to level gets a significant amount of discussion,
this discussion is probably above average in length, but it's not an order of magnitude.

@Raynos
Copy link
Member

Raynos commented Aug 3, 2014

Merging this patch would implicitly say it's alright to use a remote back end with level

This is an interesting point. As far as I understand this patch it to support riakdown. We should not encourage implementing a *down that implements a subset of leveldown & levelup.

We should also not encourage extending the levelup interface to support features only needed backends that do not implement the entire leveldown interface or do not work cleanly with the interface levelup exposes.

If we can find a use case & implementation (as @dominictarr ) mentions for this feature based on a backend that supports all the leveldown / levelup concerns then that is reasonable to merge.

If you find the need to extend levelup to support a thing that's not an embedded leveldown it's probably a sign that we are applying the wrong abstraction too generically.

We must remember that a lot of the value is level-* modules and we shouldn't encourage using leveldown's that would break those modules because they implement a subset.

@rvagg
Copy link
Member

rvagg commented Aug 3, 2014

One of my hopes for extension of level* is more experimentation with extending back beyond the leveldown interface. For example, we're seeing great experimentation with replication happen at above levelup, but why can't those efforts also be happening below leveldown, where you present a leveldown interface to something much more complicated. Perhaps you wrap the existing leveldown with a layer that is able to replicate with remote leveldowns to make a cluster that handles all of the atomicity and other concerns in a transparent way. This would obviously cause compatibility problems with some of the assumptions that levelup extensions are making but we already have a soup of compatibility problems, and that's OK. Currently it's caveat emptor if you're buying into this ecosystem, you either follow patterns promoted by people in the know, or forge your own path to piece things together.

So far the only experimentation with leveldown extension is in providing backends that interface with "real" databases. This is a good start IMO but there's a ton of potential down there for some really crazy mad science.

There's probably a disconnect in the way I see level* and @dominictarr's concerns for restricting this to "embedded" databases. But I find it difficult to even restrict that definition, for instance lmdb can have multiple processes reading from the same store, does that break the "embedded" contract? And doesn't multilevel already break the contract?

Just to expand this discussion beyond Riak, I might ask @brett19 to come in and without asking that he read this whole thread perhaps he could share what kind of metadata that Couchbase would share if it were made into a leveldown interface and had the ability to add an extra argument to a get(key, function (err, value[, extra]) { ... }) call (the essence of this thread).

@brett19
Copy link

brett19 commented Aug 4, 2014

Hey Guys,
Just to chime in here, Couchbase currently exposes a number of different meta-data details at various levels within our library. At the lowest level we expose the value, a CAS value (which is similar to Riak's vclock) as well as a generic 32-bit integer which is for storing the data-type of the
stored object (think node::Buffer, UTF-8 Encoded JSON, UTF-8 String, etc...). At the highest level (api consumer level), we only expose the CAS value along with the value itself which has already been decoded from the raw binary data into the appropriate type. It's worth mentioning that there are some close similarities between the proposed API above, and the API for Couchbase's Node.js SDK, the primary difference that we expose these metadata parameters alongside the value's together as a single object, so you would see something like:

db.get('some-key', function(err, res) {
  // res.value - the decoded value
  // res.cas - the CAS value (vclock-like)
  // res.flags - Not often will this end up here, but its possible
});

Cheers, Brett

@fritzy
Copy link

fritzy commented Aug 4, 2014

I think the "embedded backends that support all level features only" ship has sailed with SQLDown & RedisDown. PouchDB Levels Up is an interesting article on the topic.

@dominictarr
Copy link
Contributor

@rvagg actually multilevel is a interesting case - it's actually fairly explicit about the features it cannot support as a remote client - https://github.com/juliangruber/multilevel#api
it is still possible to use level plugins with it,
but you generally must extend the backend via a level-* module (and generate a manifest)
and most modules will not work if you try to install them onto the client.
for example, level-peek is usable from the client, but level-live-stream must be installed on the server side. Also, events are not supported - because it would be too expensive to send all events.
and too magic/leaky to asyncly register listeners.

yes, some of the *downs do support multiple processes, but just because they exist doesn't mean they are widely used, and there havn't been any features merged into levelup to support those cases. Often they are just implemented as a exercise, or out of desperation - like to be used on top of websql - which is effectively embedded. levelup has always been about leveldb, if another database can act like a passable leveldb then they can get involved, but no one has ever seriously suggested exposing non leveldb features, until now.

@mcollina
Copy link
Member

mcollina commented Aug 4, 2014

I think it's really a bad idea to have an extra argument. The purpose of all the *down is to offer an abstract interface on top of several database. This is non-sense if you have to use specific feature of something that is abstracted away.
However, I see that there might be the need for some down specific code to happen, but RiakDown is accessible as a property, e.g. db.db.

@chesles
Copy link

chesles commented Aug 5, 2014

As with so many discussions with this group, this has been really interesting to follow, and I keep going back and forth on it. I'm still not sure what my opinion is, or whether I can really form an opinion since I am not actively developing anything using level* at the moment.

So for what it's worth, here are my thoughts: if this extra argument is used to support backend-specific functionality, as proposed here for Riak, it just means you probably can't swap out your backend as cleanly as you can now; you're basically hard-coded to that backend if you depend on those extra parameters.

Keeping that in mind, as long as you know what you are getting into when you start using the extra parameter, whether for the above-mentioned vaporware metadata extensions, or to support a specific backend like riak, and you are OK with the tradeoffs, I think it could be interesting to experiment with.

So... I say +1...I think!

@nolanlawson
Copy link
Contributor

I think it's really a bad idea to have an extra argument. The purpose of all the *down is to offer an abstract interface on top of several database. This is non-sense if you have to use specific feature of something that is abstracted away.

Is there a way to abstract the vclock feature itself, so that it could apply to more than just Riak? While avoiding the free-form extra parameter entirely?

@rvagg
Copy link
Member

rvagg commented Aug 6, 2014

Sadly I'm going to have to draw a line under this and close the PR. We have been unable to make meaningful progress towards a consensus and passionate opinions are still being held against this feature.

Of course this doesn't rule out further discussion in here (I'm particularly interested in @nolanlawson's last comment about vclock abstraction possibilities) but I'd like for us to try and focus our energies on turning #270 into a compromise position that we can all get behind.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet