Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exclusive vs inclusive start/end min/max #118

Closed
rvagg opened this issue Apr 14, 2013 · 29 comments
Closed

exclusive vs inclusive start/end min/max #118

rvagg opened this issue Apr 14, 2013 · 29 comments
Labels
discussion Discussion

Comments

@rvagg
Copy link
Member

rvagg commented Apr 14, 2013

Continued from #110 where the topic is slightly different.

Currently both 'start' and 'end' are both inclusive. Options are to:

  • leave as they are
  • make 'end' exclusive
  • add additional option(s) for exclusivity, perhaps 'startExclusive' & 'endExclusive'?

Keeping in mind that we may be ditching 'start' and 'end' in favour of 'min' and 'max' as per #110.

@rvagg
Copy link
Member Author

rvagg commented Apr 14, 2013

ftr, my thoughts on this:

  • while I like the idea of having 4 options to cover all cases ('min', 'minExclusive', 'max', 'maxExclusive', assuming we move to min/max, you could even introduce 2 more and have 'min' and 'max' be aliases), that's perhaps getting a bit out of hand; more to test, more to explain.
  • making 'max' exclusive makes sense in terms of the patterns we normally use when programming I'm concerned that it may make it difficult to fetch what you really need from a range. You generally want a complete range and you often don't know what the range+1 is which makes an exclusive end a bit tricky. Using '~' or '\xff' will most likely get you what you want you have to be absolutely sure that the last key doesn't use those characters. Although I accept that we already have roughly this problem anyway since '~' or '\xff' won't catch anything with characters following (but that could be fixed if we wanted it to, the concept of an "end" is a LevelDOWN thing, not a LevelDB thing, ref, the compare could be made to include following characters so '~' could also include '~foo').

I'd like to hear from others who are actually making heavy use of ranges in production to see if there are actual pain points here to be addressed?

@dominictarr
Copy link
Contributor

when you use some high character as the end point (like '\xff') you need to be sure that it is not actually a part of a key. At least, it's not a part of a key that you will want to retrive as part of a range.

I'm having a little difficulty understanding your second bullet point,
are you saying that what you often want to retrive is all keys that start with a given prefix?
That is generally what you are doing when you specify a range such as
{min: 'a', max: 'a\xff'}, except you are assuming that there is no key that starts with 'a\xff'...

@rvagg
Copy link
Member Author

rvagg commented Apr 14, 2013

##leveldb <rvagg> yeah, I'm being very hypothetical, you'd be silly to have \xff included in your actual keys but it's not out of the realms of possibility
##leveldb <rvagg> perhaps your keys come from user-generated data, perhaps by them prefixing something with \xff allows a key to be generated with that at the start of it

just thinking through edge-cases, don't want \xff to end up being our own version of an SQL-injection.

@dominictarr
Copy link
Contributor

very good point! currently, in level-sublevel, put('\xffKEY\xff', value)
would insert into a deeper sublevel...
which could have unexpected results - I'll add this as an issue in level-sublevel.

@dominictarr
Copy link
Contributor

Hmm, I've been thinking about this, and my feeling is that this would be cleanest to fix in leveldown.

It does make sense for an Iterator to have start and end, if you think about it as a cursor that moves
but - building that into leveldown will mean that other leveldown implementations will need to match the same api
{start:s, end:e}. Most candidates to be polyfilled to leveldown have something closer to the proposed {min, max} than '{start, end}`.

in sql-lite (so, websql - and on phonegap, etc)
http://sqlite.org/lang_expr.html#booleanexpr

indexeddb

https://developer.mozilla.org/en-US/docs/IndexedDB/IDBKeyRange

The only other database I know that uses an api like that is couchdb
http://wiki.apache.org/couchdb/HTTP_view_API#Querying_Options

which confuses people there two!

http://grokbase.com/p/couchdb/user/096jayt85v/startkey-endkey-and-descending-true

Having to reverse {start, end} when you enable reverse just exposes an implementation detail than can be hidden.

@rvagg
Copy link
Member Author

rvagg commented Apr 16, 2013

I'll have a look at those links but ultimately I might just have to trust you from an end-user perspective.

@ricardobeat
Copy link

+1 for exclusive end, and auto-reverse.

Took me a while to figure out why I was getting extraneous keys :/

@dominictarr
Copy link
Contributor

we need Level/leveldown#28 in to solve this properly, because it is necessary to check each key for whether it matches the comparator.

@juliangruber
Copy link
Member

What about after for exclusive start and before for exlusive end?

@heapwolf
Copy link
Contributor

@juliangruber +10. It's also my birthday today, so i can do that. ;asudhfalkhgfa;slfhl;aj;weir 🍰

@rvagg
Copy link
Member Author

rvagg commented May 28, 2013

yeah, perhaps, it certainly makes sense, but it might be kind of ugly?

db.createReadStream({ start: 'foo', end: 'foo\xff' })
db.createReadStream({ start: 'foo', before: 'foo\xff' })
db.createReadStream({ after: 'foo', end: 'foo\xff' })
db.createReadStream({ after: 'foo', before: 'foo\xff' })

Is that intuitive enough?

More 🍻 and 🍰 for you @hij1nx, hip hip 🎂

@rvagg
Copy link
Member Author

rvagg commented May 28, 2013

if we adopt min & max to "fix" the start & end issue (that they appear the wrong way around for most people for reversed streams), then this:

db.createReadStream({ min: 'foo', max: 'foo\xff' })
db.createReadStream({ min: 'foo', before: 'foo\xff' })
db.createReadStream({ after: 'foo', max: 'foo\xff' })
db.createReadStream({ after: 'foo', before: 'foo\xff' })

perhaps a bit more messy because min/max is a bit more formal, in linguistic terms, than before/after, there's an odd disparity there, or is that just me?

@juliangruber
Copy link
Member

Yeah I feel the same way. It's just that it's super clear what before and after mean.

I am for start/end, min/max, before/after, etc. always being both inclusive or exclusive, as they are word pairs and belong together. Otherwise that could be really confusing!

@luk-
Copy link

luk- commented May 28, 2013

I am for start/end, min/max, before/after, etc. always being both inclusive or exclusive, as they are word pairs and belong together.

It's slightly confusing but I think it makes the most sense.

@juliangruber
Copy link
Member

Just some ideas:

// inc / exc
db.createReadStream({ inc: 'foo', inc: 'bar' });
db.createReadStream({ exc: 'foo', exc: 'bar' });

// sugar -> probably a module
db.createReadStream('[foo, bar)', {});
db.createReadStream({ start: '[foo', end: 'bar)' }); // collisions?

// verbose
db.createReadStream({ startingAt: 'foo', endingBefore: 'bar' });
db.createReadStream({ startingAfter: 'foo', endingAt: 'bar' });

// ..or
db.createReadStream({ startAt: 'foo', endBefore: 'bar' });
db.createReadStream({ startAfter: 'foo', endAt: 'bar' });

// ..or
db.createReadStream({ start: 'foo', endBefore: 'bar' });
db.createReadStream({ startAfter: 'foo', end: 'bar' });

@dominictarr
Copy link
Contributor

what about:

greaterThan: exclusiveMin, lessThan: exclusiveMax, greaterThanOrEqual: inclusiveMin, lessThanOrEqual: inclusive: min

I think the meaning here is completely clear... and it could be aliased to something that is easier to type:

">": exclusiveMin, "<": exclusiveMax, ">=": inclusiveMin, "<=" inclusiveMin

or gt, lt, gte, lte ?

@ricardobeat
Copy link

Not as pretty as the others but I think gt, lt, gte, lte does the job well, with start(inc)/end(exc) kept as the most common use case.

@mcollina
Copy link
Member

@dominictarr 👍.

It's also easy to support ALL of them.

@rvagg
Copy link
Member Author

rvagg commented May 28, 2013

I'm concerned about API surface area because it makes it more difficult to extend levelup, so lets not go overboard with aliasing.

@dominictarr
Copy link
Contributor

@rvagg agree. naturally, we'd convert any aliases into an internal representation immediately.

I also have a funny feeling that min, max, minExclusive(boolean), maxExclusive(boolean) might be easier to work with programatically, but the important thing is that we have an api that is obvious and covers the right cases.

@rvagg what do you think about: lt, gt, lte, gte?

@rvagg
Copy link
Member Author

rvagg commented May 28, 2013

yep, lt, gt, lte, gte is a nice compromise actually, I keep on thinking minInclusive, maxInclusive, minExclusive, maxExclusive but it reminds me of too much Java code I've written.

I think this thread is waiting for someone to submit a PR so we can just get on with it.

@dominictarr
Copy link
Contributor

right, so what we need for that is to expose the comparator on leveldown.
although, I guess, that could be done with a function that just compared strings, and then change it to the leveldown comparator later...

Okay, I might tackle that this evening.

@dominictarr
Copy link
Contributor

okay, that didn't happen, but, soon.

@rvagg
Copy link
Member Author

rvagg commented May 28, 2013

rather than using the comparator it'd probably be best to support inclusive and exclusive start & end in leveldown; it's an open question as to whether we do the reverse options switch in there too or if that's just a levelup thing.

@dominictarr
Copy link
Contributor

well, what are the reasons that leveldown shouldn't just have lt, gt, lte, gte?

@rvagg
Copy link
Member Author

rvagg commented May 29, 2013

No good reason that I can think of except that somebody needs to do it, I'll put this on my todo list for when I find a spare moment, anyone else is welcome to have a crack at it though. Might be a good one for someone who's been a bit hesitant to get their hands dirty with V8/C++, start the job, submit a PR for feedback & suggestions and go from there.

@dominictarr
Copy link
Contributor

This is next item on my levelup to do list - my plan is to add this to leveldown in C++.
that is probably what is stalling this the most - C++ just drives me to the drink...

@dominictarr
Copy link
Contributor

okay so now that we actually have support for lt,gt,lte,gte and it works... and it took a long time to get it in, I now realize that having two possible lower bound / upper bound keys complicates code that deals with ranges. you have to always check if there is a lower bound, or the other lower bound.

it would have been simpler to have a (min and minEx where min the key, and minEx is a boolean that determins whether it is exclusive...

Another thing that makes ranges awkward is that bytewise encodes undefined, but json doesn't.
undefined is thus often used to mark "no value", and indeed range.gt will evaluate to undefined if there is no lt key set. But, you can also set lt to undefined, and that is a valid value. Thus you need to use hasOwnProperty to check if there is a value and you can't have a function that returns
the lower bound, because what does it return if there is no lower bound?

I guess this is partially javascript's fault for having two nul values and that neither of them work right.
anyway, that means we have to use them in a javascripty way, and that now means travelling back in time and making bytewise not encode undefined. prehaps false should have been the lowest, and true the highest?

@dominictarr
Copy link
Contributor

make that travel back in time and fix bytewise AND indexeddb.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion Discussion
Projects
None yet
Development

No branches or pull requests

7 participants