ZRANGEBYLEX -- get lexicographic ranges from sorted sets #324

Closed
antirez opened this Issue Feb 3, 2012 · 7 comments

Projects

None yet

6 participants

@antirez
Owner
antirez commented Feb 3, 2012
ZRANGEBYLEX key min max [WITHSCORES] [LIMIT ...] [NOCASE]

How it works... To start the most important thing:

> ZRANGEBYLEX zset (aaa (bbb
-ERR Not all the elements in the sorted set have the same score.

So this is the requisite of the command, and it can do a super fast
check about this indeed, just check the first and last element of the
skiplist, and check if the score is equal.
Now assuming they are equal, what the command does is trivial, it
returns all the elements in the specified range, unless LIMIT is
specified of course.

Now how ranges are specified? I'm cloning your range structure, as
'zrangelexspec', but form the user point of view this is how it works:

The first character of the min or max specify if it is an open, closed
interval, or a string infinity or a string minus infinity:

"(foo" -> "foo" open interval
"[foo" -> "foo" closed interval
"-" -> the smallest string possible
"+" -> the biggest string possible

So for instance ZRANGEBYLEX zset - + will just return everything, and so forth.

Of course:

> ZRANGEBYLEX zset foo bar
-ERR Syntax error

> ZRANGEBYLEX zset (foo +bar
-ERR Syntax error
@Seldaek
Seldaek commented Feb 3, 2012

I assume this works like SORT .. ALPHA regarding UTF-8 support?

@antirez
Owner
antirez commented Feb 3, 2012

Seldaek: yes, for now we'll not support other than ASCII comparison, this is very important until we find an appropriate solution, since for instance you want to be able to put IP address in binary form, for instance, and then get ranges, without any collation doing strange things.

However if you want a different collation, you can do it in this way, for every element instead of adding the element itself you add instead:

<filtered_element>:element

This way you can control the collation yourself.

@atwam
atwam commented Mar 1, 2012

Sounds like a very good idea, especially for storing timeseries, which should be a breeze with this
It would be even better to be able to use long instead of string for keys, for storage purposes, but then it probably wouldn't be exactly a zset anymore.

@albrow
albrow commented Dec 19, 2013

+1 This would be an incredibly helpful feature!

Anyone using sorted sets for indexing string values (myself included) has to resort to inefficient workarounds to do range queries. Right now I'm using something along the lines of what is described here. Basically you do a ZADD and ZRANK to determine what would be the rank of the start and end string and then use ZRANGE to get the values.

Aside from the performance drawback, it's error-prone since elements could have been added/removed from the set in between the time you did ZRANK and the time you did ZRANGE. Resorting to scripting or using WATCH could avoid that issue, but that's nowhere near as easy as a builtin solution. And for sets which are changed frequently WATCH wouldn't suffice at all.

Alternative solutions involve using raw byte values or an order-preserving hash function to calculate scores. These are limited to some number of characters (I think you can get a max of 10-11 characters using some clever tricks) because of the limited precision of scores. These also usually limit the character set you can use for your strings.

A solution along the lines of what you're describing would be awesome.

Is anyone currently working on this feature and can I help?

@pixelcort

@stephenalexbrowne could your technique be put into a lua script and run inside the server?

@albrow
albrow commented Dec 19, 2013

Yes, I haven't tried but it should be possible. However...

  1. It's harder to implement and requires learning/getting comfortable with Lua (I don't really know it that well)
  2. A builtin solution would be 2-4x faster because it could avoid the ZADD, ZRANK, and ZREM commands entirely. (and of course Lua is slower than C in general, but that effect is probably dwarfed by the extra commands).

Haven't done much scripting with Redis personally, so correct me if I'm wrong.

@mattsta
Contributor
mattsta commented May 30, 2014

Implemented in commit 3a6a1e4

@mattsta mattsta closed this May 30, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment