Skip to content
This repository

ZRANGEBYLEX -- get lexicographic ranges from sorted sets #324

Open
antirez opened this Issue · 6 comments

5 participants

Salvatore Sanfilippo Jordi Boggiano Atwam Alex Browne Cortland Klein
Salvatore Sanfilippo
Owner
ZRANGEBYLEX key min max [WITHSCORES] [LIMIT ...] [NOCASE]

How it works... To start the most important thing:

> ZRANGEBYLEX zset (aaa (bbb
-ERR Not all the elements in the sorted set have the same score.

So this is the requisite of the command, and it can do a super fast
check about this indeed, just check the first and last element of the
skiplist, and check if the score is equal.
Now assuming they are equal, what the command does is trivial, it
returns all the elements in the specified range, unless LIMIT is
specified of course.

Now how ranges are specified? I'm cloning your range structure, as
'zrangelexspec', but form the user point of view this is how it works:

The first character of the min or max specify if it is an open, closed
interval, or a string infinity or a string minus infinity:

"(foo" -> "foo" open interval
"[foo" -> "foo" closed interval
"-" -> the smallest string possible
"+" -> the biggest string possible

So for instance ZRANGEBYLEX zset - + will just return everything, and so forth.

Of course:

> ZRANGEBYLEX zset foo bar
-ERR Syntax error

> ZRANGEBYLEX zset (foo +bar
-ERR Syntax error
Jordi Boggiano

I assume this works like SORT .. ALPHA regarding UTF-8 support?

Salvatore Sanfilippo
Owner

Seldaek: yes, for now we'll not support other than ASCII comparison, this is very important until we find an appropriate solution, since for instance you want to be able to put IP address in binary form, for instance, and then get ranges, without any collation doing strange things.

However if you want a different collation, you can do it in this way, for every element instead of adding the element itself you add instead:

<filtered_element>:element

This way you can control the collation yourself.

Atwam

Sounds like a very good idea, especially for storing timeseries, which should be a breeze with this
It would be even better to be able to use long instead of string for keys, for storage purposes, but then it probably wouldn't be exactly a zset anymore.

Alex Browne

+1 This would be an incredibly helpful feature!

Anyone using sorted sets for indexing string values (myself included) has to resort to inefficient workarounds to do range queries. Right now I'm using something along the lines of what is described here. Basically you do a ZADD and ZRANK to determine what would be the rank of the start and end string and then use ZRANGE to get the values.

Aside from the performance drawback, it's error-prone since elements could have been added/removed from the set in between the time you did ZRANK and the time you did ZRANGE. Resorting to scripting or using WATCH could avoid that issue, but that's nowhere near as easy as a builtin solution. And for sets which are changed frequently WATCH wouldn't suffice at all.

Alternative solutions involve using raw byte values or an order-preserving hash function to calculate scores. These are limited to some number of characters (I think you can get a max of 10-11 characters using some clever tricks) because of the limited precision of scores. These also usually limit the character set you can use for your strings.

A solution along the lines of what you're describing would be awesome.

Is anyone currently working on this feature and can I help?

Cortland Klein

@stephenalexbrowne could your technique be put into a lua script and run inside the server?

Alex Browne

Yes, I haven't tried but it should be possible. However...

  1. It's harder to implement and requires learning/getting comfortable with Lua (I don't really know it that well)
  2. A builtin solution would be 2-4x faster because it could avoid the ZADD, ZRANK, and ZREM commands entirely. (and of course Lua is slower than C in general, but that effect is probably dwarfed by the extra commands).

Haven't done much scripting with Redis personally, so correct me if I'm wrong.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.