Enhanced key-value store using the trie data structure #717

Closed
wants to merge 10 commits into
from

Conversation

Projects
None yet
@aluzzardi

Hey,

These commits implement the Trie data structure into Redis and provide a new data type which makes use of it.

The main advantages over hash tables are:

  • Memory efficient. No node in the tree stores the key associated with that
    node. Instead, its position in the tree defines the key with which it's
    associated. This results in memory savings as common key prefixes are stored
    only once.
  • No collisions. One of the biggest problems with hash tables are collisions.
    They yeild a worst case performance of O(N), require constant re-hashing,
    and can be attack vectors, as seen here:
    #663
  • Prefix search. Traversing a trie from a given prefix doesn't require walking
    through every node, which opens up a lot of new possibilities.

In addition to the data structure, I've implemented the TRIE data type which
makes use of it.

The TRIE data type is 100% compatible with the HASH type and supports
all of its commands (HGET becomes TGET, HSET becomes TGET and so on).

In fact, the trie unit tests were simply copied from the hash table and
slightly modified.

Additionally, tries support prefix-based traversal for TKEYS, TVALS and
TGETALL with the same performance.

Example:

tset trie hello xxx
(integer) 1
tset trie hey xxx
(integer) 1
tset trie foobar xxx
(integer) 1
tkeys trie

  1. "foobar"
  2. "hey"
  3. "hello"
    tkeys trie he
  4. "hey"
  5. "hello"

This opens up many possibilities, such as using Redis to build an efficient
Auto Complete service.

Performance wise, this implementation delivers about the same performance as
hash tables.

I've run a couple of benchmarks on my laptop, so please don't consider those
as a reference:

./redis-benchmark -n 1000000 -r 1000000 -t HSET,HGET
HSET: 35617.61 requests per second
HGET: 35335.69 requests per second
used_memory_human:75.85M
used_memory_peak_human:80.69M

./redis-benchmark -n 1000000 -r 1000000 -t TSET,TGET
TSET: 36926.26 requests per second
TGET 36587.15 requests per second
used_memory_human:42.31M
used_memory_peak_human:45.01M

While the performance is about the same, it saves more than 50% of memory.

Note that the redis-benchmark test is a little biased:
In real world tests the memory usage will highly depend on the key distribution
and the values it holds.
Because redis-benchmark uses the same key prefix and increments the suffix
of the keys, tries perform very well on it, which might not reflect real-world
usage patterns.

For now, tries live as a separate data type.

If the concept is accepted, 3 things could happen:

1/ Keep it as is - a new data type compatible with hashes that provide some
extra features.

2/ Make HT-based data types (HASH, SET, ...) use the trie data structure

3/ Whatever the outcome of 1 and 2 are, another possibility is to replace the
main object database with a Trie. Not only this could lead to improved memory
efficiency, but the KEYS command could be special cased to take advantage of
tries for prefix-based lookups (e.g. KEYS foo*).

Please keep in mind that I've just started wrapping my head around Redis
internals, so this should not be considered production ready.

Let me know what you think.

aluzzardi added some commits Oct 22, 2012

Trie data structure implementation
A Trie is a fast, memory efficient ordered tree data structure used to store
associative arrays.

Its lookup, insert, delete and replace complexity is O(k), where k is the
length of the key. Unlike hash tables, there are no collisions meaning its
worst case complexity is still O(k) and no re-hashing is required.

No node in the tree stores the key associated with that node.
Instead, its position in the tree defines the key with which it's associated.
This results in memory savings as common key prefixes are stored only once.
Implemented the TRIE data type using the trie library.
The TRIE data type is 100% compatible with the HASH type, and supports
all of its commands (HGET becomes TGET, HSET becomes TGET and so on).

Additionally, Tries support prefix-based traversal for TKEYS, TVALS and
TGETALL with the same O(k) complexity.

Example:
> tset trie hello xxx
(integer) 1
> tset trie hey xxx
(integer) 1
> tset trie foobar xxx
(integer) 1
> tkeys trie
1) "foobar"
2) "hey"
3) "hello"
> tkeys trie he
1) "hey"
2) "hello"
Trie unit tests: Removed "small hash" tests.
As the TRIE type is not using different data structures as the HASH type
(ziplist/dict), there's no point in duplicating the tests.
redis-benchmark: Added benchmarking support for HSET/HGET
Even though this is indirectly already supported by GET/SET because the
global object dictionary is implemented using a dict, comparing HASH
against other associative arrays is not fair because creating objects in
the main object database is more expensive, especially in terms of memory.
@samalba

This comment has been minimized.

Show comment Hide comment
@samalba

samalba Oct 22, 2012

This looks great, I am looking forward to use it in the next stable release. Please merge :-)

samalba commented Oct 22, 2012

This looks great, I am looking forward to use it in the next stable release. Please merge :-)

@chriso

This comment has been minimized.

Show comment Hide comment
@chriso

chriso Oct 22, 2012

I can't see this replacing the current hash implementation. It may be faster and more memory efficient for synthetic benchmarks but I think pointer overhead and excessive pointer indirection will kill performance compared to a hash when keys do not share common prefix.

The iteration through node->children is O(N) so certain lookups require more comparisons than others. This is an entirely new attack vector to consider. Look at the case when my keys are ['aaa', 'aab', ..., 'zzz' ] - there will 17,576 structures at 32 bytes each (on a 64-bit system) which is 549kb. Assuming keys are loaded sequentially, a TGET zzz requires a) 26+26+26 pointer lookups, b) 26+26+26 key comparisons, c) 2.4kb of trie nodes need to pass through the various levels of CPU cache before each comparison can be made. Compare this to a hash(zzz) operation, O(1) lookup into the bucket array, and at most a handful of linked-list nodes to resolve a collision.

Have you considered a radix tree or burst trie? Both should have better performance characteristics and space requirements.

chriso commented Oct 22, 2012

I can't see this replacing the current hash implementation. It may be faster and more memory efficient for synthetic benchmarks but I think pointer overhead and excessive pointer indirection will kill performance compared to a hash when keys do not share common prefix.

The iteration through node->children is O(N) so certain lookups require more comparisons than others. This is an entirely new attack vector to consider. Look at the case when my keys are ['aaa', 'aab', ..., 'zzz' ] - there will 17,576 structures at 32 bytes each (on a 64-bit system) which is 549kb. Assuming keys are loaded sequentially, a TGET zzz requires a) 26+26+26 pointer lookups, b) 26+26+26 key comparisons, c) 2.4kb of trie nodes need to pass through the various levels of CPU cache before each comparison can be made. Compare this to a hash(zzz) operation, O(1) lookup into the bucket array, and at most a handful of linked-list nodes to resolve a collision.

Have you considered a radix tree or burst trie? Both should have better performance characteristics and space requirements.

@jpetazzo

This comment has been minimized.

Show comment Hide comment
@jpetazzo

jpetazzo Oct 22, 2012

For some specific workloads, this could be a big win. E.g. (shameless plug) Hipache is a HTTP forward proxy which stores its configuration in Redis. Each virtual host is materialized by a key, and many virtual hosts have common prefixes (e.g. www). Also, it looks like many people use prefixes in their Redis keys, as namespaces (e.g. "session:xxx", "user:yyy"...)

For some specific workloads, this could be a big win. E.g. (shameless plug) Hipache is a HTTP forward proxy which stores its configuration in Redis. Each virtual host is materialized by a key, and many virtual hosts have common prefixes (e.g. www). Also, it looks like many people use prefixes in their Redis keys, as namespaces (e.g. "session:xxx", "user:yyy"...)

@aluzzardi

This comment has been minimized.

Show comment Hide comment
@aluzzardi

aluzzardi Oct 22, 2012

@chriso Agreed. There is a worst case memory overhead when keys do not share a common prefix.

Hash tables have their own overhead as well: Empty buckets (tables are sized in power of 2), multiple tables are used while re-hashing, keys are stored as linked-list...

However, this is a worst case scenario. I would assume that the average overhead of tries will be much lower than hash tables, unless most keys do not share a common part.

Before picking up tries, I've considered ternary search trees and radix trees.

In the end, it's all about trading performance for space efficiency.

TSTs are very popular and have a better lookup performance than tries. However, they require an extra pointer which leads to a higher overhead. I actually started by implementing both TSTs and Tries to compare how they performed. The performance gains of TSTs were not noticeable (considering everything Redis is doing beside the actual TST lookup), but the overhead was much higher.

Radix trees work especially well on small data sets.
The value of Radix trees depends on how many nodes are single-descendant (e.g. how many could be compacted together). They are more complex than tries and thus likely to yield higher access costs on average. This is especially true when inserting and removing nodes, as it requires to split or merge nodes together.

I ended up picking compact tries (double-chained), because they seemed to offer a good performance/space ratio.

@chriso Agreed. There is a worst case memory overhead when keys do not share a common prefix.

Hash tables have their own overhead as well: Empty buckets (tables are sized in power of 2), multiple tables are used while re-hashing, keys are stored as linked-list...

However, this is a worst case scenario. I would assume that the average overhead of tries will be much lower than hash tables, unless most keys do not share a common part.

Before picking up tries, I've considered ternary search trees and radix trees.

In the end, it's all about trading performance for space efficiency.

TSTs are very popular and have a better lookup performance than tries. However, they require an extra pointer which leads to a higher overhead. I actually started by implementing both TSTs and Tries to compare how they performed. The performance gains of TSTs were not noticeable (considering everything Redis is doing beside the actual TST lookup), but the overhead was much higher.

Radix trees work especially well on small data sets.
The value of Radix trees depends on how many nodes are single-descendant (e.g. how many could be compacted together). They are more complex than tries and thus likely to yield higher access costs on average. This is especially true when inserting and removing nodes, as it requires to split or merge nodes together.

I ended up picking compact tries (double-chained), because they seemed to offer a good performance/space ratio.

@chriso

This comment has been minimized.

Show comment Hide comment
@chriso

chriso Oct 22, 2012

@aluzzardi a hash function and O(1) lookup (+ potential LL-based collision resolution) should almost always be faster than a lookup in this trie structure. Even when keys share a common prefix [ user:1, user:2, ..., user:9999 ], an O(N) traversal of the linked list is still required at each node once you get past the common prefix.

Have you also considered the fact that hash tables use the ziplist encoding initially? The entire structure requires only a single allocation which minimises fragmentation and gives you a nice speed boost because of cache locality.

chriso commented Oct 22, 2012

@aluzzardi a hash function and O(1) lookup (+ potential LL-based collision resolution) should almost always be faster than a lookup in this trie structure. Even when keys share a common prefix [ user:1, user:2, ..., user:9999 ], an O(N) traversal of the linked list is still required at each node once you get past the common prefix.

Have you also considered the fact that hash tables use the ziplist encoding initially? The entire structure requires only a single allocation which minimises fragmentation and gives you a nice speed boost because of cache locality.

@aluzzardi

This comment has been minimized.

Show comment Hide comment
@aluzzardi

aluzzardi Oct 22, 2012

@chriso Even once you get past the common prefix, traversing is a O(K) operation (where K is the length of the key), not an O(N) operation.

Indeed, the best case performance of a hash table is O(1) if there are no collisions, but first you have to compute the hash of the key, which is also a O(K) operation. Also, the hash table will have to constantly rehash the keys when it grows.

I'm sorry but I didn't get the 1000 keys part. Past the common prefix, those 1000 keys will be stored on the tree as well. Assuming those keys are strings (which they are), then there cannot be more than 256 keys at the same level, so those 1000 keys have to share some other common prefix.

Because the boundaries of traversal at each level are well known (maximum of 256 lookups + comparisons), we can say that the complexity of descending from one level to the other is O(1) (because it cannot be worse than that), so all the complexity really left is how many levels we have to descend, which is the length of the key, hence O(K).

@chriso Even once you get past the common prefix, traversing is a O(K) operation (where K is the length of the key), not an O(N) operation.

Indeed, the best case performance of a hash table is O(1) if there are no collisions, but first you have to compute the hash of the key, which is also a O(K) operation. Also, the hash table will have to constantly rehash the keys when it grows.

I'm sorry but I didn't get the 1000 keys part. Past the common prefix, those 1000 keys will be stored on the tree as well. Assuming those keys are strings (which they are), then there cannot be more than 256 keys at the same level, so those 1000 keys have to share some other common prefix.

Because the boundaries of traversal at each level are well known (maximum of 256 lookups + comparisons), we can say that the complexity of descending from one level to the other is O(1) (because it cannot be worse than that), so all the complexity really left is how many levels we have to descend, which is the length of the key, hence O(K).

@chriso

This comment has been minimized.

Show comment Hide comment
@chriso

chriso Oct 22, 2012

@aluzzardi you're right regarding complexity, however I'm still not convinced that O(k) + the added pointer indirection and comparisons can compete with the hash on anything but synthetic benchmarks. Most hash use cases I've seen lean towards many small hashes as opposed to fewer large hashes - the ziplist encoding handles this very well and doesn't have the pointer overhead that the hash table or trie has. We'll have to wait for @antirez to analyse further.

chriso commented Oct 22, 2012

@aluzzardi you're right regarding complexity, however I'm still not convinced that O(k) + the added pointer indirection and comparisons can compete with the hash on anything but synthetic benchmarks. Most hash use cases I've seen lean towards many small hashes as opposed to fewer large hashes - the ziplist encoding handles this very well and doesn't have the pointer overhead that the hash table or trie has. We'll have to wait for @antirez to analyse further.

@antirez

This comment has been minimized.

Show comment Hide comment
@antirez

antirez Oct 22, 2012

Owner

Thanks for the pull request and all the analysis, it is very interesting and either way it's a great contribution even if we don't merge it after further considerations since it was proposed a number of times and we know have a real implementation to check how it performs. More news ASAP.

Owner

antirez commented Oct 22, 2012

Thanks for the pull request and all the analysis, it is very interesting and either way it's a great contribution even if we don't merge it after further considerations since it was proposed a number of times and we know have a real implementation to check how it performs. More news ASAP.

@aluzzardi

This comment has been minimized.

Show comment Hide comment
@aluzzardi

aluzzardi Oct 22, 2012

Thanks @antirez, looking forward for more news. Let me know if I can provide any further help.

Thanks @antirez, looking forward for more news. Let me know if I can provide any further help.

@chriso

This comment has been minimized.

Show comment Hide comment
@chriso

chriso Oct 22, 2012

@aluzzardi you should try a splay-variant which moves the child node to the start of the node->children linked list when it's matched - this should reduce comparisons in real world use cases where some keys are fetched more commonly than others

chriso commented Oct 22, 2012

@aluzzardi you should try a splay-variant which moves the child node to the start of the node->children linked list when it's matched - this should reduce comparisons in real world use cases where some keys are fetched more commonly than others

@aluzzardi

This comment has been minimized.

Show comment Hide comment
@aluzzardi

aluzzardi Oct 22, 2012

@chriso That sounds great indeed, thanks!

The nice thing about this is there's still a lot of room for improvements, my implementation is only scratching the surface of what can be done.

My goal was to provide an initial implementation that would allow running analysis. If it proves to be a viable alternative to hash tables, there are lots of optimizations that can be done based on the results (trial and error with real world data).

I didn't bother going too far - this is a very simple, easy to understand implementation that is compatible with existing hashes, enough for running analysis without over complication.

There are many more factors to consider. For instance, the time difference it takes to lookup from a HT versus a Trie can be negligible compared to the cost of transferring the result over the network.

A similar tradeoff was recently made when the hash tables switched from DJB hash to MurmurHash2. Slower, but better in some ways.

What I mean by that is it's probably preferable to keep it simple at first and optimize later based on results rather than coming up with a complex solution at first. Bottlenecks might not be where we think they are.

@chriso That sounds great indeed, thanks!

The nice thing about this is there's still a lot of room for improvements, my implementation is only scratching the surface of what can be done.

My goal was to provide an initial implementation that would allow running analysis. If it proves to be a viable alternative to hash tables, there are lots of optimizations that can be done based on the results (trial and error with real world data).

I didn't bother going too far - this is a very simple, easy to understand implementation that is compatible with existing hashes, enough for running analysis without over complication.

There are many more factors to consider. For instance, the time difference it takes to lookup from a HT versus a Trie can be negligible compared to the cost of transferring the result over the network.

A similar tradeoff was recently made when the hash tables switched from DJB hash to MurmurHash2. Slower, but better in some ways.

What I mean by that is it's probably preferable to keep it simple at first and optimize later based on results rather than coming up with a complex solution at first. Bottlenecks might not be where we think they are.

@mickey

This comment has been minimized.

Show comment Hide comment
@mickey

mickey Oct 22, 2012

nice 👍

mickey commented on 5addc4f Oct 22, 2012

nice 👍

@catwell

This comment has been minimized.

Show comment Hide comment
@catwell

catwell Oct 22, 2012

Contributor

I am definitely going to be in favor of something like this. See this thread from 1.5 years ago ;)

Now there is another advantage that we didn't know of at the time, which is that afaik tree-based approached are not vulnerable to the hashtable collision DoS attack.

Pointer indirection is a problem, but it would be significantly decreased by using a Patricia Trie (aka Radix Tree) or a CritBit Tree, its binary version (see Google Group thread above).

Contributor

catwell commented Oct 22, 2012

I am definitely going to be in favor of something like this. See this thread from 1.5 years ago ;)

Now there is another advantage that we didn't know of at the time, which is that afaik tree-based approached are not vulnerable to the hashtable collision DoS attack.

Pointer indirection is a problem, but it would be significantly decreased by using a Patricia Trie (aka Radix Tree) or a CritBit Tree, its binary version (see Google Group thread above).

@aluzzardi

This comment has been minimized.

Show comment Hide comment
@aluzzardi

aluzzardi Oct 22, 2012

@chriso

Re:

"The iteration through node->children is O(N) so certain lookups require more comparisons than others. This is an entirely new attack vector to consider. Look at the case when my keys are ['aaa', 'aab', ..., 'zzz' ] - there will 17,576 structures at 32 bytes each (on a 64-bit system) which is 549kb. Assuming keys are loaded sequentially, a TGET zzz requires a) 26+26+26 pointer lookups, b) 26+26+26 key comparisons, c) 2.4kb of trie nodes need to pass through the various levels of CPU cache before each comparison can be made. Compare this to a hash(zzz) operation, O(1) lookup into the bucket array, and at most a handful of linked-list nodes to resolve a collision."

Because a benchmark is worth a thousand words, I've put together a quick and dirty benchmark on your hypothetical case. The source code is available here: https://gist.github.com/9557d14ae83b939f1fd6

Here's the result after looking up every key a thousand times:

$ python test.py
1000 HGET performed in 44.4674818516 seconds
1000 TGET performed in 34.4841470718 seconds

The trie implementation is actually faster in this particular case than the hash table.

As for the memory usage, a hash table gives us:

used_memory_human:3.57M
used_memory_peak_human:4.34M

While the trie implementation:

used_memory_human:3.40M
used_memory_peak_human:4.17M

To sum up, in the particular case you mentioned, the trie performs better while consuming slightly less memory.

What about the other trie alternatives?

Let's compare the two other potential candidates, Ternary Search Trees and Radix Trees.

Ternary Search Trees

Ternary Search Trees would have performed slightly better. Basically, a TST is a Trie that uses a binary search tree instead of a linked list for node traversal.

The complexity of a trie boils down to O(K * M), where K is the key length and M the key space (the number of nodes to scan at the same level), in our case 26.

A TST on the other hand would have a complexity of O(K * log(M)).

This saves a few pointer lookups, but in the end, because the boundaries of M are well known (max 256 in this case), there are certainly performance improvements but not that huge.

Assuming that in the worst case scenario the key space is 100 and the key length is 100, in the end a Trie would only require to traverse 10k pointers, which is not the end of the world. Again, we have to take this in the context: Beside traversing those pointers, Redis has to allocate memory, send and receive over the network and so forth.

On the other hand, a TST would have required to store an extra pointer for every node (in our case 17,576 pointers). It would have performed slightly better at the expense of using more memory.

Radix Trees

As I mentioned earlier, the value of Radix trees depends on how many nodes are single-descendant. In this particular case, none, so it wouldn't have saved any memory.

Worse than that, because in a Radix tree the key is not embedded in the node but rather referenced by a pointer and that it needs an extra size_t field to know its length (unless we assume that all keys are 0 terminated), it would have ended up using MORE memory.

As for the performance, it would actually require deferencing an additional pointer to access the key. In addition, while inserting the keys, we need to dynamically split them up into smaller keys, which would have costed some extra performance, not to mention the memory fragmentation.

Tries

Again, it's all about trading performance for space. That's why I think a compact Trie is the way to go, at least for now without further real-world test cases, as it gives the best space/memory tradeoff.

Its implementation is simpler than the other tree data structures, which to me seems like the "Redis way" to go (keep it simple).
Actually, its implementation is even simpler than the Hash tables.

@chriso, @antirez : I feel like these conversations should be in the Redis mailing list rather than on the GitHub issue. I will be more than happy to continue the conversation over there if you don't mind.

@chriso

Re:

"The iteration through node->children is O(N) so certain lookups require more comparisons than others. This is an entirely new attack vector to consider. Look at the case when my keys are ['aaa', 'aab', ..., 'zzz' ] - there will 17,576 structures at 32 bytes each (on a 64-bit system) which is 549kb. Assuming keys are loaded sequentially, a TGET zzz requires a) 26+26+26 pointer lookups, b) 26+26+26 key comparisons, c) 2.4kb of trie nodes need to pass through the various levels of CPU cache before each comparison can be made. Compare this to a hash(zzz) operation, O(1) lookup into the bucket array, and at most a handful of linked-list nodes to resolve a collision."

Because a benchmark is worth a thousand words, I've put together a quick and dirty benchmark on your hypothetical case. The source code is available here: https://gist.github.com/9557d14ae83b939f1fd6

Here's the result after looking up every key a thousand times:

$ python test.py
1000 HGET performed in 44.4674818516 seconds
1000 TGET performed in 34.4841470718 seconds

The trie implementation is actually faster in this particular case than the hash table.

As for the memory usage, a hash table gives us:

used_memory_human:3.57M
used_memory_peak_human:4.34M

While the trie implementation:

used_memory_human:3.40M
used_memory_peak_human:4.17M

To sum up, in the particular case you mentioned, the trie performs better while consuming slightly less memory.

What about the other trie alternatives?

Let's compare the two other potential candidates, Ternary Search Trees and Radix Trees.

Ternary Search Trees

Ternary Search Trees would have performed slightly better. Basically, a TST is a Trie that uses a binary search tree instead of a linked list for node traversal.

The complexity of a trie boils down to O(K * M), where K is the key length and M the key space (the number of nodes to scan at the same level), in our case 26.

A TST on the other hand would have a complexity of O(K * log(M)).

This saves a few pointer lookups, but in the end, because the boundaries of M are well known (max 256 in this case), there are certainly performance improvements but not that huge.

Assuming that in the worst case scenario the key space is 100 and the key length is 100, in the end a Trie would only require to traverse 10k pointers, which is not the end of the world. Again, we have to take this in the context: Beside traversing those pointers, Redis has to allocate memory, send and receive over the network and so forth.

On the other hand, a TST would have required to store an extra pointer for every node (in our case 17,576 pointers). It would have performed slightly better at the expense of using more memory.

Radix Trees

As I mentioned earlier, the value of Radix trees depends on how many nodes are single-descendant. In this particular case, none, so it wouldn't have saved any memory.

Worse than that, because in a Radix tree the key is not embedded in the node but rather referenced by a pointer and that it needs an extra size_t field to know its length (unless we assume that all keys are 0 terminated), it would have ended up using MORE memory.

As for the performance, it would actually require deferencing an additional pointer to access the key. In addition, while inserting the keys, we need to dynamically split them up into smaller keys, which would have costed some extra performance, not to mention the memory fragmentation.

Tries

Again, it's all about trading performance for space. That's why I think a compact Trie is the way to go, at least for now without further real-world test cases, as it gives the best space/memory tradeoff.

Its implementation is simpler than the other tree data structures, which to me seems like the "Redis way" to go (keep it simple).
Actually, its implementation is even simpler than the Hash tables.

@chriso, @antirez : I feel like these conversations should be in the Redis mailing list rather than on the GitHub issue. I will be more than happy to continue the conversation over there if you don't mind.

@chriso

This comment has been minimized.

Show comment Hide comment
@chriso

chriso Oct 22, 2012

@aluzzardi the equivalent would be

keys = []
for a in xrange(ord('a'), ord('z')+1):
    for b in xrange(ord('a'), ord('z')+1):
        for c in xrange(ord('a'), ord('z')+1):
            keys.append('{0}{1}{2}'.format(chr(a), chr(b), chr(c)))

chriso commented Oct 22, 2012

@aluzzardi the equivalent would be

keys = []
for a in xrange(ord('a'), ord('z')+1):
    for b in xrange(ord('a'), ord('z')+1):
        for c in xrange(ord('a'), ord('z')+1):
            keys.append('{0}{1}{2}'.format(chr(a), chr(b), chr(c)))
@aluzzardi

This comment has been minimized.

Show comment Hide comment
@aluzzardi

aluzzardi Dec 19, 2012

Hey @antirez, did you get a chance to try it out?

Hey @antirez, did you get a chance to try it out?

@rahst12

This comment has been minimized.

Show comment Hide comment
@rahst12

rahst12 Jan 2, 2014

Any update on this? I'm looking to do some geohash prefix key/value storage in Redis. This would be perfect.

rahst12 commented Jan 2, 2014

Any update on this? I'm looking to do some geohash prefix key/value storage in Redis. This would be perfect.

@ezs

This comment has been minimized.

Show comment Hide comment
@ezs

ezs Jan 5, 2014

Is there a way to use this Trie besides for auto-complete purposes? If I understand Tries correctly, they can be used for more than just auto-complete, but prefix lookup as well. I should also able to search on the word "helloworld" and have it return the closest match, "hello". I'm interested in seeing this feature added to Redis, so I can lookup a word and return the closest prefix.

ezs commented Jan 5, 2014

Is there a way to use this Trie besides for auto-complete purposes? If I understand Tries correctly, they can be used for more than just auto-complete, but prefix lookup as well. I should also able to search on the word "helloworld" and have it return the closest match, "hello". I'm interested in seeing this feature added to Redis, so I can lookup a word and return the closest prefix.

@FGRibreau

This comment has been minimized.

Show comment Hide comment
@FGRibreau

FGRibreau Jan 5, 2014

nice 👍 !

nice 👍 !

@r8k

This comment has been minimized.

Show comment Hide comment
@r8k

r8k Jan 6, 2014

👍

r8k commented Jan 6, 2014

👍

@krisjey

This comment has been minimized.

Show comment Hide comment
@krisjey

krisjey Jan 17, 2014

If someone need a redis autocomplete implementation of java then check below link.
https://github.com/krisjey/redis.usecase

That code is alternative way of antirez's posting and faster than previous way.
http://oldblog.antirez.com/post/autocomplete-with-redis.html

krisjey commented Jan 17, 2014

If someone need a redis autocomplete implementation of java then check below link.
https://github.com/krisjey/redis.usecase

That code is alternative way of antirez's posting and faster than previous way.
http://oldblog.antirez.com/post/autocomplete-with-redis.html

@kgcrom

This comment has been minimized.

Show comment Hide comment
@kgcrom

kgcrom Jan 20, 2014

@krisjey 👍

kgcrom commented Jan 20, 2014

@krisjey 👍

@rahst12

This comment has been minimized.

Show comment Hide comment
@rahst12

rahst12 Mar 6, 2014

@ezs Great point. @krisjey I'm not looking for an autocomplete solution.

I'm looking for an implementation of trie's where I put in "helloworld" and the closet match of "hello" is returned.

rahst12 commented Mar 6, 2014

@ezs Great point. @krisjey I'm not looking for an autocomplete solution.

I'm looking for an implementation of trie's where I put in "helloworld" and the closet match of "hello" is returned.

@peterbe

This comment has been minimized.

Show comment Hide comment
@peterbe

peterbe Aug 12, 2014

What happened to this patch? It's almost two years old.

peterbe commented Aug 12, 2014

What happened to this patch? It's almost two years old.

@rahst12

This comment has been minimized.

Show comment Hide comment
@rahst12

rahst12 Aug 12, 2014

@peterbe Not sure, I could still make use of it though if it's ever rolled in.

rahst12 commented Aug 12, 2014

@peterbe Not sure, I could still make use of it though if it's ever rolled in.

@canadaduane

This comment has been minimized.

Show comment Hide comment
@canadaduane

canadaduane Sep 30, 2014

There are some great cache-conscious and SSD-optimized algorithms becoming available for this. See https://github.com/dcjones/hat-trie, and https://github.com/silt/silt, respectively.

There are some great cache-conscious and SSD-optimized algorithms becoming available for this. See https://github.com/dcjones/hat-trie, and https://github.com/silt/silt, respectively.

@aluzzardi aluzzardi closed this May 8, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment