Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RDB loading is very slow in Redis Cluster #3800

Closed
antirez opened this issue Feb 9, 2017 · 3 comments
Closed

RDB loading is very slow in Redis Cluster #3800

antirez opened this issue Feb 9, 2017 · 3 comments

Comments

@antirez
Copy link
Contributor

antirez commented Feb 9, 2017

Loading the same RDB file in a Redis Cluster node could be 4x times slower than doing it in a stand alone Redis instance. This is due to the need to take a key -> hash-slot mapping, so we populate an internal sorted set every time a key is added or removed.

However with the current data structure in use, doing this is several times slower compared to adding keys into an hash table, so when an RDB file is composed of many small keys, the loading time in a Cluster node increases a lot. The same should be true for AOF files as well.

We are planning the switch to a faster data structure. @dvirsky is benchmarking a compressed trie right now that could improve considerably the current performance. Another alternative if a compact representation does not work well, is to use N distinct hash tables.

The operation that we need to support are:

  • ADD-KEY(key)
  • DEL-KEY(key)
  • GET-KEYS-IN-SLOT(slot,count)
  • DEL-ALL-KEYS()

Currently all our ops are O(log(N)) but the constant times are just too big.

@dvirsky
Copy link
Contributor

dvirsky commented Feb 9, 2017

@antirez the only thing I'll actually need to write code for is the keys in slot thing, which is basically a prefix search.

I'll try to wrap the raw trie with this API

Also, one thing I'm not sure about. Right now the maximum length of each node's internal string (usually 1-2 bytes of course) is 65535 bytes. I can either increase that to 2gb but at the cost of 2 bytes extra per node, or just split the string every 65kb, which will save those extra bytes but will make the implementation a little more complicated. WDYT?

@sgn1
Copy link
Contributor

sgn1 commented Apr 17, 2017

I implemented slot to keys mapping using N distinct hash tables and using the references to key in dictEntry in the db like expiry dictionary. This gives us constant space cost per key.
I benchmarked this by merging this in redis 4.0 unstable against radix tree implementation there.

For key space like F1 to F10000000 performance is almost same.
When I tested with 10 million random keys of size 32 byte to 128 byte, there is 25.85% less memory requirement with N distinct hash tables over radix trees. Also, load time is 3.81% less. Redis benchmark throughput for 1M set after this data is loaded is 5.29% faster with N distinct hash tables over radix trees.

@antirez
Copy link
Contributor Author

antirez commented Apr 18, 2017

Hello @sgn1, thanks for making the effort of testing a different idea, to use N hash tables was among the possibilities. The 3.81% and 5.29% changes in speed are not very relevant since they are marginal, however the 25% speed difference in certain use cases is more interesting. Btw I find it odd that the implementation using the hash table is faster over the radix tree even if by a small percentage, since in my benchmarks the radix tree is much faster. Probably the trick is reusing the key reference from the dictionary. Anyway... a few things to consider:

  1. In some way the test appears to validate the radix tree for this use case, because having 16k hash tables is really like a micro optimization for this use case, and yet the speed is similar.
  2. The radix tree provides an ordered key space, so if the use case in the future changes and to extract N keys from a given hash slot is not enough, but we want to iterate the keys in an hash slot, for example in order to find if there are very big keys so that a slot is not a good candidate for rehashing (this is a planned feature), the radix tree will provide this out of the box, while the 16k hash tables are specifically targeted at the current use case and are much more rigid.
  3. Having to handle 16k objects in certain cases may be a source of latency, for example when the database is reset by FLUSHALL ASYNC. There are ways to fix this, like creating a macro-object with the 16k hash tables that we pass to the asynchronous deletion thread, however there is still 16k objects in general compared to a single data structure.

So basically I think that your experiment was very interesting, and kinda gives the comparison of the radix tree with the perfect ad-hoc solution, and yet it looks like the radix tree is doing well. I think it's an argument over using the radix tree solution, that also avoids the need to be rehashed incrementally in the background, resized if the hash tables get too sparse (and this check should be performed in all the 16k objects incrementally) and so forth.

It's kind of a shame that we need to take two versions of the key space wasting memory btw... With the radix tree, what we could do if in the future we want to save memory very seriously, is to use a trick that is going to be used to implement the new Stream data structure, and probably also for the Hash type: to have as radix tree values "macro nodes" actually containing N keys.

So for instance, we would have just a few keys in the radix tree, and values with multiple keys (for instance max 80 keys per macro node):

00:akey: -> "00:akey,00:bkey,00:someotherkey, ..."
00:geek: -> "00:geek",00:great,00:goof, ..."

This is basically similar to Redis quicklists but for dictionaries data structures. To seek, just search the first element <= a given one, and scan the corresponding blob. This may reduce memory usage severely... There is however to meter the CPU usage, but likely things can be optimized so that the linear scanning is compensated by the cache misses avoided to scan multiple nodes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants