-
Notifications
You must be signed in to change notification settings - Fork 23.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RDB loading is very slow in Redis Cluster #3800
Comments
@antirez the only thing I'll actually need to write code for is the keys in slot thing, which is basically a prefix search. I'll try to wrap the raw trie with this API Also, one thing I'm not sure about. Right now the maximum length of each node's internal string (usually 1-2 bytes of course) is 65535 bytes. I can either increase that to 2gb but at the cost of 2 bytes extra per node, or just split the string every 65kb, which will save those extra bytes but will make the implementation a little more complicated. WDYT? |
I implemented slot to keys mapping using N distinct hash tables and using the references to key in dictEntry in the db like expiry dictionary. This gives us constant space cost per key. For key space like F1 to F10000000 performance is almost same. |
Hello @sgn1, thanks for making the effort of testing a different idea, to use N hash tables was among the possibilities. The 3.81% and 5.29% changes in speed are not very relevant since they are marginal, however the 25% speed difference in certain use cases is more interesting. Btw I find it odd that the implementation using the hash table is faster over the radix tree even if by a small percentage, since in my benchmarks the radix tree is much faster. Probably the trick is reusing the key reference from the dictionary. Anyway... a few things to consider:
So basically I think that your experiment was very interesting, and kinda gives the comparison of the radix tree with the perfect ad-hoc solution, and yet it looks like the radix tree is doing well. I think it's an argument over using the radix tree solution, that also avoids the need to be rehashed incrementally in the background, resized if the hash tables get too sparse (and this check should be performed in all the 16k objects incrementally) and so forth. It's kind of a shame that we need to take two versions of the key space wasting memory btw... With the radix tree, what we could do if in the future we want to save memory very seriously, is to use a trick that is going to be used to implement the new Stream data structure, and probably also for the Hash type: to have as radix tree values "macro nodes" actually containing N keys. So for instance, we would have just a few keys in the radix tree, and values with multiple keys (for instance max 80 keys per macro node):
This is basically similar to Redis quicklists but for dictionaries data structures. To seek, just search the first element |
Loading the same RDB file in a Redis Cluster node could be 4x times slower than doing it in a stand alone Redis instance. This is due to the need to take a key -> hash-slot mapping, so we populate an internal sorted set every time a key is added or removed.
However with the current data structure in use, doing this is several times slower compared to adding keys into an hash table, so when an RDB file is composed of many small keys, the loading time in a Cluster node increases a lot. The same should be true for AOF files as well.
We are planning the switch to a faster data structure. @dvirsky is benchmarking a compressed trie right now that could improve considerably the current performance. Another alternative if a compact representation does not work well, is to use N distinct hash tables.
The operation that we need to support are:
Currently all our ops are O(log(N)) but the constant times are just too big.
The text was updated successfully, but these errors were encountered: