-
Notifications
You must be signed in to change notification settings - Fork 23.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hyperloglog pfmerge inflates key size #3819
Comments
Redis uses two different internal encodings for HLL: dense that is always 12288 bytes or sparse (see the the That said, it appears that keys created by |
@itamarhaber thanks for the detailed explanation. It helps. I think it's an issue with redis b/c one of our databases just grew from 4GB to to 12GB just because we merged some HLLs as a part of some maintenance we did. Do you think this issue can be solved? |
Hello. It is actually possible to fix this issue... in theory after merging Redis should check again if the HyperLogLog is sparse, and convert it back into a sparse HLL. Tagging the issue to solve it at some point (not sure why, quite some backlog...). |
@antirez thanks. Looking forward to it. |
@antirez can you tell when this will be done? |
Moved it to the Urgent milestone, but not sure about the ETA. However being under urgent will make it faster. There are other priorities to be ready in mid-September and the vacations in between, so let's see what happens. Cheers. |
@antirez 👍 |
Have there been any updates on this, maybe something in a development branch that I can look at? |
Sorry we are late about improving this one. Maybe @artix75 wants to take over it and improve the code faster than I could? |
I'm looking at pfmergeCommand. By that point i think the merge has already been performed, but it may now be the case that If not, I guess the steps would be something like:
As for the what the subsequent code is doing and what's appropriate for the HLL_SPARSE case, I'm still looking at. |
Unfortunately I won't have time to dig into this any further at least until january. |
Yep not ideal, checking the problem today. |
Hello, I pushed a fix into |
The commit splits the add functions into a set() and add() set of functions, so that it's possible to set registers in an independent way just having the index and count. Related to #3819, otherwise a fix is not possible.
P.S. as a side effect PFMERGE could be much faster in certain small cardinality use cases. |
Warning: I found a few bugs, fixing. |
Just pushed a fix, we should be fine now. I'm writing a stress tester for this code. |
This is a fix for the #3819 improvements. The o->ptr may change because of hllSparseSet() calls, so 'hdr' must be correctly re-fetched.
@antirez thank you so much for this! |
@antirez thank you for taking this up. Looking at the diffs i wonder:
|
Ok 1. seems to be answered by this
|
Hello @mbarkhau, there is none of the problems you mentioned. What you say in "2" is actually the bug that I fixed, this was the old behavior. About "1", if the new representation after PFMERGE cannot be stored into a sparse representation it will automatically be promoted to dense. |
@antirez just to make sure there's no misunderstanding. I have a redis instance running 4.0.2 with a few 100k hll values that were generated using pfmerge and are now stored in the dense format. You're saying that if I now update to unstable, and go through these keys doing pfmerge again, the resulting hll values may end up in the sparse representation? If so, that would be amazing! |
No sorry, I misunderstood, but your problem is one worth fixing indeed! I'll try to find a solution today, if I run out of time, after returning back from holidays. Thanks for reporting this problem! |
Just to mention another use case for this: Somebody might change the |
Yes, I was more focused actually on this last use case, because the current problem, we can fix with a DEBUG command to reconvert back, but on the long run to fix this on loading is much better. |
To do this efficiently could require a function that can than be used in order to implement even faster PFMERGE btw. That is, a function able to insert into the sparse representation without re-parsing everything, if it can assume the element we are going to insert is bigger than any other one already in. I'll study the problem with care. |
However note that you have already a simple way to convert Dense -> Sparse right now (after applying this patch I published here)
Try it with care before deploying. |
The commit splits the add functions into a set() and add() set of functions, so that it's possible to set registers in an independent way just having the index and count. Related to redis#3819, otherwise a fix is not possible.
This is a fix for redis#3819.
This is a fix for the redis#3819 improvements. The o->ptr may change because of hllSparseSet() calls, so 'hdr' must be correctly re-fetched.
The commit splits the add functions into a set() and add() set of functions, so that it's possible to set registers in an independent way just having the index and count. Related to #3819, otherwise a fix is not possible.
This is a fix for the #3819 improvements. The o->ptr may change because of hllSparseSet() calls, so 'hdr' must be correctly re-fetched.
The commit splits the add functions into a set() and add() set of functions, so that it's possible to set registers in an independent way just having the index and count. Related to redis#3819, otherwise a fix is not possible.
This is a fix for redis#3819.
This is a fix for the redis#3819 improvements. The o->ptr may change because of hllSparseSet() calls, so 'hdr' must be correctly re-fetched.
The commit splits the add functions into a set() and add() set of functions, so that it's possible to set registers in an independent way just having the index and count. Related to redis#3819, otherwise a fix is not possible.
This is a fix for redis#3819.
This is a fix for the redis#3819 improvements. The o->ptr may change because of hllSparseSet() calls, so 'hdr' must be correctly re-fetched.
Pardon me if I misunderstood how this is supposed to work. I've tried the following and thought that key
|
@mbarkhau I don't think this fix is pushed to any live version yet. |
@refaelos I read this in RELEASENOTES
But I guess this is maybe just one part of this issue? |
@mbarkhau why merging a sparse and dense HLL should result into a sparse one? |
The previous bug was that merging two sparse representations, always produced a dense one... There is however no "auto conversion" to the new settings. That would be quite costly probably. However what we want to do is the ability to do that on RDB reloadings. Now you can get this effect with AOF reloads btw. |
@antirez the problem was that merging two HLLs made the merged one huge compared to what it should be. Is it now fixed in 4.0.7 ? |
From your earlier comment I thought this is how it would behave.
So you're saying Dense -> Sparse is not possible with 4.0.7? |
@mbarkhau no it's not possible to go from dense to sparse right now. What we are avoiding with the patch, is to have a sparse+sparse merge producing always a dense representation. However this development is possible for the future, but in order to avoid consuming CPU we have to do it when it's, let's say... opportunistic to do the conversion, that is in theory during PFCOUNT, because once we scan and get the number of zeroes, we can figure if it's worth to attempt a conversion. Surely doable but needs some care, and there are open problems, like, what's happening to the slaves given that PFCOUNT is a read only command? |
Closing this issue since the auto-promotion to dense representation is now fixed. We have to consider, later, if we should also try to convert from dense to sparse at loading time in case the new configuration makes it applicable, however I've the feeling that in order to do that, we need to introduce a proper HLL type in Redis instead of relying on the string format. That could be addressed in Redis 5.0, but because 5.0 has as target just streams I'm not sure it there will be time. |
@antirez so the issue with inflating memory is resolved? Can we safely merge HLLs now using the latest version? |
The commit splits the add functions into a set() and add() set of functions, so that it's possible to set registers in an independent way just having the index and count. Related to redis#3819, otherwise a fix is not possible.
This is a fix for redis#3819.
This is a fix for the redis#3819 improvements. The o->ptr may change because of hllSparseSet() calls, so 'hdr' must be correctly re-fetched.
Hey,
I'll go straight to the point:
I have 2 hyperloglog keys that I want to pfmerge into one. Each key has pfcount of 2 and a size of 160 bytes. When I merge them, the count is 4 (which is good) but size becomes 14400 bytes!
Why does it happen?
The text was updated successfully, but these errors were encountered: