New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PFADD hangs on empty values #1657
Comments
The bug is in
Because the string is empty it hashes to 0. The following while loop thus never ends. |
Exactly right! Redis's Another implementation of murmur had the same problem (hash of the empty string being 0) and did a very tiny fix to include the string terminator If we give Before: Returning H of: 510903276987443985 (hash of "a")
Returning H of: 0 (hash of "")
<infinite loop> After: Returning H of: 7517249135209847859 (hash of "a")
Returning H of: 6351753276682545529 (hash of "")
Returning H of: 3063979243939886323 (hash of "b") |
We need to guarantee that the last bit is 1, otherwise an element may hash to just zeroes with probability 1/(2^64) and trigger an infinite loop. See issue #1657.
Fixed, thank you. The bug was triggered by the behavior of this specific hash function implementation, but actually the bug is unrelated to that since a legitimate element may hash to 00000...0000. |
Well actually I think we should follow Matt advice and modify the hash function since it is uncool that a common value maps to a such rare pattern... |
Ok guys I've considered this a bit more. My opinion:
The real source of issues is that simply I selected "0" as seed. The seed argument is how the hash value is initialized, and is as a side effect the hash value of the empty string. The only issue here is that when I wrote the original code, I failed to realize this and set it to zero. So I'm changing this value. However I'm tankful to my lameness since otherwise the bug about counting bits could be still there, even if the probability of it actually being triggered is very small for us to ever notice, probably! But could be used as a DOS triggered by outside. |
Using a seed of zero has the side effect of having the empty string hashing to what is a very special case in the context of HyperLogLog: a very long run of zeroes. This did not influenced the correctness of the result with 16k registers because of the harmonic mean, but still it is inconvenient that a so obvious value maps to a so special hash. The seed 0xadc83b19 is used instead, which is the first 64 bits of the SHA1 of the empty string. Reference: issue #1657.
👍 Two fixes for the price of one bug report! Happy to see the "empty string avoids hashing" scenario got re-fixed too. |
We need to guarantee that the last bit is 1, otherwise an element may hash to just zeroes with probability 1/(2^64) and trigger an infinite loop. See issue #1657.
Using a seed of zero has the side effect of having the empty string hashing to what is a very special case in the context of HyperLogLog: a very long run of zeroes. This did not influenced the correctness of the result with 16k registers because of the harmonic mean, but still it is inconvenient that a so obvious value maps to a so special hash. The seed 0xadc83b19 is used instead, which is the first 64 bits of the SHA1 of the empty string. Reference: issue #1657.
We need to guarantee that the last bit is 1, otherwise an element may hash to just zeroes with probability 1/(2^64) and trigger an infinite loop. See issue #1657.
Using a seed of zero has the side effect of having the empty string hashing to what is a very special case in the context of HyperLogLog: a very long run of zeroes. This did not influenced the correctness of the result with 16k registers because of the harmonic mean, but still it is inconvenient that a so obvious value maps to a so special hash. The seed 0xadc83b19 is used instead, which is the first 64 bits of the SHA1 of the empty string. Reference: issue #1657.
The following Python snippet makes a Redis build of
cf34507b870124f99a68f13128274af3d708b50b
on Arch Linux hang at 100% CPU.The text was updated successfully, but these errors were encountered: