-
Notifications
You must be signed in to change notification settings - Fork 469
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MurmurHash3 collision property #58
Comments
I hope you don't mean it in cryptographic sense, because MMH-3 function belongs not to the class of strong cryptography. So again - it generates fast and good enough distributed hash value. Not more. As regards the calculating of the odds resp. the chance of a collision of some hash algorithms, it is similar to generalization of the birthday problem. One may assume that for the ideal hash-function with size N, the count of generated hashes without collisions seeks to 2N. But, as the BP says us, the expected number of N-bit hashes that can be generated before getting a collision is not 2N, but rather only 2N⁄2. In terms of ideal function:
It is not so easy to do mathematical correct calculations of real collision chance outside of your range parameters, without of knowledge of the keysets and bucket sizes (so for which sets do you need it), how the data to be hashed was generated, etc. But you could also estimate it (or calculate the limits), using simply simulation with your basic key data. So for example (from the article), there were more than 400 collisions by 2M different file paths (by the way for your data-sets it may look totally different). See also Probability of secure hash function collisions with proof, it's another nice article of John D. Cook, PhD to theme hash collisions. |
When MurmurHash is used as a deterministic function (without randomization), then the answer is that you can find two keys that always collide. With 100% probability. How do I know this? Simply because there are more strings that you can hash than there are hash values. So you must have collisions. So maybe you randomize MurmurHash... MurmurHash is at best pairwise universal because it is an iterated hash function...
Even if you could randomize MurmurHash, it is not clear that there are known bounds on the universality of randomized MurmurHash. If you just have such proofs, there are related functions with formal proofs, see...
... but MurmurHash was not designed with universality in mind. Or else, maybe you mean something else... maybe you mean regularity.
I don't know that it is regular, however. So I don't know what you mean. |
@lemire I think her question was not whether a collision is possible (it's pretty clear), it was rather how large may be the chance to catch a collision on some set of keys, e. g. by 2 different text-messages (and this probability is definitively not 100% 😄), also not too large, if this keys are pseudo- resp. even totally not random. This is highly depended also on the dataset used as well as on the set-size (with other words on count of the hashes picked). |
I don't know what the question was... so I am speculating... hoping that it will help clear up the question...
|
Hi,
I have to choose a hash function for a Bloom Filter in my Bachelor's thesis.
As recommended in some tutorials I used a version of the MurmurHash3.
My supervisor wants me to find the value of the collision property but I cannot find the place of the documentation.
I have read that it is low but not a real value.
If someone could help to find the collision property, I'd be very greatful.
Thanks,
Julia
The text was updated successfully, but these errors were encountered: