Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standardize hash algorithm used by strategies #247

Closed
ivarconr opened this issue Jun 29, 2017 · 9 comments
Closed

Standardize hash algorithm used by strategies #247

ivarconr opened this issue Jun 29, 2017 · 9 comments

Comments

@ivarconr
Copy link
Member

ivarconr commented Jun 29, 2017

A few of the strategy implementations in the various clients uses a hash-algoritm to normalize values in to a number between 0 and 100, as described in docs/activation-strategies.md.

I believe we should make an effort in to defining how client SDKs should hash and normalize and introduce common test vectors , as suggested in #217. Standardizing this across client SDKs allows for a more predicable outcome across services and webapps.

Because hashing is not used for crypto, only for providing a randomized, but repeatable, number between 0 and 100 we should focus on a fast algorithm which is easy to implement across languages. It must also provide enough randomness to provide an "even" distribution.

Some viable alternatives:

@ivarconr
Copy link
Member Author

ivarconr commented Sep 8, 2017

If we go for the last 4 md5 hex-numbers we can replicate the same implementation from the node-client in Java with the following code:

            String s="group:123";
            MessageDigest m=MessageDigest.getInstance("MD5");
            m.update(s.getBytes(),0,s.length());
            String hexVersion = new BigInteger(1,m.digest()).toString(16);
            int value = Integer.parseInt(hexVersion.substring(hexVersion.length() - 4), 16);
            int normalized =  value % 100;
            System.out.println(normalized);

I have no idea atm of the performance of doing the md5 thing on java.

@ivarconr
Copy link
Member Author

ivarconr commented Sep 29, 2017

I have done some performance testing on the above java-impl (with 5 iteraions varump)

  • 1000 iterations: 0.039ms per calculation
  • 10000 iterations: 0.0097ms per calculation
  • 100000 iterations: 0.00289ms per calculation
  • 1000000 iterations: 0.001533ms per calculation

@ivarconr
Copy link
Member Author

ivarconr commented Sep 29, 2017

I have tested the 32bit murmor3 implementation found in https://github.com/sangupta/murmur, a small lib doing performing the murmor-hashing without external dependencies 👍 .

Performance seems to be around 1.35E-4ms per calculation from 1000 iterations and above.

It also seems to provide fair amount of randomness needed.

@ivarconr
Copy link
Member Author

ivarconr commented Oct 3, 2017

For Node.js i have tested murmurhash3js, it provides the same hash and has similar performance as the java implementation.

My suggestion is thus to switch to 32bit Murmur3 hash algorithm in order to standardise hashing across client implementations.

@sveisvei
Copy link
Contributor

sveisvei commented Oct 3, 2017

@ivarconr good stuff!

@ivarconr
Copy link
Member Author

ivarconr commented Oct 3, 2017

@jrbarron would be nice to have your thoughts on using 32 bit Murmur3 in the go-client also? By aligning how we has and normalize IDs we can provide a consistent experience across platforms.

Seems to be a few different options:
https://golanglibs.com/top?q=murmur

@jrbarron
Copy link

jrbarron commented Oct 4, 2017

@ivarconr I don't know enough about Murmur3 to debate the merits of the hash algorithm compared to md5, but I see no problems switching to using this in the Go version. As you pointed out there seem to be several implementations lying around, the most popular being this one: https://github.com/spaolacci/murmur3 I'll create an issue in the Go repo now.

One question, is this something that should be done for the upcoming version 3 of the clients? Or is that irrelevant because this is simply an implementation detail?

@ivarconr
Copy link
Member Author

ivarconr commented Oct 4, 2017 via email

@ivarconr
Copy link
Member Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants