New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reverseproxy: Improve hostByHashing distribution #5229
Conversation
* If upstreams are all using same host but with different ports ie: foobar:4001 foobar:4002 foobar:4003 ... Because fnv-1a has not a good enough avalanche effect Then the hostByHashing result is not well balanced over all upstreams As last byte FNV input tend to affect few bits, the idea is to change the concatenation order between the key and the upstream strings So the upstream last byte have more impact on hash diffusion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wow! Thanks for digging into this. I had no idea about that property of this hash function. (But I have always been skeptical of it.)
I've recently discovered Blake3: https://github.com/BLAKE3-team/BLAKE3 -- and I'm using it for one of my projects (but I have not analyzed it myself yet).
I can merge this PR for starters, but I'd be interested in your thoughts on using Blake3 for this (maybe a short output length) instead of FNV.
Thank you again, I really appreciate this contribution!
Blake3 is a cryptographic hash, maybe something like aHash could be a better fit ? - https://github.com/tkaitchuck/ahash |
@tirz (That's written in Rust, not Go -- I'd have to find a Go implementation.) Sure, but it's still way faster than SHA-1 for instance. If it's fast, does it matter that it's cryptographic? |
Well, no, but a cryptographic hash is usually slower than a hash without this constraint. I guess we just need a good cascading effect for our use-case. It's just that I already tried aHash and Blake3 as a hashing algorithm for a HashMap and aHash was significantly faster. But since I do not know the Go ecosystem, I do not have any lib to recommend. |
No worries. We appreciate your participation! |
Small change, but long story. It's not easy to explain, sorry if it's not clear, feel free to ask questions.
I recently upgraded caddy from 2.4.x to 2.6.2, unfortunately the load balancing becomes quite unbalanced as
you can see on this graph:
I made some test code to expose this behaviour:
Here the output
There is clearly something wrong, 46% of requests are going to same upstream.
I made another test code to pin point the issue:
output
As you can see when using same kind of concatenation between field and upstream as in hostByHashing() function
lot of bits are the same between all hashes
But if you swap the concatenation (as done in this commit) then hashes are well diffused.
My point is last byte push to FNV-1a tend to affect few bits in resulting hash, the idea is to change
the concatenation order between the key and the upstream strings, so the upstream last byte have
more impact on the hash diffusion.
I think there is little to no risk to do this change in term of performance.
Another solution would be to change the hash method but it could have more impact.
Some reference to support my point:
https://softwareengineering.stackexchange.com/questions/273809/should-changes-to-fnv-1as-input-exhibit-the-avalanche-effect
haproxy implements an avalanche algorithm on the
result of the hashing function : https://github.com/haproxy/haproxy/blob/master/doc/internals/hashing.txt