Implement a universal hash function#528
Conversation
A slight variation of https://en.wikipedia.org/wiki/Universal_hashing#Avoiding_modular_arithmetic without the need for 128bit arithmetic. The 64bit hash is folded into 32bit, expanded into 64bit using two random numbers and compressed into nb_bits using fast power-of-2 modulo and div. Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
therault
left a comment
There was a problem hiding this comment.
LGTM. There is a line I think we can remove (see comment below), but otherwise, I think it should be merged.
|
Removals are consistently slower, and not marginally slower. This is counter-intuitive as the average length of the buckets decreases and so do the number of collisions. It would seem normal to have a decreasing removal (and insertion) time, so why it is not the case ? |
|
For random keys, the difference is marginal, sometimes faster, sometimes slower. For 100k structured keys, the universal hash is slower because buckets are more tightly packed, thus on average longer lists to traverse. For 1M structured keys, the universal hash is an order of magnitude faster in removals because the lists grow longer with the current hash (up to 27 and 6 more levels than the universal hash). |
Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
The rehashing function in the hash table does a poor job of scrambling bits on structured keys. At least in TTG, keys are typically generated from a 2D or 3D index, e.g., by shifting the key dimensions into a 64bit value. This PR implements a variation of [1] that avoids 128bit arithmetic by initially folding the key into 32bit. That means we're losing some information on the key but I expect it to perform better than the original rehash.
This PR also extends the hash benchmark with options to provide an initial seed for random keys and to force a 3D key generation, shifting three counters into a single 64bit value.
Some measurement
In all cases, we generate 100k keys, with 5 repetitions and a single thread.
Random keys
Current Hash
Universal Hash
The universal hash function seems to cause one more bucket being used, although the average is roughly comparable. Performance seems slightly lower in some cases, but nothing I would consider concerning. The keys are already random, so this might be a result of the folding from 64bit to 32bit we do on the input key. I don't think our keys are ever truly random
Structured keys
Instead of a random key generator, here we generate keys in a structured 3D key space. This should be closer to what we see in applications.
Current Hash
Universal Hash
Note that the universal hash requires 2 buckets less (10 instead of 12) and has generally a higher minimum and average length. Insertion and removal times are naturally slower than before because buckets are longer, i.e., there are longer element lists to traverse. However, the tighter packing means we are less likely to run into the maximum bucket size, at which point buckets can become really long
Structured 1M Keys
Current Hash
Note that the current hash function exhausts the default bucket count (8M). This likely will negatively impact performance.
Universal Hash
No excessive collisions here, and decent average length.
[1] https://en.wikipedia.org/wiki/Universal_hashing#Avoiding_modular_arithmetic