Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.Sign up
MurmurHash3 information and brief performance results
(Note - this information is a bit stale, and refers to an earlier version of Murmur3. I'll rewrite it when I have time.)
MurmurHash3 is the successor to MurmurHash2. It comes in 3 variants - a 32-bit version that targets low latency for hash table use and two 128-bit versions for generating unique identifiers for large blocks of data, one each for x86 and x64 platforms.
MurmurHash3's mix functions are based on this snippet -
k *= c1; k = rotl(k,r1); k *= c2; h ^= k; h = rotl(h,r1); h = h*m1+n1;
k is a block of the key,
h is a block of the hash state, and
cN are constants.
For each block of the key, we pre-mix it using two constants and a rotate, xor it into the hash block, and them mix the hash block using a rotate and a multiply-add.
MurmurHash3's 32-bit finalizer is
h ^= h >> 16; h *= 0x85ebca6b; h ^= h >> 13; h *= 0xc2b2ae35; h ^= h >> 16;
and its 64-bit finalizer is
h ^= h >> 33; h *= 0xff51afd7ed558ccd; h ^= h >> 33; h *= 0xc4ceb9fe1a85ec53; h ^= h >> 33;
The constants for the finalizers were generated by a simple simulated-annealing algorithm, and both avalanche all bits of
h to within 0.25% bias.
The 128-bit variants mix multiple blocks of key data in parallel. To ensure that all the intermediate hash blocks affect each other, Murmurhash3 does a few simple operations interleaved with the block mix -
The 64-bit, 2-block inter-mix is
h1 += h2; h2 = _rotl64(h2,41); h2 += h1;
The 32-bit, 4-block inter-mix is
h1 += h2; h1 += h3; h1 += h4; h1 = _rotl(h1,17); h2 += h1; h3 += h1; h4 += h1;
hN is one block of the hash value.
Bulk speed test, hashing an 8-byte-aligned 256k block
Results are from an Intel Core 2 Quad Q9650 running at 3.0 ghz, running on a single core.
|SuperFastHash_x86_32||1224 mb/sec (1)|
|MurmurHash2_x86_64||3352 mb/sec (2)|
|MurmurHash3_x64_128||5058 mb/sec (3)|
(1) - SuperFastHash has very poor collision properties, which have been documented elsewhere.
(2) - MurmurHash2_x86_64 computes two 32-bit results in parallel and mixes them at the end, which is fast but means that collision resistance is only as good as a 32-bit hash. I suggest avoiding this variant.
(3) - That's about 1.68 bytes per cycle, or about 9.5 cycles per 16-byte chunk. The inner loop is 20 instructions long, so we're sustaining over 2 instructions per cycle. Hooray for modern platforms with fast 64-bit multipliers and superscalar architectures. :)