Skip to content
Branch: master
Clone or download
Pull request Compare This branch is 7 commits ahead, 47 commits behind rurban:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
doc
t1ha
.appveyor.yml
.gitignore
.travis.yml
AvalancheTest.cpp
AvalancheTest.h
Bitslice.cpp
Bitvec.cpp
Bitvec.h
CMakeLists.txt
City.cpp
City.h
CityCrc.h
CityTest.cpp
DifferentialTest.cpp
DifferentialTest.h
FarmTest.cc
Hashes.cpp
Hashes.h
KeysetTest.cpp
KeysetTest.h
MurmurHash1.cpp
MurmurHash1.h
MurmurHash2.cpp
MurmurHash2.h
MurmurHash3.cpp
MurmurHash3.h
PMurHash.c
PMurHash.h
Platform.cpp
Platform.h
README.md
Random.cpp
Random.h
SpeedTest.cpp
SpeedTest.h
Spooky.cpp
Spooky.h
SpookyTest.cpp
Stats.cpp
Stats.h
SuperFastHash.cpp
Types.cpp
Types.h
build-msys2.bat
build.sh
build32.sh
cmetrohash.h
cmetrohash64.c
crc.cpp
crc32-generated-constants.cpp
crc32_hw.c
crc32_hw1.c
crc32c.cpp
crc32c.h
falkhash-elf64.o
falkhash-macho64.o
falkhash.asm
farmhash-c-test.cc
farmhash-c.c
farmhash-c.h
farmhash.cc
farmhash.h
fasthash.cpp
fasthash.h
fhtw-elf64.o
fhtw-macho64.o
fhtw.asm
fhtw.h
halfsiphash.c
hasshe2.c
jody_hash32.c
jody_hash32.h
jody_hash64.c
jody_hash64.h
log.hashes
log.speed
lookup3.cpp
main.cpp
md5.cpp
metrohash.h
metrohash128.cpp
metrohash128crc.cpp
metrohash64.cpp
metrohash64crc.cpp
mum.cc
mum.h
opt_cmetrohash.h
opt_cmetrohash64_1.c
os_dependent_stuff.asm
sha1.cpp
sha1.h
siphash.c
siphash.h
siphash_impl.h
siphash_sse2.c
siphash_ssse3.c
speed.sh
speedall.sh
split.pl
t1ha.h
testall.sh
testpar.sh
testspeed.sh
wyhash.h
xxh3.h
xxhash.c
xxhash.h

README.md

SMhasher

Hash function MiB/sec cycles/hash Quality problems
donothing32 12954965.31 4.64 test NOP
donothing64 9224792.69 4.72 test NOP
donothing128 9074269.36 4.47 test NOP
NOP_OAAT_read64 3909.83 16.34 test NOP
BadHash 900.52 58.13 test FAIL
sumhash 20379.60 21.60 test FAIL
sumhash32 74930.18 11.33 test FAIL
--------------------------------------
crc32 544.84 92.65 insecure, 8589.93x collisions, distrib
md5_32a 399.30 550.28 8589.93x collisions, distrib
sha1_32a 648.72 850.86 collisions, 36.6% distrib
hasshe2 2054.39 76.60 insecure, fails all tests
crc32_hw 9310.60 24.63 insecure, 100% bias, collisions, distrib, machine-specific (x86 SSE4.2)
crc32_hw1 30145.28 30.98 insecure, 100% bias, collisions, distrib, machine-specific (x86 SSE4.2)
crc64_hw 10181.36 23.46 insecure, 100% bias, collisions, distrib, machine-specific (x86_64 SSE4.2)
fibonacci 20706.41 19.38 zeros, fails all tests
FNV1a 937.85 56.63 zeros, fails all tests
FNV1a_YT 15425.31 21.51 fails all tests
FNV64 893.70 56.94 fails all tests
FNV2 7885.43 28.71 fails all tests
fletcher2 26241.42 19.01 fails all tests
fletcher4 9526.54 19.53 fails all tests
bernstein 1261.71 46.20 fails all tests
sdbm 929.94 54.35 fails all tests
x17 1174.99 53.62 99.98% bias, fails all tests
JenkinsOOAT 736.83 86.43 53.5% bias, fails all tests
JenkinsOOAT_pl 741.41 71.78 1.5-11.5% bias, 7.2x collisions
MicroOAAT 819.52 63.24 100% bias, distrib
jodyhash32 1864.24 34.43 bias, collisions, distr
jodyhash64 3764.44 30.13 bias, collisions, distr
lookup3 3133.07 30.65 28% bias, collisions, 30% distr
superfast 2918.47 34.62 91% bias, 5273.01x collisions, 37% distr
MurmurOAAT 669.46 74.54 collisions, 99.998% distr
Crap8 3962.47 26.98 2.42% bias, collisions, 2% distrib
Murmur2 3712.67 31.32 1.7% bias, 81x coll, 1.7% distrib
Murmur2A 3875.69 34.80 12.7% bias
Murmur2B 7336.97 34.43 1.8% bias, collisions, 3.4% distrib
Murmur2C 5050.30 34.10 91% bias, collisions, distr
HalfSipHash 1105.93 79.31 zeroes
SipHash13 2124.26 84.20 0.9% bias
--------------------------------------
SipHash 1124.24 115.72
GoodOAAT 1237.86 52.75
PMurHash32 3070.82 44.13
Murmur3A 3166.17 38.70
Murmur3C 3404.34 48.65
Murmur3F 6853.24 38.15
fasthash32 6693.98 36.67
fasthash64 7020.23 35.51
MUM 12790.65 29.63 machine-specific (32/64 differs)
MUMlow 12973.97 34.67
MUMhigh 12465.58 35.64
City32 5802.91 40.73
City64noSeed 14124.43 29.41
City64 13964.74 39.84 2 minor collisions
City64low 13888.28 45.66
City64high 14340.83 45.69
City128 16000.61 45.74
CityCrc128 19348.29 48.34
FarmHash64 14899.89 42.41
FarmHash128 15998.86 58.12
FarmHash32 24831.45 24.99 machine-specific (x86_64 SSE4/AVX)
farmhash32_c 24647.21 25.36 machine-specific (x86_64 SSE4/AVX)
farmhash64_c 14967.76 42.01
farmhash128_c 15097.31 61.00
xxHash32 7297.39 36.91 collisions with 4bit diff
xxHash64 14879.09 44.11
xxh3 43021.12 26.00
xxh3low 37670.92 26.39
xxh3high 45932.74 27.29
xxh128 44407.83 26.89
xxh128low 43661.23 25.86
xxh128high 41762.05 28.81
Spooky32 14213.99 48.03
Spooky64 14839.81 46.62
Spooky128 14833.63 47.50
metrohash64_1 15491.63 37.56
metrohash64_2 16316.71 37.82
metrohash128_1 15902.48 43.79
metrohash128_2 16374.31 46.21
metrohash64crc_1 25856.64 41.24 cyclic collisions 8 byte, machine-specific (x64 SSE4.2)
metrohash64crc_2 26450.83 39.74 cyclic collisions 8 byte, machine-specific (x64 SSE4.2)
metrohash128crc_1 25404.51 49.07 machine-specific (x64 SSE4.2)
metrohash128crc_2 25248.70 49.57 machine-specific (x64 SSE4.2)
cmetrohash64_1o 15997.88 37.07 LongNeighbors
cmetrohash64_1 17094.64 36.85
cmetrohash64_2 15952.55 38.87
falkhash 37742.94 108.59 LongNeighbors, machine-specific (x86_64 AES-NI)
t1ha2_atonce 14747.01 36.09
t1ha2_stream 6376.62 82.41
t1ha2_atonce128 14007.67 50.51
t1ha2_stream128 7274.53 93.11
t1ha1_64le 16255.08 27.63
t1ha1_64be 12285.15 28.84
t1ha0_32le 8866.04 36.95
t1ha0_32be 7859.07 38.63
t1ha0_aes_noavx 21264.27 35.63 machine-specific (x86 AES-NI)
t1ha0_aes_avx1 20443.32 36.05 machine-specific (x64 AVX)
t1ha0_aes_avx2 36436.51 36.31 machine-specific (x64 AVX2)
wyhash 10315.00 25.10

Summary

I added some SSE assisted hashes and fast intel/arm CRC32-C and AES HW variants, but not the fastest crcutil yet. See our crcutil results. See also the old https://code.google.com/p/smhasher/w/list.

So the fastest hash functions on x86_64 without quality problems are:

  • wyhash
  • xxh3
  • t1ha
  • Metro (but not 64crc yet, WIP)
  • FarmHash (not portable, too machine specific: 64 vs 32bit, old gcc, ...)
  • Spooky32
  • fasthash
  • MUM (machine specific, mum: different results on 32/64-bit archs)

Hash functions for symbol tables or hash tables typically use 32 bit hashes, for databases, file systems and file checksums typically 64 or 128bit, for crypto now starting with 256 bit.

Typical median key size in perl5 is 20, the most common 4. Similar for all other dynamic languages. See github.com/rurban/perl-hash-stats

When used in a hash table the instruction cache will usually beat the CPU and throughput measured here. In my tests the smallest FNV1A beats the fastest crc32_hw1 with Perl 5 hash tables. Even if those worse hash functions will lead to more collisions, the overall speed advantage beats the slightly worse quality. See e.g. A Seven-Dimensional Analysis of Hashing Methods and its Implications on Query Processing for a concise overview of the best hash table strategies, confirming that the simplest Mult hashing (bernstein, FNV*, x17, sdbm) always beat "better" hash functions (Tabulation, Murmur, Farm, ...) when used in a hash table.

The fast hash functions tested here are recommendable as fast for file digests and maybe bigger databases, but not for 32bit hash tables. The "Quality problems" lead to less uniform distribution, i.e. more collisions and worse performance, but are rarely related to real security attacks, just the 2nd sanity zeroes test against \0 invariance is security relevant.

Other

TODO

Some popular SSE-improved FNV1 (sanmayce) variants, fletcher (ZFS), ... and slower cryptographic hashes or more secure hashes are still missing. BLAKE2, SHA-2, SHA-3 (Keccak), Grøstl, JH, Skein, ...

SECURITY

The hash table attacks described in SipHash against City, Murmur or Perl JenkinsOAAT or at Hash Function Lounge are not included here.

Such an attack avoidance cannot be the problem of the hash function, but the hash table collision resolution scheme. You can attack every single hash function, even the best and most secure if you detect the seed, e.g. from language (mis-)features, side-channel attacks, collision timings and independly the sort-order, so you need to protect your collision handling scheme from the worst-case O(n), i.e. separate chaining with linked lists. Linked lists chaining allows high load factors, but is very cache-unfriendly. The only recommendable linked list scheme is inlining the key or hash into the array. Nowadays everybody uses fast open addressing, even if the load factor needs to be ~50%, unless you use Cuckoo Hashing.

I.e. the usage of SipHash for their hash table in Python 3.4, ruby, rust, systemd, OpenDNS, Haskell and OpenBSD is pure security theatre. SipHash is not secure enough for security purposes and not fast enough for general usage. Brute-force generation of ~32k collisions need 2-4m for all these hashes. siphash being the slowest needs max 4m, other typically max 2m30s, with <10s for practical 16k collision attacks with all hash functions. Using Murmur is usually slower than a simple Mult, even in the worst case. Provable secure is only uniform hashing, i.e. 2-5 independent Mult or Tabulation, or using a guaranteed logarithmic collision scheme (a tree) or a linear collision scheme, such as Robin Hood or Cockoo hashing with collision counting.

One more note regarding security: Nowadays even SHA1 can be solved in a solver, like Z3 (or faster ones) for practical hash table collision attacks (i.e. 14-20 bits). So all hash functions with less than 256 bits tested here cannot be considered "secure" at all.

The '\0' vulnerability attack with binary keys is tested in the 2nd Sanity test.

You can’t perform that action at this time.