Skip to content

Logan007/pearsonC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 

Repository files navigation

Pearson CRC32c Hashing

As discussed over at Skeeto's Hash Prospector, a CRC32c-MUL-CRC32c scheme seemingly forms a low-bias permutation on 32-bit values. Embedding it into a Pearson scheme as used with Pearson B. Hashing, we get a new hashing function. Following the naming series, let's call it Pearson C. Hashing.

Hope is that the CRC32c instruction as found at Intel comptaible CPU's starting with SSE 4.2 support will accelerate hashing a lot.

Permutation

We use an all-same constant version as it has a comparable bias to an all-different constant version. This way, we might save some cycles loading constants from cache or memory.

uint32_t cmc (uint32_t x) {
    
    x = _mm_crc32_u32(x, 0x941325ab);
    x *= 0x941325ab;
    x = _mm_crc32_u32(x, 0x941325ab);
    
    return x;
}

This permutation passes SmallCrunch (15/15), masters Crunch (140/144) and does well on BigCrush (141/160).

========= Summary results of SmallCrush =========

 Version:          TestU01 1.2.3
 Generator:        CRC32-MUL-CRC32
 Number of statistics:  15
 Total CPU time:   00:00:04.42

 All tests were passed
 ========= Summary results of Crush =========

 Version:          TestU01 1.2.3
 Generator:        CRC32-MUL-CRC32
 Number of statistics:  144
 Total CPU time:   00:19:25.30
 The following tests gave p-values outside [0.001, 0.9990]:
 (eps  means a value < 1.0e-300):
 (eps1 means a value < 1.0e-15):

       Test                          p-value
 ----------------------------------------------
 41  MaxOft, t = 5                  1 -  3.2e-7
 42  MaxOft, t = 10                 1 - 1.7e-13
 43  MaxOft, t = 20                 1 - 8.4e-13
 44  MaxOft, t = 30                 1 - 5.1e-13
 ----------------------------------------------
 All other tests were passed
========= Summary results of BigCrush =========

 Version:          TestU01 1.2.3
 Generator:        CRC32-MUL-CRC32
 Number of statistics:  160
 Total CPU time:   02:27:12.07
 The following tests gave p-values outside [0.001, 0.9990]:
 (eps  means a value < 1.0e-300):
 (eps1 means a value < 1.0e-15):

       Test                          p-value
 ----------------------------------------------
 30  CouponCollector, r = 0         4.0e-11
 31  CouponCollector, r = 10         1.1e-6
 32  CouponCollector, r = 20        7.8e-10
 33  CouponCollector, r = 27        1.4e-13
 34  Gap, r = 0                       eps  
 35  Gap, r = 25                      eps  
 36  Gap, r = 0                       eps  
 37  Gap, r = 20                      eps  
 41  Permutation, t = 5               eps  
 43  Permutation, t = 10              eps  
 46  MaxOft, t = 8                  1 - eps1
 47  MaxOft, t = 16                 1 - eps1
 48  MaxOft, t = 24                 1 - eps1
 49  MaxOft, t = 32                 1 - eps1
 58  AppearanceSpacings, r = 27      5.6e-7
 65  SumCollector                     eps  
 89  PeriodsInStrings, r = 20        0.9995 
 101  Run of bits, r = 0              2.7e-6
 102  Run of bits, r = 27             8.0e-4
 ----------------------------------------------
 All other tests were passed

Further Steps

A few tests are still missing such as the comprehensive evaluation of the whole scheme (not just the permutation) by SMHasher – to follow soon. Same for the full C implementation.

About

Pearson CRC32c Hashing

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published