Permalink
Switch branches/tags
Nothing to show
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
146 lines (145 sloc) 9.25 KB
input size: 201326592
scalar (32 bit) ... 0.67659
scalar (64 bit) ... 0.58455 (speedup 1.16)
SWAR (64 bit) ... 0.84144 (speedup 0.80)
SSE (lookup: naive) ... 0.35677 (speedup 1.90)
SSE (lookup: other improved) ... 0.33010 (speedup 2.05)
SSE (lookup: improved) ... 0.32233 (speedup 2.10)
SSE (lookup: pshufb-based) ... 0.54131 (speedup 1.25)
SSE (lookup: pshufb improved) ... 0.41217 (speedup 1.64)
SSE (lookup: other improved, unrolled) ... 0.32264 (speedup 2.10)
SSE (lookup: improved, unrolled) ... 0.30930 (speedup 2.19)
SSE (lookup: pshufb-based, unrolled) ... 0.52263 (speedup 1.29)
SSE (lookup: pshufb improved unrolled) ... 0.37436 (speedup 1.81)
SSE (fully unrolled improved lookup) ... 0.29208 (speedup 2.32)
SSE & BMI2 (lookup: naive) ... 0.34828 (speedup 1.94)
SSE & BMI2 (lookup: improved) ... 0.36099 (speedup 1.87)
SSE & BMI2 (lookup: pshufb improved) ... 0.38641 (speedup 1.75)
AVX2 (lookup: improved) ... 0.31569 (speedup 2.14)
AVX2 (lookup: improved, unrolled) ... 0.30359 (speedup 2.23)
AVX2 (lookup: pshufb-based) ... 0.44837 (speedup 1.51)
AVX2 (lookup: pshufb-based, unrolled) ... 0.45462 (speedup 1.49)
AVX2 (lookup: pshufb improved) ... 0.34897 (speedup 1.94)
AVX2 (lookup: pshufb unrolled improved) ... 0.35050 (speedup 1.93)
AVX2 & BMI (lookup: pshufb improved) ... 0.59088 (speedup 1.15)
AVX512 (incremental logic) ... 0.12472 (speedup 5.42)
AVX512 (incremental logic improved) ... 0.11351 (speedup 5.96)
AVX512 (incremental logic improved with gather load)... 0.12821 (speedup 5.28)
AVX512 (binary search) ... 0.12529 (speedup 5.40)
AVX512 (gather) ... 0.25702 (speedup 2.63)
input size: 201326592
scalar (32 bit) ... 0.67584
scalar (64 bit) ... 0.58457 (speedup 1.16)
SWAR (64 bit) ... 0.84177 (speedup 0.80)
SSE (lookup: naive) ... 0.35682 (speedup 1.89)
SSE (lookup: other improved) ... 0.33041 (speedup 2.05)
SSE (lookup: improved) ... 0.32218 (speedup 2.10)
SSE (lookup: pshufb-based) ... 0.54043 (speedup 1.25)
SSE (lookup: pshufb improved) ... 0.41220 (speedup 1.64)
SSE (lookup: other improved, unrolled) ... 0.32282 (speedup 2.09)
SSE (lookup: improved, unrolled) ... 0.30934 (speedup 2.18)
SSE (lookup: pshufb-based, unrolled) ... 0.52266 (speedup 1.29)
SSE (lookup: pshufb improved unrolled) ... 0.37419 (speedup 1.81)
SSE (fully unrolled improved lookup) ... 0.29228 (speedup 2.31)
SSE & BMI2 (lookup: naive) ... 0.34835 (speedup 1.94)
SSE & BMI2 (lookup: improved) ... 0.36082 (speedup 1.87)
SSE & BMI2 (lookup: pshufb improved) ... 0.38654 (speedup 1.75)
AVX2 (lookup: improved) ... 0.31560 (speedup 2.14)
AVX2 (lookup: improved, unrolled) ... 0.30334 (speedup 2.23)
AVX2 (lookup: pshufb-based) ... 0.44785 (speedup 1.51)
AVX2 (lookup: pshufb-based, unrolled) ... 0.45432 (speedup 1.49)
AVX2 (lookup: pshufb improved) ... 0.34887 (speedup 1.94)
AVX2 (lookup: pshufb unrolled improved) ... 0.35022 (speedup 1.93)
AVX2 & BMI (lookup: pshufb improved) ... 0.59034 (speedup 1.14)
AVX512 (incremental logic) ... 0.12477 (speedup 5.42)
AVX512 (incremental logic improved) ... 0.11466 (speedup 5.89)
AVX512 (incremental logic improved with gather load)... 0.12943 (speedup 5.22)
AVX512 (binary search) ... 0.12574 (speedup 5.38)
AVX512 (gather) ... 0.25687 (speedup 2.63)
input size: 201326592
scalar (32 bit) ... 0.67622
scalar (64 bit) ... 0.58459 (speedup 1.16)
SWAR (64 bit) ... 0.84120 (speedup 0.80)
SSE (lookup: naive) ... 0.35669 (speedup 1.90)
SSE (lookup: other improved) ... 0.33012 (speedup 2.05)
SSE (lookup: improved) ... 0.32253 (speedup 2.10)
SSE (lookup: pshufb-based) ... 0.54085 (speedup 1.25)
SSE (lookup: pshufb improved) ... 0.41274 (speedup 1.64)
SSE (lookup: other improved, unrolled) ... 0.32280 (speedup 2.09)
SSE (lookup: improved, unrolled) ... 0.30967 (speedup 2.18)
SSE (lookup: pshufb-based, unrolled) ... 0.52245 (speedup 1.29)
SSE (lookup: pshufb improved unrolled) ... 0.37415 (speedup 1.81)
SSE (fully unrolled improved lookup) ... 0.29240 (speedup 2.31)
SSE & BMI2 (lookup: naive) ... 0.34823 (speedup 1.94)
SSE & BMI2 (lookup: improved) ... 0.36100 (speedup 1.87)
SSE & BMI2 (lookup: pshufb improved) ... 0.38636 (speedup 1.75)
AVX2 (lookup: improved) ... 0.31564 (speedup 2.14)
AVX2 (lookup: improved, unrolled) ... 0.30354 (speedup 2.23)
AVX2 (lookup: pshufb-based) ... 0.44807 (speedup 1.51)
AVX2 (lookup: pshufb-based, unrolled) ... 0.45450 (speedup 1.49)
AVX2 (lookup: pshufb improved) ... 0.34926 (speedup 1.94)
AVX2 (lookup: pshufb unrolled improved) ... 0.35021 (speedup 1.93)
AVX2 & BMI (lookup: pshufb improved) ... 0.59036 (speedup 1.15)
AVX512 (incremental logic) ... 0.12488 (speedup 5.41)
AVX512 (incremental logic improved) ... 0.11455 (speedup 5.90)
AVX512 (incremental logic improved with gather load)... 0.12808 (speedup 5.28)
AVX512 (binary search) ... 0.12535 (speedup 5.39)
AVX512 (gather) ... 0.25674 (speedup 2.63)
input size: 201326592
scalar (32 bit) ... 0.67601
scalar (64 bit) ... 0.58490 (speedup 1.16)
SWAR (64 bit) ... 0.84158 (speedup 0.80)
SSE (lookup: naive) ... 0.35687 (speedup 1.89)
SSE (lookup: other improved) ... 0.32959 (speedup 2.05)
SSE (lookup: improved) ... 0.32261 (speedup 2.10)
SSE (lookup: pshufb-based) ... 0.54107 (speedup 1.25)
SSE (lookup: pshufb improved) ... 0.41271 (speedup 1.64)
SSE (lookup: other improved, unrolled) ... 0.32302 (speedup 2.09)
SSE (lookup: improved, unrolled) ... 0.30935 (speedup 2.19)
SSE (lookup: pshufb-based, unrolled) ... 0.52317 (speedup 1.29)
SSE (lookup: pshufb improved unrolled) ... 0.37424 (speedup 1.81)
SSE (fully unrolled improved lookup) ... 0.29187 (speedup 2.32)
SSE & BMI2 (lookup: naive) ... 0.34796 (speedup 1.94)
SSE & BMI2 (lookup: improved) ... 0.36089 (speedup 1.87)
SSE & BMI2 (lookup: pshufb improved) ... 0.38607 (speedup 1.75)
AVX2 (lookup: improved) ... 0.31549 (speedup 2.14)
AVX2 (lookup: improved, unrolled) ... 0.30332 (speedup 2.23)
AVX2 (lookup: pshufb-based) ... 0.44801 (speedup 1.51)
AVX2 (lookup: pshufb-based, unrolled) ... 0.45472 (speedup 1.49)
AVX2 (lookup: pshufb improved) ... 0.34949 (speedup 1.93)
AVX2 (lookup: pshufb unrolled improved) ... 0.35024 (speedup 1.93)
AVX2 & BMI (lookup: pshufb improved) ... 0.59123 (speedup 1.14)
AVX512 (incremental logic) ... 0.12479 (speedup 5.42)
AVX512 (incremental logic improved) ... 0.11348 (speedup 5.96)
AVX512 (incremental logic improved with gather load)... 0.12745 (speedup 5.30)
AVX512 (binary search) ... 0.12521 (speedup 5.40)
AVX512 (gather) ... 0.25683 (speedup 2.63)
input size: 201326592
scalar (32 bit) ... 0.67597
scalar (64 bit) ... 0.58455 (speedup 1.16)
SWAR (64 bit) ... 0.84142 (speedup 0.80)
SSE (lookup: naive) ... 0.35688 (speedup 1.89)
SSE (lookup: other improved) ... 0.33024 (speedup 2.05)
SSE (lookup: improved) ... 0.32250 (speedup 2.10)
SSE (lookup: pshufb-based) ... 0.54079 (speedup 1.25)
SSE (lookup: pshufb improved) ... 0.41269 (speedup 1.64)
SSE (lookup: other improved, unrolled) ... 0.32306 (speedup 2.09)
SSE (lookup: improved, unrolled) ... 0.30911 (speedup 2.19)
SSE (lookup: pshufb-based, unrolled) ... 0.52287 (speedup 1.29)
SSE (lookup: pshufb improved unrolled) ... 0.37435 (speedup 1.81)
SSE (fully unrolled improved lookup) ... 0.29212 (speedup 2.31)
SSE & BMI2 (lookup: naive) ... 0.34834 (speedup 1.94)
SSE & BMI2 (lookup: improved) ... 0.36102 (speedup 1.87)
SSE & BMI2 (lookup: pshufb improved) ... 0.38660 (speedup 1.75)
AVX2 (lookup: improved) ... 0.31562 (speedup 2.14)
AVX2 (lookup: improved, unrolled) ... 0.30371 (speedup 2.23)
AVX2 (lookup: pshufb-based) ... 0.44838 (speedup 1.51)
AVX2 (lookup: pshufb-based, unrolled) ... 0.45438 (speedup 1.49)
AVX2 (lookup: pshufb improved) ... 0.34914 (speedup 1.94)
AVX2 (lookup: pshufb unrolled improved) ... 0.35032 (speedup 1.93)
AVX2 & BMI (lookup: pshufb improved) ... 0.59048 (speedup 1.14)
AVX512 (incremental logic) ... 0.12492 (speedup 5.41)
AVX512 (incremental logic improved) ... 0.11655 (speedup 5.80)
AVX512 (incremental logic improved with gather load)... 0.12879 (speedup 5.25)
AVX512 (binary search) ... 0.12553 (speedup 5.38)
AVX512 (gather) ... 0.25703 (speedup 2.63)