Skip to content

bab2min/prefix-sum

Repository files navigation

Comparison of Prefix Sum Method

I've measured the elapsed time and the mean absolute error for the various methods of computing Prefix Sum of an array of 1M floats.

Compiled with SSE2 option

g++ 5.4.0 g++ 7.5.0 g++ 9.3.0 clang 11.0.3* msvc 19.26 Avg SpeedUp MAE
Time(sec) SpeedUp Time SpeedUp Time SpeedUp Time SpeedUp Time SpeedUp
float simple(baseline) 1.459 0% 1.255 0% 0.983 0% 0.901 0% 1.247 0% 0% 2.037 simple(baseline)
simple_double 1.546 -6% 1.648 -24% 1.334 -26% 1.281 -30% 1.156 8% -11% - simple_double
sse 0.521 180% 0.549 128% 0.341 188% 0.444 103% 0.384 225% 172% 0.683 sse
kahan 5.924 -75% 4.789 -74% 3.987 -75% 3.497 -74% 4.100 -70% -73% 0.000 kahan
unroll4 1.401 4% 1.247 1% 0.972 1% 0.893 1% 1.017 23% 9% 2.037 unroll4
unroll4_reorder1 1.068 37% 0.984 27% 0.802 23% 0.745 21% 0.818 52% 36% 0.768 unroll4_reorder1
unroll4_shift 0.896 63% 0.514 144% 0.919 7% 0.545 65% 0.917 36% 57% 0.683 unroll4_shift
unroll8 1.431 2% 1.240 1% 1.042 -6% 0.886 2% 1.048 19% 7% 2.037 unroll8
unroll8_reorder1 1.062 37% 0.926 36% 0.763 29% 0.660 37% 0.780 60% 44% 1.160 unroll8_reorder1
unroll8_reorder2 1.249 17% 0.849 48% 0.645 52% 0.634 42% 0.692 80% 55% 0.833 unroll8_reorder2
unroll8_shift 1.210 21% 0.657 91% 1.248 -21% 0.591 52% 1.294 -4% 23% 0.344 unroll8_shift
unroll16 1.378 6% 1.242 1% 1.009 -3% 0.897 0% 1.036 20% 8% 2.037 unroll16
unroll16_reorder1 0.880 66% 0.891 41% 0.715 37% 0.701 29% 0.728 71% 52% 1.198 unroll16_reorder1
unroll16_reorder2 0.657 122% 0.793 58% 0.613 60% 0.533 69% 0.847 47% 65% 2.277 unroll16_reorder2
double simple(baseline) 1.486 0% 1.291 0% 0.997 0% 0.885 0% 1.526 0% 0% 2.037 simple(baseline)
kahan 5.563 -73% 4.813 -73% 4.079 -76% 3.466 -74% 4.267 -64% -70% - kahan
unroll4 1.478 1% 1.248 3% 1.032 -3% 0.878 1% 1.018 50% 19% 2.037 unroll4
unroll4_reorder1 1.079 38% 1.010 28% 0.789 26% 0.741 19% 0.794 92% 51% 0.768 unroll4_reorder1
unroll4_shift 0.927 60% 0.544 138% 0.919 8% 0.468 89% 0.967 58% 70% 0.683 unroll4_shift
unroll8 1.549 -4% 1.226 5% 0.958 4% 0.883 0% 1.035 47% 19% 2.037 unroll8
unroll8_reorder1 0.929 60% 0.944 37% 0.754 32% 0.671 32% 0.765 100% 61% 1.160 unroll8_reorder1
unroll8_reorder2 0.808 84% 0.831 55% 0.619 61% 0.616 44% 0.673 127% 83% 0.833 unroll8_reorder2
unroll8_shift 1.240 20% 0.648 99% 1.208 -18% 0.606 46% 1.269 20% 32% 0.344 unroll8_shift
unroll16 1.431 4% 1.247 4% 1.015 -2% 0.904 -2% 1.019 50% 19% 2.037 unroll16
unroll16_reorder1 0.861 73% 0.907 42% 0.690 44% 0.622 42% 0.727 110% 72% 1.198 unroll16_reorder1
unroll16_reorder2 0.621 139% 0.812 59% 0.619 61% 0.509 74% 0.802 90% 85% 2.277 unroll16_reorder2

Compiled with AVX option

g++ 5.4.0 g++ 7.5.0 g++ 9.3.0 clang 11.0.3* msvc 19.26 Avg SpeedUp MAE
Time(sec) SpeedUp Time SpeedUp Time SpeedUp Time SpeedUp Time SpeedUp
float simple(baseline) 1.245 0% 1.071 0% 0.995 0% 0.862 0% 1.243 0% 0% 2.037 simple(baseline)
simple_double 1.614 -23% 1.515 -29% 1.263 -21% 1.173 -27% 1.269 -2% -17% - simple_double
avx 0.514 142% 0.502 113% 0.492 102% 0.481 79% 0.577 116% 108% 0.344 avx
sse 0.394 216% 0.358 199% 0.345 189% 0.405 113% 0.504 147% 159% 0.683 sse
kahan 4.825 -74% 4.246 -75% 3.860 -74% 3.406 -75% 4.069 -69% -73% 0.000 kahan
unroll4 1.179 6% 1.213 -12% 0.992 0% 0.861 0% 1.249 0% -1% 2.037 unroll4
unroll4_reorder1 0.988 26% 0.871 23% 0.813 22% 0.727 19% 0.972 28% 24% 0.768 unroll4_reorder1
unroll4_shift 0.679 84% 0.910 18% 0.937 6% 0.570 51% 0.534 133% 76% 0.683 unroll4_shift
unroll8 1.229 1% 1.049 2% 0.949 5% 0.862 0% 1.027 21% 9% 2.037 unroll8
unroll8_reorder1 1.026 21% 0.753 42% 0.734 36% 0.630 37% 0.792 57% 43% 1.160 unroll8_reorder1
unroll8_reorder2 0.758 64% 0.673 59% 0.627 59% 0.570 51% 0.715 74% 63% 0.833 unroll8_reorder2
unroll8_shift 0.756 65% 1.356 -21% 1.246 -20% 0.776 11% 0.631 97% 42% 0.344 unroll8_shift
unroll16 1.171 6% 1.009 6% 1.003 -1% 0.876 -2% 1.074 16% 7% 2.037 unroll16
unroll16_reorder1 0.829 50% 0.769 39% 0.680 46% 0.571 51% 0.773 61% 53% 1.198 unroll16_reorder1
unroll16_reorder2 0.726 71% 0.626 71% 0.569 75% 0.494 75% 0.943 32% 58% 2.277 unroll16_reorder2
double simple(baseline) 1.214 0% 1.066 0% 0.952 0% 0.875 0% 1.537 0% 0% 2.037 simple(baseline)
kahan 4.790 -75% 4.175 -74% 3.821 -75% 3.431 -75% 3.986 -61% -70% - kahan
unroll4 1.152 5% 1.032 3% 0.933 2% 0.866 1% 1.243 24% 10% 2.037 unroll4
unroll4_reorder1 0.975 25% 0.890 20% 0.810 18% 0.732 19% 0.959 60% 35% 0.768 unroll4_reorder1
unroll4_shift 0.683 78% 0.950 12% 0.919 4% 0.567 54% 0.523 194% 98% 0.683 unroll4_shift
unroll8 1.194 2% 1.063 0% 0.965 -1% 0.870 1% 1.034 49% 18% 2.037 unroll8
unroll8_reorder1 0.876 39% 0.761 40% 0.715 33% 0.639 37% 0.838 83% 54% 1.160 unroll8_reorder1
unroll8_reorder2 0.791 54% 0.665 60% 0.642 48% 0.576 52% 0.721 113% 76% 0.833 unroll8_reorder2
unroll8_shift 0.750 62% 1.237 -14% 1.259 -24% 0.763 15% 0.624 146% 61% 0.344 unroll8_shift
unroll16 1.158 5% 1.031 3% 0.960 -1% 0.874 0% 1.034 49% 19% 2.037 unroll16
unroll16_reorder1 0.825 47% 0.750 42% 0.705 35% 0.609 44% 0.747 106% 66% 1.198 unroll16_reorder1
unroll16_reorder2 0.752 61% 0.636 68% 0.587 62% 0.492 78% 0.958 61% 66% 2.277 unroll16_reorder2

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages