Customize full accumulating loop for SVE #756

XXH3_accumulate() handle the whole accumulating loop and architecture optimized code is in the mini loop of 512 bytes. But it also causes accessing memory frequently for the large block data. Now make XXH3_accumulate() as architecture optimized code. Signed-off-by: Haojian Zhuang <haojian.zhuang@linaro.org> Signed-off-by: Devin Hussey <easyaspi314@users.noreply.github.com>

With optimized full accumulating loop, the performance is improved at least 2 times. The ACC result needn't to save to stack in the full loop. And instructions of prefetching data for SVE are also used. Without this patch, the performance result is in below. === benchmarking 4 hash functions === benchmarking large inputs : from 512 bytes (log9) to 128 MB (log27) xxh3 , 1904, 2315, 2468, 2580, 2640, 2670, 2682, 2673, 2677, 2663, 2683, 2688, 2686, 2591, 2241, 2181, 2191, 2048, 2048 XXH32 , 1326, 1440, 1493, 1523, 1534, 1543, 1547, 1532, 1504, 1507, 1507, 1505, 1506, 1446, 1218, 1150, 1151, 1153, 1135 XXH64 , 2511, 2795, 2975, 3068, 3120, 3125, 3154, 3128, 3034, 3045, 3052, 3053, 3053, 2842, 2050, 1853, 1848, 1853, 1853 XXH128 , 1867, 2294, 2465, 2569, 2622, 2662, 2676, 2667, 2677, 2682, 2684, 2677, 2683, 2570, 2093, 2013, 2045, 2046, 2046 With this patch, the performance result is in below. === benchmarking 4 hash functions === benchmarking large inputs : from 512 bytes (log9) to 128 MB (log27) xxh3 , 3681, 6007, 7803, 8954, 9875, 10411, 10703, 10505, 10670, 10794, 10812, 10804, 10205, 9923, 6279, 5927, 5967, 6022, 6062 XXH32 , 1281, 1434, 1494, 1523, 1534, 1543, 1547, 1535, 1500, 1502, 1502, 1502, 1501, 1443, 1242, 1169, 1193, 1196, 1195 XXH64 , 2497, 2801, 2961, 3074, 3092, 3136, 3155, 3123, 3031, 3037, 3040, 3037, 3033, 2847, 2102, 1955, 1967, 1974, 1971 XXH128 , 3419, 5798, 7488, 8854, 9787, 10357, 10673, 10468, 10647, 10748, 10785, 10751, 10805, 9698, 6011, 5677, 5999, 6065, 6074 Signed-off-by: Haojian Zhuang <haojian.zhuang@linaro.org> Signed-off-by: Devin Hussey <easyaspi314@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Customize full accumulating loop for SVE #756

Customize full accumulating loop for SVE #756

Commits on Nov 10, 2022