Implement ARM SVE optimization with assembly code #751

hzhuang1 · 2022-10-25T07:22:55Z

The whole patch set is in #748.

In this patch set, some features are included.

Change dispatch breakpoint to XXH3_accumulate() (Full acc loop #744). This pull request is prepared for ARM SVE dispatch.
Add SVE intrinsic code for XXH3.
Use dispatch as a common framework for both x86 and aarch64. Import the assembly implementation of aarch64 SVE.

hzhuang1 · 2022-10-25T07:36:24Z

Let's start from #744.

hzhuang1 · 2022-10-27T02:43:19Z

Let's start from #744.

I thought for a while. The effect of #744 isn't intuitive. So I created #752 that just supported ARM SVE intrinsic.

In #752, we could observe the performance is even downgraded versus scalar on the test platform. But it's only the intrinsic implementation for easy reviewing and a starting point of optimization.

After #752, we could keep up on #744 that exposes XXH3_accumulate() interface to all silicons. With this self-maintained interface, we could avoid to access memory frequently without hacking XXHASH that improves the performance in huge.

When both of them are handled, we could continue on the assembly implementation.

Logically, this new sequence could be much more intuitive.

hzhuang1 · 2022-11-10T01:34:38Z

#752 is merged. Thanks a lot.

Now we're moving to #756 that simplifies #744. With this patch, full accumulating loop could be customized on different architectures. On SVE, we could avoid accessing stacks and apply SVE specific prefetching instructions. The performance is improved a lot.

hzhuang1 mentioned this issue Nov 7, 2022

Add xxHash algorithm family intel/isa-l_crypto#110

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement ARM SVE optimization with assembly code #751

Implement ARM SVE optimization with assembly code #751

hzhuang1 commented Oct 25, 2022

hzhuang1 commented Oct 25, 2022

hzhuang1 commented Oct 27, 2022

hzhuang1 commented Nov 10, 2022 •

edited

Loading

Implement ARM SVE optimization with assembly code #751

Implement ARM SVE optimization with assembly code #751

Comments

hzhuang1 commented Oct 25, 2022

hzhuang1 commented Oct 25, 2022

hzhuang1 commented Oct 27, 2022

hzhuang1 commented Nov 10, 2022 • edited Loading

hzhuang1 commented Nov 10, 2022 •

edited

Loading