-
Notifications
You must be signed in to change notification settings - Fork 776
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement ARM SVE optimization with assembly code #751
Comments
Let's start from #744. |
I thought for a while. The effect of #744 isn't intuitive. So I created #752 that just supported ARM SVE intrinsic. In #752, we could observe the performance is even downgraded versus scalar on the test platform. But it's only the intrinsic implementation for easy reviewing and a starting point of optimization. After #752, we could keep up on #744 that exposes XXH3_accumulate() interface to all silicons. With this self-maintained interface, we could avoid to access memory frequently without hacking XXHASH that improves the performance in huge. When both of them are handled, we could continue on the assembly implementation. Logically, this new sequence could be much more intuitive. |
The whole patch set is in #748.
In this patch set, some features are included.
The text was updated successfully, but these errors were encountered: