Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unify _mm_hsub_* functions #432

Closed
howjmay opened this issue May 27, 2021 · 1 comment · Fixed by #498
Closed

unify _mm_hsub_* functions #432

howjmay opened this issue May 27, 2021 · 1 comment · Fixed by #498
Assignees

Comments

@howjmay
Copy link
Contributor

howjmay commented May 27, 2021

The implementation for _mm_hsub_* functions varies. Maybe we should unify to the faster one

@howjmay howjmay self-assigned this Aug 8, 2021
@marktwtn
Copy link
Collaborator

The generated assembly code of different implementations
Compiler: ARM64 GCC 11.1
Optimization: -O2

ARM32 with unzip vector intrinsic:
https://godbolt.org/z/rhdcP7Khe

ARM64 with unzip vector intrinsic:
https://godbolt.org/z/ehvn51To3

Extract narrow and shift implementation:
https://godbolt.org/z/7Ybof1Y1K

The unzip vector implementation has less assembly code.

marktwtn added a commit that referenced this issue Oct 18, 2021
Unify the implementation of _mm_hsub[s]_* with unzip vector intrinsic.

The old implementation:
https://godbolt.org/z/7Ybof1Y1K

The better implementation with less assembly code for ARM32 and ARM64:
https://godbolt.org/z/rhdcP7Khe
https://godbolt.org/z/ehvn51To3

Extract variable declaration for readability.
Replace transpose vector intrinsic with unzip vector instrinsic for
unification.

Close #432.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants