Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vectorize alignment algorithm for x86-64 #768

Open
rhpvorderman opened this issue Mar 15, 2024 · 0 comments
Open

Vectorize alignment algorithm for x86-64 #768

rhpvorderman opened this issue Mar 15, 2024 · 0 comments

Comments

@rhpvorderman
Copy link
Collaborator

rhpvorderman commented Mar 15, 2024

Ideally you would want to use in this case __m128i _mm_blendv_epi8(__m128i a, __m128i b, __m128i mask)
where the mask could be created with anything in the _mm_cmp**_epi8 range.

But blendv is a SSE4.1 instruction. Leading to compile headaches. However, this can be done using SSE2 instructions only:

<include "emmintrin.h">

static inline __m128i vector_blend_128(__m128i a, __m128i b, __m128i mask) {
     return _mm_or_si128(
         _mm_and_si128(mask, a);
         _mm_andnot_si128(mask, b);
     );
}

So this might open up opportunities for vectorization, using only #ifdef __SSE2__ compile guards.

EDIT: This would work for other than epi8 data types as well of course.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant