Vectorize alignment algorithm for x86-64 #768

rhpvorderman · 2024-03-15T08:03:45Z

Ideally you would want to use in this case __m128i _mm_blendv_epi8(__m128i a, __m128i b, __m128i mask)
where the mask could be created with anything in the _mm_cmp**_epi8 range.

But blendv is a SSE4.1 instruction. Leading to compile headaches. However, this can be done using SSE2 instructions only:

<include "emmintrin.h">

static inline __m128i vector_blend_128(__m128i a, __m128i b, __m128i mask) {
     return _mm_or_si128(
         _mm_and_si128(mask, a);
         _mm_andnot_si128(mask, b);
     );
}

So this might open up opportunities for vectorization, using only #ifdef __SSE2__ compile guards.

EDIT: This would work for other than epi8 data types as well of course.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vectorize alignment algorithm for x86-64 #768

Vectorize alignment algorithm for x86-64 #768

rhpvorderman commented Mar 15, 2024 •

edited

Vectorize alignment algorithm for x86-64 #768

Vectorize alignment algorithm for x86-64 #768

Comments

rhpvorderman commented Mar 15, 2024 • edited

rhpvorderman commented Mar 15, 2024 •

edited