SIMD (SWAR/SSE/SSE4/AVX2/AVX512F/ARM Neon) of Karp-Rabin algorithm's modification
C++ C Makefile Python
Switch branches/tags
Nothing to show
Clone or download
Latest commit 50a2c11 May 30, 2018
Failed to load latest commit information.
data new C++11 + intrinsic code Apr 6, 2015
original move legacy code to subdirectory Apr 5, 2015
results add avx512bw for skylake and results Jul 20, 2017
src Remove leftovers May 4, 2018
utils ARM Neon implementation Jan 18, 2017
.gitignore Add benchmark utility, which counts CPU cycles May 4, 2018
LICENSE added LICENSE Oct 25, 2016
Makefile Fix Aarch64 build target May 30, 2018
README.rst cleanup Makefile and README Apr 29, 2017
aarch64-strstr-v2.cpp AArch64: use while loops instead of if+for Apr 29, 2017
avx2-naive-strstr.cpp Fix AVX2 naive and plug into unittest utility Mar 17, 2018
avx2-naive-strstr64.cpp added AVX2-naive processing 64 bytes at once May 11, 2017
avx2-naive-unrolled-strstr.cpp added unrolled variant of AVX2 naive algorithm May 11, 2017
avx2-strstr-v2-clang-specific.cpp Fix AVX2 compilation with clang Mar 17, 2018
avx2-strstr-v2.cpp Fix AVX2 compilation with clang Mar 17, 2018
avx2-strstr.cpp simpler condition Oct 8, 2016
avx512bw-strstr-v2.cpp Use standard _mm512_set1_epi8 intrinsics May 4, 2018
avx512f-strstr-v2.cpp Use standard _mm512_set1_epi8 intrinsics May 4, 2018
avx512f-strstr.cpp specialized AVX512F for 4-byte "needles" Oct 8, 2016
common.h speed up ARM code a little Jan 29, 2017
fixed-memcmp.cpp ARM Neon: even simpler memcmp methods Jan 30, 2017 prepare ARM Neon compilation Jan 17, 2017
neon-strstr-v2.cpp ARM Neon: unrolling loops bring some benefits Jan 30, 2017
scalar.cpp Fix scalar implementation Mar 17, 2018
sse-naive-strstr.cpp Fix SSE naive implementation Mar 17, 2018
sse2-strstr.cpp fixed SSE2 implementation Oct 19, 2016
sse4-strstr-unrolled.cpp keep SSE4.1 MPSADBW unrolled variant Nov 9, 2016
sse4-strstr.cpp keep SSE4.1 MPSADBW unrolled variant Nov 9, 2016
sse4.2-strstr.cpp SSE4.2 variant: compare shorter substrings Oct 24, 2016
swar32-strstr-v2.cpp added 32-bit SWAR implementation Jan 18, 2017
swar64-strstr-v2.cpp fixed SWAR version Nov 9, 2016


SIMD-friendly algorithms for substring searching

Sample programs for article "SIMD-friendly algorithms for substring searching" (

The root directory contains C++11 procedures implemented using intrinsics for SSE, SSE4, AVX2, AVX512F, AVX512BW and ARM Neon (both ARMv7 and ARMv8).

The subdirectory original contains 32-bit programs with inline assembly, written in 2008 for another article.


To run unit and validation tests type make test_ARCH, to run performance tests type make ``make run_ARCH. Value ARCH selectes the CPU architecture:

  • sse4,
  • avx2,
  • avx512f,
  • avx512bw,
  • arm,
  • aarch64.

Performance results

The subdirectory results contains raw timings from various computers.