SSSE3: enc: add inline asm codepath #109

aklomp · 2022-10-20T20:19:25Z

After adding an inline assembly implementation of the AVX2 and AVX encoders in #104 and #108, it turns out that we can easily repeat the trick for SSSE3. The code looks a lot like the AVX implementation. Benchmarking across a few machines consistently shows around 10-20% speedup.

One caveat is that the inline assembly codepath will only be available on 64-bit machines. 32-bit machines with SSSE3 support (rare, but they exist, I own one) have eight XMM registers instead of sixteen, and that's not enough to implement a proper pipelined, unrolled loop.

I did try to write inline assembly that uses only eight XMM registers, but under those constraints I could not implement a parallelized loop, and in fact I could not even beat the compiler for speed.

aklomp · 2022-10-20T23:13:23Z

@htot Since you use the SSSE3 level stuff, I think this might be of interest to you?

aklomp self-assigned this Oct 20, 2022

aklomp added the enhancement label Oct 20, 2022

aklomp closed this as completed in 59a8417 Oct 20, 2022

aklomp added the performance label Oct 22, 2022

aklomp mentioned this issue Nov 8, 2023

Create release 0.5.1 #122

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SSSE3: enc: add inline asm codepath #109

SSSE3: enc: add inline asm codepath #109

aklomp commented Oct 20, 2022 •

edited

Loading

aklomp commented Oct 20, 2022

SSSE3: enc: add inline asm codepath #109

SSSE3: enc: add inline asm codepath #109

Comments

aklomp commented Oct 20, 2022 • edited Loading

aklomp commented Oct 20, 2022

aklomp commented Oct 20, 2022 •

edited

Loading