Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SSSE3: enc: add inline asm codepath #109

Closed
aklomp opened this issue Oct 20, 2022 · 1 comment
Closed

SSSE3: enc: add inline asm codepath #109

aklomp opened this issue Oct 20, 2022 · 1 comment

Comments

@aklomp
Copy link
Owner

aklomp commented Oct 20, 2022

After adding an inline assembly implementation of the AVX2 and AVX encoders in #104 and #108, it turns out that we can easily repeat the trick for SSSE3. The code looks a lot like the AVX implementation. Benchmarking across a few machines consistently shows around 10-20% speedup.

One caveat is that the inline assembly codepath will only be available on 64-bit machines. 32-bit machines with SSSE3 support (rare, but they exist, I own one) have eight XMM registers instead of sixteen, and that's not enough to implement a proper pipelined, unrolled loop.

I did try to write inline assembly that uses only eight XMM registers, but under those constraints I could not implement a parallelized loop, and in fact I could not even beat the compiler for speed.

@aklomp aklomp self-assigned this Oct 20, 2022
@aklomp aklomp closed this as completed in 59a8417 Oct 20, 2022
@aklomp
Copy link
Owner Author

aklomp commented Oct 20, 2022

@htot Since you use the SSSE3 level stuff, I think this might be of interest to you?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant