Add AVX2-accelerated decoder #5

AdamNiederer · 2017-10-11T21:38:28Z

Yields a 25%+ speedup on decoder-limited workflows. This means base💯 is now nearly twice as fast as base64 in many scenarios.

mm256_shuffle_epi8 isn't in stdsimd, so let's work around it with some inline assembly until it is.

Performs what was previously a 64+ instruction scalar load in a little over 16 instructions. This doesn't yield the same speedup I was hoping for, but hopefully it'll be more noticeable once the encoder is vectorized as well.

Remove the unnecessary division and modulus in the encoding algorithm. I may need to find a way around the u8->u16->u8 cast as well.

AdamNiederer added 6 commits October 11, 2017 17:40

Add partial implementation of vectorized loads on AVX2

db3e466

Add preliminary avx2 inline asm implementation

093811b

mm256_shuffle_epi8 isn't in stdsimd, so let's work around it with some inline assembly until it is.

AVX2 Vectorized Decoding

d2969be

Performs what was previously a 64+ instruction scalar load in a little over 16 instructions. This doesn't yield the same speedup I was hoping for, but hopefully it'll be more noticeable once the encoder is vectorized as well.

Prepare for vectorized encoding

965574a

Remove the unnecessary division and modulus in the encoding algorithm. I may need to find a way around the u8->u16->u8 cast as well.

Minor style cleanups

63644f2

Update README with AVX2 benchmarks

f6155fc

AdamNiederer force-pushed the avx2 branch from ab94a27 to f6155fc Compare October 11, 2017 21:40

Add doc comments to non-SSE encoder and fix erroneous subtraction

4c0d56c

AdamNiederer merged commit 0fb92a6 into master Oct 14, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add AVX2-accelerated decoder #5

Add AVX2-accelerated decoder #5

AdamNiederer commented Oct 11, 2017

Add AVX2-accelerated decoder #5

Add AVX2-accelerated decoder #5

Conversation

AdamNiederer commented Oct 11, 2017