Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add AVX2-accelerated decoder #5

Merged
merged 7 commits into from
Oct 14, 2017
Merged

Add AVX2-accelerated decoder #5

merged 7 commits into from
Oct 14, 2017

Conversation

AdamNiederer
Copy link
Owner

Yields a 25%+ speedup on decoder-limited workflows. This means base💯 is now nearly twice as fast as base64 in many scenarios.

mm256_shuffle_epi8 isn't in stdsimd, so let's work around it with some inline
assembly until it is.
Performs what was previously a 64+ instruction scalar load in a little over 16
instructions. This doesn't yield the same speedup I was hoping for, but
hopefully it'll be more noticeable once the encoder is vectorized as well.
Remove the unnecessary division and modulus in the encoding algorithm. I may
need to find a way around the u8->u16->u8 cast as well.
@AdamNiederer AdamNiederer merged commit 0fb92a6 into master Oct 14, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant