Skip to content

Commit

Permalink
SSE: add SSSE3 decoding routine for long input strings
Browse files Browse the repository at this point in the history
  • Loading branch information
aklomp committed Oct 29, 2014
1 parent 9d320e4 commit 85414e9
Show file tree
Hide file tree
Showing 2 changed files with 87 additions and 6 deletions.
10 changes: 5 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,14 +3,14 @@
This is an implementation of a base64 stream encoder/decoder in C89. It also
contains wrapper functions to encode/decode simple length-delimited strings.

If your processor supports SSSE3, encoding speed is about four times higher
than the competition, because this library uses SSE intrinsics to encode twelve
bytes at a time. To the author's knowledge, this is the only Base64 library
that does this.
If your processor supports SSSE3, encoding/decoding speed is about four times
higher than the competition, because this library uses SSE intrinsics to
encode/decode twelve bytes at a time. To the author's knowledge, this is the
only Base64 library that does this.

Notable features:

- Really fast encoding on x86 systems, using SSE instructions;
- Really fast on x86 systems by using SSE vector instructions;
- Reads/writes blocks of streaming data;
- Does not dynamically allocate memory;
- Valid C89 that compiles with pedantic options on;
Expand Down
83 changes: 82 additions & 1 deletion base64.c
Original file line number Diff line number Diff line change
Expand Up @@ -241,7 +241,88 @@ base64_stream_decode (struct base64_state *state, const char *const src, size_t
{
for (;;)
{
case 0: if (srclen-- == 0) {
case 0:
#ifdef __SSSE3__
/* If we have SSSE3 support, pick off 16 bytes at a time for as long
* as we can, but make sure that we quit before seeing any == markers
* at the end of the string. Also, because we write four zeroes at
* the end of the output, ensure that there are at least 6 valid bytes
* of input data remaining to close the gap. 16 + 2 + 6 = 24 bytes: */
while (srclen >= 24)
{
__m128i str, mask, res;
__m128i s1mask, s2mask, s3mask, s4mask, s5mask;

/* Load string: */
str = _mm_loadu_si128((__m128i *)c);

/* Classify characters into five sets:
* Set 1: "ABCDEFGHIJKLMNOPQRSTUVWXYZ" */
s1mask = _mm_andnot_si128(
_mm_cmplt_epi8(str, _mm_set1_epi8('A')),
_mm_cmplt_epi8(str, _mm_set1_epi8('Z' + 1)));

/* Set 2: "abcdefghijklmnopqrstuvwxyz" */
s2mask = _mm_andnot_si128(
_mm_cmplt_epi8(str, _mm_set1_epi8('a')),
_mm_cmplt_epi8(str, _mm_set1_epi8('z' + 1)));

/* Set 3: "0123456789" */
s3mask = _mm_andnot_si128(
_mm_cmplt_epi8(str, _mm_set1_epi8('0')),
_mm_cmplt_epi8(str, _mm_set1_epi8('9' + 1)));

/* Set 4: "+" */
s4mask = _mm_cmpeq_epi8(str, _mm_set1_epi8('+'));

/* Set 5: "/" */
s5mask = _mm_cmpeq_epi8(str, _mm_set1_epi8('/'));

/* Check if all bytes have been classified; else fall back on bytewise code
* to do error checking and reporting: */
if (_mm_movemask_epi8(s1mask | s2mask | s3mask | s4mask | s5mask) != 0xFFFF) {
break;
}
/* Subtract sets from byte values: */
res = s1mask & _mm_sub_epi8(str, _mm_set1_epi8('A'));
res |= s2mask & _mm_sub_epi8(str, _mm_set1_epi8('a' - 26));
res |= s3mask & _mm_sub_epi8(str, _mm_set1_epi8('0' - 52));
res |= s4mask & _mm_set1_epi8(62);
res |= s5mask & _mm_set1_epi8(63);

/* Shuffle bytes to 32-bit bigendian: */
res = _mm_shuffle_epi8(res,
_mm_setr_epi8(3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12));

/* Mask in a single byte per shift: */
mask = _mm_set1_epi32(0x3F000000);

/* Pack bytes together: */
str = _mm_slli_epi32(res & mask, 2);
mask = _mm_srli_epi32(mask, 8);

str |= _mm_slli_epi32(res & mask, 4);
mask = _mm_srli_epi32(mask, 8);

str |= _mm_slli_epi32(res & mask, 6);
mask = _mm_srli_epi32(mask, 8);

str |= _mm_slli_epi32(res & mask, 8);

/* Reshuffle and repack into 12-byte output format: */
str = _mm_shuffle_epi8(str,
_mm_setr_epi8(3, 2, 1, 7, 6, 5, 11, 10, 9, 15, 14, 13, -1, -1, -1, -1));

/* Store back: */
_mm_storeu_si128((__m128i *)o, str);

c += 16;
o += 12;
outl += 12;
srclen -= 16;
}
#endif
if (srclen-- == 0) {
ret = 1;
break;
}
Expand Down

0 comments on commit 85414e9

Please sign in to comment.