Utilize SIMD to find expected end of token #499

thedrow · 2015-12-28T14:01:18Z

picohttpparser uses SIMD to scan a string in order to find a character in a string.
We can use it or something similar to be able to determine the end of the token.
Say we have the following JSON:

{
"a": 123,
"b": "fd"
}

Instead of scanning each character we can find the boundaries of the integer value (that is ':' and ',') which will allow us to use the existing buffer sliced from start to end in order to cast the value to integer.
There can be multiple characters to look ahead for like newline or " depending on the type of what is being parsed on the contents of the JSON file.
If we'd be able to optimize for the common cases I think it can be useful.

miloyip · 2015-12-29T02:32:12Z

I think it may be possible to find the ending double quote or escape character of JSON string type. However, JSON number seems quite difficult to do so. As the syntax is more complicated, and also RapidJSON currently need to determine whether it is 32/64-bit signed/unsigned integer or double during the single pass of parsing.

thedrow · 2015-12-29T08:06:10Z

With any data type that is not a string you need to be able to find either ',' a newline or '}' or ']' unless I'm missing something.
If we were able to determine that the type of the object is not a string we can scan for those tokens.
Wouldn't parsing be much shorter that way since we know where the tokens should end?

miloyip · 2015-12-29T08:58:23Z

I think the types under consideration should be JSON string and number only. I think knowing where the token should end does not help much. For number, you may check https://github.com/miloyip/rapidjson/blob/master/include/rapidjson/reader.h#L824 and see how it validates the input, and at the same time detects the actual data type and decodes it. It does not (need to) check for , or : or } or ] while doing so.

thedrow · 2015-12-29T09:07:30Z

If we know where the number ends couldn't we use what you wrote in https://github.com/miloyip/itoa-benchmark/blob/master/src/sse2.cpp

miloyip · 2015-12-29T09:08:23Z

itoa is for converting integer to ASCII.

thedrow · 2015-12-29T09:10:04Z

Oh right. My bad.
That's the example I had in mind http://www.mersenneforum.org/showthread.php?t=11590

miloyip · 2015-12-29T09:10:48Z

It may be possible to use SIMD for scanning the same kind of characters, such as 0 to 9. But for the length of integers or decimal places, it seems could not help much.

thedrow · 2015-12-29T09:34:52Z

So let's focus on strings for now. I'll open issues for other data types as I come up with algorithms that may or may not fit in order to parse them in parallel.
The following malformed json:

{
"k": "v, # no quotation mark
"k2": "v2"
}

Can cause erroneous parsing if we determine that the quotation mark that specifies the start of the "k2" key is the end of the string of the value "v,".
How do we work around that?

miloyip · 2015-12-29T09:53:54Z

The malformed JSON you quoted is invalid at k2, where a comma is expected after the previous JSON value is parsed.

To optimize string parsing (https://github.com/miloyip/rapidjson/blob/master/include/rapidjson/reader.h#L737), I think it can scan the first appearance of the characters \, " and codepoint < 0x20. At the same time copy other preceding characters into the output buffer. The comparison and copying operation should be benefited by SIMD.

thedrow · 2015-12-29T11:59:57Z

I know we can find the start of the string but how do we find the end of the string?
The example above "terminates" the string and the SIMD operation will copy it already to the buffer.
Maybe I'm not seeing it correctly because I'm not that familiar with SIMD operations (just with the concept). I am trying to learn.

miloyip · 2016-01-22T12:24:06Z

I've committed the optimizations which uses SSE2 instruction to scan (and copy) unescaped characters during string parsing. Unfortunately I have mistakenly commit it to master directly, instead of making a pull request.

Anyway, here are the results.

Before:
[       OK ] RapidJson.ReaderParseInsitu_DummyHandler_SSE42 (559 ms)
[       OK ] RapidJson.ReaderParse_DummyHandler_SSE42 (812 ms)
[       OK ] RapidJson.ReaderParse_DummyHandler_Paragraphs_SSE42 (1394 ms)

After:
[       OK ] RapidJson.ReaderParseInsitu_DummyHandler_SSE42 (403 ms)
[       OK ] RapidJson.ReaderParse_DummyHandler_SSE42 (395 ms)
[       OK ] RapidJson.ReaderParseInsitu_DummyHandler_Paragraphs_SSE42 (68 ms)

The likely/unlikely and unsafe push are also committed. This should improve performance in #275 . @xpol can try verifying it with

thedrow · 2016-01-22T12:49:04Z

Well done! Can you link to the code?

miloyip · 2016-01-22T12:51:46Z

The scan/copy thing is at https://github.com/miloyip/rapidjson/blob/master/include/rapidjson/reader.h#L796

thedrow · 2016-01-30T09:38:21Z

Couldn't we refactor the code a bit to repeat less?

thedrow · 2016-01-30T10:20:14Z

Regarding numbers, I have something in mind that might work. It's still a bit raw though.

We create a list of ranges that specifies where numbers start and end.
We search for a '.' character in all of them at once using the technic used by picohttpparser.
We utilize AVX2's fused multiply-add with 3 operands in order to get the number from the string like so:
result = number * position + result
Where position is a multiple of 10 if we're calculating the number to the left of the decimal point and a multiple of 0.1 if we're calculating the number to the right of the decimal point.
We then add the two numbers.

Unfortunately this requires the users to have CPUs that support AVX2 but this should speed up parsing.
What do you think?

miloyip · 2016-01-30T14:53:01Z

Due to page boundary issue, SIMD load needed to be aligned. I think for number it will be quite difficult to do so. You may try to do a simple atoi() kind of function to test out your idea first.
But ParseNumber() is really much more complicated than ParseString().
For the code duplication stuff, I think it is quite difficult, or the result may have more code and less readable. You may also suggest a way for so.

thedrow · 2016-01-30T15:20:05Z

As for true/false/null we can employ a slightly different method from picohttpparser's findchar_first().
We only need to know the beginning of each token (if we skip whitespaces) but instead of marking if the buffer is found in only one of the ranges it should be in all of them.

Scan the string and mark each place where 't', 'f' or 'n' is present.
Scan the entire buffer with ranges that are fixed in size e.g. 't' + 3 for true and 'f' + 4 for false and compare them to the relevant buffer.
If any of the buffers in the provided ranges are not equal to the token we're checking then we should error.
If all are equal, just set the values.

In this case I'm not sure it's worth the trouble but I figured I should mention it anyway.

miloyip · 2016-01-31T03:51:17Z

As I said, due to page boundary issue, only aligned data can be accessed. The preceding unaligned bytes needed to process in non-SIMD way. In other words, only data >= 16 bytes are useful for SIMD. So true, false, null should not able to gain performance boost via SIMD.
It is possible to do SIMD calculations in atoi() things, to do so need to store the digits into an aligned buffer, and then do the calculation in SIMD. But I think the performance gain will be limited, as it involves horizontal sum, e.g. a * 1000 + b * 100 + c * 10 + d for four digits abcd.

pah · 2016-01-31T08:01:21Z

In the SIMD functions SkipUnescapedString, ScanCopyUnescapedString, how do you enforce not to scan over the bounds of the underlying (string, insitu) buffer? I think, it would be helpful to document the mechanics in that regard, if possible.

#499

miloyip · 2016-02-02T14:51:06Z

In an early version of RapidJSON, an issue reported that the SkipWhitespace_SIMD() causes crash very rarely (around 1 in 500,000). After investigation, it is suspected that _mm_loadu_si128() accessed bytes after '\0', and across a protected page boundary.

In Intel® 64 and IA-32 Architectures Optimization Reference Manual
, section 10.2.1:

To support algorithms requiring unaligned 128-bit SIMD memory accesses, memory buffer allocation by a caller function should consider adding some pad space so that a callee function can safely use the address pointer safely with unaligned 128-bit SIMD memory operations.
The minimal padding size should be the width of the SIMD register that might be used in conjunction with unaligned SIMD memory access.

This is not feasible as RapidJSON should not enforce such requirement.

To fix this issue, currently the routine process bytes up to the next aligned address. After tha, use aligned read to perform SIMD processing. Also see #85.

miloyip · 2016-02-02T14:52:20Z

I close this issue now. If @thedrow has no idea on optimization, please drop a new issue. You may also do experiments on your fork and we can discuss there as well.

thedrow · 2016-02-02T16:31:24Z

Thanks :) 👍

miloyip added the performance label Jan 22, 2016

miloyip added a commit that referenced this issue Feb 2, 2016

Add notes about SIMD optimization issue. [skip ci]

29016f3

#499

miloyip closed this as completed Feb 2, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Utilize SIMD to find expected end of token #499

Utilize SIMD to find expected end of token #499

thedrow commented Dec 28, 2015

miloyip commented Dec 29, 2015

thedrow commented Dec 29, 2015

miloyip commented Dec 29, 2015

thedrow commented Dec 29, 2015

miloyip commented Dec 29, 2015

thedrow commented Dec 29, 2015

miloyip commented Dec 29, 2015

thedrow commented Dec 29, 2015

miloyip commented Dec 29, 2015

thedrow commented Dec 29, 2015

miloyip commented Jan 22, 2016

thedrow commented Jan 22, 2016

miloyip commented Jan 22, 2016

thedrow commented Jan 30, 2016

thedrow commented Jan 30, 2016

miloyip commented Jan 30, 2016

thedrow commented Jan 30, 2016

miloyip commented Jan 31, 2016

pah commented Jan 31, 2016

miloyip commented Feb 2, 2016

miloyip commented Feb 2, 2016

thedrow commented Feb 2, 2016

Utilize SIMD to find expected end of token #499

Utilize SIMD to find expected end of token #499

Comments

thedrow commented Dec 28, 2015

miloyip commented Dec 29, 2015

thedrow commented Dec 29, 2015

miloyip commented Dec 29, 2015

thedrow commented Dec 29, 2015

miloyip commented Dec 29, 2015

thedrow commented Dec 29, 2015

miloyip commented Dec 29, 2015

thedrow commented Dec 29, 2015

miloyip commented Dec 29, 2015

thedrow commented Dec 29, 2015

miloyip commented Jan 22, 2016

thedrow commented Jan 22, 2016

miloyip commented Jan 22, 2016

thedrow commented Jan 30, 2016

thedrow commented Jan 30, 2016

miloyip commented Jan 30, 2016

thedrow commented Jan 30, 2016

miloyip commented Jan 31, 2016

pah commented Jan 31, 2016

miloyip commented Feb 2, 2016

miloyip commented Feb 2, 2016

thedrow commented Feb 2, 2016