Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Utilize SIMD to find expected end of token #499

Closed
thedrow opened this issue Dec 28, 2015 · 22 comments
Closed

Utilize SIMD to find expected end of token #499

thedrow opened this issue Dec 28, 2015 · 22 comments

Comments

@thedrow
Copy link
Contributor

thedrow commented Dec 28, 2015

picohttpparser uses SIMD to scan a string in order to find a character in a string.
We can use it or something similar to be able to determine the end of the token.
Say we have the following JSON:

{
"a": 123,
"b": "fd"
}

Instead of scanning each character we can find the boundaries of the integer value (that is ':' and ',') which will allow us to use the existing buffer sliced from start to end in order to cast the value to integer.
There can be multiple characters to look ahead for like newline or " depending on the type of what is being parsed on the contents of the JSON file.
If we'd be able to optimize for the common cases I think it can be useful.

@miloyip
Copy link
Collaborator

miloyip commented Dec 29, 2015

I think it may be possible to find the ending double quote or escape character of JSON string type. However, JSON number seems quite difficult to do so. As the syntax is more complicated, and also RapidJSON currently need to determine whether it is 32/64-bit signed/unsigned integer or double during the single pass of parsing.

@thedrow
Copy link
Contributor Author

thedrow commented Dec 29, 2015

With any data type that is not a string you need to be able to find either ',' a newline or '}' or ']' unless I'm missing something.
If we were able to determine that the type of the object is not a string we can scan for those tokens.
Wouldn't parsing be much shorter that way since we know where the tokens should end?

@miloyip
Copy link
Collaborator

miloyip commented Dec 29, 2015

I think the types under consideration should be JSON string and number only. I think knowing where the token should end does not help much. For number, you may check https://github.com/miloyip/rapidjson/blob/master/include/rapidjson/reader.h#L824 and see how it validates the input, and at the same time detects the actual data type and decodes it. It does not (need to) check for , or : or } or ] while doing so.

@thedrow
Copy link
Contributor Author

thedrow commented Dec 29, 2015

If we know where the number ends couldn't we use what you wrote in https://github.com/miloyip/itoa-benchmark/blob/master/src/sse2.cpp

@miloyip
Copy link
Collaborator

miloyip commented Dec 29, 2015

itoa is for converting integer to ASCII.

@thedrow
Copy link
Contributor Author

thedrow commented Dec 29, 2015

Oh right. My bad.
That's the example I had in mind http://www.mersenneforum.org/showthread.php?t=11590

@miloyip
Copy link
Collaborator

miloyip commented Dec 29, 2015

It may be possible to use SIMD for scanning the same kind of characters, such as 0 to 9. But for the length of integers or decimal places, it seems could not help much.

@thedrow
Copy link
Contributor Author

thedrow commented Dec 29, 2015

So let's focus on strings for now. I'll open issues for other data types as I come up with algorithms that may or may not fit in order to parse them in parallel.
The following malformed json:

{
"k": "v, # no quotation mark
"k2": "v2"
}

Can cause erroneous parsing if we determine that the quotation mark that specifies the start of the "k2" key is the end of the string of the value "v,".
How do we work around that?

@miloyip
Copy link
Collaborator

miloyip commented Dec 29, 2015

The malformed JSON you quoted is invalid at k2, where a comma is expected after the previous JSON value is parsed.

To optimize string parsing (https://github.com/miloyip/rapidjson/blob/master/include/rapidjson/reader.h#L737), I think it can scan the first appearance of the characters \, " and codepoint < 0x20. At the same time copy other preceding characters into the output buffer. The comparison and copying operation should be benefited by SIMD.

@thedrow
Copy link
Contributor Author

thedrow commented Dec 29, 2015

I know we can find the start of the string but how do we find the end of the string?
The example above "terminates" the string and the SIMD operation will copy it already to the buffer.
Maybe I'm not seeing it correctly because I'm not that familiar with SIMD operations (just with the concept). I am trying to learn.

@miloyip
Copy link
Collaborator

miloyip commented Jan 22, 2016

I've committed the optimizations which uses SSE2 instruction to scan (and copy) unescaped characters during string parsing. Unfortunately I have mistakenly commit it to master directly, instead of making a pull request.

Anyway, here are the results.

Before:
[       OK ] RapidJson.ReaderParseInsitu_DummyHandler_SSE42 (559 ms)
[       OK ] RapidJson.ReaderParse_DummyHandler_SSE42 (812 ms)
[       OK ] RapidJson.ReaderParse_DummyHandler_Paragraphs_SSE42 (1394 ms)

After:
[       OK ] RapidJson.ReaderParseInsitu_DummyHandler_SSE42 (403 ms)
[       OK ] RapidJson.ReaderParse_DummyHandler_SSE42 (395 ms)
[       OK ] RapidJson.ReaderParseInsitu_DummyHandler_Paragraphs_SSE42 (68 ms)

The likely/unlikely and unsafe push are also committed. This should improve performance in #275 . @xpol can try verifying it with

@thedrow
Copy link
Contributor Author

thedrow commented Jan 22, 2016

Well done! Can you link to the code?

@miloyip
Copy link
Collaborator

miloyip commented Jan 22, 2016

@thedrow
Copy link
Contributor Author

thedrow commented Jan 30, 2016

Couldn't we refactor the code a bit to repeat less?

@thedrow
Copy link
Contributor Author

thedrow commented Jan 30, 2016

Regarding numbers, I have something in mind that might work. It's still a bit raw though.

  1. We create a list of ranges that specifies where numbers start and end.
  2. We search for a '.' character in all of them at once using the technic used by picohttpparser.
  3. We utilize AVX2's fused multiply-add with 3 operands in order to get the number from the string like so:
    result = number * position + result
    Where position is a multiple of 10 if we're calculating the number to the left of the decimal point and a multiple of 0.1 if we're calculating the number to the right of the decimal point.
    We then add the two numbers.

Unfortunately this requires the users to have CPUs that support AVX2 but this should speed up parsing.
What do you think?

@miloyip
Copy link
Collaborator

miloyip commented Jan 30, 2016

Due to page boundary issue, SIMD load needed to be aligned. I think for number it will be quite difficult to do so. You may try to do a simple atoi() kind of function to test out your idea first.
But ParseNumber() is really much more complicated than ParseString().
For the code duplication stuff, I think it is quite difficult, or the result may have more code and less readable. You may also suggest a way for so.

@thedrow
Copy link
Contributor Author

thedrow commented Jan 30, 2016

As for true/false/null we can employ a slightly different method from picohttpparser's findchar_first().
We only need to know the beginning of each token (if we skip whitespaces) but instead of marking if the buffer is found in only one of the ranges it should be in all of them.

  1. Scan the string and mark each place where 't', 'f' or 'n' is present.
  2. Scan the entire buffer with ranges that are fixed in size e.g. 't' + 3 for true and 'f' + 4 for false and compare them to the relevant buffer.
  3. If any of the buffers in the provided ranges are not equal to the token we're checking then we should error.
  4. If all are equal, just set the values.

In this case I'm not sure it's worth the trouble but I figured I should mention it anyway.

@miloyip
Copy link
Collaborator

miloyip commented Jan 31, 2016

As I said, due to page boundary issue, only aligned data can be accessed. The preceding unaligned bytes needed to process in non-SIMD way. In other words, only data >= 16 bytes are useful for SIMD. So true, false, null should not able to gain performance boost via SIMD.
It is possible to do SIMD calculations in atoi() things, to do so need to store the digits into an aligned buffer, and then do the calculation in SIMD. But I think the performance gain will be limited, as it involves horizontal sum, e.g. a * 1000 + b * 100 + c * 10 + d for four digits abcd.

@pah
Copy link
Contributor

pah commented Jan 31, 2016

In the SIMD functions SkipUnescapedString, ScanCopyUnescapedString, how do you enforce not to scan over the bounds of the underlying (string, insitu) buffer? I think, it would be helpful to document the mechanics in that regard, if possible.

@miloyip
Copy link
Collaborator

miloyip commented Feb 2, 2016

In an early version of RapidJSON, an issue reported that the SkipWhitespace_SIMD() causes crash very rarely (around 1 in 500,000). After investigation, it is suspected that _mm_loadu_si128() accessed bytes after '\0', and across a protected page boundary.

In Intel® 64 and IA-32 Architectures Optimization Reference Manual
, section 10.2.1:

To support algorithms requiring unaligned 128-bit SIMD memory accesses, memory buffer allocation by a caller function should consider adding some pad space so that a callee function can safely use the address pointer safely with unaligned 128-bit SIMD memory operations.
The minimal padding size should be the width of the SIMD register that might be used in conjunction with unaligned SIMD memory access.

This is not feasible as RapidJSON should not enforce such requirement.

To fix this issue, currently the routine process bytes up to the next aligned address. After tha, use aligned read to perform SIMD processing. Also see #85.

@miloyip
Copy link
Collaborator

miloyip commented Feb 2, 2016

I close this issue now. If @thedrow has no idea on optimization, please drop a new issue. You may also do experiments on your fork and we can discuss there as well.

@miloyip miloyip closed this as completed Feb 2, 2016
@thedrow
Copy link
Contributor Author

thedrow commented Feb 2, 2016

Thanks :) 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants