New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-4708: [C++] refactoring JSON parser to prepare for multithreaded impl #4148
Conversation
- don't use in-situ parsing - don't require null termination of parsed buffers - resize parser's scalar storage when it might overflow - rewrite chunker to use Buffer instead of string_view to represent memory ranges - iwyu + lint cleanup - add test for parsing JSON with a partial schema - add test for inferring timestamp type in an unexpected field of strings
@pitrou I have removed the converter/reader changes, this PR only affects the chunker, parser, and utilities. |
Codecov Report
@@ Coverage Diff @@
## master #4148 +/- ##
==========================================
+ Coverage 87.85% 89.16% +1.31%
==========================================
Files 748 617 -131
Lines 91842 81857 -9985
Branches 1251 0 -1251
==========================================
- Hits 80687 72988 -7699
+ Misses 11036 8869 -2167
+ Partials 119 0 -119
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this. This looks good on the principle. A number of small comments below.
@pitrou how's this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Just two nits.
cpp/src/arrow/json/rapidjson-defs.h
Outdated
} | ||
|
||
// enable SIMD whitespace skipping, if available | ||
#if defined(__SSE4_2__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should use ARROW_HAVE_SSE4_2
from sse-util.h
instead.
@pitrou done |
Thank you @bkietz ! |
to represent memory ranges
field of strings